Introducing Terran: A human perception library

Computer vision has seen tremendous change in the last decade. Along this journey, the nature of the techniques used has shifted from fixed-pipeline algorithms based mostly in heuristics and little data to resource-intensive general-purpose models that require both heavy computing power and a lot of data. This shift, while unlocking new capabilities, presents some drawbacks.

Most of the work done at the research level is fragmented because researchers, by the very nature of good research, need to focus on one aspect at a time when planning their experiments. And while there are papers on unifying techniques, the difficulty to evaluate, and the requirements on what a novel contribution is being so stringent, these are few and apart. Most of these works come from non-academic initiatives such as fast.ai.

The objective of research is to present and validate a novel approach to a problem. Being that their end goal, the focus is on results and the published paper, not on the code used to run those experiments. Even when code is made publicly available, more often than not, it tends to be hard to extend, optimize, or integrate with other pieces of code.

Commercially available APIs may solve this issue since they are usually easy to use and integrate. However, they come with another set of problems.

First and foremost, code is not auditable. Since they are commercial solutions, code is not open source, so you cannot be certain what the code is doing. You may trust the company you are working with, but that may not be enough for other stakeholders. Plus, it is always nice to be able to check what is being done. Also, since they are usually cloud solutions, you have to share your data with them. This is a no-go for some companies, since the data may be somewhat sensitive.

There are some great services by some big-name companies, which may work fine for some set of problems. Nevertheless, their "best" models remain behind closed door. In addition to this, the technique that it is used may be unknown. While you can easily use Amazon Rekognition or Google Vision AI, you can't know what's under the hood. This may be fine for some scenarios, but for other cases you may need to have a sense of in which cases the solution fails, or which biases it has.

We believe that there's a place for an open-source initiative focusing on addressing the above points.

That's where Terran comes in.

Terran Face Detection

An example of Face Detection made with Terran

An open-source human perception library. We focus on providing open, auditable, and performant alternatives to closed approaches. Solutions that can run on your computer or device and that will adapt to the environment that it's in. But most importantly, code that you can test (without piling-up costs), understand and modify to your needs.

We strive for code that is easy to integrate and understand. While it is open-source, it is not specifically targeted for researchers. We want to make computer vision accessible to anyone who can code, and enable them to create innovative applications. To achieve this, Terran was built with these guiding principles in mind:

  • A self-explanatory API that tells you what you are doing.
  • No intricate configuration files.
  • Simplest I/O operations for any kind of videos and images.

The focus is on being a production-grade library, not a toy project. Only models whose performance can be assessed and with reasonable (and knowable) computational and memory requirements will be included.

Terran Pose Estimation

An example of Pose Estimation using Terran

Let's see a small example of what we can build with it.

We could do our own image album creator grouping photos of the same people. That's a use case that all cloud providers (such as Google Photos and Apple Photos) do. For the sake of keeping it simple and focusing on Terran, lets do the core of this problem: finding on which photos someone is present.

We will start with some imports:

import click

from pathlib import Path
from scipy.spatial.distance import cosine

from terran.face import extract_features, face_detection
from terran.io import open_image, resolve_images
from terran.vis import display_image, vis_faces

We will use click to create a small CLI tool (we love click — highly recommended). We will use pathlib to handle paths and a measure of distance from scipy. Finally, we import some functions from Terran's modules: from face, core functions to detect faces and extract features; from io, utility functions to resolve paths and open images; from vis, some visualization functions (extremely important and useful when working with images!).

  1. from terran.face, we import core functions to detect and extract features from faces
  2. from terran.io, utility functions to resolve paths and open images
  3. and finally, from terran.vis, we import some visualization functions that are extremely important and useful when working with images!

Now we will create a function that deals with the problem of finding the photos where a person is present. You'll see how easy it is with Terran!

@click.command(name='match-dir')
@click.argument('reference')
@click.argument('image-dir')
@click.option('--batch-size', type=int, default=1)
@click.option('--threshold', type=float, default=0.5)
@click.option('--display', is_flag=True, default=False)
def match_directory(reference, image_dir, batch_size, threshold, display):

    reference = open_image(reference)
    faces_in_reference = face_detection(reference)

    if len(faces_in_reference) != 1:
        click.echo('Reference image must have exactly one face.')
        return

    ref_feature = extract_features(reference, faces_in_reference[0])

Our function takes in two arguments: a reference image and a directory where we should scan the photos. Then, our first step is to get a set of features out of our reference image that represents the person we are looking for.

Then, we get the paths to all of the images that we will be scanning:

    paths = resolve_images(
        Path(image_dir).expanduser(),
        batch_size=batch_size
    )

This Terran function gives paths in batches, so you can process them in parallel (if needed).

So the only thing remaining to do is, for each image, open it, detect the faces, get the features, and compare it with our reference image. Easy enough with Terran!

    for batch_paths in paths:
        # Open them
        batch_images = list(map(open_image, batch_paths))

        # Detect the faces
        faces_per_image = face_detection(batch_images)

        # Get the features
        features_per_image = extract_features(batch_images, faces_per_image)

        for path, image, faces, features in zip(
            batch_paths, batch_images, faces_per_image, features_per_image
        ):
            for face, feature in zip(faces, features):
                # Compare each feature with our reference feature
                confidence = cosine(ref_feature, feature)

                # If it is a match print it / display it!
                if confidence < threshold:
                    click.echo(f'{path}, confidence = {confidence:.2f}')
                    if display:
                        display_image(vis_faces(image, face))

All the code together:

import click

from pathlib import Path
from scipy.spatial.distance import cosine

from terran.face import extract_features, face_detection
from terran.io import open_image, resolve_images
from terran.vis import display_image, vis_faces

@click.command(name='match-dir')
@click.argument('reference')
@click.argument('image-dir')
@click.option('--batch-size', type=int, default=1)
@click.option('--threshold', type=float, default=0.5)
@click.option('--display', is_flag=True, default=False)
def match_directory(reference, image_dir, batch_size, threshold, display):
    reference = open_image(reference)
    faces_in_reference = face_detection(reference)
    if len(faces_in_reference) != 1:
        click.echo('Reference image must have exactly one face.')
        return
    ref_feature = extract_features(reference, faces_in_reference[0])

    paths = resolve_images(
        Path(image_dir).expanduser(),
        batch_size=batch_size
    )
    for batch_paths in paths:
        batch_images = list(map(open_image, batch_paths))
        faces_per_image = face_detection(batch_images)
        features_per_image = extract_features(batch_images, faces_per_image)

        for path, image, faces, features in zip(
            batch_paths, batch_images, faces_per_image, features_per_image
        ):
            for face, feature in zip(faces, features):
                confidence = cosine(ref_feature, feature)
                if confidence < threshold:
                    click.echo(f'{path}, confidence = {confidence:.2f}')
                    if display:
                        display_image(vis_faces(image, face))

if __name__ == '__main__':
    match_directory()

With just a few lines of code, you can easily check in which images you are present! From your computer, not needing to upload them anywhere, and tweaking the parameters to fit your case (take a look at threshold). You can also take a look at this example in our Github, and to the rest of Terran and see how it works.

Right now, Terran has models for Face Detection, Face Recognition, and Pose Estimation. We think that several applications can be created using these features, like controlling your computer with body gestures or changing what gets displayed depending on the presence of a face or not. You can use Terran as an interface to interact to your computer with your body, among several other ideas that one could think of.

In the future, we plan to:

  • Add more models for these same problems, but that are more lightweight, in case you need it to run on a small device.
  • Add other models to tackle other problems, e.g. segmentation.
  • Add the functionality to train or fine-tune the existing models.

While this is our current roadmap, feel free to create an issue on github to tell us if you'd like us to work on anything in particular.

Please feel free to use it and tell us what you think! And of course, if you build something using it, share it with us!

Do you like our content?

Sign up for our newsletter to stay up to date.