Building an Image Classification Pipeline Using Serverless Architecture

Jason Fox

Background

The International Committee of the Red Cross (ICRC) Trace-the-Face program uses photos to match up missing family members who have been separated due to migration and conflict. We have been working with the ICRC to update and extend Trace-the-Face to perform automatic face detection and matching using machine learning, via Microsoft Cognitive Services Face API.

As we discussed how Trace-the-Face could use computer vision technologies, the ICRC grew more and more intrigued and developed a long list of their challenges that could potentially be solved by the use of computer vision technologies. Through a series of workshops and discussions, we designed a centralized image classification pipeline that could be integrated across the organization to build up a rich, tagged image set over time – including faces found in any images from any source – with the intent of addressing many of their computer vision related challenges.

The Solution

In February, ICRC and Microsoft started a collaboration on Imaginem, an image classification pipeline built on top of Azure Functions. If you’re not familiar with Azure Functions, it is a framework for building serverless microservices that can easily be deployed to the cloud. Azure Functions is built on top of Azure App Service, which means it’s easily manageable and has built-in scaling. Visual Studio supports developing Azure Functions, but you can easily use your favorite text editor to write either C# or JavaScript files.

The choice to go serverless gave us the flexibility to add or remove functions as the pipeline grows. The initial focus was to do general image classification and face detection and matching.  We have additional plans to incorporate custom image classification and object detection built with ML frameworks like CNTK and TensorFlow.

Our architecture uses message queuing to move images along the pipeline. Each message contains the information needed to move the image to the next step,  including a link to the image blob and the collected properties about the image classification.

The Code

Each step in the pipeline is an Azure Function; building the pipeline involved creating each Azure Function, and then adding its input queue name into the pipeline definition.

Initially, you can set these steps pre-deployment in the ARM deployment parameters.

    "pipelineDefinition": {
      "value": "generalclassification,ocr,facedetection,facecrop,faceprint,facematch,pipelineoutput"
    },

If you set the steps post-deployment, you can simply edit them in your Function App’s environment variables.

Below is a simple function that breaks down a message, retrieves the image from Azure Blob Storage and then sends it into Microsoft Cognitive Services Computer Vision API for image tagging. This example is written in C#, but you could also write it using JavaScript.

#load "..\Common\FunctionHelper.csx"
#load "..\Common\ComputerVisionFunctions.csx"

using System.Net.Http.Headers;
using System.Text;
using System.Net.Http;
using System.Web;
using System.Runtime;
using Newtonsoft.Json;

private const string ClassifierName = "generalclassification";

public static void Run(string inputMsg, TraceWriter log)
{
    log.Info(inputMsg);
    PipelineHelper.Process(GeneralClassificationFunction, ClassifierName, inputMsg, log);
}

public static dynamic GeneralClassificationFunction(dynamic inputJson, string imageUrl, TraceWriter log)
{
    var parameters = inputJson.job_definition.image_parameters;
    var response = ComputerVisionFunctions.AnalyzeImageAsync(imageUrl, log).Result;
    return JsonConvert.DeserializeObject(response);
}

The code includes calls to helper methods that allow the function to take part in the pipeline in a well-defined way. They ensure adherence to the message contract and then advance the message to the next defined queue. These methods are defined in the GitHub repo, if you want to see what they do in the context of the pipeline.

The first parameter in the function’s Run method is defined as string inputMsg and is tied to an Azure Function, QueueTrigger. The framework will watch the Azure Storage Queue that is defined in the function.json file for new messages and trigger the function when a new message is found.

Here is a sample of a function definition for a QueueTrigger from the documentation:

{
    "type": "queueTrigger",
    "direction": "in",
    "name": "<The name used to identify the trigger data in your code>",
    "queueName": "<Name of queue to poll>",
    "connection":"<Name of app setting - see below>"
}

To add a custom step to the pipeline:

  1. Create a new Azure Storage Queue in the attached Storage Account for your Function App
  2. Add a new function to the Function App with a QueueTrigger
  3. Set the name  field to “inputMsg” to match our Run method
  4. Set the queueName  field to the name of the input queue that you created in step 1
  5. Set the connection  field to the Storage Account name defined in appsettings.json

Keep in mind that you can perform any action on an image in this step of the pipeline. For example, we’ve had partners use a step to combine multiple images into a grid for submission to the Cognitive Services Face API in order to save on API calls.

Conclusion

Through our collaboration with ICRC, we created a nice, tidy reusable framework for processing images using message queues. It detects faces, searches for similar or matching faces, identifies objects, and extracts text characters from images. The results can be easily extracted from the JSON object or retrieved from the SQL database.

Imaginem could be reused in many scenarios that require facial recognition and matching.  We welcome your feedback and PRs!

Note: This code was built using preview tools for Azure Functions in Visual Studio 2015. The Azure Functions team recommends using Visual Studio 2017 (read more).

Resources

0 comments

Discussion is closed.

Feedback usabilla icon