Accelerate web forms using Azure Form Recognizer client library

dinoesposito

This post was written by guest blogger and Developer Technologies MVP Dino Esposito.

One way in which artificial intelligence (AI) provides concrete help is in the automation of repetitive chores. More generally, AI helps reduce the number of steps required to accomplish a task. Put another way, one of the most relevant aids we can expect from AI in everyday operations is the simplification of workflows. Human operators can achieve the same results with less effort and can avoid error-prone actions.

In this article, I aim at illustrating just one of these scenarios: completing the passport-specific fields of a registration form in some B2C portal. In similar situations, it’s fairly common today that the backend service requires both:

  • The actual photo of the passport—typically the first page with picture and details.
  • Individual items of textual information, such as first and last name, birth date, country of residence, expiry date, and number.

Users are often presented with a webpage containing a file upload component and text fields to enter data. The users are expected to manually complete both things. Why not upload the file on its own?

To achieve automatic uploads, there’s no need for some dedicated OCR service you host on your end. The Azure Form Recognizer client library makes this scenario possible.

The Form Recognizer Cognitive Service

Azure Form Recognizer uses pre-trained machine learning models to extract several types of information from provided documents. Depending upon the class of the provided document, returned information can be in the form of key-value pairs, tables, or plain text. Any returned information is packaged in a JSON container. You need to know what API to call, therefore you must be aware of which document type you’re submitting. If you don’t do so, you might still be able to have the document processed but will likely get a low-quality result. It’s on the roadmap, though, to support document classification. In the future, you can send the document without any classification beforehand. The Azure model will figure out the type, whether a passport or an invoice, and return the appropriate fields. The returned information is often structured enough to be immediately usable as is. In general, Form Recognizer is a powerful tool to automate document data processing in software applications. Usage of the service isn’t limited to .NET applications.

To start using Form Recognizer from within an application, you first need to create a specific Azure resource. From the Azure home page, you select the Form Recognizer type and confirm to create it. See Figure 1.

Creating a Form Recognizer resource in the Azure portal

Figure 1: Creating a Form Recognizer resource in Azure

Next, you select an existing subscription, name the new form recognizer and indicate the Azure resource group that will contain it. You also need to specify the Azure region of choice and opt for a pricing tier. A free plan is possible too. The chosen name will determine the actual URL you’ll be using from within any client application. From the portal, you also download the API key necessary for each and every call.

Set up a sample ASP.NET Core client application

For testing the capabilities of the Form Recognizer service, we need to have a host application. For example, an ASP.NET Core application. In this article, I’ll use the starter template you can freely download from https://ybq-dev.azurewebsites.net. At the end of the day, it’s a plain ASP.NET Core MVC application with one key controller action and one crucial Razor view. The user interface of the sample page I’ll discuss is shown in Figure 2.

The sample passport upload page

Figure 2: The sample passport upload page

As you can see, the sample page contains a file upload section and a classic array of form text input fields to collect name, country, passport number and relevant dates. The idea is that the user selects the upload button, selects a photo from the local computer, and uploads it to the action ASP.NET controller. As you can see in Figure 3, the photo has been selected but the other text fields are still empty and will be filled on return of the postback action. The postback action takes place via AJAX.

Ready to upload a passport photo

Figure 3: Ready to upload a passport photo

The host page has the following markup layout:

<form id="passport-form" asp-antiforgery="true" method="post">
    <hidden id="signalrId" />
    <div class="row">
        <div class="col-12 col-md-4 text-center">
            <fileupload id="passport"
                        accept=".jpeg,.jpg,.png"
                        placeholder="<i class='fal fa-passport'></i>" />
            <div class="mt-3">
                <button disabled id="passport-trigger" 
                        type="button" 
                        class="btn btn-primary">UPLOAD</button>
            </div>
        </div>   
        <div>
            <!-- Form fields -->
        </div>     
    </div>
</form>

The hidden and fileupload custom tags are shortcuts for the markup and JavaScript necessary to model a canonical hidden field and a file input tag that’s smoother to use for the end user. Both tags are coded as ASP.NET Core Tag Helpers, fully integrated in the starter template used for the demo. The hidden field is related to the setup of the SignalR connection that will monitor the server-side interaction with the Azure Form Recognizer resource.

The following code represents the ASP.NET Core controller action method that receives the HTTP POST from the HTML form:

public class TaskController : Controller
{
    private readonly IHubContext<DefaultMonitorHub> _srContext;
    private readonly TaskService _task;

    public TaskController(IHubContext<DefaultMonitorHub> srContext) 
    {
        _srContext = srContext;
        _task = new TaskService();
    }

    [HttpPost]
    [ActionName("passport")]
    public async Task<IActionResult> ProcessPassportUpload(
        IFormFile passport,
        string signalrId)
    {
        // Set up SignalR connection to pass over to the method that
        // orchestrates the interaction with the recognizer 
        var signalR = new ConnectionDescriptor<DefaultMonitorHub>(
            signalrId,
            _srContext);

        // Invoke the Form Recognizer client to process the photo
        // ...
    }
}

The file uploaded from the client reaches the controller action method wrapped up in an IFormFile object. From here, you open a file stream and trigger the form recognizer. The Form Recognizer client will upload the photo to the Azure service, have it processed, and return the extracted data.

The Form Recognizer client

To connect to the previously created Form Recognizer resource from the host ASP.NET Core application, install the Azure.AI.FormRecognizer NuGet package. The connection point between your client application and the Azure service is the FormRecognizerClient class defined in the package. This class lets you arrange a passport analysis in just a few intelligible steps.

Parsing the passport image, therefore, has more of a workflow than of a single operation. Hence, it’s a good idea to isolate all the steps in an application service method invoked from the controller action method. The _task member is defined in the controller’s constructor.

// Invoke the Form Recognizer client to process the photo
var (mrz, confidence) = await _task
    .ParsePassportFile(passport.OpenReadStream(), signalR);

The ParsePassportFile method opens the stream out of the uploaded file and starts working with the classes in the Form Recognizer library. A SignalR Hub object is also passed to send updates to the browser while the operations take place.

// SignalR notification
await signalR.Hub.Clients.Client(signalR.ConnectionId)
    .SendAsync("updateStatus", "Uploading file to Azure cloud...");

The client object requires the URL of the dedicated Form Recognizer resource and the related API key.

// xxx is the display name of the Azure resource
const string Endpoint = "xxx.cognitiveservices.azure.com";
const string ApiKey = "...";
var client = new FormRecognizerClient(
    new Uri(Endpoint), 
    new AzureKeyCredential(ApiKey));

Once you have the client object, method StartRecognizeIdentityDocumentAsync takes the stream to the uploaded photo and returns a temporary object of type RecognizeIdentityDocumentsOperation. On this object, you call the method WaitForCompletionAsync, which covers the server phase that interacts with the trained model.

var operation = await client.StartRecognizeIdentityDocumentsAsync(stream);
var response = await operation.WaitForCompletionAsync();

The response you get is a collection of recognized forms. In general, the uploaded photo can contain multiple smaller images that can be recognized as known types of forms.

RecognizedFormCollection identityDocuments = response.Value;
if (identityDocuments.Count == 0)
    return (null, 0);

RecognizedForm identityDocument = identityDocuments.FirstOrDefault();
if (identityDocument == null)
    return (null, 0);

In this example, I focus on the first returned document. In a realistic scenario, you might want to apply any selection logic that fits. For example, processing all returned documents or only documents with a given number of pages or of a certain type. Here’s a list of properties you can query on the RecognizedForm class.

Property Description
Fields List of information items recognized in the document, such as the MRZ area in a passport
FormType Type of the document. For example, Invoice or Passport.
FormTypeConfidence Value between 0 and 1 to denote the system’s confidence in the response
ModelId Further name (if any) of the recognized document
PageRange Pages of the document

The actual content recognized in the scanned document is returned as a collection of fields. The next step, therefore, is just getting ahold of the list of fields for the recognized type of document and extract information. Here’s the information captured when a valid passport first page photo is uploaded.

Information captured when a valid passport first page photo is uploaded

The collection of fields contains just one element, the document has only one page, and the type is passport. The list of possible fields is outlined in the service documentation at Form Recognizer ID document model.

For a passport, the sole field returned is MachineReadableZone (MRZ) which corresponds to the two (or maybe three) lines of text at the bottom of the page interspersed with several angle bracket characters. The following code returns the passport MRZ as a single continuous string with a blank in lieu of line breaks.

if (!identityDocument.Fields.TryGetValue("MachineReadableZone", out FormField mrzField))
    return (null, 0);
var mrz = mrzField.ValueData.Text;

As mentioned, the MRZ string you obtain in this way contains some blanks. With a call to String.Replace, you remove them and obtain a continuous string as if the passport MRZ rows were just one.

mrz = mrz.Replace(" ", "");

The final step consists in parsing the MRZ record to extract first and last name, date of birth, passport number, gender, expiry, and issuing country.

Parse the MRZ record

The MRZ record follows an international format made of plain information laid out at specific lengths and many check digits. The sample application provides an MRZ parser class articulated in two steps. First, it extracts strings following the known schema and computes and compares check digits. Second, it cleans the data and, for example, it converts date strings in plain DateTime .NET objects. As a result, one gets a PassportData C# data structure that contains all relevant passport information. The data structure, along with the value of model confidence, is serialized to JSON and returned to the caller. The following snippet shows the final lines of code in the controller action method.

In version 2.1 of the client library, you also find some automatic parsing of MRZ. In version 3.0, instead, you find more fields, improved model quality, and automatic recognition of document type.

var (mrz, confidence) = await _task
    .ParsePassportFile(passport.OpenReadStream(), signalR);
return mrz == null 
    ? Json(new {passport = new PassportData(false), confidence = confidence})
    : Json(new {passport = mrz.Data, confidence = confidence});

To finish, let’s see how the downloaded information makes its way back to the origin HTML page.

Update the requesting webpage

The call to Form Recognizer starts from an HTML button. There, it returns JSON data ready to be incorporated in the live page. Here’s an example of the JavaScript code responsible for parsing the JSON response:

function (data) {
    var response = JSON.parse(data);
    if (response.passport.valid) {
        console.log(response.passport.firstName);
        console.log(response.passport.lastName);
        console.log(response.passport.issuerCountryCode);
        console.log(response.passport.dateOfExpiry.substring(0, 10));
        console.log(response.passport.dateOfBirth.substring(0, 10));
        console.log(response.passport.number);
    }
}

The sample page after Form Recognizer

Figure 5: The sample page after the form recognizer

In summary, the simple upload of a passport photo triggered a call to the Azure Form Recognizer service and returned a parsed passport class to fill out remaining input fields. This feature saves the end user from some mundane data entry. It doesn’t cost much to arrange. Furthermore, it smooths the user experience in web forms that require uploading passport information, or form data in general.

What’s new in library version 3.0

The code shown above refers to version 2.1 of the library. In the 3.0 version, new classes were added. And as far as ID documents are concerned, the work of parsing the MRZ string is done automatically by the API. Here’s an example that gets the last name of the passport holder directly. To use this new API, you just need to upgrade the related NuGet package.

// Using version 3.0
var client = new DocumentAnalysisClient(
    new Uri(Endpoint),
    new AzureKeyCredential(ApiKey));
var operation = 
    await client.StartAnalyzeDocumentAsync("prebuilt-idDocument", stream);
await operation.WaitForCompletionAsync();
AnalyzeResult result = operation.Value;
AnalyzedDocument doc = result.Documents.FirstOrDefault();

if (doc == null)
    return (null, 0);

// Extract MRZ info
if (!doc.Fields.TryGetValue("MachineReadableZone", out DocumentField mrzField))
    return (null, 0);

// Extract last name
var mrzParts = mrzField.AsDictionary();
var ln = mrzParts["LastName"].Content;

The Form Recognizer service is continually updated. Information about the features of the latest version is available at https://docs.microsoft.com/azure/applied-ai-services/form-recognizer/whats-new?tabs=csharp.

More about the Form Recognizer service

The Form Recognizer service’s free tier:

  • Works with images up to 4 MB—no smaller than 50×50 pixels and no larger than 10000×10000.
  • Allows up to 500 pages per month at a maximum rate of 20 calls-per-minute.

The pre-trained model works sufficiently for standard passports without requiring further training. Finally, in addition to the REST API, the Form Recognizer service is also available through a Docker container to run the AI engine in your environment.

0 comments

Discussion is closed.

Feedback usabilla icon