November 11th, 2021

Introducing the new Azure Form Recognizer libraries **Beta**

Sameeksha Vaity
Software Engineer

This blog post highlights important changes and features in the new Azure Form Recognizer client libraries. You’re encouraged to try the libraries and provide feedback for consideration before the General Availability (GA) release.

Some of the changes and new features in this beta release include:

  • Introduction of DocumentAnalysisClient and DocumentModelAdministrationClient
  • Unification of the document analysis method to be used for prebuilt models and custom models
  • General document analysis (prebuilt-document)
  • Get/list models and operations
  • Cross-page elements & bounding regions
  • Build model

In this blog post, Java is the primary language used to showcase the new features and changes. For language-specific improvements and features, see Conclusion.

Introduction of DocumentAnalysisClient and DocumentModelAdministrationClient

This 4.0 Beta 1 version of the azure-ai-formrecognizer Java library replaces the FormRecognizerClient and FormTrainingClient with DocumentAnalysisClient and DocumentModelAdministrationClient, respectively. The new clients provide support for the features added by the service in API version 2021-09-30-preview and later.

Previously, instantiating FormRecognizerClient client with version 3.x.x:

FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Now, instantiating DocumentAnalysisClient client with version 4.x.x:

DocumentAnalysisClient documentAnalysisClient = new DocumentAnalysisClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Similarly, in 4.x.x, FormTrainingClient and FormTrainingAsyncClient were replaced with the DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient, instantiated via the DocumentModelAdministrationClientBuilder. The synchronous and asynchronous operations are separated to DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient.

Previously, instantiating FormTrainingClient client with version 3.x.x:

FormTrainingClient formTrainingClient = new FormTrainingClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Now, instantiating DocumentModelAdministrationClient client with version 4.x.x:

DocumentModelAdministrationClient documentModelAdminClient = new DocumentModelAdministrationClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Unification of the document analysis method

With 4.x.x, the following methods have been replaced with a unified method called beginAnalyzeDocument:

  • beginRecognizeBusinessCards
  • beginRecognizeContent
  • beginRecognizeCustomForms
  • beginRecognizeIdentityDocuments
  • beginRecognizeInvoices
  • beginRecognizeReceipts

The 4.x.x version combines analysis for layout analysis, prebuilt models, and custom models into a single operation. It accepts a string with the desired model ID for analysis. The model ID can be any of the prebuilt model IDs, layout model ID, or a custom model ID.

3.1.x 4.x.x Model ID Features
beginRecognizeBusinessCards / beginRecognizeBusinessCardsFromUrl beginAnalyzeDocument/beginAnalyzeDocumentFromUrl “prebuilt-businessCard” Text extraction and prebuilt fields, and values related to English business cards
beginRecognizeContent / beginRecognizeContentFromUrl beginAnalyzeDocument/beginAnalyzeDocumentFromUrl “prebuilt-layout” Text extraction, selection marks, tables
beginRecognizeCustomForms / beginRecognizeCustomFormsFromUrl beginAnalyzeDocument/beginAnalyzeDocumentFromUrl “{custom-model-id}” Text extraction, selection marks, tables, labeled fields, and values from your custom documents
beginRecognizeIdentityDocuments / beginRecognizeIdentityDocumentsFromUrl beginAnalyzeDocument/beginAnalyzeDocumentFromUrl “prebuilt-idDocument” Text extraction and prebuilt fields and values related to US driver licenses and international passports
beginRecognizeInvoices / beginRecognizeInvoicesFromUrl beginAnalyzeDocument/beginAnalyzeDocumentFromUrl “prebuilt-invoice” Text extraction, selection marks, tables, and prebuilt fields and values related to English invoices
beginRecognizeReceipts / beginRecognizeReceiptsFromUrl beginAnalyzeDocument/beginAnalyzeDocumentFromUrl “prebuilt-receipt” Text extraction and prebuilt fields and values related to English sales receipts

The unified method returns an AnalyzeResult model that improves the accessibility of the document elements (tables, words, styles) to the top level in contrast to the previously returned RecognizedForm.

The list of supported prebuilt model IDs can be found here.

General document analysis (prebuilt-document)

The 4.x.x version of the library:

  • No longer requires training to extract general key-value pairs.
  • Uses the prebuilt model `prebuilt-document to extracts entities, key-value pairs, and layout from a document.

This prebuilt-document model provides a similar functionality to unlabeled custom models from the previous library without the need to train a model.

Example of using prebuilt-document for extracting document data

String documentUrl = "{document-url}";
String modelId = "prebuilt-document";

SyncPoller<DocumentOperationResult, AnalyzeResult> analyzeDocumentPoller =
    documentAnalysisClient.beginAnalyzeDocumentFromUrl(modelId, documentUrl);

AnalyzeResult analyzeResult = analyzeDocumentPoller.getFinalResult();

// extracting page level information of the document 
analyzeResult.getPages().forEach(documentPage -> {
    System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
        documentPage.getWidth(),
        documentPage.getHeight(),
        documentPage.getUnit());

    // document element - lines accessible on page level
    documentPage.getLines().forEach(documentLine ->
        System.out.printf("Line %s is within a bounding box %s.%n",
            documentLine.getContent(),
            documentLine.getBoundingBox().toString()));

    // document element - words accessible on page level
    documentPage.getWords().forEach(documentWord ->
        System.out.printf("Word %s has a confidence score of %.2f%n.",
            documentWord.getContent(),
            documentWord.getConfidence()));
});

// tables found in the document
List<DocumentTable> tables = analyzeResult.getTables();
for (int i = 0; i < tables.size(); i++) {
    DocumentTable documentTable = tables.get(i);
    System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTable.getRowCount(),
        documentTable.getColumnCount());
    documentTable.getCells().forEach(documentTableCell -> {
        System.out.printf("Cell '%s', has row index %d and column index %d.%n",
            documentTableCell.getContent(),
            documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());
    });
    System.out.println();
}

// Entities analyzed from the document
analyzeResult.getEntities().forEach(documentEntity -> {
    System.out.printf("Entity category : %s, sub-category %s%n: ",
        documentEntity.getCategory(), documentEntity.getSubCategory());
    System.out.printf("Entity content: %s%n: ", documentEntity.getContent());
    System.out.printf("Entity confidence: %.2f%n", documentEntity.getConfidence());
});

// Key-value pairs extracted from the document
analyzeResult.getKeyValuePairs().forEach(documentKeyValuePair -> {
    System.out.printf("Key content: %s%n", documentKeyValuePair.getKey().getContent());
    System.out.printf("Key content bounding region: %s%n",
        documentKeyValuePair.getKey().getBoundingRegions().toString());

    if (documentKeyValuePair.getValue() != null) {
        System.out.printf("Value content: %s%n", documentKeyValuePair.getValue().getContent());
        System.out.printf("Value content bounding region: %s%n", documentKeyValuePair.getValue().getBoundingRegions().toString());        
    }
});

Get/list models and operations

With 4.x.x, the listModels operation returns a paged list of prebuilt and custom models. Also, when using the getModel method the users can get the field schema (field names and types that the model can extract) for the model they specified.

Furthermore, the getModel and listModels methods no longer return the models that didn’t succeed during model creation. These failed creation operations can only be retrieved using the getOperation and listOperations methods. However, these methods can only retrieve the data for an operation that has occurred in the past 24 hours.

Cross-page elements and bounding regions

The 4.x.x version of the Form Recognizer library provides an improved experience to define elements located on documents. It introduces the BoundingRegion model, which helps account for elements that can span multiple pages. Each bounding region is composed of the one-based page number and the bounding box coordinates within that page.

Build model

The beginBuildModel method of the 4.x.x library replaces the beginTraining method in the 3.1.x library. The beginBuildModel replaces the requirement of the beginTraining method for the required the useTrainingLabels parameter as we can use prebuilt-document model to extract general key-value pairs without training.

With the 4.x.x version of the library:

  • The newest Form Recognizer service APIs no longer require training to extract general key-value pairs and so have removed the useTrainingLabels parameter from beginBuildModel.
  • Users can now assign their own model IDs and specify a description when building, composing, or copying models.
3.1.x 4.x.x
beginTraining(String trainingFilesUrl, boolean useTrainingLabels, TrainingOptions trainingOptions) beginBuildModel(String trainingFilesUrl, String modelId, BuildModelOptions buildModelOptions)

Note: You can use the Form Recognizer Studio preview for creating a labeled file for your training forms.

Conclusion

The Form Recognizer libraries have enhanced analysis mechanisms and provided new features and capabilities.

For language-specific reference documentation, examples, and migration guides, see the following resources:

You’re encouraged to provide feedback before the library reaches GA. To report issues or send feedback to the Azure SDK engineering team, use the language-specific links below:

Azure SDK Blog Contributions

Thank you for reading this Azure SDK blog post! We hope that you learned something new and welcome you to share this post. We’re open to Azure SDK blog contributions. Contact us at azsdkblog@microsoft.com with your idea, and we’ll get you set up as a guest blogger.

Azure SDK Links

Author

Sameeksha Vaity
Software Engineer

I am a Software Engineer currently working at Microsoft on their Azure SDK Java team located on the Redmond campus. I love working for this team which is customer focused, driven by talent and share my interest of love for writing code and solving complex issues. In my spare time, you would mostly find me cooking experimental dishes and enjoying nature with some good music in the background!

0 comments

Discussion are closed.