Introducing the new Azure Form Recognizer libraries **Beta**

This blog post highlights important changes and features in the new Azure Form Recognizer client libraries. You’re encouraged to try the libraries and provide feedback for consideration before the General Availability (GA) release.

Some of the changes and new features in this beta release include:

Introduction of DocumentAnalysisClient and DocumentModelAdministrationClient
Unification of the document analysis method to be used for prebuilt models and custom models
General document analysis (prebuilt-document)
Get/list models and operations
Cross-page elements & bounding regions
Build model

In this blog post, Java is the primary language used to showcase the new features and changes. For language-specific improvements and features, see Conclusion.

Introduction of `DocumentAnalysisClient` and `DocumentModelAdministrationClient`

This 4.0 Beta 1 version of the azure-ai-formrecognizer Java library replaces the FormRecognizerClient and FormTrainingClient with DocumentAnalysisClient and DocumentModelAdministrationClient, respectively. The new clients provide support for the features added by the service in API version 2021-09-30-preview and later.

Previously, instantiating FormRecognizerClient client with version 3.x.x:

FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Now, instantiating DocumentAnalysisClient client with version 4.x.x:

DocumentAnalysisClient documentAnalysisClient = new DocumentAnalysisClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Similarly, in 4.x.x, FormTrainingClient and FormTrainingAsyncClient were replaced with the DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient, instantiated via the DocumentModelAdministrationClientBuilder. The synchronous and asynchronous operations are separated to DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient.

Previously, instantiating FormTrainingClient client with version 3.x.x:

FormTrainingClient formTrainingClient = new FormTrainingClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Now, instantiating DocumentModelAdministrationClient client with version 4.x.x:

DocumentModelAdministrationClient documentModelAdminClient = new DocumentModelAdministrationClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("{endpoint}")
    .buildClient();

Unification of the document analysis method

With 4.x.x, the following methods have been replaced with a unified method called beginAnalyzeDocument:

beginRecognizeBusinessCards
beginRecognizeContent
beginRecognizeCustomForms
beginRecognizeIdentityDocuments
beginRecognizeInvoices
beginRecognizeReceipts

The 4.x.x version combines analysis for layout analysis, prebuilt models, and custom models into a single operation. It accepts a string with the desired model ID for analysis. The model ID can be any of the prebuilt model IDs, layout model ID, or a custom model ID.

3.1.x	4.x.x	Model ID	Features
`beginRecognizeBusinessCards` / `beginRecognizeBusinessCardsFromUrl`	`beginAnalyzeDocument`/`beginAnalyzeDocumentFromUrl`	“prebuilt-businessCard”	Text extraction and prebuilt fields, and values related to English business cards
`beginRecognizeContent` / `beginRecognizeContentFromUrl`	`beginAnalyzeDocument`/`beginAnalyzeDocumentFromUrl`	“prebuilt-layout”	Text extraction, selection marks, tables
`beginRecognizeCustomForms` / `beginRecognizeCustomFormsFromUrl`	`beginAnalyzeDocument`/`beginAnalyzeDocumentFromUrl`	“{custom-model-id}”	Text extraction, selection marks, tables, labeled fields, and values from your custom documents
`beginRecognizeIdentityDocuments` / `beginRecognizeIdentityDocumentsFromUrl`	`beginAnalyzeDocument`/`beginAnalyzeDocumentFromUrl`	“prebuilt-idDocument”	Text extraction and prebuilt fields and values related to US driver licenses and international passports
`beginRecognizeInvoices` / `beginRecognizeInvoicesFromUrl`	`beginAnalyzeDocument`/`beginAnalyzeDocumentFromUrl`	“prebuilt-invoice”	Text extraction, selection marks, tables, and prebuilt fields and values related to English invoices
`beginRecognizeReceipts` / `beginRecognizeReceiptsFromUrl`	`beginAnalyzeDocument`/`beginAnalyzeDocumentFromUrl`	“prebuilt-receipt”	Text extraction and prebuilt fields and values related to English sales receipts

The unified method returns an AnalyzeResult model that improves the accessibility of the document elements (tables, words, styles) to the top level in contrast to the previously returned RecognizedForm.

The list of supported prebuilt model IDs can be found here.

General document analysis (`prebuilt-document`)

The 4.x.x version of the library:

No longer requires training to extract general key-value pairs.
Uses the prebuilt model `prebuilt-document to extracts entities, key-value pairs, and layout from a document.

This prebuilt-document model provides a similar functionality to unlabeled custom models from the previous library without the need to train a model.

Example of using `prebuilt-document` for extracting document data

String documentUrl = "{document-url}";
String modelId = "prebuilt-document";

SyncPoller<DocumentOperationResult, AnalyzeResult> analyzeDocumentPoller =
    documentAnalysisClient.beginAnalyzeDocumentFromUrl(modelId, documentUrl);

AnalyzeResult analyzeResult = analyzeDocumentPoller.getFinalResult();

// extracting page level information of the document 
analyzeResult.getPages().forEach(documentPage -> {
    System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
        documentPage.getWidth(),
        documentPage.getHeight(),
        documentPage.getUnit());

    // document element - lines accessible on page level
    documentPage.getLines().forEach(documentLine ->
        System.out.printf("Line %s is within a bounding box %s.%n",
            documentLine.getContent(),
            documentLine.getBoundingBox().toString()));

    // document element - words accessible on page level
    documentPage.getWords().forEach(documentWord ->
        System.out.printf("Word %s has a confidence score of %.2f%n.",
            documentWord.getContent(),
            documentWord.getConfidence()));
});

// tables found in the document
List<DocumentTable> tables = analyzeResult.getTables();
for (int i = 0; i < tables.size(); i++) {
    DocumentTable documentTable = tables.get(i);
    System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTable.getRowCount(),
        documentTable.getColumnCount());
    documentTable.getCells().forEach(documentTableCell -> {
        System.out.printf("Cell '%s', has row index %d and column index %d.%n",
            documentTableCell.getContent(),
            documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());
    });
    System.out.println();
}

// Entities analyzed from the document
analyzeResult.getEntities().forEach(documentEntity -> {
    System.out.printf("Entity category : %s, sub-category %s%n: ",
        documentEntity.getCategory(), documentEntity.getSubCategory());
    System.out.printf("Entity content: %s%n: ", documentEntity.getContent());
    System.out.printf("Entity confidence: %.2f%n", documentEntity.getConfidence());
});

// Key-value pairs extracted from the document
analyzeResult.getKeyValuePairs().forEach(documentKeyValuePair -> {
    System.out.printf("Key content: %s%n", documentKeyValuePair.getKey().getContent());
    System.out.printf("Key content bounding region: %s%n",
        documentKeyValuePair.getKey().getBoundingRegions().toString());

    if (documentKeyValuePair.getValue() != null) {
        System.out.printf("Value content: %s%n", documentKeyValuePair.getValue().getContent());
        System.out.printf("Value content bounding region: %s%n", documentKeyValuePair.getValue().getBoundingRegions().toString());        
    }
});

Get/list models and operations

With 4.x.x, the listModels operation returns a paged list of prebuilt and custom models. Also, when using the getModel method the users can get the field schema (field names and types that the model can extract) for the model they specified.

Furthermore, the getModel and listModels methods no longer return the models that didn’t succeed during model creation. These failed creation operations can only be retrieved using the getOperation and listOperations methods. However, these methods can only retrieve the data for an operation that has occurred in the past 24 hours.

Cross-page elements and bounding regions

The 4.x.x version of the Form Recognizer library provides an improved experience to define elements located on documents. It introduces the BoundingRegion model, which helps account for elements that can span multiple pages. Each bounding region is composed of the one-based page number and the bounding box coordinates within that page.

Build model

The beginBuildModel method of the 4.x.x library replaces the beginTraining method in the 3.1.x library. The beginBuildModel replaces the requirement of the beginTraining method for the required the useTrainingLabels parameter as we can use prebuilt-document model to extract general key-value pairs without training.

With the 4.x.x version of the library:

The newest Form Recognizer service APIs no longer require training to extract general key-value pairs and so have removed the useTrainingLabels parameter from beginBuildModel.
Users can now assign their own model IDs and specify a description when building, composing, or copying models.

3.1.x	4.x.x
`beginTraining(String trainingFilesUrl, boolean useTrainingLabels, TrainingOptions trainingOptions)`	`beginBuildModel(String trainingFilesUrl, String modelId, BuildModelOptions buildModelOptions)`

Note: You can use the Form Recognizer Studio preview for creating a labeled file for your training forms.

Conclusion

The Form Recognizer libraries have enhanced analysis mechanisms and provided new features and capabilities.

For language-specific reference documentation, examples, and migration guides, see the following resources:

.NET: Document Reference | README | Samples | Migration Guide
Java: Document Reference | README | Samples | Migration Guide
JavaScript/TypeScript: Document Reference | README | Samples | Migration Guide
Python: Document Reference | README | Samples | Migration Guide

You’re encouraged to provide feedback before the library reaches GA. To report issues or send feedback to the Azure SDK engineering team, use the language-specific links below:

Azure SDK Releases

Azure SDK Blog Contributions

Thank you for reading this Azure SDK blog post! We hope that you learned something new and welcome you to share this post. We’re open to Azure SDK blog contributions. Contact us at azsdkblog@microsoft.com with your idea, and we’ll get you set up as a guest blogger.

Azure SDK Links

Azure SDK Website: aka.ms/azsdk
Azure SDK Intro (3-minute video): aka.ms/azsdk/intro
Azure SDK Intro Deck (PowerPoint deck): aka.ms/azsdk/intro/deck
Azure SDK Releases: aka.ms/azsdk/releases
Azure SDK Blog: aka.ms/azsdk/blog
Azure SDK Twitter: twitter.com/AzureSDK
Azure SDK Design Guidelines: aka.ms/azsdk/guide
Azure SDKs & Tools: azure.microsoft.com/downloads
Azure SDK Central Repository: github.com/azure/azure-sdk
Azure SDK for .NET: github.com/azure/azure-sdk-for-net
Azure SDK for Java: github.com/azure/azure-sdk-for-java
Azure SDK for Python: github.com/azure/azure-sdk-for-python
Azure SDK for JavaScript/TypeScript: github.com/azure/azure-sdk-for-js
Azure SDK for Android: github.com/Azure/azure-sdk-for-android
Azure SDK for iOS: github.com/Azure/azure-sdk-for-ios
Azure SDK for Go: github.com/Azure/azure-sdk-for-go
Azure SDK for C: github.com/Azure/azure-sdk-for-c
Azure SDK for C++: github.com/Azure/azure-sdk-for-cpp

Introducing the new Azure Form Recognizer libraries Beta

Introduction of `DocumentAnalysisClient` and `DocumentModelAdministrationClient`

Unification of the document analysis method

General document analysis (`prebuilt-document`)

Example of using `prebuilt-document` for extracting document data

Get/list models and operations

Cross-page elements and bounding regions

Build model

Conclusion

Azure SDK Blog Contributions

Azure SDK Links

Author

0 comments

Read next

Azure SDK Release (November 2021)

Tuning your uploads and downloads with the Azure Storage client library for .NET

Introduction of DocumentAnalysisClient and DocumentModelAdministrationClient

Unification of the document analysis method

General document analysis (prebuilt-document)

Example of using prebuilt-document for extracting document data

Get/list models and operations

Cross-page elements and bounding regions

Build model

Conclusion

Azure SDK Blog Contributions

Azure SDK Links

Author

0 comments

Read next

Azure SDK Release (November 2021)

Tuning your uploads and downloads with the Azure Storage client library for .NET

Stay informed

Introduction of `DocumentAnalysisClient` and `DocumentModelAdministrationClient`

General document analysis (`prebuilt-document`)

Example of using `prebuilt-document` for extracting document data