This blog post highlights important changes and features in the new Azure Form Recognizer client libraries. You’re encouraged to try the libraries and provide feedback for consideration before the General Availability (GA) release.
Some of the changes and new features in this beta release include:
- Introduction of
DocumentAnalysisClient
andDocumentModelAdministrationClient
- Unification of the document analysis method to be used for prebuilt models and custom models
- General document analysis (
prebuilt-document
) - Get/list models and operations
- Cross-page elements & bounding regions
- Build model
In this blog post, Java is the primary language used to showcase the new features and changes. For language-specific improvements and features, see Conclusion.
Introduction of DocumentAnalysisClient
and DocumentModelAdministrationClient
This 4.0 Beta 1 version of the azure-ai-formrecognizer
Java library replaces the FormRecognizerClient
and FormTrainingClient
with DocumentAnalysisClient
and DocumentModelAdministrationClient
, respectively. The new clients provide support for the features added by the service in API version 2021-09-30-preview
and later.
Previously, instantiating FormRecognizerClient
client with version 3.x.x:
FormRecognizerClient formRecognizerClient = new FormRecognizerClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
Now, instantiating DocumentAnalysisClient
client with version 4.x.x:
DocumentAnalysisClient documentAnalysisClient = new DocumentAnalysisClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
Similarly, in 4.x.x, FormTrainingClient
and FormTrainingAsyncClient
were replaced with the DocumentModelAdministrationClient
and DocumentModelAdministrationAsyncClient
, instantiated via the DocumentModelAdministrationClientBuilder. The synchronous and asynchronous operations are separated to DocumentModelAdministrationClient and DocumentModelAdministrationAsyncClient.
Previously, instantiating FormTrainingClient
client with version 3.x.x:
FormTrainingClient formTrainingClient = new FormTrainingClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
Now, instantiating DocumentModelAdministrationClient
client with version 4.x.x:
DocumentModelAdministrationClient documentModelAdminClient = new DocumentModelAdministrationClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("{endpoint}")
.buildClient();
Unification of the document analysis method
With 4.x.x, the following methods have been replaced with a unified method called beginAnalyzeDocument
:
beginRecognizeBusinessCards
beginRecognizeContent
beginRecognizeCustomForms
beginRecognizeIdentityDocuments
beginRecognizeInvoices
beginRecognizeReceipts
The 4.x.x version combines analysis for layout analysis, prebuilt models, and custom models into a single operation. It accepts a string with the desired model ID for analysis. The model ID can be any of the prebuilt model IDs, layout model ID, or a custom model ID.
3.1.x | 4.x.x | Model ID | Features |
---|---|---|---|
beginRecognizeBusinessCards / beginRecognizeBusinessCardsFromUrl |
beginAnalyzeDocument /beginAnalyzeDocumentFromUrl |
“prebuilt-businessCard” | Text extraction and prebuilt fields, and values related to English business cards |
beginRecognizeContent / beginRecognizeContentFromUrl |
beginAnalyzeDocument /beginAnalyzeDocumentFromUrl |
“prebuilt-layout” | Text extraction, selection marks, tables |
beginRecognizeCustomForms / beginRecognizeCustomFormsFromUrl |
beginAnalyzeDocument /beginAnalyzeDocumentFromUrl |
“{custom-model-id}” | Text extraction, selection marks, tables, labeled fields, and values from your custom documents |
beginRecognizeIdentityDocuments / beginRecognizeIdentityDocumentsFromUrl |
beginAnalyzeDocument /beginAnalyzeDocumentFromUrl |
“prebuilt-idDocument” | Text extraction and prebuilt fields and values related to US driver licenses and international passports |
beginRecognizeInvoices / beginRecognizeInvoicesFromUrl |
beginAnalyzeDocument /beginAnalyzeDocumentFromUrl |
“prebuilt-invoice” | Text extraction, selection marks, tables, and prebuilt fields and values related to English invoices |
beginRecognizeReceipts / beginRecognizeReceiptsFromUrl |
beginAnalyzeDocument /beginAnalyzeDocumentFromUrl |
“prebuilt-receipt” | Text extraction and prebuilt fields and values related to English sales receipts |
The unified method returns an AnalyzeResult
model that improves the accessibility of the document elements (tables, words, styles) to the top level in contrast to the previously returned RecognizedForm
.
The list of supported prebuilt model IDs can be found here.
General document analysis (prebuilt-document
)
The 4.x.x version of the library:
- No longer requires training to extract general key-value pairs.
- Uses the prebuilt model `prebuilt-document to extracts entities, key-value pairs, and layout from a document.
This prebuilt-document
model provides a similar functionality to unlabeled custom models from the previous library without the need to train a model.
Example of using prebuilt-document
for extracting document data
String documentUrl = "{document-url}";
String modelId = "prebuilt-document";
SyncPoller<DocumentOperationResult, AnalyzeResult> analyzeDocumentPoller =
documentAnalysisClient.beginAnalyzeDocumentFromUrl(modelId, documentUrl);
AnalyzeResult analyzeResult = analyzeDocumentPoller.getFinalResult();
// extracting page level information of the document
analyzeResult.getPages().forEach(documentPage -> {
System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
documentPage.getWidth(),
documentPage.getHeight(),
documentPage.getUnit());
// document element - lines accessible on page level
documentPage.getLines().forEach(documentLine ->
System.out.printf("Line %s is within a bounding box %s.%n",
documentLine.getContent(),
documentLine.getBoundingBox().toString()));
// document element - words accessible on page level
documentPage.getWords().forEach(documentWord ->
System.out.printf("Word %s has a confidence score of %.2f%n.",
documentWord.getContent(),
documentWord.getConfidence()));
});
// tables found in the document
List<DocumentTable> tables = analyzeResult.getTables();
for (int i = 0; i < tables.size(); i++) {
DocumentTable documentTable = tables.get(i);
System.out.printf("Table %d has %d rows and %d columns.%n", i, documentTable.getRowCount(),
documentTable.getColumnCount());
documentTable.getCells().forEach(documentTableCell -> {
System.out.printf("Cell '%s', has row index %d and column index %d.%n",
documentTableCell.getContent(),
documentTableCell.getRowIndex(), documentTableCell.getColumnIndex());
});
System.out.println();
}
// Entities analyzed from the document
analyzeResult.getEntities().forEach(documentEntity -> {
System.out.printf("Entity category : %s, sub-category %s%n: ",
documentEntity.getCategory(), documentEntity.getSubCategory());
System.out.printf("Entity content: %s%n: ", documentEntity.getContent());
System.out.printf("Entity confidence: %.2f%n", documentEntity.getConfidence());
});
// Key-value pairs extracted from the document
analyzeResult.getKeyValuePairs().forEach(documentKeyValuePair -> {
System.out.printf("Key content: %s%n", documentKeyValuePair.getKey().getContent());
System.out.printf("Key content bounding region: %s%n",
documentKeyValuePair.getKey().getBoundingRegions().toString());
if (documentKeyValuePair.getValue() != null) {
System.out.printf("Value content: %s%n", documentKeyValuePair.getValue().getContent());
System.out.printf("Value content bounding region: %s%n", documentKeyValuePair.getValue().getBoundingRegions().toString());
}
});
Get/list models and operations
With 4.x.x, the listModels
operation returns a paged list of prebuilt and custom models. Also, when using the getModel
method the users can get the field schema (field names and types that the model can extract) for the model they specified.
Furthermore, the getModel
and listModels
methods no longer return the models that didn’t succeed during model creation. These failed creation operations can only be retrieved using the getOperation
and listOperations
methods. However, these methods can only retrieve the data for an operation that has occurred in the past 24 hours.
Cross-page elements and bounding regions
The 4.x.x version of the Form Recognizer library provides an improved experience to define elements located on documents. It introduces the BoundingRegion
model, which helps account for elements that can span multiple pages. Each bounding region is composed of the one-based page number and the bounding box coordinates within that page.
Build model
The beginBuildModel
method of the 4.x.x library replaces the beginTraining
method in the 3.1.x library. The beginBuildModel
replaces the requirement of the beginTraining
method for the required the useTrainingLabels
parameter as we can use prebuilt-document
model to extract general key-value pairs without training.
With the 4.x.x version of the library:
- The newest Form Recognizer service APIs no longer require training to extract general key-value pairs and so have removed the
useTrainingLabels
parameter frombeginBuildModel
. - Users can now assign their own model IDs and specify a description when building, composing, or copying models.
3.1.x | 4.x.x |
---|---|
beginTraining(String trainingFilesUrl, boolean useTrainingLabels, TrainingOptions trainingOptions) |
beginBuildModel(String trainingFilesUrl, String modelId, BuildModelOptions buildModelOptions) |
Note: You can use the Form Recognizer Studio preview for creating a labeled file for your training forms.
Conclusion
The Form Recognizer libraries have enhanced analysis mechanisms and provided new features and capabilities.
For language-specific reference documentation, examples, and migration guides, see the following resources:
- .NET: Document Reference | README | Samples | Migration Guide
- Java: Document Reference | README | Samples | Migration Guide
- JavaScript/TypeScript: Document Reference | README | Samples | Migration Guide
- Python: Document Reference | README | Samples | Migration Guide
You’re encouraged to provide feedback before the library reaches GA. To report issues or send feedback to the Azure SDK engineering team, use the language-specific links below:
Azure SDK Blog Contributions
Thank you for reading this Azure SDK blog post! We hope that you learned something new and welcome you to share this post. We’re open to Azure SDK blog contributions. Contact us at azsdkblog@microsoft.com with your idea, and we’ll get you set up as a guest blogger.
Azure SDK Links
- Azure SDK Website: aka.ms/azsdk
- Azure SDK Intro (3-minute video): aka.ms/azsdk/intro
- Azure SDK Intro Deck (PowerPoint deck): aka.ms/azsdk/intro/deck
- Azure SDK Releases: aka.ms/azsdk/releases
- Azure SDK Blog: aka.ms/azsdk/blog
- Azure SDK Twitter: twitter.com/AzureSDK
- Azure SDK Design Guidelines: aka.ms/azsdk/guide
- Azure SDKs & Tools: azure.microsoft.com/downloads
- Azure SDK Central Repository: github.com/azure/azure-sdk
- Azure SDK for .NET: github.com/azure/azure-sdk-for-net
- Azure SDK for Java: github.com/azure/azure-sdk-for-java
- Azure SDK for Python: github.com/azure/azure-sdk-for-python
- Azure SDK for JavaScript/TypeScript: github.com/azure/azure-sdk-for-js
- Azure SDK for Android: github.com/Azure/azure-sdk-for-android
- Azure SDK for iOS: github.com/Azure/azure-sdk-for-ios
- Azure SDK for Go: github.com/Azure/azure-sdk-for-go
- Azure SDK for C: github.com/Azure/azure-sdk-for-c
- Azure SDK for C++: github.com/Azure/azure-sdk-for-cpp
0 comments