We’re pleased to announce a stable release of the Azure Form Recognizer (now known as Document Intelligence) libraries for .NET, Python, Java, and JavaScript/TypeScript.
Highlighted features
The new, stable release of the Form Recognizer client libraries targets the 2023-07-31
service version and includes many new features and quality improvements. For a complete list of what’s new, see What’s new in Form Recognizer?.
This blog post highlights the following features:
- Build a custom classification model for document splitting and classification
- Add-on recognition capabilities
- New prebuilt models support
Library availability
The new Form Recognizer libraries can be downloaded from each language’s preferred package manager.
Language | Package | Command | Project | Get started |
---|---|---|---|---|
.NET | NuGet | dotnet add package Azure.AI.FormRecognizer |
link | link |
Python | PyPI | pip install azure-ai-formrecognizer |
link | link |
Java | Maven | Add to POM.xml file | link | link |
JavaScript/TypeScript | npm | npm install @azure/ai-form-recognizer |
link | link |
Document classification
One of the most significant improvements in this version of the Form Recognizer library is the support to build a custom classification model for document splitting and classification. With a custom-built classification model, users can now analyze a single- or multi-file document to identify if any of the trained document types are contained within an input file.
The following samples illustrate how a single custom classification model can be built to analyze input documents related to a loan application package using the Form Recognizer library for Java.
Build a classification model
The following code builds a custom classification model trained to analyze a loan application package containing a loan application form, payslip, and bank statement.
Java
DocumentModelAdministrationClient client = new DocumentModelAdministrationClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("https://{endpoint}.cognitiveservices.azure.com/")
.buildClient();
// Provide source for training the model
ContentSource loanApplnFormSource = new BlobContentSource("{SAS URL to your container}");
ContentSource payslipSource = new BlobContentSource("{SAS URL to your container}");
ContentSource bankStatementSource = new BlobContentSource("{SAS URL to your container}");
HashMap<String, ClassifierDocumentTypeDetails> docTypes = new HashMap<>();
docTypes.put("loan application form", new ClassifierDocumentTypeDetails(loanApplnFormSource));
docTypes.put("payslip", new ClassifierDocumentTypeDetails(payslipSource));
docTypes.put("bank statement", new ClassifierDocumentTypeDetails(bankStatementSource));
/**
* Alternatively, if you have a flat list of files to train the model, you can use the
* BlobFileListContentSource type to train the model.
*/
ContentSource loanApplnFormListSource
= new BlobFileListContentSource("{SAS URL to your container}", "Loan-Application-Documents.jsonl");
HashMap<String, ClassifierDocumentTypeDetails> fileListDocTypes = new HashMap<>();
fileListDocTypes.put("loan application form", new ClassifierDocumentTypeDetails(loanApplnFormListSource));
// Build a custom classifier document model
SyncPoller<OperationResult, DocumentClassifierDetails> buildOperationPoller
= client.beginBuildDocumentClassifier(docTypes);
DocumentClassifierDetails documentClassifierDetails = buildOperationPoller.getFinalResult();
// Get the custom built classifier ID
System.out.printf("Classifier ID: %s%n", documentClassifierDetails.getClassifierId());
Find a similar example in other languages here:
- .NET – Build a document classifier
- Python – Build a document classifier
- JavaScript – Build a document classifier
Note: Users can also build a classification model using Document Intelligence Studio.
Analyze a document using classification model
Now that the built custom classification model is ready, it can be used to identify the page ranges for the individual documents comprising loan applications, payslips, or bank statements. The following code shows how it can be used.
For example, a user wants to classify a single document containing a mix of loan application forms and payslips.
Java
// File URL to analyze
String documentUrl = "{URL to the sample document}";
SyncPoller<OperationResult, AnalyzeResult> syncPoller
= client.beginClassifyDocumentFromUrl(documentClassifierDetails.getClassifierId(),
documentUrl)
AnalyzeResult analyzeResult = syncPoller.getFinalResult();
// Notice the classified documents under each doc type
analyzeResult.getDocuments()
.forEach(analyzedDocument -> System.out.printf("Doc Type: %s%n", analyzedDocument.getDocType()));
// Get identified/classified page/page ranges
analyzeResult.getPages().forEach(documentPage -> {
System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
documentPage.getWidth(),
documentPage.getHeight(),
documentPage.getUnit());
// lines
documentPage.getLines().forEach(documentLine ->
System.out.printf("Line '%s' is within a bounding box %s.%n",
documentLine.getContent(),
documentLine.getBoundingPolygon().toString()));
// words
documentPage.getWords().forEach(documentWord ->
System.out.printf("Word '%s' has a confidence score of %.2f.%n",
documentWord.getContent(),
documentWord.getConfidence()));
});
Find the preceding example in other languages here:
- .NET – Classify a document
- Python – Classify a document
- JavaScript – Classify a document
Add-on recognition capabilities
Form Recognizer now supports more sophisticated analysis capabilities. These optional capabilities can be enabled and disabled depending on the scenario of the document extraction. The following add-on capabilities are available for service version 2023-07-31
and later releases:
- ocr.barcode – Support for extracting layout barcodes.
- ocr.highResolution – The task of recognizing small text from large documents.
- ocr.formula – Detect formulas in documents, such as mathematical equations.
- ocr.font – Recognize font-related properties of extracted text.
Users can use the add-on capabilities by including the DocumentAnalysisFeature
object in the analysis request.
Barcode recognition
Many documents can now be detected using the barcode recognition feature of the library. Examples of such documents include healthcare and procurement-related document types in which critical information like patient ID is encoded in the barcode. The detected barcodes are represented in the barcodes
collection as a top-level property under DocumentPage
. Each object describes the:
- Barcode type (QRCode, UPCA, etc.).
- Decoded value (general string representing URL, number, or other data).
- Bounding polygon.
- Span within which the embedded barcode content as value resides.
- Overall extraction confidence.
Users can use the following code to access the first barcode property of the first page on their respective analyzed result object.
Java
DocumentBarcode barcode =
analyzeResult.getPages().get(0).getBarcodes().get(0);
System.out.printf("Barcode kind: '%s'", barcode.getKind());
// Output:
// Barcode kind: 'Code39'
.NET
DocumentBarcode barcode = analyzeResult.Pages[0].Barcodes[0];
Console.WriteLine($"Barcode kind: '{barcode.Kind}'");
// Output:
// Barcode kind: 'Code39'
Python
print(f"Barcode kind: {result.pages[0].barcodes[0].kind}") # "Code39"
JavaScript
const [barcode1, barcode2] = anaylzeResult.pages?.[0].barcodes as DocumentBarcode[];
console.log(barcode1.kind); // "Code39"
console.log(barcode1.value) // "D589992-X"
High-resolution recognition
With this add-on feature, users can now easily extract content from complex documents comprising a mix of graphical and structural elements and have varying fonts, sizes, and orientations.
For example, the following code includes the high-resolution recognition add-on feature when analyzing a document:
Java
SyncPoller<OperationResult, AnalyzeResult> syncPoller
= client.beginAnalyzeDocumentFromUrl("prebuilt-layout", "sourceUrl",
new AnalyzeDocumentOptions()
.setDocumentAnalysisFeatures(Collections.singletonList(DocumentAnalysisFeature.OCR_HIGH_RESOLUTION)));
AnalyzeResult analyzeResult = syncPoller.getFinalResult();
.NET
var documentUri = new Uri("source-url");
var options = new AnalyzeDocumentOptions
{
Features = { DocumentAnalysisFeature.OcrHighResolution }
};
AnalyzeDocumentOperation operation = client.AnalyzeDocumentFromUri(
WaitUntil.Completed, "prebuilt-layout", documentUri, options);
AnalyzeResult analyzeResult = operation.Value;
Python
poller = document_analysis_client.begin_analyze_document(
"prebuilt-layout",
document = document_to_analyze,
features = [AnalysisFeature.OCR_HIGH_RESOLUTION]
)
result = poller.result()
JavaScript
const poller = await client.beginAnalyzeDocumentFromUrl("prebuilt-layout", "source-url", {
features: [FormRecognizerFeature.OcrHighResolution],
});
const anaylzeResult = await poller.pollUntilDone();
Detect formulas
Formulae are often found in scientific document types and now can be detected with this add-on feature. The detected formulas are represented in the formula
collection as a top-level property under DocumentPage
. Each object describes the formula type as inline or display, and its LaTeX representation as the value along with its polygon coordinates.
Java
DocumentFormula formula = analyzeResult.getPages().get(0).getFormulas().get(0);
System.out.printf("Formula kind: '%s' %n", formula.getKind());
System.out.printf("Formula value: '%s'", formula.getValue());
// Output:
// Formula kind: 'inline'
// Formula value: 'a+b=c'
.NET
DocumentFormula formula = analyzeResult.Pages[0].Formulas[0];
Console.WriteLine($"Formula kind: '{formula.Kind}'");
Console.WriteLine($"Formula value: '{formula.Value}'");
// Output:
// Formula kind: 'inline'
// Formula value: 'a+b=c'
Python
formula = result.pages[0].formulas[0]
print(f"Formula kind: {formula.kind}") # Formula kind: inline
print(f"Formula value: {formula.value}") # Formula value: a+b=c
JavaScript
const [formula1, formula2] = anaylzeResult.pages?.[0].formulas as DocumentFormula[];
console.log(formula1.kind); // "inline"
console.log(formula1.value) // "a+b=c"
Font extraction
This add-on feature enables users to detect various font properties associated with the extracted text in the document. The detected font properties collection is represented in the top-level property styles
under DocumentPage
. DocumentStyle
provides font-related properties, like similarFontFamily
, specifying the visually most similar font within a supported documented set of fonts, fontStyle
, fontWeight
, color
, and backgroundColor
for the extracted text.
The following code sample illustrates the use of DocumentAnalysisFeature.STYLE_FONT
to extract font properties from text:
Java
DocumentStyle documentStyle = analyzeResult.getStyles().get(0);
System.out.printf("Font style: '%s' %n", documentStyle.getFontStyle());
System.out.printf("Font background color: '%s'", documentStyle.getBackgroundColor());
// Output:
// Font style: 'italic'
// Font background color: '#0000FF'
.NET
DocumentStyle documentStyle = analyzeResult.Styles[0];
Console.WriteLine($"Font style: '{documentStyle.FontStyle}'");
Console.WriteLine($"Font background color: '{documentStyle.BackgroundColor}'");
// Output:
// Font style: 'italic'
// Font background color: '#0000FF'
Python
for style in result.styles:
if style.font_style:
print(f"Font style: '{style.font_style}'") # Font style: 'italic'
if style.background_color:
print(f"Background color: '{style.background_color}'") # Font background color: '#0000FF'
JavaScript
const [style1, style2] = anaylzeResult.styles as DocumentStyle[];
console.log(style1.fontStyle); // "italic"
console.log(style2.backgroundColor) // "#0000FF"
Support for new prebuilt models
New prebuilt models are now supported with Form Recognizer libraries to analyze:
- Contracts (
prebuilt-contract
) - Tax forms (
prebuilt-tax.us.1098
,prebuilt-tax.us.1098E
,prebuilt-tax.us.1098T
) - Health insurance cards (
prebuilt-healthInsuranceCard.us
)
Prebuilt models offer the convenience of extracting fields from a document without having to build a model. To find more information about models, including a list of supported prebuilt models, see Form Recognizer models.
The following code analyzes a healthcare card using a prebuilt model provided by the service:
Java
SyncPoller<OperationResult, AnalyzeResult> syncPoller
= client.beginAnalyzeDocumentFromUrl("prebuilt-healthInsuranceCard.us", "URL to health document").getSyncPoller();
AnalyzeResult analyzeResult = syncPoller.getFinalResult();
for (int i = 0; i < analyzeResult.getDocuments().size(); i++) {
System.out.printf("--------Analyzing health care card %d--------%n", i);
AnalyzedDocument analyzedHealthCard = analyzeResults.getDocuments().get(i);
Map<String, DocumentField> healthCardFields = analyzedHealthCard.getFields();
System.out.printf("Health care insurer: '%s'%n", healthCardFields.get("Insurer").getValueAsString());
System.out.println("--------Member details --------");
DocumentField memberDocumentField = healthCardFields.get("Member");
if (memberDocumentField != null) {
if (DocumentFieldType.MAP == memberDocumentField.getType()) {
memberDocumentField.getValueAsMap().forEach((key, documentField) -> {
if ("Member.Name".equals(key)) {
if (DocumentFieldType.STRING == documentField.getType()) {
String name = documentField.getValueAsString();
System.out.printf("\tMember Name: %s, confidence: %.2f%n", name, documentField.getConfidence());
}
}
if ("Member.BirthDate".equals(key)) {
if (DocumentFieldType.DATE == documentField.getType()) {
LocalDate birthDate = documentField.getValueAsDate();
System.out.printf("\tMember birth date: %s, confidence: %.2f%n",
birthDate, documentField.getConfidence());
}
}
}));
}
}
}
Find a similar example in other languages here:
- .NET – Analyze a document with a prebuilt model – ‘prebuilt-invoice’
- Python – Analyze with prebuilt model – ‘prebuilt-tax.us.W-2’
- JavaScript – Analyze with prebuilt model – ‘prebuilt-receipt’
Learn more
To learn more and to try the new features, see these links to our official documentation:
Give us your feedback
We appreciate your feedback and encourage you to share your thoughts with us. We thrive on improvement and would welcome any suggestions you may have. Let’s work together to make our experience even better!
You can reach out to us by filing issues in the language-specific GitHub repository:
Include the “[Form Recognizer]” string in the issue title so it gets routed to the right people.
0 comments