September 15th, 2023

New features in the Azure Form Recognizer client libraries

Sameeksha Vaity
Software Engineer

We’re pleased to announce a stable release of the Azure Form Recognizer (now known as Document Intelligence) libraries for .NET, Python, Java, and JavaScript/TypeScript.

Highlighted features

The new, stable release of the Form Recognizer client libraries targets the 2023-07-31 service version and includes many new features and quality improvements. For a complete list of what’s new, see What’s new in Form Recognizer?. This blog post highlights the following features:

Library availability

The new Form Recognizer libraries can be downloaded from each language’s preferred package manager.

Language Package Command Project Get started
.NET NuGet dotnet add package Azure.AI.FormRecognizer link link
Python PyPI pip install azure-ai-formrecognizer link link
Java Maven Add to POM.xml file link link
JavaScript/TypeScript npm npm install @azure/ai-form-recognizer link link

Document classification

One of the most significant improvements in this version of the Form Recognizer library is the support to build a custom classification model for document splitting and classification. With a custom-built classification model, users can now analyze a single- or multi-file document to identify if any of the trained document types are contained within an input file.

The following samples illustrate how a single custom classification model can be built to analyze input documents related to a loan application package using the Form Recognizer library for Java.

Build a classification model

The following code builds a custom classification model trained to analyze a loan application package containing a loan application form, payslip, and bank statement.

Java
DocumentModelAdministrationClient client = new DocumentModelAdministrationClientBuilder()
  .credential(new AzureKeyCredential("{key}"))
  .endpoint("https://{endpoint}.cognitiveservices.azure.com/")
  .buildClient();

// Provide source for training the model
ContentSource loanApplnFormSource = new BlobContentSource("{SAS URL to your container}");
ContentSource payslipSource = new BlobContentSource("{SAS URL to your container}");
ContentSource bankStatementSource = new BlobContentSource("{SAS URL to your container}");

HashMap<String, ClassifierDocumentTypeDetails> docTypes = new HashMap<>();
docTypes.put("loan application form", new ClassifierDocumentTypeDetails(loanApplnFormSource));
docTypes.put("payslip", new ClassifierDocumentTypeDetails(payslipSource));
docTypes.put("bank statement", new ClassifierDocumentTypeDetails(bankStatementSource));

/**
 * Alternatively, if you have a flat list of files to train the model, you can use the
 * BlobFileListContentSource type to train the model.
 */
ContentSource loanApplnFormListSource 
  = new BlobFileListContentSource("{SAS URL to your container}", "Loan-Application-Documents.jsonl");

HashMap<String, ClassifierDocumentTypeDetails> fileListDocTypes = new HashMap<>();
fileListDocTypes.put("loan application form", new ClassifierDocumentTypeDetails(loanApplnFormListSource));

// Build a custom classifier document model
SyncPoller<OperationResult, DocumentClassifierDetails> buildOperationPoller
  = client.beginBuildDocumentClassifier(docTypes);

DocumentClassifierDetails documentClassifierDetails = buildOperationPoller.getFinalResult();

// Get the custom built classifier ID 
System.out.printf("Classifier ID: %s%n", documentClassifierDetails.getClassifierId());

Find a similar example in other languages here:

Note: Users can also build a classification model using Document Intelligence Studio.

Analyze a document using classification model

Now that the built custom classification model is ready, it can be used to identify the page ranges for the individual documents comprising loan applications, payslips, or bank statements. The following code shows how it can be used.

For example, a user wants to classify a single document containing a mix of loan application forms and payslips.

Java
// File URL to analyze
String documentUrl = "{URL to the sample document}";
SyncPoller<OperationResult, AnalyzeResult> syncPoller
  = client.beginClassifyDocumentFromUrl(documentClassifierDetails.getClassifierId(),
      documentUrl)
AnalyzeResult analyzeResult = syncPoller.getFinalResult();

// Notice the classified documents under each doc type
analyzeResult.getDocuments()
  .forEach(analyzedDocument -> System.out.printf("Doc Type: %s%n", analyzedDocument.getDocType()));

// Get identified/classified page/page ranges
analyzeResult.getPages().forEach(documentPage -> {
  System.out.printf("Page has width: %.2f and height: %.2f, measured with unit: %s%n",
      documentPage.getWidth(),
      documentPage.getHeight(),
      documentPage.getUnit());

  // lines
  documentPage.getLines().forEach(documentLine ->
      System.out.printf("Line '%s' is within a bounding box %s.%n",
          documentLine.getContent(),
          documentLine.getBoundingPolygon().toString()));

  // words
  documentPage.getWords().forEach(documentWord ->
      System.out.printf("Word '%s' has a confidence score of %.2f.%n",
          documentWord.getContent(),
          documentWord.getConfidence()));
});

Find the preceding example in other languages here:

Add-on recognition capabilities

Form Recognizer now supports more sophisticated analysis capabilities. These optional capabilities can be enabled and disabled depending on the scenario of the document extraction. The following add-on capabilities are available for service version 2023-07-31 and later releases:

  • ocr.barcode – Support for extracting layout barcodes.
  • ocr.highResolution – The task of recognizing small text from large documents.
  • ocr.formula – Detect formulas in documents, such as mathematical equations.
  • ocr.font – Recognize font-related properties of extracted text.

Users can use the add-on capabilities by including the DocumentAnalysisFeature object in the analysis request.

Barcode recognition

Many documents can now be detected using the barcode recognition feature of the library. Examples of such documents include healthcare and procurement-related document types in which critical information like patient ID is encoded in the barcode. The detected barcodes are represented in the barcodes collection as a top-level property under DocumentPage. Each object describes the:

  • Barcode type (QRCode, UPCA, etc.).
  • Decoded value (general string representing URL, number, or other data).
  • Bounding polygon.
  • Span within which the embedded barcode content as value resides.
  • Overall extraction confidence.

Users can use the following code to access the first barcode property of the first page on their respective analyzed result object.

Java
DocumentBarcode barcode =
    analyzeResult.getPages().get(0).getBarcodes().get(0);
System.out.printf("Barcode kind: '%s'", barcode.getKind());

// Output:
// Barcode kind: 'Code39'
.NET
DocumentBarcode barcode = analyzeResult.Pages[0].Barcodes[0];

Console.WriteLine($"Barcode kind: '{barcode.Kind}'");

// Output:
// Barcode kind: 'Code39'
Python
print(f"Barcode kind: {result.pages[0].barcodes[0].kind}") # "Code39"
JavaScript
const [barcode1, barcode2] = anaylzeResult.pages?.[0].barcodes as DocumentBarcode[];
console.log(barcode1.kind); // "Code39"
console.log(barcode1.value) // "D589992-X"

High-resolution recognition

With this add-on feature, users can now easily extract content from complex documents comprising a mix of graphical and structural elements and have varying fonts, sizes, and orientations.

For example, the following code includes the high-resolution recognition add-on feature when analyzing a document:

Java
SyncPoller<OperationResult, AnalyzeResult> syncPoller
  = client.beginAnalyzeDocumentFromUrl("prebuilt-layout", "sourceUrl", 
      new AnalyzeDocumentOptions()
        .setDocumentAnalysisFeatures(Collections.singletonList(DocumentAnalysisFeature.OCR_HIGH_RESOLUTION)));
AnalyzeResult analyzeResult = syncPoller.getFinalResult();
.NET
var documentUri = new Uri("source-url");
var options = new AnalyzeDocumentOptions
{
    Features = { DocumentAnalysisFeature.OcrHighResolution }
};

AnalyzeDocumentOperation operation = client.AnalyzeDocumentFromUri(
    WaitUntil.Completed, "prebuilt-layout", documentUri, options);
AnalyzeResult analyzeResult = operation.Value;
Python
poller = document_analysis_client.begin_analyze_document(
      "prebuilt-layout",
      document = document_to_analyze,
      features = [AnalysisFeature.OCR_HIGH_RESOLUTION]
)
result = poller.result()
JavaScript
const poller = await client.beginAnalyzeDocumentFromUrl("prebuilt-layout", "source-url", {
  features: [FormRecognizerFeature.OcrHighResolution],
});
const anaylzeResult = await poller.pollUntilDone();

Detect formulas

Formulae are often found in scientific document types and now can be detected with this add-on feature. The detected formulas are represented in the formula collection as a top-level property under DocumentPage. Each object describes the formula type as inline or display, and its LaTeX representation as the value along with its polygon coordinates.

Java
DocumentFormula formula = analyzeResult.getPages().get(0).getFormulas().get(0);
System.out.printf("Formula kind: '%s' %n", formula.getKind());
System.out.printf("Formula value: '%s'", formula.getValue());

// Output:
// Formula kind: 'inline'
// Formula value: 'a+b=c'
.NET
DocumentFormula formula = analyzeResult.Pages[0].Formulas[0];

Console.WriteLine($"Formula kind: '{formula.Kind}'");
Console.WriteLine($"Formula value: '{formula.Value}'");

// Output:
// Formula kind: 'inline'
// Formula value: 'a+b=c'
Python
formula = result.pages[0].formulas[0]
print(f"Formula kind: {formula.kind}") # Formula kind: inline
print(f"Formula value: {formula.value}") # Formula value: a+b=c
JavaScript
const [formula1, formula2] = anaylzeResult.pages?.[0].formulas as DocumentFormula[];
console.log(formula1.kind); // "inline"
console.log(formula1.value) // "a+b=c"

Font extraction

This add-on feature enables users to detect various font properties associated with the extracted text in the document. The detected font properties collection is represented in the top-level property styles under DocumentPage. DocumentStyle provides font-related properties, like similarFontFamily, specifying the visually most similar font within a supported documented set of fonts, fontStyle, fontWeight, color, and backgroundColor for the extracted text.

The following code sample illustrates the use of DocumentAnalysisFeature.STYLE_FONT to extract font properties from text:

Java
DocumentStyle documentStyle = analyzeResult.getStyles().get(0);
System.out.printf("Font style: '%s' %n", documentStyle.getFontStyle());
System.out.printf("Font background color: '%s'", documentStyle.getBackgroundColor());

// Output:
// Font style: 'italic'
// Font background color: '#0000FF'
.NET
DocumentStyle documentStyle = analyzeResult.Styles[0];

Console.WriteLine($"Font style: '{documentStyle.FontStyle}'");
Console.WriteLine($"Font background color: '{documentStyle.BackgroundColor}'");

// Output:
// Font style: 'italic'
// Font background color: '#0000FF'
Python
for style in result.styles:
  if style.font_style:
      print(f"Font style: '{style.font_style}'") # Font style: 'italic'
  if style.background_color:
      print(f"Background color: '{style.background_color}'") # Font background color: '#0000FF'
JavaScript
const [style1, style2] = anaylzeResult.styles as DocumentStyle[];
console.log(style1.fontStyle);      // "italic"
console.log(style2.backgroundColor) // "#0000FF"

Support for new prebuilt models

New prebuilt models are now supported with Form Recognizer libraries to analyze:

  • Contracts (prebuilt-contract)
  • Tax forms (prebuilt-tax.us.1098, prebuilt-tax.us.1098E, prebuilt-tax.us.1098T)
  • Health insurance cards (prebuilt-healthInsuranceCard.us)

Prebuilt models offer the convenience of extracting fields from a document without having to build a model. To find more information about models, including a list of supported prebuilt models, see Form Recognizer models.

The following code analyzes a healthcare card using a prebuilt model provided by the service:

Java
SyncPoller<OperationResult, AnalyzeResult> syncPoller 
  = client.beginAnalyzeDocumentFromUrl("prebuilt-healthInsuranceCard.us", "URL to health document").getSyncPoller();

AnalyzeResult analyzeResult = syncPoller.getFinalResult();

for (int i = 0; i < analyzeResult.getDocuments().size(); i++) {
  System.out.printf("--------Analyzing health care card %d--------%n", i);
  AnalyzedDocument analyzedHealthCard = analyzeResults.getDocuments().get(i);
  Map<String, DocumentField> healthCardFields = analyzedHealthCard.getFields();
  System.out.printf("Health care insurer: '%s'%n", healthCardFields.get("Insurer").getValueAsString());
  System.out.println("--------Member details --------");
  DocumentField memberDocumentField = healthCardFields.get("Member");
  if (memberDocumentField != null) { 
    if (DocumentFieldType.MAP == memberDocumentField.getType()) {
      memberDocumentField.getValueAsMap().forEach((key, documentField) -> {
        if ("Member.Name".equals(key)) {
          if (DocumentFieldType.STRING == documentField.getType()) {
            String name = documentField.getValueAsString();
            System.out.printf("\tMember Name: %s, confidence: %.2f%n", name, documentField.getConfidence());
          }
        }
        if ("Member.BirthDate".equals(key)) {
          if (DocumentFieldType.DATE == documentField.getType()) {
            LocalDate birthDate = documentField.getValueAsDate();
            System.out.printf("\tMember birth date: %s, confidence: %.2f%n",
              birthDate, documentField.getConfidence());
          }
        }
      }));
    }
  }
}

Find a similar example in other languages here:

Learn more

To learn more and to try the new features, see these links to our official documentation:

Give us your feedback

We appreciate your feedback and encourage you to share your thoughts with us. We thrive on improvement and would welcome any suggestions you may have. Let’s work together to make our experience even better!

You can reach out to us by filing issues in the language-specific GitHub repository:

Include the “[Form Recognizer]” string in the issue title so it gets routed to the right people.

References

Author

Sameeksha Vaity
Software Engineer

I am a Software Engineer currently working at Microsoft on their Azure SDK Java team located on the Redmond campus. I love working for this team which is customer focused, driven by talent and share my interest of love for writing code and solving complex issues. In my spare time, you would mostly find me cooking experimental dishes and enjoying nature with some good music in the background!

0 comments

Discussion are closed.