Performing OCR for iOS, Android, and Windows with Microsoft Cognitive Services

Pierce Boggan

The Computer Vision API from Microsoft Cognitive Services.Optical character recognition, commonly known as OCR, detects the text found in an image or video and extracts the recognized words. By using OCR, we can provide our users a much better user experience; instead of having to manually perform data entry on a mobile device, users can simply take a photo, and OCR can extract the information required without requiring any further interaction from the user. Many mobile apps have adopted OCR as a primary user input source, such as apps for tracking expenses or keeping track of business cards, or you may have seen this feature used to “scan” your information off of a credit card rather than typing it in.

In this blog post, you will learn how to add OCR to your mobile apps in just a few lines of code by building an invoice-tracking app for iOS, Android, and Windows with Xamarin.Forms and Microsoft Cognitive Services.

Getting Started with the Computer Vision API

Microsoft Cognitive Services lets you build apps using powerful algorithms in just a few lines of code with 22 APIs to help us do everything from facial recognition to OCR. APIs are broken down into five main categories: vision, speech, language, knowledge, and search. The Computer Vision API allows us to extract rich information from images, such as recognizing objects or extracting text with OCR.

Let’s build an app that tracks due dates for payments by taking photos of invoices. To get started, download the starter code for the app. This contains one screen that allows users to view invoices, along with a button for tracking new invoices.

Obtaining an API Key

Microsoft Cognitive Services is completely free to get started. To begin using the Computer Vision API, we must first obtain an API key for the service. To do this, visit the Computer Vision API page and click “Get started for free.”

Check the row with “Computer Vision” as the product name and click “Subscribe.” Once registered, you should have a free subscription to the Computer Vision APIs, including API keys.

Screenshot of the Computer Vision API after it has been subscribed to.

Taking a Photo

Now that we have an API key, it’s time to add the functionality to take a photo that we will later perform OCR on. Taking photos is a platform-specific functionality, but thanks to the Media Plugin for Xamarin and Windows, we can add the ability to take or select photos from shared code quickly and easily. Add the Media Plugin for Xamarin and Windows NuGet package to both the PCL and platform-specific projects.

Let’s add the logic to take the photo. Open up InvoicesViewModel and scroll down to the ExecuteAddInvoiceCommandAsync method. This is where we’ll add the logic to take a photo, perform OCR, and add the invoice to the view.

Add the following code under the “Add camera logic” comment:

// Bring these namespaces in at the top of the class
using Plugin.Media;
using Plugin.Media.Abstractions;
...
await CrossMedia.Current.Initialize();

MediaFile photo;
if (CrossMedia.Current.IsCameraAvailable)
{
    photo = await CrossMedia.Current.TakePhotoAsync(new StoreCameraMediaOptions
    {
        Directory = "Invoices",
        Name="Invoice.jpg"
    });
}
else
{
    photo = await CrossMedia.Current.PickPhotoAsync();
}

After initializing the Media Plugin library, we check to make sure that the camera is available on the device the code is running on. If it is, we take a photo with the camera; if not, we allow the user to pick from their photo library. We’re now done with our photo logic; that’s all of the code that’s required to take photos on iOS, Android, and Windows.

Performing OCR

Now we’re ready to upload our photo to the Computer Vision API to perform OCR. Because all Microsoft Cognitive Services APIs are just RESTful APIs, we can use HttpClient to call into them just like any other service. To make it as easy as possible to consume Microsoft Cognitive Services APIs, the team has created client libraries and shipped them on NuGet. To add the Computer Vision API to our project, add the Microsoft.ProjectOxford.Vision NuGet to the PCL project.

In the ExecuteAddInvoiceCommandAsync method under the “add OCR logic” comment, add the following code:

// Bring these namespaces in at the top of the class
using Microsoft.ProjectOxford.Vision;
using Microsoft.ProjectOxford.Vision.Contract;
...
OcrResults text;
var client = new VisionServiceClient("{YOUR-API-KEY-HERE}");
using (var photoStream = photo.GetStream())
{
    text = await client.RecognizeTextAsync(photoStream);
}

The Computer Vision API will automatically detect all characters in the uploaded image and is capable of recognizing characters from 21 different languages. If required, the image will even be rotated around the horizontal axis (so users don’t have to take a perfect photo). These characters are divided up into regions, which contain individual lines of text.

Next, let’s grab the total from the invoice by selecting the highest numeric value returned by the OCR. After selecting the total, let’s create a new Invoice and add it to our ObservableCollection.

double total = 0.0;
foreach(var region in text.Regions)
{
    foreach (var line in region.Lines)
    {
        foreach (var word in line.Words)
        {
            if (word.Text.Contains("$"))
            {
                var number = Double.Parse(word.Text.Replace("$", ""));
                total = (number > total) ? number : total;
            }
        }
    }
}
// Add to data-bound collection.
Invoice.Add(new Invoice
{
    Total = total,
    Photo = photo.Path,
    TimeStamp = DateTime.Now
});

We’ve just built an app to track invoices using Microsoft Cognitive Services and Xamarin! This app uses the Computer Vision API’s OCR functionality to extract the total from an invoice. We could even extend this to extract dates using OCR and automatically add an event on the calendar to remind users an invoice is due.

App built using Xamarin and Microsoft Cognitive Services' Computer Vision API.

Wrapping Up

In this blog post, you learned how to use Microsoft Cognitive Services’ free Computer Vision API to build an app that tracks due dates from invoices using OCR. Microsoft Cognitive Services allows us to build smarter apps with powerful algorithms, such as facial recognition, in just a few lines of code. For more information about Microsoft Cognitive Services, check out the service portal for a full listing of all 21 APIs, along with documentation. For more information about OCR, download the InvoiceIt sample or visit the documentation.

3 comments

Discussion is closed. Login to edit/delete existing comments.

  • Robert Edyvean 0

    Looks great but getting “Package Microsoft.ProjectOxford.Vision.DotNetCore 1.1.0 is not compatible with netstandard2.0 – Microsoft.ProjectOxford.Vision.DotNetCore 1.1.0 supports: netcoreapp1.1 ” when trying to install the NuGet package. As far as I know netstandard2.0 would be the default for Xamarin development at the moment?

    • Stephen Spencer 0

      Did you ever get a response to this? I have the same problem….

  • Suraj Binorkar 0

    I registered for 7 days free api, first wanted to test so I have tried, but I am getting null value when I am trying to retrun RecognizeTextAsync result,let me share you in details : 
    OcrResults text;
    if (CrossMedia.Current.IsCameraAvailable){photo = await CrossMedia.Current.TakePhotoAsync(new StoreCameraMediaOptions{Directory = “Invoices”,Name = “Invoice.jpg”});}else{photo = await CrossMedia.Current.PickPhotoAsync();}var client = new VisionServiceClient(“{api key}”);
    using (var photoStream = photo.GetStream()){    text = await client.RecognizeTextAsync(photoStream); // Its giving me null value}can you please suggest, what is issue, am I doing something wrong? Please let me know. Thanks In advance 🙂 

Feedback usabilla icon