{"id":2935,"date":"2017-05-12T19:11:22","date_gmt":"2017-05-12T19:11:22","guid":{"rendered":"https:\/\/www.microsoft.com\/reallifecode\/?p=2935"},"modified":"2020-03-19T20:47:09","modified_gmt":"2020-03-20T03:47:09","slug":"food-classification-custom-vision-service","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/food-classification-custom-vision-service\/","title":{"rendered":"Food Classification with Custom Vision Service"},"content":{"rendered":"<h2>Background<\/h2>\n<p>Classification of a photo using machine learning tools can be challenging. Over the last year, significant improvements in the algorithms that power these tools have dramatically improved their efficacy. Developers are now able to create state-of-the-art\u00a0complex models using powerful tools like Microsoft Cognitive Toolkit, Tensorflow, Theano, Caffe, and others.<\/p>\n<p>With just a few clicks and no custom code, you can easily build your own predictive model using\u00a0<a href=\"http:\/\/CustomVision.ai\">CustomVision.ai<\/a>,\u00a0a new addition to\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/\">Microsoft Cognitive Services<\/a>. The Custom Vision Service makes image classification powered by Deep Neural Nets very accessible to developers. Let&#8217;s explore it!<\/p>\n<h3>Scenario<\/h3>\n<p>In our recent engagement with <a href=\"https:\/\/www.vectorform.com\/\">Vectorform<\/a>, we built a simple Android application that allows a user to obtain a\u00a0food&#8217;s nutritional values based on a photo of that food.\u00a0 To make this scenario simpler, we will assume that the target photo has either a single food item in it or that the user will indicate the food item in question.<\/p>\n<p>The main functionality of the app (image recognition) is powered by Custom Vision, where we will detect what the item is: for example, an apple or a tomato. Once we know what the food is, our goal of finding nutritional info from publicly available services is easy.<\/p>\n<h2>What is Custom Vision Service?<\/h2>\n<p>To put it simply, Custom Vision\u00a0is the younger sibling of Microsoft&#8217;s\u00a0<a href=\"https:\/\/www.microsoft.com\/cognitive-services\/en-us\/computer-vision-api\">Computer Vision API<\/a> with one\u00a0big difference: in Custom Vision, you can fine-tune a predictive model to the dataset at hand (hence the &#8220;Custom&#8221; part). Custom Vision\u00a0utilizes the concept of <a href=\"http:\/\/www.kdnuggets.com\/2015\/08\/recycling-deep-learning-representations-transfer-ml.html\">Transfer Learning<\/a>, a method in which a powerful pre-trained model (ResNet, AlexNet) is taught to pay more attention to the distinctive features seen in user-provided dataset and classes.<\/p>\n<h2>Obtaining\u00a0the data<\/h2>\n<p>Obtaining labeled data for our app took several hours.\u00a0 We wanted to use food pictures taken by real people (not stock photos) and to have 100-200 images per class.\u00a0 <a href=\"https:\/\/www.flickr.com\/\">Flickr<\/a> has quite a variety of client-side tools that allow image search and bulk download. Search results, however, were not 100% accurate and required a bit of cleanup (like removing a few photos of dogs and kids from a\u00a0dataset representing the food class &#8220;cupcake&#8221;).<\/p>\n<p>Once the data was downloaded (one image class per folder) it was split into train and test subsets. This is an optional step that you can take if you desire a more detailed evaluation of model performance.<\/p>\n<p>We created the script <a href=\"https:\/\/github.com\/CatalystCode\/Custom-Vison-Service\/blob\/master\/FoodClassification\/Scripts\/prepareTrainTestImages.py\"><em>prepareTrainTestImages.py<\/em><\/a>\u00a0 to partition the data into\u00a03\/4 for training and 1\/4 for testing.<\/p>\n<h2>Training the model<\/h2>\n<p>Now let&#8217;s go to <a href=\"http:\/\/CustomVision.ai\">CustomVision.ai<\/a>\u00a0and try training different food classification models. When doing this for the first time you may want to use <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/custom-vision-service\/getting-started-build-a-classifier\">this overview<\/a> as a\u00a0point of reference.<\/p>\n<p>We will try building a few models, starting with a super simple model with 5 classes, then expanding the model to classify 14 types of food. Next, we will see if grouping the classes is helpful in improving prediction accuracy.<\/p>\n<h3>Tiny model<\/h3>\n<p>To get started, we&#8217;ll begin by classifying the following foods. For the purposes of this example, we&#8217;ll select foods that are visually distinct from each other:<\/p>\n<ul>\n<li>Apple<\/li>\n<li>Banana<\/li>\n<li>Cake<\/li>\n<li>Fries<\/li>\n<li>Sandwich<\/li>\n<\/ul>\n<p>Creating a classification model in CSV is easy:<\/p>\n<ol>\n<li>Create a\u00a0name for the project<\/li>\n<li>Create a\u00a0new class and upload corresponding training images<\/li>\n<li>Repeat step #2 as needed<\/li>\n<li>Press the &#8220;Train&#8221; button<\/li>\n<li>Sit back and relax<\/li>\n<\/ol>\n<p>Once the model is trained, the statistics below are displayed in the Custom Vision\u00a0portal for each iteration. We had pretty good model performance out of the box with both precision and recall exceeding 90%! Precision metrics tell us the percentage of correct predictions for a given image. Recall measures how much a classifier can detect (what percentage of the apples in the test were classified as such).<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-3357 size-medium\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/5classes_PR.png\" alt=\"Precision and recall for model\" width=\"300\" height=\"176\" \/><\/p>\n<p>Custom Vision\u00a0introduces\u00a0a \u201cProbability Threshold\u201d slider (we used the default value of 90%) that is used to calculate Precision and Recall. When interpreting the predictions you get a probability per tag; for example,\u00a0the probability that picture A contains an apple is 95%. \u00a0If the probability threshold is 90%, then this example will be taken into consideration as a &#8220;correct prediction&#8221;. \u00a0Depending on your application needs, you may want to set a higher\/lower probability threshold.<\/p>\n<p>By going to the &#8220;Training Images&#8221; tab, you can view the images that confused the model. Some &#8220;cake&#8221; samples below were incorrectly classified as &#8220;sandwich&#8221;. And honestly, they do resemble a sandwich a bit.<img decoding=\"async\" class=\"aligncenter wp-image-3356 size-medium\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/cake_sandw.png\" alt=\"misclassified cake\" width=\"300\" height=\"300\" \/><\/p>\n<p>To get a more detailed analysis of how each class performs, we have created a script that will send test images to the model endpoint and evaluate results. See<a href=\"https:\/\/github.com\/CatalystCode\/Custom-Vison-Service\/blob\/master\/FoodClassification\/Scripts\/evalWebservice.py\"> <em>evalWebservice.py<\/em><\/a> (in the\u00a0<a href=\"https:\/\/github.com\/CatalystCode\/Custom-Vison-Service\/tree\/master\/FoodClassification\">GitHub repository<\/a>).<\/p>\n<p>The script expects a CSV file with two columns as input.\u00a0 Column 1 should have a path to test the image. Column 2 should have the name of a true class (matching the exact &#8220;tag&#8221; name in the Custom Vision\u00a0UI). To run the script again, modify your endpoint\/key accordingly.<\/p>\n<p>Once the script is done testing the endpoint, the following table will display:<img decoding=\"async\" class=\"aligncenter wp-image-3096\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/3.png\" alt=\"Confusion matrix for the model\" width=\"550\" height=\"421\" \/><\/p>\n<p>This visualization is called a\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Confusion_matrix\">Confusion Matrix<\/a>; items on the diagonal are cases where the model&#8217;s prediction is correct (that is, 95 images were predicted to be sandwiches and they were in fact sandwiches). In\u00a0our model, train and test images contain one type of food and when interpreting Custom Vision\u00a0predictions and building Confusion Matrix we focus on the one with the highest probability.<\/p>\n<p>The numbers that are off the main diagonal show where the model made classification mistakes. Nine bananas were incorrectly classified as cakes, and seven cakes as sandwiches.\u00a0 Those mistakes give us good insight that the color and the texture-rich variety of cakes make it quite complex for the model to learn the requisite class features; as a result, this class is\u00a0easy to confuse with something else.<\/p>\n<p><img decoding=\"async\" class=\"wp-image-3097 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/4.png\" alt=\"Precision, recall, F1-Score\" width=\"459\" height=\"165\" \/><\/p>\n<p>In this console output of <a href=\"https:\/\/github.com\/CatalystCode\/Custom-Vison-Service\/blob\/master\/FoodClassification\/Scripts\/evalWebservice.py\"><em>evalWebservice.py<\/em><\/a>,\u00a0 we see that &#8220;apples&#8221; are the most correctly predicted food (with high precision, high recall and thus high <a href=\"https:\/\/en.wikipedia.org\/wiki\/F1_score\">F1-score).<\/a>\u00a0 &#8220;Cake&#8221; has the lowest precision: only 86% present of &#8220;cake&#8221; predictions were correct (comparing to the glorious 90%+ for other classes).<\/p>\n<h2>Bigger model<\/h2>\n<p>Our 5-class model did reasonably well and it took us only a\u00a0few minutes to train! \u00a0Most of the time was spent uploading the images.<\/p>\n<p>If we expand the model, adding nine more food classes, let&#8217;s see what happens.<\/p>\n<ul>\n<li><span style=\"color: #000000;\">Apple<\/span><\/li>\n<li><span style=\"color: #000000;\">Banana<\/span><\/li>\n<li><strong><span style=\"color: #0000ff;\">Bell pepper (new)<\/span><\/strong><\/li>\n<li><strong><span style=\"color: #0000ff;\">Burger (new)<\/span><\/strong><\/li>\n<li><span style=\"color: #000000;\">Cake<\/span><\/li>\n<li><strong><span style=\"color: #0000ff;\">Canned drinks (new)<\/span><\/strong><\/li>\n<li><strong><span style=\"color: #0000ff;\">Cupcake (new)<\/span><\/strong><\/li>\n<li><span style=\"color: #000000;\">Fries<\/span><\/li>\n<li><strong><span style=\"color: #0000ff;\">Green salad (new)<\/span><\/strong><\/li>\n<li><strong><span style=\"color: #0000ff;\">Ice cream (new)<\/span><\/strong><\/li>\n<li><strong><span style=\"color: #0000ff;\">Onion (new)<\/span><\/strong><\/li>\n<li><strong><span style=\"color: #0000ff;\">Pomegranate (new)<\/span><\/strong><\/li>\n<li><span style=\"color: #000000;\">Sandwich<\/span><\/li>\n<li><strong><span style=\"color: #0000ff;\">Tomato (new)<\/span><\/strong><\/li>\n<\/ul>\n<p>In our first iteration, we purposefully provided classes of food items with quite different appearances. Now, we will be providing some similar-looking foods to see how well the Custom Vision-based model can distinguish items like apples vs. tomatoes (both are round, shiny, often red objects), cakes vs. cupcakes vs. ice cream (colorful, varied textures), sandwiches vs. burgers (different types of bread\/buns). Despite the variety of items this time, we&#8217;ll give Custom Vision\u00a0only 100-300 training images per class. Again, our images are mostly amateur photography and of varied quality. \u00a0For a\u00a0very simple model, you could try an even smaller amount of training data.<\/p>\n<p>Here are the results of training this model with 14 classes (ouch!):<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-3369 size-medium\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/5-1.png\" alt=\"Precision and recall\" width=\"300\" height=\"204\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>As the classifier is shown a larger variety of classes (some of which are very tricky!) it starts making more mistakes.<\/p>\n<p>Let&#8217;s look at the Confusion Matrix to understand where the classifier is making most of its mistakes.<\/p>\n<p><img decoding=\"async\" class=\"size-full wp-image-3100 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/6-1.png\" alt=\"\" width=\"1636\" height=\"1312\" \/><\/p>\n<p>As we can see &#8220;tomato&#8221; has a splendid recall (out of 99 tomatoes in test set, 98 were correctly classified). However, the precision of the &#8220;tomato&#8221; class is damaged by the addition of &#8220;bellpepper&#8221;. A similar situation exists with &#8220;sandwich&#8221; and &#8220;burger&#8221;. &#8220;Fries&#8221;, &#8220;greenSalad&#8221; and &#8220;cans&#8221;, however, are performing quite well.<\/p>\n<p>If we look at the columns of the Confusion Matrix we can see that the &#8220;iceCream&#8221; class is leading the board as the class that other items are most often confused with. We can see 24 bananas were incorrectly\u00a0classified as &#8220;iceCream&#8221; (do we have lots of white-yellow ice cream in the training set?), as well as 31 cakes and 44 cupcakes (probably because cupcakes and ice cream cones often have similar swirly tops).<\/p>\n<h2>&#8220;Layered&#8221; Model<\/h2>\n<p>Now let&#8217;s deal with some of the problematic classes. We&#8217;ll do this by creating a layered model, in which we use two models to address the problematic classes.<\/p>\n<p>We&#8217;ll let the uber-model\u00a0determine if something falls into the general sandwich-burger food type. Then we&#8217;ll make an additional call to a specialized small model that was trained only on 2 classes: #1 Sandwich, and #2 Burger. We&#8217;ll do the same with the tomato-bell pepper case.<\/p>\n<p><img decoding=\"async\" class=\"wp-image-3102 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/7.png\" alt=\"Layered model architecture\" width=\"625\" height=\"493\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s train the uber-model first (where we combined several foods together).<\/p>\n<p><em>Note: we&#8217;re using the same train and test images, and just grouping them differently.<\/em><\/p>\n<p>After training, we see a definite boost in overall Precision and Recall for the model: 8% in precision and 4.5% in the recall.<\/p>\n<p>Now let&#8217;s train Layer 2 models for tomatoes vs. bell peppers and sandwiches vs. burgers and see how\u00a0the classification performance is.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-3104\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/8.png\" alt=\"Confusion matrix\" width=\"1134\" height=\"941\" \/><\/p>\n<p>There is definitely an improvement!<\/p>\n<p>Previously, \u00a0most of the bell peppers (36 out of 49) were classified as tomatoes and 0 bell peppers were classified correctly. In the layered model, we are starting to get correct predictions for bell peppers. \u00a0It&#8217;s quite fascinating, especially if we remember that we&#8217;re training on the same set of images!<\/p>\n<p>As an observation, since the uber-model in Layer 1 has combined the classes of tomatoes and bell peppers (mostly red, somewhat round, glossy objects), some of the &#8220;apples&#8221; started to get misclassified as the class &#8220;belltomato&#8221;. That&#8217;s understandable, as a red apple does look similar to a tomato. Adding more data to the classes or maybe creating a &#8220;belltomatapple&#8221; class may be helpful in the next iteration.<\/p>\n<p>With sandwiches and burgers, there is also a positive trend. In the original iteration, zero burger predictions were made, and pretty much all burgers were incorrectly classified as sandwiches. In the layered model, we had 39 correct burger predictions.<\/p>\n<p>A layered model approach is only one of many options to consider to improve model&#8217;s performance. Increasing the number of train images and further refinement of image quality are definitely worth trying as well.<\/p>\n<h2 id=\"toc_0\">Android App<\/h2>\n<p>To demonstrate the full power of the Custom Vision\u00a0classification we built, we created a mobile application on Android that would capture a food-related image, hit our endpoint to determine what food is pictured in the image, use the Nutritionix service to get nutritional information about that food, then display the results to the user.<\/p>\n<p>The app provides several ways to capture an image for use:<\/p>\n<ol>\n<li>Use an image URL<\/li>\n<li>Capture a photo with the phone&#8217;s camera<\/li>\n<li>Choose a photo from the phone&#8217;s gallery<\/li>\n<\/ol>\n<p>Once we were done with adding support for classifying food items via the Custom Vision\u00a0endpoint in the Android application, we used a nutrition API that provides relevant nutritional information about the image sent to the Custom Vision\u00a0endpoint. \u00a0In this demo, we leveraged the <a href=\"https:\/\/www.nutritionix.com\/business\/api\">Nutritionix Nutrition API<\/a>.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/app_home_screenshot.png\" alt=\"App Home\" width=\"293\" height=\"521\" \/><\/p>\n<p>In the following sections, we will describe and show the code for the integration\/consumption of the Custom Vision\u00a0endpoint.<\/p>\n<h3 id=\"toc_1\">Code<\/h3>\n<p>Below is an example of how you can call Microsoft\u2019s Custom Vision\u00a0using Java on an Android app.<\/p>\n<p>The snippet below requires an API endpoint from the Microsoft Custom Vision.<\/p>\n<h3 id=\"toc_3\">AndoridManifest.xml<\/h3>\n<p>Here we declare the required user permissions, application activities, and application services.<\/p>\n<pre class=\"lang:default decode:true \">&lt;?xml version=\"1.0\" encoding=\"utf-8\"?&gt;\r\n&lt;manifest xmlns:android=\"http:\/\/schemas.android.com\/apk\/res\/android\"\r\n    package=\"com.claudiusmbemba.irisdemo\"&gt;\r\n\r\n    &lt;uses-feature\r\n        android:name=\"android.hardware.camera\"\r\n        android:required=\"true\" \/&gt;\r\n\r\n    &lt;uses-permission android:name=\"android.permission.INTERNET\" \/&gt;\r\n    &lt;uses-permission android:name=\"android.permission.ACCESS_NETWORK_STATE\" \/&gt;\r\n    &lt;uses-permission android:name=\"android.permission.CAMERA\" \/&gt;\r\n    &lt;uses-permission android:name=\"android.permission.WRITE_EXTERNAL_STORAGE\" \/&gt;\r\n\r\n    &lt;application\r\n        android:allowBackup=\"true\"\r\n        android:icon=\"@mipmap\/ic_launcher\"\r\n        android:label=\"@string\/app_name\"\r\n        android:roundIcon=\"@mipmap\/ic_launcher_round\"\r\n        android:supportsRtl=\"true\"\r\n        android:theme=\"@style\/AppTheme\"&gt;\r\n        &lt;activity\r\n            android:name=\".MainActivity\"\r\n            android:label=\"@string\/app_name\"\r\n            android:theme=\"@style\/AppTheme.NoActionBar\"&gt;\r\n            &lt;intent-filter&gt;\r\n                &lt;action android:name=\"android.intent.action.MAIN\" \/&gt;\r\n\r\n                &lt;category android:name=\"android.intent.category.LAUNCHER\" \/&gt;\r\n            &lt;\/intent-filter&gt;\r\n        &lt;\/activity&gt;\r\n\r\n        &lt;service\r\n            android:name=\".services.IrisService\"\r\n            android:exported=\"false\" \/&gt;\r\n\r\n        &lt;activity android:name=\".NutritionActivity\" \/&gt;\r\n\r\n        &lt;service\r\n            android:name=\".services.NutritionixService\"\r\n            android:exported=\"false\" \/&gt;\r\n    &lt;\/application&gt;\r\n\r\n&lt;\/manifest&gt;<\/pre>\n<h3>Main.java<\/h3>\n<p>At the top of our Main file, we declare our Custom Vision\u00a0endpoint as well as the Nutritionix endpoint (using a custom string formatter)<\/p>\n<pre class=\"lang:default decode:true\"># Variable declarations\r\n    ... (removed for brevity)\r\n    \/\/TODO: CHANGE ME!!\r\n    private final String ENDPOINT = \"your-cvs-endpoint-url\";\r\n    private final String NUTRI_ENDPOINT = \"https:\/\/api.nutritionix.com\/v1_1\/search\/%s\";\r\n    public static final String FOOD_RESULT = \"FOOD_RESULT\";\r\n    public static final String NUTRITION_RESULT = \"NUTRITION_RESULT\";\r\n    public static final String IRIS_REQUEST = \"IRIS_REQUEST\";\r\n    ... (removed for brevity)<\/pre>\n<h3>BroadcastReceivers<\/h3>\n<p>In order to get the results from our Async Background API calls, BroadcastReceivers are created and configured for each service (Custom Vision\u00a0API &amp; Nutritionix API).<\/p>\n<pre class=\"lang:default decode:true\"> private BroadcastReceiver irisReceiver = new BroadcastReceiver() {\r\n        @Override\r\n        public void onReceive(Context context, final Intent intent) {\r\n\r\n            runOnUiThread(new Runnable() {\r\n                @Override\r\n                public void run() {\r\n                    if (intent.getExtras().containsKey(IrisService.IRIS_SERVICE_ERROR)) {\r\n                        String msg = intent.getStringExtra(IrisService.IRIS_SERVICE_ERROR);\r\n                        resultTV.setText(msg);\r\n                        Toast.makeText(getApplicationContext(), msg, Toast.LENGTH_SHORT).show();\r\n                    } else if (intent.getExtras().containsKey(IrisService.IRIS_SERVICE_PAYLOAD)) {\r\n                        IrisData irisData = (IrisData) intent\r\n                                .getParcelableExtra(IrisService.IRIS_SERVICE_PAYLOAD);\r\n                        food_result = irisData.getClassifications().get(0);\r\n                        clearText();\r\n                        String msg = String.format(\"I'm %.0f%% confident that this is a %s \\n\", food_result.getProbability() * 100, food_result.getClass_());\r\n                        resultTV.append(msg);\r\n\r\n                        for (int i = 0; i &lt; irisData.getClassifications().size(); i++) {\r\n                            Log.i(TAG, \"onReceive: \" + irisData.getClassifications().get(i).getClass_());\r\n                        }\r\n                        requestNutritionInfo();\r\n                    }\r\n                }\r\n            });\r\n\r\n        }\r\n    };\r\n\r\n    private BroadcastReceiver nutritionixReceiver = new BroadcastReceiver() {\r\n        @Override\r\n        public void onReceive(Context context, Intent intent) {\r\n            if (intent.getExtras().containsKey(NutritionixService.NUTRITION_SERVICE_ERROR)) {\r\n                String msg = intent.getStringExtra(NutritionixService.NUTRITION_SERVICE_ERROR);\r\n                Toast.makeText(getApplicationContext(), msg, Toast.LENGTH_SHORT).show();\r\n            } else if (intent.getExtras().containsKey(NutritionixService.NUTRITION_SERVICE_PAYLOAD)) {\r\n                NutritionixData results = (NutritionixData) intent.getParcelableExtra(NutritionixService.NUTRITION_SERVICE_PAYLOAD);\r\n                nutritionixHit = results.getHits().get(0);\r\n                nutritionButton.setEnabled(true);\r\n            }\r\n        }\r\n    };<\/pre>\n<h3>Hitting the Custom Vision Endpoint<\/h3>\n<p>In order to call the Custom Vision\u00a0Endpoint, we build a RequestPackage either with the image URL provided or the device image, converted to a byteArray, and then set the request method to &#8220;POST&#8221;.<\/p>\n<p>After packaging that RequestPackage object into our intent, which we constructed from the Custom Vision\u00a0Class (<em>IrisService.class<\/em>), we can then start the service. When the result from that API call is returned, it will be received by the <code>irisReceiver<\/code> BroadcastReceiver mentioned above.<\/p>\n<pre class=\"lang:default decode:true\">  private void requestIrisService(final String type) {\r\n\r\n        final Bitmap croppedImage = image.getCroppedImage();\r\n\r\n        AsyncTask.execute(new Runnable() {\r\n            @Override\r\n            public void run() {\r\n                RequestPackage requestPackage = new RequestPackage();\r\n                Intent intent = new Intent(MainActivity.this, IrisService.class);\r\n                requestPackage.setParam(IRIS_REQUEST, \"IRIS\");\r\n\r\n                if (type.equals(URL)) {\r\n                    requestPackage.setEndPoint(String.format(ENDPOINT, URL));\r\n                    requestPackage.setParam(\"Url\", urlText.getText().toString());\r\n                } else if (type.equals(IMAGE)) {\r\n                    ByteArrayOutputStream stream = new ByteArrayOutputStream();\r\n                    croppedImage.compress(Bitmap.CompressFormat.JPEG, 50, stream);\r\n                    byte[] byteArray = stream.toByteArray();\r\n                    Log.d(TAG, \"requestIrisService: byte array size = \" + byteArray.length);\r\n                    requestPackage.setEndPoint(String.format(ENDPOINT, IMAGE));\r\n                    intent.putExtra(IrisService.REQUEST_IMAGE, byteArray);\r\n                }\r\n\r\n                requestPackage.setMethod(\"POST\");\r\n                intent.putExtra(IrisService.REQUEST_PACKAGE, requestPackage);\r\n\r\n                try {\r\n                    startService(intent);\r\n                } catch (Exception e) {\r\n                    runOnUiThread(new Runnable() {\r\n                        @Override\r\n                        public void run() {\r\n                            resultTV.setVisibility(View.GONE);\r\n                            Toast.makeText(getApplicationContext(), \"Image too large.\", Toast.LENGTH_LONG).show();\r\n                        }\r\n                    });\r\n\r\n                    e.printStackTrace();\r\n                }\r\n            }\r\n        });\r\n    }<\/pre>\n<h3>Helpers\/HttpHelper.java<\/h3>\n<h4 id=\"toc_8\">Making the Http Request<\/h4>\n<p>The static helper method\u00a0<code>makeRequest()<\/code>\u00a0below is called by the service class <code>IrisService.class<\/code>. It\u00a0passes in the RequestPackage mentioned above and an optional InputStream (if using an image).<\/p>\n<p>In order to successfully make the HTTP request,\u00a0an <code>OkHttpClient<\/code> client object is constructed and a <code>Request.Builder<\/code> is configured with the Custom Vision\u00a0predictionKey passed as a request header. Then, the client is used to execute the request. Eventually, it returns a successful stringified response body or throws a caught error if something went wrong.<\/p>\n<pre class=\"lang:default decode:true\">public static String makeRequest(RequestPackage requestPackage, InputStream data)\r\n            throws Exception {\r\n\r\n        String address = requestPackage.getEndpoint();\r\n\r\n        OkHttpClient client = new OkHttpClient();\r\n\r\n        Request.Builder requestBuilder = new Request.Builder();\r\n\r\n        iris = (requestPackage.getParams().containsKey(MainActivity.IRIS_REQUEST)) ? true : false;\r\n\r\n        if (requestPackage.getMethod().equals(\"POST\")) {\r\n            RequestBody requestBody = null;\r\n            if (iris) {\r\n                \/\/TODO: CHANGE ME!!\r\n                requestBuilder.addHeader(\"Prediction-Key\",\"a5427...\");\r\n                if (requestPackage.getParams().containsKey(\"Url\")) {\r\n                    requestBuilder.addHeader(\"Content-Type\",\"application\/json\");\r\n                    JSONObject json = new JSONObject(requestPackage.getParams());\r\n                    requestBody = RequestBody.create(MediaType.parse(\"application\/json; charset=utf-8\"), String.valueOf(json));\r\n                } else {\r\n                    if (data != null) {\r\n                        requestBuilder.addHeader(\"Content-Type\",\"application\/octet-stream\");\r\n                        requestBody = RequestBodyUtil.create(MediaType.parse(\"application\/octet-stream; charset=utf-8\"), data);\r\n                    } else {\r\n                        throw new Exception(\"No image data found\");\r\n                    }\r\n                }\r\n            } else {\r\n                MultipartBody.Builder builder = new MultipartBody.Builder()\r\n                        .setType(MultipartBody.FORM);\r\n                Map&lt;String, String&gt; params = requestPackage.getParams();\r\n                for (String key : params.keySet()) {\r\n                    builder.addFormDataPart(key, params.get(key));\r\n                }\r\n                requestBody = builder.build();\r\n            }\r\n            requestBuilder.method(\"POST\", requestBody);\r\n        } else if (requestPackage.getMethod().equals(\"GET\")) {\r\n            address = String.format(\"%s?%s\", address, requestPackage.getEncodedParams());\r\n        }\r\n\r\n        requestBuilder.url(address);\r\n\r\n        Request request = requestBuilder.build();\r\n        Response response = client.newCall(request).execute();\r\n        if (response.isSuccessful()) {\r\n            return response.body().string();\r\n        } else {\r\n            throw new IOException(\"Exception: response code \" + response.code());\r\n        }\r\n    }<\/pre>\n<h2>Screenshots<\/h2>\n<p><img decoding=\"async\" class=\"alignleft\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/app_thinking_screenshot.png\" alt=\"Making CVS Request\" width=\"211\" height=\"375\" \/><\/p>\n<p><img decoding=\"async\" class=\"alignleft\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/app_result_screenshot.png\" alt=\"CVS Result\" width=\"211\" height=\"375\" \/><\/p>\n<p><img decoding=\"async\" class=\"alignleft\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2017\/05\/app_cropper_screenshot.png\" alt=\"Cropping Feature\" width=\"211\" height=\"375\" \/><\/p>\n<h2 id=\"toc_10\"><\/h2>\n<h2 id=\"toc_10\">Conclusions<\/h2>\n<p>Custom Vision Service brings domain-specific Deep Neural Network-powered image recognition to your fingertips. Building a quick proof of concept app with a handful of classes is very simple. You can also easily make a prediction endpoint to experiment with, which works well in domains where the\u00a0number of classes is finite and the visual appearances of those classes are distinct. Simplicity comes with a\u00a0cost, however, as you can customize the training data, but not the algorithms. Additionally, as we&#8217;ve seen in this post, closely related classes need to be specifically addressed with techniques like layered models.<\/p>\n<p>Nevertheless, Custom Vision\u00a0is suitable for a broad range of domains. For example, you could use this technique to detect\u00a0items in a customer&#8217;s online cart, recognize UI elements, pre-filter images into categories to simplify further analyses, and so on.<\/p>\n<h2>Additional Information<\/h2>\n<p>The code for this example is available in our <a href=\"https:\/\/github.com\/CatalystCode\/Custom-Vison-Service\/tree\/master\/FoodClassification\">GitHub repository<\/a>.<\/p>\n<p>Custom Vision Service official documentation, overview and tutorials \u00a0are <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/custom-vision-service\/home\">here<\/a>.<\/p>\n<hr \/>\n<p>Cover image from <a href=\"https:\/\/unsplash.com\/collections\/599230\/phone-and-food\">Unsplash<\/a>, used under <a href=\"https:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/\">CC0 license<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How we created an app to identify foods and their nutritional content by using the new Custom Vision Service to leverage domain-specific image recognition powered by DNNs.<\/p>\n","protected":false},"author":21373,"featured_media":10981,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[19],"tags":[42,139,206,250,259],"class_list":["post-2935","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-android","tag-custom-vision-service","tag-image","tag-microsoft-cognitive-services","tag-mobile"],"acf":[],"blog_post_summary":"<p>How we created an app to identify foods and their nutritional content by using the new Custom Vision Service to leverage domain-specific image recognition powered by DNNs.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2935","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21373"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=2935"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2935\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/10981"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=2935"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=2935"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=2935"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}