{"id":3062,"date":"2023-02-09T14:06:45","date_gmt":"2023-02-09T22:06:45","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3062"},"modified":"2023-12-20T10:11:08","modified_gmt":"2023-12-20T18:11:08","slug":"onnx-machine-learning-2","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/onnx-machine-learning-2\/","title":{"rendered":"ONNX runtime inputs and outputs"},"content":{"rendered":"<p>\n  Hello Android developers,\n<\/p>\n<p>\n  Last week we got an <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/onnx-machine-learning-1\/\">ONNX runtime demo running on Android<\/a>, which classified the subject of images being streamed from the device\u2019s camera. Setup required downloading a pre-trained model and adding it to the sample app on GitHub. This week we\u2019re going to look into the details of preparing inputs for the model, following the sample app\u2019s code.\n<\/p>\n<h2>Model inputs<\/h2>\n<p>\n  Pre-trained models in formats that can be shared across platforms are incredibly powerful, but it makes sense that each model must have inputs supplied in a known and repeatable way to get the most accurate results. Models will typically specify the expected format of their input parameters.\n<\/p>\n<p>\n  FOr example, on the information page for the <a href=\"https:\/\/pytorch.org\/hub\/pytorch_vision_mobilenet_v2\/\">MOBILENET V2 model<\/a>, (from last week&#8217;s sample) you\u2019ll find the following information along with sample Python code that shows an example of how to pre-process image data before sending to the model.\n<\/p>\n<blockquote><p>All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape\u00a0(3 x H x W), where\u00a0H\u00a0and\u00a0W\u00a0are expected to be at least\u00a0224. The images have to be loaded in to a range of\u00a0[0, 1]\u00a0and then normalized using\u00a0mean = [0.485, 0.456, 0.406]\u00a0and\u00a0std = [0.229, 0.224, 0.225].<\/p><\/blockquote>\n<p>\n  You can use <a href=\"https:\/\/colab.research.google.com\/github\/pytorch\/pytorch.github.io\/blob\/master\/assets\/hub\/pytorch_vision_mobilenet_v2.ipynb\">Python notebook to interactively step through<\/a> the pre-processing code and test the model, such as with this sample image input:\n<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/02\/onnx-python.png\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/02\/onnx-python.png\" alt=\"\" width=\"990\" height=\"307\" class=\"alignnone size-full wp-image-3065\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/02\/onnx-python.png 990w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/02\/onnx-python-300x93.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/02\/onnx-python-768x238.png 768w\" sizes=\"(max-width: 990px) 100vw, 990px\" \/><\/a><\/p>\n<p>\n  Many clients (including mobile apps on Android) can\u2019t easily re-use the Python code provided, and will have to implement their own pre-processing in a native language like Java or Kotlin. The following sections show examples of how the Python code can be adapted to interact with an ONNX model on Android using Kotlin.\n<\/p>\n<h2>Image input formatting on Android<\/h2>\n<p>\n  The sample does its image pre-processing in the <code>analyze<\/code> function in <a href=\"https:\/\/github.com\/microsoft\/onnxruntime-inference-examples\/blob\/f6a8f370333d547023c8f2fee4a42627d0f67a14\/mobile\/examples\/image_classification\/android\/app\/src\/main\/java\/ai\/onnxruntime\/example\/imageclassifier\/ORTAnalyzer.kt#L75\">ORTAnalyzer.kt<\/a>.\n<\/p>\n<p>\n  The first step is resizing to the required dimensions (224&#215;224) with this function call:\n<\/p>\n<pre>Bitmap.createScaledBitmap(it, 224, 224, false)<\/pre>\n<p>\n  Further manipulation is done in the <code>preProcess<\/code> function which is defined in <a href=\"https:\/\/github.com\/microsoft\/onnxruntime-inference-examples\/blob\/f6a8f370333d547023c8f2fee4a42627d0f67a14\/mobile\/examples\/image_classification\/android\/app\/src\/main\/java\/ai\/onnxruntime\/example\/imageclassifier\/ImageUtil.kt#L29\">ImageUtil.kt<\/a>. In this code snippet you can see the normalization process on each pixel that matches the parameters specified for the model:\n<\/p>\n<pre>bitmap.getPixels(bmpData, 0, bitmap.width, 0, 0, bitmap.width, bitmap.height)\r\nfor (i in 0..IMAGE_SIZE_X - 1) {\r\n    for (j in 0..IMAGE_SIZE_Y - 1) {\r\n        val idx = IMAGE_SIZE_Y * i + j\r\n        val pixelValue = bmpData[idx]\r\n        imgData.put(idx, (((pixelValue shr 16 and 0xFF) \/ 255f - 0.485f) \/ 0.229f))\r\n        imgData.put(idx + stride, (((pixelValue shr 8 and 0xFF) \/ 255f - 0.456f) \/ 0.224f))\r\n        imgData.put(idx + stride * 2, (((pixelValue and 0xFF) \/ 255f - 0.406f) \/ 0.225f))\r\n    }\r\n}<\/pre>\n<p>\n  You\u2019ll find some additional image processing code for bitmap conversion to the correct format for ONNX in the <strong><a href=\"https:\/\/github.com\/microsoft\/onnxruntime-inference-examples\/blob\/f6a8f370333d547023c8f2fee4a42627d0f67a14\/mobile\/examples\/image_classification\/android\/app\/src\/main\/java\/ai\/onnxruntime\/example\/imageclassifier\/ImageUtil.kt\">ImageUtil.kt<\/a><\/strong> file.\n<\/p>\n<h2>Output parsing<\/h2>\n<p>\n  The <a href=\"https:\/\/pytorch.org\/hub\/pytorch_vision_mobilenet_v2\/\">model page<\/a> sample code also contains information about how to parse the output:\n<\/p>\n<blockquote><p># Tensor of shape 1000, with confidence scores over Imagenet&#8217;s 1000 classes\n<br\/># The output has unnormalized scores. To get probabilities, you can run a softmax on it.<\/p><\/blockquote>\n<p>\n  This means the result is a thousand \u2018confidence scores\u2019, one for each possible classification. The model result doesn\u2019t include the text descriptions of each possible classification, so as a consumer of the model you need to separately download the <a href=\"https:\/\/raw.githubusercontent.com\/pytorch\/hub\/master\/imagenet_classes.txt\">classes list<\/a> and match them up with the scores. \n<\/p>\n<p>\n  You can find the code that extracts the scores and runs them through the custom functions <code>softMax<\/code> and <code>getTop3<\/code> to get the indices of the highest scoring classifications in <strong><a href=\"https:\/\/github.com\/microsoft\/onnxruntime-inference-examples\/blob\/f6a8f370333d547023c8f2fee4a42627d0f67a14\/mobile\/examples\/image_classification\/android\/app\/src\/main\/java\/ai\/onnxruntime\/example\/imageclassifier\/ORTAnalyzer.kt&quot; \\l &quot;L96\">ORTAnalyzer.kt<\/a><\/strong>:\n<\/p>\n<pre>val rawOutput = ((output?.get(0)?.value) as Array&lt;FloatArray&gt;)[0]\r\nval probabilities = softMax(rawOutput)\r\nresult.detectedIndices = getTop3(probabilities)\r\n<\/pre>\n<p>\n  To display the results with the correct label, in <strong><a href=\"https:\/\/github.com\/microsoft\/onnxruntime-inference-examples\/blob\/f6a8f370333d547023c8f2fee4a42627d0f67a14\/mobile\/examples\/image_classification\/android\/app\/src\/main\/java\/ai\/onnxruntime\/example\/imageclassifier\/MainActivity.kt&quot; \\l &quot;L146\">MainActivity.kt<\/a><\/strong> the downloaded file (placed in the raw resources directory) is parsed:\n<\/p>\n<pre>resources.openRawResource(R.raw.imagenet_classes).bufferedReader().readLines()<\/pre>\n<p>\n  And when the results are displayed, the index of the highest scoring classifications are used to show the correct text value from the list:\n<\/p>\n<pre>detected_item_1.text = labelData[result.detectedIndices[0]]\r\ndetected_item_value_1.text = \"%.2f%%\".format(result.detectedScore[0] * 100)<\/pre>\n<p>\n  Any model you decide to add to your apps will have its own input parameter formatting requirements and output parsing rules, so remember to check the model\u2019s documentation while implementing. \n<\/p>\n<p>\n  Not all models have images as their input or collections of data as the output. For a simpler example, this <a href=\"https:\/\/github.com\/shubham0204\/Scikit_Learn_Android_Demo\/blob\/a1dbeec675892728524064792134864e590e497b\/app\/src\/main\/java\/com\/mobileml\/shubham0204\/scikitlearndemo\/MainActivity.kt#L48\">linear regression sample on GitHub<\/a> shows how to provide a simple numerical value to a model which returns a single numerical result (check the <a href=\"https:\/\/towardsdatascience.com\/deploying-scikit-learn-models-in-android-apps-with-onnx-b3adabe16bab\">associated blog<\/a> for instructions to build, convert, and place the model in the Android code).\n<\/p>\n<h2>Resources and feedback<\/h2>\n<p>\n  More information about the ONNX Runtime is available at\u00a0<a href=\"https:\/\/onnxruntime.ai\/\" target=\"_blank\" rel=\"noopener\">onnxruntime.ai<\/a>\u00a0and also on\u00a0<a href=\"https:\/\/www.youtube.com\/onnxruntime\" target=\"_blank\" rel=\"noopener\">YouTube<\/a>.\n<\/p>\n<p>\n  If you have any questions about applying machine learning, or would like to tell us about your apps, use the\u00a0<a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\" target=\"_blank\" rel=\"noopener\">feedback forum<\/a>\u00a0or message us on\u00a0<a href=\"https:\/\/twitter.com\/surfaceduodev\" target=\"_blank\" rel=\"noopener\">Twitter @surfaceduodev<\/a>.\n<\/p>\n<p>\n  There won\u2019t be a livestream this week, but check out the\u00a0<a href=\"https:\/\/youtube.com\/c\/surfaceduodev\" target=\"_blank\" rel=\"noopener\">archives on YouTube<\/a>. We\u2019ll see you online again soon!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello Android developers, Last week we got an ONNX runtime demo running on Android, which classified the subject of images being streamed from the device\u2019s camera. Setup required downloading a pre-trained model and adding it to the sample app on GitHub. This week we\u2019re going to look into the details of preparing inputs for the [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3065,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[740],"tags":[473,729,728],"class_list":["post-3062","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-kotlin","tag-machine-learning","tag-onnx"],"acf":[],"blog_post_summary":"<p>Hello Android developers, Last week we got an ONNX runtime demo running on Android, which classified the subject of images being streamed from the device\u2019s camera. Setup required downloading a pre-trained model and adding it to the sample app on GitHub. This week we\u2019re going to look into the details of preparing inputs for the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3062","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3062"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3062\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3065"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3062"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3062"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3062"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}