{"id":2175,"date":"2015-07-21T16:34:28","date_gmt":"2015-07-21T23:34:28","guid":{"rendered":"https:\/\/www.microsoft.com\/reallifecode\/index.php\/2015\/07\/21\/using-camera-stream-for-real-time-object-tracking-in-windows-apps\/"},"modified":"2020-03-18T23:20:51","modified_gmt":"2020-03-19T06:20:51","slug":"using-camera-stream-for-real-time-object-tracking-in-windows-apps","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/using-camera-stream-for-real-time-object-tracking-in-windows-apps\/","title":{"rendered":"Using camera stream for real time object tracking in Windows apps"},"content":{"rendered":"<h1 id=\"summary\">Summary<\/h2>\n<p>The imaging and computing capabilities in commodity mobile devices provide exciting possibilities for computer vision based solutions. While some computer vision scenarios are purpose-built from the get-go with mobile device usage in mind, we have also witnessed several scenarios where existing legacy solutions could benefit from replacing the often expensive and immobile hardware (camera and computing) setup with modern smartphones and their imaging capabilities.<\/p>\n<p>As an example of this, we ran a pilot with a leading sports brand to enable tracking and analyzing object movement \u2013 consider a ball leaving a bat or a club at arbitrary launch angles, high velocity and varying spin rates.<\/p>\n<p>The partner wanted to explore how well modern smartphones would suit their needs and enable moving away from the usage patterns and restrictions caused by their existing imaging hardware and analytics pipeline.<\/p>\n<p>As a byproduct of the pilot, we identified reusable patterns and ways to avoid common pitfalls when implementing object tracking in Windows apps. This case study, the referenced blog and sample code will introduce our key findings and help the reader identify the best approaches to object detection and tracking to use in their Windows apps.<\/p>\n<h1 id=\"solution\">Solution<\/h2>\n<p>The scenario we faced could be broken down to this pipeline.<\/p>\n<ol>\n<li>Capture video data from a Windows phone<\/li>\n<li>Understand the data format provided to us<\/li>\n<li>Identify the object we\u2019re interested in tracking from the video data<\/li>\n<li>Lock and track the object while it blazes through the field of view<\/li>\n<li>Analyze what actually happened to the object<\/li>\n<li>Don\u2019t mess up the performance during steps 1-5. Easy.<\/li>\n<\/ol>\n<h2 id=\"capture-video-data-from-a-windows-device\">Capture video data from a Windows device<\/h2>\n<p>Capturing video frames in Windows apps has been well documented in Windows platform documentation. Use these MSDN resources to understand how to get the video data:<\/p>\n<p><a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/windows\/apps\/xaml\/Dn642092\">Quickstart: Capturing video by using the MediaCapture API<\/a><\/p>\n<p><a href=\"https:\/\/code.msdn.microsoft.com\/windowsapps\/Media-Capture-Sample-adf87622\">Media capture using capture device sample app<\/a><\/p>\n<h2 id=\"understand-the-data-format\">Understand the data format<\/h2>\n<p>Before processing the data, we must first understand the data characteristics (format and frequency). Here is a sample of the image data that we are processing.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/2015-07-21-Using-camera-stream-for-real-time-object-tracking-in-Windows-apps_images-image001.png\" alt=\"Target object locked\" \/><\/p>\n<p><em>A four pixel section in NV12 image data &#8211; location of the bytes in NV12 byte array A, where h is the height of the image and w is the width of the image (in bytes).<\/em><\/p>\n<p>In the first part of Tomi Paananen\u2019s blog posts, he <a href=\"http:\/\/tomipaananen.azurewebsites.net\/?p=361\">introduces two common YUV color space formats<\/a> and then dives into details on how to work with NV12 format, which is a conventional data type for smartphone cameras.<\/p>\n<h2 id=\"identify-the-object-were-interested-in-tracking-from-the-video-data\">Identify the object we\u2019re interested in tracking from the video data<\/h2>\n<p>Now that we understand the type of data provided to us, it\u2019s time to think about how to identify the objects we\u2019re looking for from the video data.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/2015-07-21-Using-camera-stream-for-real-time-object-tracking-in-Windows-apps_images-image002.png\" alt=\"Object motion captured\" \/><\/p>\n<p>Red areas filtered out from a video frame using the problem solving pipeline described below.<\/p>\n<p>The problem solving pipeline described below is broken down in detail in a blog post that explains approaches to <a href=\"http:\/\/tomipaananen.azurewebsites.net\/?p=481\">detecting areas of similar pixels using threshold values and applying versions of the convex hull algorithm to ease detection of objects<\/a>.<\/p>\n<table>\n<thead>\n<tr>\n<th>What is the result?<\/th>\n<th>How do we get there?<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>An object, with a specific shape, identified in the image (location, size, shape etc.)<\/td>\n<td>Find the center of mass of the two dimensional object and its boundary.<\/td>\n<\/tr>\n<tr>\n<td>Boundary of the detected object.<\/td>\n<td>Apply convex hull algorithm to object map (extended binary image, where the background is removed).<\/td>\n<\/tr>\n<tr>\n<td>Object map, where all significant (suspected)<\/td>\n<td>objects are separated and the background is removed. Individualize objects from a binary image.<\/td>\n<\/tr>\n<tr>\n<td>Binary image in which objects have value 1 and background value 0.<\/td>\n<td>Apply algorithm to extract objects with some criteria from the background. E.g. chroma filter.<\/td>\n<\/tr>\n<tr>\n<td>Chroma filter implementation to extract objects from image.<\/td>\n<td>Start coding!<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2 id=\"lock-and-track-the-object-while-it-moves-through-the-field-of-view\">Lock and track the object while it moves through the field of view<\/h2>\n<p>Since our object tracking solution is required to provide real-time analysis while running on (Windows) mobile devices, we need to be considerate on what algorithms are used and on which set of pixels from a frame to actually apply them.<\/p>\n<p>Instead of applying all methods described in the previous section on each analyzed frame, our sample implementation provides a way to lock the object of interest and, in following frames, apply chroma filtering to only the small subset of frame pixels which contain the object we want to track.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/2015-07-21-Using-camera-stream-for-real-time-object-tracking-in-Windows-apps_images-image003.png\" alt=\"VaatturiLockedToObjectScaled\" \/><\/p>\n<p>Target object locked, and tracking limited to the region marked by the green rectangle<\/p>\n<p>This <a href=\"http:\/\/tomipaananen.azurewebsites.net\/?p=581\">blog post explains the object locking and partial pixel map tracking<\/a> in more detail.<\/p>\n<h2 id=\"analyze-what-actually-happened-to-the-object\">Analyze what actually happened to the object<\/h2>\n<p>Unsurprisingly, our scenario eventually ends up in a scenario very common in technology and life: We have raw and interesting data \u2013 what should we actually do to analyze it?<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/2015-07-21-Using-camera-stream-for-real-time-object-tracking-in-Windows-apps_images-image004.png\" alt=\"VaatturiObjectMotionCapturedScaled\" \/><\/p>\n<p>Object (USB cannon projectile wrapped with pink sticker) motion captured<\/p>\n<p>Tomi Paananen and Juhana Koski provide some concrete tips for the (nearly) real time analysis \u2013 explaining<\/p>\n<p>a) How a ring buffer approach to buffer video was used in our solution to enable a good compromise on consumed resources and provide ability to do a bit of post processing on the captured video frames<\/p>\n<p>b) Iterating through the buffered video frames to <a href=\"http:\/\/tomipaananen.azurewebsites.net\/?p=581\">understand object displacement<\/a> using the pipeline mentioned earlier in this post<\/p>\n<p>In our scenario, further data analysis could contain for example real time speed, launch angle and rotation related calculations \u2013 and perhaps pushing that information to a secondary, non-real time, analytics pipeline that could utilize Azure Machine Learning or other approaches to get more insight or predictions.<\/p>\n<h1 id=\"next-steps\">Next Steps<\/h2>\n<p>We still have ways to go in reaching our stretch goal of analyzing moving objects at a high frame rate (&gt; 400 fps) using mobile devices. The next areas that we would focus on to continue to improve performance of the pipeline under challenging conditions such as variations in lighting, distortion of objects shape and color blending after a heavy impact causing high velocity.<\/p>\n<h1 id=\"reusable-assets\">Reusable assets<\/h2>\n<p>Tracking objects from video feed \u2013 3 part blog series<\/p>\n<ul>\n<li>Image data formats &#8211; <a href=\"http:\/\/tomipaananen.azurewebsites.net\/?p=361\">http:\/\/tomipaananen.azurewebsites.net\/?p=361<\/a><\/li>\n<li>Identifying a stationary object &#8211; <a href=\"http:\/\/tomipaananen.azurewebsites.net\/?p=481\">http:\/\/tomipaananen.azurewebsites.net\/?p=481<\/a><\/li>\n<li>Detecting object displacement &#8211; <a href=\"http:\/\/tomipaananen.azurewebsites.net\/?p=581\">http:\/\/tomipaananen.azurewebsites.net\/?p=581<\/a><\/li>\n<li>Buffering video frames using a ring buffer &#8211; http:\/\/juhana.cloudapp.net\/?p=181<\/li>\n<li>Object tracking demo app source code (Windows) &#8211; <a href=\"https:\/\/github.com\/tompaana\/object-tracking-demo\">https:\/\/github.com\/tompaana\/object-tracking-demo<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Findings from a pilot project where we identified reusable patterns, and ways to avoid common pitfalls when implementing object detection and tracking in Windows apps. <\/p>\n","protected":false},"author":21375,"featured_media":12732,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[19],"tags":[127,239,389],"class_list":["post-2175","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-computer-vision","tag-machine-learning-ml","tag-windows-phone-apps"],"acf":[],"blog_post_summary":"<p>Findings from a pilot project where we identified reusable patterns, and ways to avoid common pitfalls when implementing object detection and tracking in Windows apps. <\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2175","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21375"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=2175"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2175\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/12732"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=2175"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=2175"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=2175"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}