{"id":2146,"date":"2016-02-08T17:05:12","date_gmt":"2016-02-09T01:05:12","guid":{"rendered":"https:\/\/www.microsoft.com\/reallifecode\/index.php\/2016\/02\/08\/video-tagging-tool-for-video-processing-and-image-recognition\/"},"modified":"2020-03-18T13:59:54","modified_gmt":"2020-03-18T20:59:54","slug":"video-tagging-tool-for-video-processing-and-image-recognition","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/video-tagging-tool-for-video-processing-and-image-recognition\/","title":{"rendered":"Video Tagging Tool for Video-Processing and Image Recognition"},"content":{"rendered":"<p><em>Image: <a href=\"https:\/\/www.flickr.com\/photos\/deepfrozen\/19357743954\">DJI Inspire 1 Drone<\/a> by <a href=\"https:\/\/www.flickr.com\/photos\/deepfrozen\">DFSB DE<\/a>, used by <a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/2.0\">CC BY-SA 2.0<\/a><\/em><\/p>\n<h2 id=\"background\">Background<\/h2>\n<p>The drone industry is a fast growing space with more and more players joining this field in a rapid pace.\nMany drone applications, in particular, autonomous drone ones, require vision capabilities. The camera feed is being processed by the drone in order to support tasks such as tracking, navigating, filming, and more.\nCompanies that are using video processing and image recognition as part of their solutions, need the ability to train their algorithms to identify objects in a video stream. Whether it\u2019s a drone that is trained to identify suspicious activities such as human presence where it shouldn\u2019t be, or a robot that is navigating itself indoors following its human owner- they all need the ability to identify and track objects around them.\nIn order to improve the recognition\/tracking algorithms, there\u2019s a need to create a collection of manually tagged videos, which serves as the <em>ground truth<\/em>.<\/p>\n<h2 id=\"the-problem\">The Problem<\/h2>\n<p>The tagging work today is very sisyphean manual work that is done by humans (e.g., <a href=\"https:\/\/en.wikipedia.org\/wiki\/Amazon_Mechanical_Turk\">Mechanical Turk<\/a>). The Turks are sitting in front of the video, watching it frame by frame, and maintaining excel files describing each frame in the video stream. The data that is kept for each frame is the frame\u2019s number, which objects are found in the frame, and the area in the frame in which these objects are located.\nThe video and the tagging metadata is then used by the algorithm to learn how to identify these objects. The tagging data that was created manually by a Turk (the <em>ground truth<\/em>), is also used to run quality and regression tests for the algorithm, by comparing the algorithm\u2019s tagging results to the Turk\u2019s tagging data.<\/p>\n<p>Exploring the web for existing solutions didn\u2019t bring great results. While larger companies in the video processing space are developing their own internal tools for doing the tagging work, the small startups that don\u2019t have the bandwidth to invest in developing such tools (which are also not part of their IP) are doing manual work, tagging the videos frame by frame using excel. We couldn\u2019t find any tool that came near something that we could easily just take and use.<\/p>\n<p>Solutions similar to Mechanical Turk have the following limitation, which brought us to the decision of developing this tool:\n1) Lack of or poor video tagging support, there aren\u2019t good (OSS) tools today.\n2) High quantity over high quality.\n3) Sometimes there\u2019s a business need to keep those videos confidential, and allow tagging by trusted people only.<\/p>\n<h2 id=\"the-engagement\">The Engagement<\/h2>\n<p>We engaged with <a href=\"http:\/\/www.percepto.co\/\">Percepto<\/a> and <a href=\"http:\/\/thirdeye-systems.com\/\">ThirdEye Systems<\/a> which are startups that provide vision for drones. Both of these startups needed some kind of a frame-by-frame video tagging tool to be able to use it with their video processing algorithms as described above.\nWe met each of the startups for a few days of hacking, to discuss their scenarios, do some brainstorming and scope the problem. We then decided on developing a video tagging tool, with the most essential features, addressing their needs. The video tagging tool is composed of the following two main modules:<\/p>\n<h3 id=\"html-video-tagging-control\">HTML Video-Tagging Control<\/h3>\n<p>This is a <a href=\"https:\/\/www.polymer-project.org\/1.0\/\">polymer<\/a> based web control that enables those who consume it to add it to their web pages just like they would with any other native HTML element. It provides basic functionality like navigating through the video frames, selecting an area or a specific location on the frame, and tag it with a single or multiple tagging, and more.\nDetailed feature list, code, demo and usage can be found in the project\u2019s <a href=\"https:\/\/github.com\/CatalystCode\/video-tagging\">Github repository<\/a>.<\/p>\n<h3 id=\"a-video-tagging-web-tool\">A Video Tagging Web Tool<\/h3>\n<p>This is a single-page, angular.js based web application that provides a basic holistic solution for managing users, videos and tagging jobs.\nIt uses the Video-Tagging HTML control to demonstrate a real use of it in an actual web app.\nThe tool comes with built-in authentication and authorization mechanisms. We used Google login for the authentication, and defined two main roles for the authorization. Each user is either an Admin or an Editor.<\/p>\n<p>An Admin is able to add users, upload videos and create video-tagging jobs (assigning users to videos). An Admin can also review and approve video-tagging jobs, as well as fix specific tags while reviewing.\nAn Editor can only view his jobs list, and do the actual tagging work. When the tagging-work is done, the editor sends the job for review, which is done by an Admin that reviews and approves it.\nIn the end, the tags can also be downloaded (json format) to be used with the video-processing algorithms.<\/p>\n<p>The data schema of each entity (user, video, job) was designed to be extensible. Users can use the tool as is, with its current DB schema (Sql server), and add more data items without changing the schema. In the tool, for example, we keep various of metadata items for a job, like RegionType for example, to define if we would like to tag a specific location, or an area in the frames.<\/p>\n<p>The server side code isn\u2019t aware of this data. It is just being used as a pipe between the client side and the storage layer. It was important for us to provide a framework that will enable adding features without changing the schema, or at least minimizing the amount of changes required to add more feature to the tool.<\/p>\n<p>Detailed feature list, code, and usage can be found in the project\u2019s <a href=\"https:\/\/github.com\/CatalystCode\/VideoTaggingTool\">Github repository<\/a>.<\/p>\n<h2 id=\"the-tool-in-action\">The Tool in Action<\/h2>\n<h3 id=\"managing-videos\">Managing videos<\/h3>\n<ul>\n<li>Videos list<\/li>\n<li>Filter videos by labels<\/li>\n<li>Edit videos<\/li>\n<li>Create a tagging job for a video<\/li>\n<li>Download tags for a video\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/2016-02-09-Developing-a-Video-Tagging-Tool-for-Video-Processing-and-Image-Recognition-videos.png\" alt=\"Managing videos\" \/><\/li>\n<\/ul>\n<h3 id=\"creating-a-job\">Creating a job<\/h3>\n<ul>\n<li>Assigning a video to a user for tagging<\/li>\n<li>Configure the tagging job\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/2016-02-09-Developing-a-Video-Tagging-Tool-for-Video-Processing-and-Image-Recognition-newjob.png\" alt=\"Creating a job\" \/><\/li>\n<\/ul>\n<blockquote><p>The settings section is UI specific- the server side nor the DB layer are aware of these data items<\/p><\/blockquote>\n<h3 id=\"the-tagging\">The Tagging<\/h3>\n<ul>\n<li>Navigating frame by frame<\/li>\n<li>Select areas on the frame, and tags for each area<\/li>\n<li>Video controls<\/li>\n<li>Review a tagging job<\/li>\n<li>Send a job for review by an Admin, or approve the tagging job\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2020\/03\/2016-02-09-Developing-a-Video-Tagging-Tool-for-Video-Processing-and-Image-Recognition-tagging.png\" alt=\"Tagging\" \/><\/li>\n<\/ul>\n<h2 id=\"opportunities-for-reuse\">Opportunities for reuse<\/h2>\n<p>As previously mentioned, the tool provides holistic solution with basic functionality for managing users, videos and tagging tasks. It can be used by anyone who has similar need for video tagging, whether for drones, robots, etc.\nIt can either be used as is, or be extended with more features based on your needs. For example, you can extend the shapes of the tagging area by implementing a more specific polygon for tagging. Other features like supporting multiple users tagging the same video, calculating average\/conflation of their results, etc.\nWe encourage you to contribute to the project by sending pull requests for new features so that others can benefit from your work as well.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Developing a frame-by-frame video tagging tool for video processing and image recognition.<\/p>\n","protected":false},"author":21349,"featured_media":12522,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[19],"tags":[161,239,375],"class_list":["post-2146","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-drones","tag-machine-learning-ml","tag-video-tagging"],"acf":[],"blog_post_summary":"<p>Developing a frame-by-frame video tagging tool for video processing and image recognition.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21349"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=2146"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2146\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/12522"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=2146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=2146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=2146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}