{"id":13552,"date":"2021-06-07T09:26:36","date_gmt":"2021-06-07T16:26:36","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cse\/?p=13552"},"modified":"2021-06-07T09:32:59","modified_gmt":"2021-06-07T16:32:59","slug":"using-azure-cognitive-services-to-analyse-evidence-in-public-safety-and-justice","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/using-azure-cognitive-services-to-analyse-evidence-in-public-safety-and-justice\/","title":{"rendered":"Using Azure Cognitive Services to Analyse Evidence in Public Safety and Justice"},"content":{"rendered":"<h3 aria-level=\"2\"><a href=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-13683\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER.jpg\" alt=\"Image Using Azure Cognitive services to analyse evidence in public safety and justice HEADER\" width=\"2356\" height=\"1194\" srcset=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER.jpg 2356w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER-300x152.jpg 300w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER-1024x519.jpg 1024w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER-768x389.jpg 768w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER-1536x778.jpg 1536w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/Using-Azure-Cognitive-services-to-analyse-evidence-in-public-safety-and-justice-HEADER-2048x1038.jpg 2048w\" sizes=\"(max-width: 2356px) 100vw, 2356px\" \/><\/a><\/h3>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">Background<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p>A scenario commonly encountered in public safety and justice is the need to collect, store and index digital data recovered from devices, so that investigating officers can perform objective, evidence-based analysis. This evidence can be in the form of media files (video, audio, or image files) or computer readable documents (documents, spreadsheets etc.) and is typically collected from devices using digital forensics systems and compiled into case specific \u2018workspaces\u2019 using legal e-discovery tools.<\/p>\n<p>CSE recently partnered with a customer to develop an <em>advanced evidence analysis platform<\/em> that use <em>Azure AI technology<\/em> for automatic labelling of media and documents collected as evidence.<\/p>\n<p>&nbsp;<\/p>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">Problem<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p>The customer has an investigation team that collects, preserves and stores multiple terabytes \u00a0of evidence in the form of images, audio, video, and digital text files that have been recovered directly from devices (e.g., laptops and mobiles) and other relevant sources.<\/p>\n<p>The data consists of multiple TBs of raw data files and metadata that is recovered using an on-premise digital forensic software, <a href=\"https:\/\/www.nuix.com\/\"><strong>NUIX<\/strong><\/a>.\u00a0A team of twenty to thirty investigators and legal officers search, examine, and compile this data as evidence using <a href=\"https:\/\/www.relativity.com\/\"><strong>Relativity<\/strong><\/a>, a legal e-discovery system.<\/p>\n<p>The investigator needed better methods to review the significantly large volume of data to identify the files needed for evidence. Typically, content moderators and translators manually examine each file and tag it based on their interpretation of the content.\u00a0This process was time consuming, potentially error- and bias-prone, with a large backlog of digital evidence building up waiting to be analysed. They wanted to utilize AI to \u201clook inside\u201d the recovered files to identify relevant information and enable investigators to build evidence more effectively and efficiently.<\/p>\n<p>&nbsp;<\/p>\n<h4 aria-level=\"3\"><b><span data-contrast=\"auto\">Challenges we needed to consider<\/span><\/b><\/h4>\n<ul>\n<li>The data was highly sensitive and confidential which meant that security at rest and in transit was an important area of focus.<\/li>\n<li>The solution needed to use pre-built AI services as much as possible to limit the need to train staff in machine learning and be able to quickly reap the benefits of the suite of tools offered by Azure\u2019s Cognitive services<\/li>\n<li>The services and the data storage needed to support large data volumes (between 500 TB and 1 PB) of raw data over a low bandwidth connection to the on-prem office batches of up to 100,000 files.<\/li>\n<li>Since Azure Cognitive Services has rate limits, concurrency control and throttling had to be implemented in the workflow.<\/li>\n<\/ul>\n<h3 aria-level=\"2\"><\/h3>\n<p>&nbsp;<\/p>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">Solution\u00a0<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p>The solution we proposed would focus on secure upload and preparation of raw data for processing in the cloud, labelling the data with the insights derived from AI and providing them to the investigation team in an easy to consume form so that they may process the evidence more effectively.<\/p>\n<p><strong>Secure Data ingest and storage<\/strong><\/p>\n<ul>\n<li>Data recovered by the NUIX e-forensics tool is uploaded to a secure data lake.<\/li>\n<li>The data is validated, cleaned, and prepared for enrichment.<\/li>\n<\/ul>\n<p><strong>Secure AI enrichment and orchestration<\/strong><\/p>\n<ul>\n<li>The uploaded data is run through Azure AI technologies to automatically extract insights and provide language translation.<\/li>\n<li>The first phase focussed on video, image, and digital text files.<\/li>\n<\/ul>\n<p><strong>Search and E-discovery<\/strong><\/p>\n<ul>\n<li>The native data, it\u2019s metadata and the AI enrichments are exported to the Relativity e-discovery tool providing the ability to search on these insights and create reports. For example, if video evidence was analysed, the user would get searchable insights such as known places or objects, text on screen, and transcript of the audio<\/li>\n<\/ul>\n<h4 aria-level=\"3\"><\/h4>\n<p>&nbsp;<\/p>\n<h4 aria-level=\"3\"><b><span data-contrast=\"auto\">Personas<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h4>\n<table style=\"height: 22px; width: 39.0751%; border-style: none;\" data-tablestyle=\"MsoTableGrid\" data-tablelook=\"1184\" aria-rowcount=\"1\">\n<tbody>\n<tr style=\"height: 42px;\" aria-rowindex=\"1\">\n<td style=\"width: 29.5033%; height: 22px;\" data-celllook=\"4369\"><a href=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/admin.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-13565\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/admin.jpg\" alt=\"Image admin\" width=\"78\" height=\"78\" \/><\/a><\/td>\n<td style=\"width: 164.169%; height: 22px; text-align: left;\" data-celllook=\"4369\"><span data-contrast=\"auto\">Content\u00a0Administrator<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:259}\">\u00a0<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span data-contrast=\"auto\">The Content Administrator\u2019s objective is to make sure that the\u00a0evidence\u00a0retrieved\u00a0from the e-forensics\u00a0system\u00a0is available\u00a0for processing\u00a0on the\u00a0cloud platform. They are responsible for correcting any validation errors in the metadata and index file (for e.g.,\u00a0missing files, incorrect format, incomplete upload) with the help of the e-forensics\u00a0system.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<table style=\"width: 39.0841%; height: 81px;\" data-tablestyle=\"MsoTableGrid\" data-tablelook=\"1184\" aria-rowcount=\"1\">\n<tbody>\n<tr style=\"height: 81px;\" aria-rowindex=\"1\">\n<td style=\"width: 29.0274%; height: 81px;\" data-celllook=\"4369\"><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:259}\"><a href=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/investigator.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-13567\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/investigator.jpg\" alt=\"Image investigator\" width=\"77\" height=\"77\" \/><\/a><\/span><\/td>\n<td style=\"width: 316.659%; height: 81px;\" data-celllook=\"4369\"><span data-contrast=\"auto\">Investigator<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:259}\">\u00a0<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span data-contrast=\"auto\">The Investigator\u2019s objective is to use the data in the e-discovery system\u00a0as evidence in legal prosecution.\u00a0They\u00a0will log in to the e-discovery tool and search for and collate the files needed for their investigation. They can also leverage the Video Indexer portal to help\u00a0with\u00a0the search and discovery of video\u00a0files.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p aria-level=\"2\"><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">Outcome<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<h4 aria-level=\"3\"><b><span data-contrast=\"auto\">Design considerations<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h4>\n<h5>Logic apps v\/s durable functions v\/s Azure Cognitive search<\/h5>\n<p>Early in the project, the team had to make a design choice between Logic Apps, Durable functions, and Azure Cognitive search for implementing the enrichment workflow. We decided to go with the serverless technologies instead of a container-based implementation to free the customer from the burden of managing the hosting platform. There were various considerations that informed this choice:<\/p>\n<ol>\n<li>Cognitive search does not support easy integration of long running functions. One of the main AI services we planned to leverage was Video Indexer, which can take many minutes to process a large video, while a Cognitive Search custom skill has a fixed 230 second limit.<\/li>\n<li>This project did not need a search index to be created as indexing and searching were already provided by the e-discovery system.<\/li>\n<li>Complex branching logic is not easily represented in Cognitive Search pipelines.<\/li>\n<li>Logic apps have a lot of built-in connectors for Cognitive services and storage (with retry and exponential backoff) making it relatively easy to get start delivering value out-of-the-box.<\/li>\n<li>Logic apps provide built in support for throttling which is crucial when considering the rate limits imposed Azure Cognitive services<\/li>\n<li>The graphical design interface of Logic Apps made the design and review of conditional logic easier, especially for the MVP phase, which had a high degree of iterative design and implementation based on customer feedback<\/li>\n<\/ol>\n<p>Based on the above reasons, for this project we decided to use Logic Apps for the branching logic and connections to Cognitive services and storage and Azure functions for any custom implementation<\/p>\n<p aria-level=\"2\"><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"2\"><\/h3>\n<h4 aria-level=\"3\"><b><span data-contrast=\"auto\">Architecture\u00a0<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h4>\n<h3 aria-level=\"2\"><a href=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/architecture.png\"><img decoding=\"async\" class=\"aligncenter wp-image-13571 size-full\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/architecture.png\" alt=\"Image architecture\" width=\"1635\" height=\"864\" srcset=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/architecture.png 1635w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/architecture-300x159.png 300w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/architecture-1024x541.png 1024w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/architecture-768x406.png 768w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/architecture-1536x812.png 1536w\" sizes=\"(max-width: 1635px) 100vw, 1635px\" \/><\/a><\/h3>\n<p>Nuix (a digital forensics tool) is used by field personnel to extract the image (i.e., contents of the hard drive) from devices. This is in the form of native files and metadata associated with these files. The device image is then securely uploaded to Azure Data Lake, where it is picked up for validation and preparation.<\/p>\n<p>The is a Logic Apps workflow that examines the native files and triggers different workflows based on whether they are videos, images, or machine-readable documents. These are then passed through Azure Cognitive and ML services such as Video Indexer, Computer Vision, Text Translate and Text Analytics for AI based enrichment.<\/p>\n<p>The output of the AI services, along with the native file and the e-forensics metadata are now exported to the RelativityOne e-discovery system. Here, the Investigators are now able to perform a search based on much richer context such as image descriptions, locations, landmarks, video transcripts etc.<\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h5 aria-level=\"4\"><i><span data-contrast=\"none\">Technologies\u00a0used<\/span><\/i><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h5>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/\">Azure Cognitive Services<\/a>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-gb\/azure\/azure-video-analyzer\/video-analyzer-for-media-docs\/\">Video Indexer<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/computer-vision\/\">Computer Vision<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/translator\/\">Text Translator<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/text-analytics\/\">Text Analytics<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/logic-apps\/\">Azure Logic Apps<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-functions\/\">Azure Functions<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/service-bus-messaging\/\">Azure Service Bus<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/storage\/blobs\/data-lake-storage-introduction\">Azure Data Lake<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/\">Azure Cosmos DB<\/a><\/li>\n<\/ul>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h5 aria-level=\"4\"><i><span data-contrast=\"none\">Enrichment workflow details<\/span><\/i><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h5>\n<h3 aria-level=\"2\"><a href=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/System-Architecture.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-13583\" src=\"https:\/\/devblogs.microsoft.com\/cse\/wp-content\/uploads\/sites\/55\/2021\/05\/System-Architecture.png\" alt=\"Image System Architecture\" width=\"1382\" height=\"1567\" srcset=\"https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/System-Architecture.png 1382w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/System-Architecture-265x300.png 265w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/System-Architecture-903x1024.png 903w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/System-Architecture-768x871.png 768w, https:\/\/devblogs.microsoft.com\/ise\/wp-content\/uploads\/sites\/55\/2021\/05\/System-Architecture-1355x1536.png 1355w\" sizes=\"(max-width: 1382px) 100vw, 1382px\" \/><\/a><\/h3>\n<h3 aria-level=\"2\"><\/h3>\n<ol>\n<li>The image of a device is securely uploaded to an Azure Data Lake. This contains the native file, any extracted text and the \u201cloadData\u201d i.e., the metadata extracted by NUIX. We referred to this as a \u2018batch\u2019.<\/li>\n<li>The NUIXValidator function validates and cleans the incoming data and create a message per native file on the NUIXImporter Service bus<\/li>\n<li>The main orchestration function is triggered by messages appearing on the service bus.<\/li>\n<li>The duplicate detection function checks if the same was processed before, and if it was it skips the entire AI enrichment branch of the workflow.<\/li>\n<li>Based on the file type, it is now processed by either a video, image, or digital text workflow. Each of these call out to specific Cognitive services to derive AI based insights into the content of the file. The digital text workflow is further augmented by a custom model that is trained to extract entities that are specific to the customer.<\/li>\n<li>The AI enrichments are saved back to the data lake and a message consisting of the location of the native file, the NUIX metadata and the AI enrichments is put on the RelativityImporter service bus.<\/li>\n<li data-leveltext=\"%1.\" data-font=\"Calibri, Calibri_MSFontService, sans-serif\" data-listid=\"24\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\">The RelativityImporter maps the incoming message to the configured fields in the Relativity workspace and the file is now ready for search and e-discovery!<\/li>\n<\/ol>\n<p aria-level=\"4\"><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h5 aria-level=\"4\"><\/h5>\n<h5 aria-level=\"4\"><i><span data-contrast=\"none\">De-duplication<\/span><\/i><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h5>\n<p><span data-contrast=\"auto\">There is a high likelihood in evidence analysis systems that we will encounter the same files extracted from multiple sources\/devices, so we need a way of identifying duplicates and therefore preventing wasted compute resource and costly AI analysis for same result.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The approach we took was to create a duplicate detection function to calculate a hash of each file that went into the enrichment pipeline. If the file does not match an existing hash, it meant that the system had not seen this file before, so\u00a0it was processed through the AI services.\u00a0After processing, the AI insights were collated as json and save to the data lake with the hash as a filename. If the hash of the file was already in the system, the AI insights were simply retrieved from the data lake instead of re-running enrichment on the file again. Each time the AI insights were saved to the data lake, they were stamped with the system version that generated the insights.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">For example, if the system had not seen a video file previously it would be sent to the Azure Video Indexer for processing. The insights from the Video Indexer would then be stored back to the data lake\u00a0and\u00a0sent to the\u00a0RelativityImporter. Alternatively, if the file hash was found,\u00a0the Video Indexer enrichments would be\u00a0retrieved from the data lake\u00a0(skipping the Azure Video Indexer step)\u00a0and sent in the message to the\u00a0RelativityImporter.\u00a0Thus,\u00a0the cost of running AI processing on the same video content was avoided.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">There is a possible extension to this system where\u00a0<\/span><i><span data-contrast=\"auto\">similar<\/span><\/i><span data-contrast=\"auto\">\u00a0content can also be excluded from re-processing, for e.g., videos with different encoding or images with different resolutions.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p aria-level=\"4\"><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h5 aria-level=\"4\"><i><span data-contrast=\"none\">Re-processing<\/span><\/i><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h5>\n<p><span data-contrast=\"auto\">It may be required at times\u00a0to\u00a0re-run\u00a0content that has already been exported to the e-discovery tool\u00a0again\u00a0through the\u00a0enrichment\u00a0workflow. Situations that call for this include\u00a0when\u00a0an Azure AI service\u00a0is upgraded,\u00a0when\u00a0the enrichment workflow\u00a0is\u00a0enhanced, or if there has been an error in the processing that is now fixed.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">For example, consider the scenario where\u00a0videos recorded\u00a0in a\u00a0language unsupported by Video Indexer\u00a0were\u00a0processed through the\u00a0workflow and exported to Relativity. Later, Video Indexer\u00a0made improvements to their\u00a0service and added support for this language. In this case, the\u00a0Content\u00a0Administrator might decide to re-run all\u00a0such videos and\u00a0update the records in\u00a0Relativity\u00a0with\u00a0the additional insights\u00a0now available.\u00a0The De-duplication logic would prevent these\u00a0videos from being\u00a0processed by Video Indexer as they\u00a0would be\u00a0considered duplicates of existing data.\u00a0Hence,\u00a0we\u00a0designed\u00a0a solution whereby the enrichment pipeline\u00a0could be\u00a0run for files\u00a0even if\u00a0they\u00a0were\u00a0flagged as duplicates.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">For the MVP,\u00a0we took a\u00a0straight-forward\u00a0approach\u00a0allowing\u00a0the\u00a0Content\u00a0Administrator\u00a0Persona to flag either an entire\u00a0batch\u00a0or specific documents\u00a0within\u00a0a\u00a0batch\u00a0for re-processing. This would be done by creating a simple text file containing identifiers of the files to re-process, and\u00a0for re-processing the entire batch, the text file will be left empty,\u00a0alongside the batch of data uploaded to the data lake. This would signal the system to \u201cforce-enrich\u201d the files from the batch even if they were duplicates.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p aria-level=\"4\"><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h4 aria-level=\"3\"><b><span data-contrast=\"auto\">Observability and Monitoring<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"auto\">As with any distributed system,\u00a0the ability to\u00a0observe and monitor\u00a0the system\u00a0in a consistent manner\u00a0was key to\u00a0successfully\u00a0operating\u00a0it in production.\u00a0We leveraged the capabilities of\u00a0Azure Monitor\u00a0which\u00a0provided\u00a0various mechanisms to collect and present logs,\u00a0metrics,\u00a0and trace data.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">For coded components (like Azure Functions,\u00a0APIs,\u00a0and containers)\u00a0Application Insights was used to collect logs and telemetry.\u00a0Azure\u00a0Logic Apps\u00a0is already integrated into Azure Monitor and provides\u00a0trace logs.\u00a0These can be supplemented by adding\u00a0tracked properties. The gathered log data was\u00a0queried and analysed\u00a0using\u00a0Log Analytics and\u00a0Azure Dashboards\u00a0were created for common scenarios.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The logs\u00a0and metrics\u00a0generated in code also\u00a0logged additional\u00a0contextual properties\u00a0that made it possible to\u00a0consistently\u00a0query\u00a0across multiple components. This helped when\u00a0tracing the path of a file through a workflow (using\u00a0correlationID), the errors\u00a0in\u00a0a batch of files (using\u00a0batchId) or\u00a0the current version of the system (using\u00a0SystemVersion\u00a0and\u00a0ComponentVersion)<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h4 aria-level=\"3\"><b><span data-contrast=\"auto\">Performance considerations<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"auto\">By default, Logic Apps have no limit to the number of concurrent triggers.\u00a0There are limits on the number of actions which\u00a0are\u00a0100,000 (default) up to 300,000 (maximum) per 5-minute period, but that can be increased by\u00a0enabling\u202f<\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/logic-apps\/logic-apps-limits-and-config#run-in-high-throughput-mode\"><span data-contrast=\"none\">High Throughput Mode<\/span><\/a><span data-contrast=\"auto\">\u202f.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">However, the downstream systems (Video Indexer, Computer Vision etc) will all have rate limits in place which the\u00a0Logic\u00a0App could easily hit if left to run in its default mode of operation\u00a0(in normal operation we expected the customer to be processing up\u00a0to 100,000 files in one batch).\u00a0To\u00a0mitigate the impact on downstream systems, we enabled\u00a0concurrency control.\u00a0Concurrency control is a Logic\u00a0App mode which limits the number of concurrent triggers the Logic App will process at a given time. Concurrency control can be set to allow up to 50 concurrent instances.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In our solution,\u00a0we\u00a0enabled\u00a0concurrency\u00a0control\u00a0on the main orchestration Logic App and ran load tests to\u00a0derive the concurrency setting\u00a0suited\u00a0for the tiers of\u00a0Cognitive\u00a0Services\u00a0selected\u00a0by\u00a0the customer.\u00a0The\u00a0NUIXValidator\u00a0service bus served\u00a0as temporary store for\u00a0incoming files as\u00a0they waited to be processed by\u00a0the orchestrator\u00a0workflow.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The solution\u00a0can be extended further\u00a0to make it possible\u00a0set different concurrency settings for the file specific \u2018child\u2019 logic apps. This\u00a0way, the\u00a0concurrency can be tuned according to the rate limits of the Cognitive\u00a0Services specific to that file type. Also,\u00a0to increase the throughput of the system, multiple\u00a0instances\u00a0of Cognitive\u00a0Services can be used in a load-balanced manner.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">Benefits for the customer<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">As the customer starts to use this solution in production there will be an initial phase where they load their existing backlog of data into the cloud to be analysed by this system and uploaded to Relativity\u00a0leading to\u00a0a substantial improvement in the discoverability of files to use as evidence. Since the Investigators are already proficient in using Relativity, there is no need to train them in new tooling, but now they have the additional option to use the Video Indexer portal for finer grained interactions with video data.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">As the solution largely uses pre-built AI services and serverless technologies, as\u00a0these services are improved, the customer automatically\u00a0receives these changes.\u00a0The re-processing scenario gives them the ability to re-run the improved AI processing on already imported data.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The asynchronous nature of the Logic Apps workflow\u00a0solution\u00a0is\u00a0better suited for low bandwidth connection from on prem servers and\u00a0for the processing of\u00a0large media files.\u00a0Concurrency controls allow them to tune the processing of large batches to prevent hitting the consumption limits of the AI services.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Lastly, the solution has some strategies built in for cost\u00a0savings. When Video indexer processes video files, these get stored in a Media Services account. Hence, to save storage cost, the native video files are not exported to Relativity but a pointer to the Video Indexer portal is exported in its place.\u00a0Also, the\u00a0strategy to skip processing of\u00a0duplicate\u00a0files\u00a0saves on unnecessary AI\u00a0processing\u00a0costs.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">Conclusion<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">Our project tackled\u00a0a common\u00a0scenario\u00a0in\u00a0public\u00a0safety and\u00a0justice\u00a0where large\u00a0amounts\u00a0of data has been gathered during\u00a0an\u00a0investigation\u00a0that needed\u00a0to be analysed\u00a0and labelled so that it may be used\u00a0as evidence for prosecution.\u00a0The design decision\u00a0to use out-of-the-box AI services and\u00a0serverless technology\u00a0gave the customer\u00a0faster onboarding\u00a0onto Azure,\u00a0and\u00a0the ability to\u00a0unlock more value from their data and move into production with agility.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">We hope that this code story will encourage other digital transformations where Azure\u2019s AI services are leveraged to\u00a0gain insights easily and quickly\u00a0from large data sets, without the need to deploy custom machine learning models.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">For\u00a0additional\u00a0information<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">For a generalised code sample that demonstrates how this solution was built using Logic Apps, Azure Functions and Azure Cognitive services refer to the <a href=\"https:\/\/github.com\/Azure-Samples\/media-services-video-indexer\/tree\/master\/AIEnrichmentPipeline\">AI Enrichment Pipeline sample on the Video Indexer Azure samples git repo<\/a>.\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"2\"><b><span data-contrast=\"auto\">Contributors<\/span><\/b><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">Alysha Arshad,\u00a0Peter Daukintis,\u00a0April Edwards, Adrian Gonzalez, Lawrence Gripper, Randy Guthrie,\u00a0Martin Kearn,\u00a0Peter Maynard, David Moore,\u00a0Martin Peck,\u00a0Shane Peckham,\u00a0Dave Storey<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Cover Photo by <\/span><a href=\"https:\/\/unsplash.com\/@markusspiske?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\"><span data-contrast=\"none\">Markus Spiske<\/span><\/a><span data-contrast=\"auto\">\u00a0on\u00a0<\/span><a href=\"https:\/\/unsplash.com\/?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText\"><span data-contrast=\"none\">Unsplash<\/span><\/a><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>A scenario commonly encountered in public safety and justice is the need to collect and index digital data recovered from devices, so that investigating officers can perform evidence-based analysis. We recently built an advanced evidence analysis platform that uses Azure AI services for automated labelling of media and documents. <\/p>\n","protected":false},"author":52539,"featured_media":13683,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14,1],"tags":[],"class_list":["post-13552","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cognitive-services","category-cse"],"acf":[],"blog_post_summary":"<p>A scenario commonly encountered in public safety and justice is the need to collect and index digital data recovered from devices, so that investigating officers can perform evidence-based analysis. We recently built an advanced evidence analysis platform that uses Azure AI services for automated labelling of media and documents. <\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/13552","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/52539"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=13552"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/13552\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/13683"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=13552"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=13552"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=13552"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}