{"id":21671,"date":"2024-02-21T09:00:59","date_gmt":"2024-02-21T17:00:59","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azuregov\/?p=21671"},"modified":"2024-02-20T15:49:21","modified_gmt":"2024-02-20T23:49:21","slug":"document-translation-solution","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azuregov\/document-translation-solution\/","title":{"rendered":"Comprehensive Document Translation Solution"},"content":{"rendered":"<p><span style=\"font-size: 12pt;\"><strong>Background and Use Cases<\/strong><\/span><\/p>\n<p><span style=\"font-size: 12pt;\">Many of our customers have a requirement to translate documents from a variety of languages into a common language to ensure their mission success. These documents can be in a variety of formats, with unique page layouts and styles, and they may contain images with embedded text essential for a complete understanding by the reader. In many scenarios, there are large numbers of documents that must be translated quickly and securely to ensure mission success.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\">Below are some common use cases for government organizations requiring document translation:<\/span><\/p>\n<ul>\n<li><span style=\"font-size: 12pt;\"><strong>Intelligence and Security<\/strong>: translating foreign documents and communications to monitor threats and understand global dynamics.<\/span><\/li>\n<li><span style=\"font-size: 12pt;\"><strong>International Cooperation and Alliances<\/strong>: Translating treaties, agreements, and training materials in support of global military alliances.<\/span><\/li>\n<li><span style=\"font-size: 12pt;\"><strong>Local Engagement and Stability Operations<\/strong>: Translation in support of humanitarian, disaster relief, and local engagement.<\/span><\/li>\n<li><span style=\"font-size: 12pt;\"><strong>Technical and Equipment Manuals<\/strong>: Translation required to ensure correct use and maintenance of diverse technologies and equipment. While international support often includes financial and equipment aid, a significant challenge arises when equipment manuals are not in the recipient&#8217;s native language. This impedes the effective and timely use of the equipment, highlighting the critical need for document translation to ensure the success of missions.<\/span><\/li>\n<li><span style=\"font-size: 12pt;\"><strong>Government Communications<\/strong>: Translating official communications, public service announcements, and information about public health, safety, and welfare ensures that all members of a diverse population have access to important information.<\/span><\/li>\n<li><span style=\"font-size: 12pt;\"><strong>Immigration Services<\/strong>: Translating documents related to immigration, visas, and citizenship services helps streamline the process for both applicants and the authorities.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-size: 12pt;\"><strong>The Challenge<\/strong><\/span><\/p>\n<p><span style=\"font-size: 12pt;\">The native Azure Document Translation service is feature rich and can translate complex documents across a multitude of languages and preserve the original document structure and data format. However, it does not support translation of text embedded in images in digital documents. Often, the text located inside of images can be critical for an accurate and complete understanding and therefore it is a \u201cmust have\u201d capability for our customers.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\">Our challenge was to find the perfect balance between the accuracy of digital text-only documents and the completeness of scanned documents.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>The Solution<\/strong><\/span><\/p>\n<p><span style=\"font-size: 12pt;\">Our <em>Comprehensive Document Translation Solution<\/em> solves this problem through a \u201cHybrid Translation\u201d approach. The Hybrid Translation process splits the digital PDF into two files. One file is a digital document that contains all the pages that are text-only. The other file is a scanned document that contains all the pages that have images, including images embedded with text. The solution then translates both files separately. By translating both, we get the most accurate translation and layout of text-only digital documents and the completeness of scanned documents.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\">After both versions are translated, the solution then \u201cstitches\u201d back together the complete document, in the correct page order, taking the best and most accurate translation of each page from either the digital or scanned document.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\">For flexible application, the solution provides these options:<\/span><\/p>\n<ol>\n<li><span style=\"font-size: 12pt;\"><strong>Scanned-Only Translation<\/strong>: Converts a document to scanned version and translate the scanned version to ensure complete translation, including images with text.<\/span><\/li>\n<li><span style=\"font-size: 12pt;\"><strong>Hybrid Translation<\/strong>: provides the best quality and complete translation. Hybrid Translation combines the best aspects of digital page translation and scanned page translation as needed.<\/span><\/li>\n<\/ol>\n<p><span style=\"font-size: 12pt;\">The Comprehensive Document Translation Solution is built on several Azure services and capabilities that allow for a fast, secure, and scalable translation process. Each of the services has its own pricing and scaling options, therefore the total cost of the solution will depend on what is selected.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\">The core functionality of the solution is an Azure Functions application, consisting of three functions. The functions are written in Python and utilize open-source libraries for PDF conversion, splitting and processing. The documents, both original and translated, securely reside in Azure Storage and are only accessible to users and services with correctly configured access control. The language translation is provided by Azure AI Translator, a cloud-based neural machine translation service (part of the Azure AI family of services).<\/span><\/p>\n<p><span style=\"font-size: 12pt;\">This solution leverages various Azure services like Document Translation, Storage, Functions, and Event Grid.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><em style=\"text-align: var(--bs-body-text-align);\">Note: Many customers require secure network enclaves to translate their documents and this solution easily integrates with most security policies and network architectures.<\/em><\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Supported File Types<\/strong><\/span><\/p>\n<p><span style=\"font-size: 12pt;\">The Comprehensive Document Translation Solution supports image files (BMP, PNG, JPG) and the file types documented here: <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/translator\/document-translation\/overview#supported-document-formats\">https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/translator\/document-translation\/overview#supported-document-formats<\/a><\/span><\/p>\n<p><span style=\"font-size: 12pt;\">Image files are converted to PDF before translation.<\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><em>Note: Currently, Microsoft Office documents, like Word, Excel, and PowerPoint that contain images with embedded text must be converted to PDF prior to loading them into the solution.<\/em><\/span><\/p>\n<p><span style=\"font-size: 12pt;\"><strong>Code Repository<\/strong><\/span><\/p>\n<p><span style=\"font-size: 12pt;\">The Comprehensive Document Translation Solution is an open-source project made available by the US Regulated Industries of Microsoft. The source code and instructions are available here: <a href=\"https:\/\/github.com\/usri\/Comprehensive-Document-Translator\">GitHub Repository<\/a><\/span><\/p>\n<p><span style=\"font-size: 10pt;\"><em>Additional Contributors: Jose Alanis, Joshua Donnelly, Elliott Fields, Krishna Doss Mohan, Krishnakumar Muthukrishnan<\/em><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Background and Use Cases Many of our customers have a requirement to translate documents from a variety of languages into a common language to ensure their mission success. These documents can be in a variety of formats, with unique page layouts and styles, and they may contain images with embedded text essential for a complete [&hellip;]<\/p>\n","protected":false},"author":57266,"featured_media":21344,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,1],"tags":[3480],"class_list":["post-21671","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-announcements","category-azuregov","tag-document-translation"],"acf":[],"blog_post_summary":"<p>Background and Use Cases Many of our customers have a requirement to translate documents from a variety of languages into a common language to ensure their mission success. These documents can be in a variety of formats, with unique page layouts and styles, and they may contain images with embedded text essential for a complete [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/posts\/21671","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/users\/57266"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/comments?post=21671"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/posts\/21671\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/media\/21344"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/media?parent=21671"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/categories?post=21671"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azuregov\/wp-json\/wp\/v2\/tags?post=21671"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}