{"id":2119,"date":"2017-01-14T16:00:00","date_gmt":"2017-01-14T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/reallifecode\/index.php\/2017\/01\/14\/building-luis-models-for-unsupported-languages-with-machine-translation\/"},"modified":"2020-03-15T06:05:26","modified_gmt":"2020-03-15T13:05:26","slug":"building-luis-models-for-unsupported-languages-with-machine-translation","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/ise\/building-luis-models-for-unsupported-languages-with-machine-translation\/","title":{"rendered":"Building LUIS Models for Unsupported Languages with Machine Translation"},"content":{"rendered":"<h2 id=\"overview\">Overview<\/h2>\n<p>The following case study outlines a method for providing additional language support for LUIS using the <a href=\"https:\/\/www.microsoft.com\/cognitive-services\/en-us\/translator-api\">Microsoft Translation Cognitive API<\/a>.<\/p>\n<h2 id=\"background\">Background<\/h2>\n<p><a href=\"http:\/\/moed.ai\/\">Moed.ai<\/a> is an Israeli Startup that enables service providers to manage and fill their business calendar with a unified cloud-based platform, that is accessible from any device.<\/p>\n<p>Customers can configure scheduling of their services, resources and calendars using Moed.ai\u2019s designated dashboard. Resources can be objects, such as cars or meeting rooms, as well as people, such as test drivers or sales representatives in a car dealership. The platform manages a calendar for each of these resources and uses it to schedule meetings with customers\u2019 clients based on availability.<\/p>\n<p>Moed.ai is developing a chat bot for each of their customers so that their customers\u2019 clients can schedule services more comfortably through natural language on their preferred channel (Facebook Messenger, Slack, Skype, etc.).<\/p>\n<h2 id=\"the-problem\">The Problem<\/h2>\n<p>As an Israeli company, many of Moed.ai\u2019s customers are native Hebrew speakers. While they provide an English version of their bot that can extract intents and entities, they want to provide equal functionality for their Hebrew bot service. Unfortunately, <a href=\"https:\/\/www.luis.ai\/\">LUIS<\/a>, which they were interested in using for intent and entity extraction, does not currently have native support for the Hebrew language.<\/p>\n<h2 id=\"the-solution\">The Solution<\/h2>\n<p>The goal of the engagement was to work with Moed.ai to identify a valid approach for providing Hebrew support for LUIS using the <a href=\"https:\/\/www.microsoft.com\/cognitive-services\/en-us\/translator-api\">Translation Cognitive Service<\/a>. During the course of the engagement, we compared two approaches for providing Hebrew support. While, the first approach of feeding translated text directly from the <a href=\"https:\/\/www.microsoft.com\/cognitive-services\/en-us\/translator-api\">Translation Cognitive Service<\/a> into an existing English LUIS model provided disappointing results, we were successful in determining a more accurate method.<\/p>\n<p>We trained the LUIS model in a novel way, using malformed translated text instead of a list of pre-curated English samples. This approach enabled us to close the gap between the translation service\u2019s output and proper English.<\/p>\n<p>To understand why this approach works more accurately, let\u2019s look at the following scenario:<\/p>\n<p>Assume a user wants to query a bot with the four Hebrew phrases below.<\/p>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>  \u05d0\u05e0\u05d9 \u05e8\u05d5\u05e6\u05d4 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e4\u05d2\u05d9\u05e9\u05d4\r\n  \u05d0\u05e0\u05d9 \u05e8\u05d5\u05e6\u05d4 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e0\u05e1\u05d9\u05e2\u05ea \u05de\u05d1\u05d7\u05df \r\n  \u05d0\u05e0\u05d9 \u05e8\u05d5\u05e6\u05d4 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e0\u05e1\u05d9\u05e2\u05ea \u05de\u05d1\u05d7\u05df \u05dc\u05de\u05d7\u05e8\r\n  \u05d0\u05e4\u05e9\u05e8 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e0\u05e1\u05d9\u05e2\u05ea \u05de\u05d1\u05d7\u05df \u05dc\u05de\u05d7\u05e8?\r\n\r\n<\/code><\/pre>\n<\/div>\n<p>The English equivalent for these phrases is the following:<\/p>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>I want to schedule a meeting.\r\nI want to schedule a test drive.\r\nI want to schedule a test drive for tommorrow.\r\nCan I schedule a test drive tomorrow?\r\n\r\n<\/code><\/pre>\n<\/div>\n<p>Yet the translation service translates the phrases as:<\/p>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>I want to schedule an appointment.\r\nI want to schedule a test drive.\r\nI want to make a test tomorrow.\r\nCan set a test tomorrow?\r\n\r\n<\/code><\/pre>\n<\/div>\n<p>Note that while the first two phrases and their translations are nearly identical, there is a gap between the translation of the second two phrases (\u201cI want to make a test tomorrow.\u201d , \u201cCan set a test tomorrow?\u201d) and their proper English meaning (\u201cI want to schedule a test drive for tomorrow.\u201d, \u201cCan I schedule a test drive tomorrow?\u201d).<\/p>\n<p>For example, in both phrases, the translation service substituted the word <strong>test<\/strong> in place of the concept <strong>test drive<\/strong>, which has a very different meaning despite being a close literal translation. A LUIS model trained only on proper English queries, such as \u201cI want to schedule a test drive for tomorrow\u201d would struggle to identify such substitutions since they are unique to the way Hebrew is translated into English. Differences in grammatical structure and word usage between different languages often lead to consistent but unique errors in translated texts.<\/p>\n<p>However, if we train the model on translated Hebrew, the service will quickly learn to identify the gaps between the malformed Hebrew translation and its intended meaning. Over time, as the model learns the unique ways in which Hebrew translations are erroneous in a given context, it will provide more and more accurate results.<\/p>\n<h3 id=\"how-to-use\">How to Use<\/h3>\n<p>The following section outlines how to train and use our node module for integrating additional language support for bots. It is assumed that the user has already created a <a href=\"https:\/\/www.luis.ai\/\">LUIS<\/a> application and has generated a key for the <a href=\"https:\/\/www.microsoft.com\/cognitive-services\/en-us\/translator-api\">Translation Cognitive Service<\/a>.<\/p>\n<p>1) Compile a list of commands in the unsupported language (in this case Hebrew) such as:<\/p>\n<div class=\"highlighter-rouge\">\n<pre class=\"highlight\"><code>  \u05d0\u05e0\u05d9 \u05e8\u05d5\u05e6\u05d4 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e4\u05d2\u05d9\u05e9\u05d4             \/\/ I want to schedule an appointment\r\n  \u05d0\u05e0\u05d9 \u05e8\u05d5\u05e6\u05d4 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e0\u05e1\u05d9\u05e2\u05ea \u05de\u05d1\u05d7\u05df        \/\/ I want to schedule a test drive\r\n  \u05d0\u05e0\u05d9 \u05e8\u05d5\u05e6\u05d4 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e0\u05e1\u05d9\u05e2\u05ea \u05de\u05d1\u05d7\u05df \u05dc\u05de\u05d7\u05e8   \/\/ I want to schedule a test drive for tomorrow\r\n  \u05d0\u05e4\u05e9\u05e8 \u05dc\u05e7\u05d1\u05d5\u05e2 \u05e0\u05e1\u05d9\u05e2\u05ea \u05de\u05d1\u05d7\u05df \u05dc\u05de\u05d7\u05e8?      \/\/ Can I schedule a test drive tomorrow?\r\n\r\n<\/code><\/pre>\n<\/div>\n<p>2) Run the <a href=\"https:\/\/github.com\/CatalystCode\/Universal-Language-Intelligence-Service\/blob\/master\/ulis\/tools\/bulkTranslateAndTrain.js\">Bulk Translate and Insert into LUIS<\/a> Script<\/p>\n<p>3) Tag translations, intents and entities using the <a href=\"https:\/\/www.luis.ai\/\">LUIS portal<\/a><\/p>\n<p>4) Use the <a href=\"https:\/\/github.com\/CatalystCode\/Universal-Language-Intelligence-Service\/blob\/master\/ulis\/tools\/translateAndTrainBot.js\">train and test bot<\/a> with the <a href=\"https:\/\/www.luis.ai\/\">LUIS portal<\/a> to validate and re-train your model until it learns to fit the translations to the new language meanings.<\/p>\n<p>5) Use the <a href=\"https:\/\/github.com\/CatalystCode\/Universal-Language-Intelligence-Service\">ULIS npm module<\/a> to consume your trained LUIS model and integrate the service into your application.<\/p>\n<h2 id=\"code\">Code<\/h2>\n<p>You can find the notebook and code for implementing this methodology <a href=\"https:\/\/github.com\/CatalystCode\/Universal-Language-Intelligence-Service\">on GitHub<\/a>.<\/p>\n<h2 id=\"opportunities-for-reuse\">Opportunities for Reuse<\/h2>\n<p>The methodology outlined in this code story can be used to provide natural language intent and entity extraction for any language supported by the Translation Cognitive Service. It can be reused to provide localization support in many Conversation as a Platform scenarios for a more immersive bot experience.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A wrapper for the Microsoft LUIS Cognitive Service that provides universal language support (after training) using the Cognitive Service Translation API.<\/p>\n","protected":false},"author":21353,"featured_media":11048,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[13,19],"tags":[110,132,231,250,261,266,361],"class_list":["post-2119","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bots","category-machine-learning","tag-bots","tag-conversation-as-a-platform","tag-language-understanding-intelligent-service-luis","tag-microsoft-cognitive-services","tag-moed-ai","tag-multilingual-support","tag-translator-api"],"acf":[],"blog_post_summary":"<p>A wrapper for the Microsoft LUIS Cognitive Service that provides universal language support (after training) using the Cognitive Service Translation API.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2119","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/users\/21353"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/comments?post=2119"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/posts\/2119\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media\/11048"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/media?parent=2119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/categories?post=2119"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/ise\/wp-json\/wp\/v2\/tags?post=2119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}