Building LUIS Models for Unsupported Languages with Machine Translation

Ari Bornstein

Ari

Overview

The following case study outlines a method for providing additional language support for LUIS using the Microsoft Translation Cognitive API.

Background

Moed.ai is an Israeli Startup that enables service providers to manage and fill their business calendar with a unified cloud-based platform, that is accessible from any device.

Customers can configure scheduling of their services, resources and calendars using Moed.ai’s designated dashboard. Resources can be objects, such as cars or meeting rooms, as well as people, such as test drivers or sales representatives in a car dealership. The platform manages a calendar for each of these resources and uses it to schedule meetings with customers’ clients based on availability.

Moed.ai is developing a chat bot for each of their customers so that their customers’ clients can schedule services more comfortably through natural language on their preferred channel (Facebook Messenger, Slack, Skype, etc.).

The Problem

As an Israeli company, many of Moed.ai’s customers are native Hebrew speakers. While they provide an English version of their bot that can extract intents and entities, they want to provide equal functionality for their Hebrew bot service. Unfortunately, LUIS, which they were interested in using for intent and entity extraction, does not currently have native support for the Hebrew language.

The Solution

The goal of the engagement was to work with Moed.ai to identify a valid approach for providing Hebrew support for LUIS using the Translation Cognitive Service. During the course of the engagement, we compared two approaches for providing Hebrew support. While, the first approach of feeding translated text directly from the Translation Cognitive Service into an existing English LUIS model provided disappointing results, we were successful in determining a more accurate method.

We trained the LUIS model in a novel way, using malformed translated text instead of a list of pre-curated English samples. This approach enabled us to close the gap between the translation service’s output and proper English.

To understand why this approach works more accurately, let’s look at the following scenario:

Assume a user wants to query a bot with the four Hebrew phrases below.

  אני רוצה לקבוע פגישה
  אני רוצה לקבוע נסיעת מבחן 
  אני רוצה לקבוע נסיעת מבחן למחר
  אפשר לקבוע נסיעת מבחן למחר?

The English equivalent for these phrases is the following:

I want to schedule a meeting.
I want to schedule a test drive.
I want to schedule a test drive for tommorrow.
Can I schedule a test drive tomorrow?

Yet the translation service translates the phrases as:

I want to schedule an appointment.
I want to schedule a test drive.
I want to make a test tomorrow.
Can set a test tomorrow?

Note that while the first two phrases and their translations are nearly identical, there is a gap between the translation of the second two phrases (“I want to make a test tomorrow.” , “Can set a test tomorrow?”) and their proper English meaning (“I want to schedule a test drive for tomorrow.”, “Can I schedule a test drive tomorrow?”).

For example, in both phrases, the translation service substituted the word test in place of the concept test drive, which has a very different meaning despite being a close literal translation. A LUIS model trained only on proper English queries, such as “I want to schedule a test drive for tomorrow” would struggle to identify such substitutions since they are unique to the way Hebrew is translated into English. Differences in grammatical structure and word usage between different languages often lead to consistent but unique errors in translated texts.

However, if we train the model on translated Hebrew, the service will quickly learn to identify the gaps between the malformed Hebrew translation and its intended meaning. Over time, as the model learns the unique ways in which Hebrew translations are erroneous in a given context, it will provide more and more accurate results.

How to Use

The following section outlines how to train and use our node module for integrating additional language support for bots. It is assumed that the user has already created a LUIS application and has generated a key for the Translation Cognitive Service.

1) Compile a list of commands in the unsupported language (in this case Hebrew) such as:

  אני רוצה לקבוע פגישה             // I want to schedule an appointment
  אני רוצה לקבוע נסיעת מבחן        // I want to schedule a test drive
  אני רוצה לקבוע נסיעת מבחן למחר   // I want to schedule a test drive for tomorrow
  אפשר לקבוע נסיעת מבחן למחר?      // Can I schedule a test drive tomorrow?

2) Run the Bulk Translate and Insert into LUIS Script

3) Tag translations, intents and entities using the LUIS portal

4) Use the train and test bot with the LUIS portal to validate and re-train your model until it learns to fit the translations to the new language meanings.

5) Use the ULIS npm module to consume your trained LUIS model and integrate the service into your application.

Code

You can find the notebook and code for implementing this methodology on GitHub.

Opportunities for Reuse

The methodology outlined in this code story can be used to provide natural language intent and entity extraction for any language supported by the Translation Cognitive Service. It can be reused to provide localization support in many Conversation as a Platform scenarios for a more immersive bot experience.

2 comments

Comments are closed. Login to edit/delete your existing comments

  • Avatar
    lauren

    Hello Ari,
    I really enjoy reading your post and have one question. Does ULIS itself do any action related to “learning?” I am still confused where the line is between URIS and LUIS. In my understanding, ULIS is just a wrapper for sending translated text (done by Bing) to LUIS. Once all data sent to LUIS, user needs to tag each intent and entity. Then, using URIS, add more and more examples to make more accurate. Am I on the right track? Thank you so much for your helpful blog.