Leveraging Azure Search for Implementing a QnA Bot in Unsupported Languages

Ami Turgman

December 10th, 20160 0

Image: F.A.Q by photosteve101, used by CC BY 2.0

Background

Our team recently worked with a Korean company that wanted to expose a QnA (Frequently Asked Questions / Questions and Answers) bot in their web portal to help answer common user questions. Bot Framework provides a QnA service (QnA Maker) that takes a list of common questions and answers related to your domain, called a QnA corpus, as input. After indexing the QnA corpus, the QnA Maker provides a simple REST API service to get the most suitable answer for a free text-based question.

The Problem

Since the QnA Maker currently does not support Korean, we had to develop similar QnA functionality without using the QnA Maker service.

The Solution

This post describes a method for providing a backend service for QnA services in languages that the QnA Maker does not currently support, using Azure Search. For the purpose of demonstration, we will build a QnA service in the English language. However, you can apply the following steps to any language currently supported by Azure Search (please refer to the list of supported languages).

Azure Search is a search engine as a service offered by Azure. Azure Search can be used to index custom data and run search queries on the indexed corpus. Using Azure Search, we can index our QnA items and find the most appropriate item and label based on the question that the user asked. In addition to supporting most languages, the following two features make Azure Search the right tool for building a custom QnA service:

Analyzers– an analyzer augments the search query with a dictionary of synonyms and related phrases to increase the likelihood of finding a related match in the corpus.
Scoring Profile– a scoring profile allows us to provide larger or smaller weights to the different fields of the indexed items. This profile is used by Azure Search engine in the scoring process and impacts the results directly as a function of the scoring profile.

In our solution we will provide Azure Search with our QnA corpus, in the form of a table containing the following schema: [id, category, url, question, answer, keywords]

id is the unique key for each item
keyword is a collection of keywords that best describes the question
url is for cases where we want to direct the user to an external URL as part of the answer
category will be used and explained later in a more advanced scenario

A sample excel file is available

Using the scoring profile feature, we can define a few scoring profiles and then try each to see which provides better results. There are no magic numbers here and we’ll have to experiment with different weights so that we get the most accurate results.

Our solution is based on the keyword field. We use this field to promote the essence of the question text into this field. By providing this field with the highest weight, we are “pushing” the results in that direction, making the items with the most similar keywords to the keyword field appear with a higher score.

In this demo, I’ll create a scoring profile that defines the following numerical weights for the fields as explained above:

keywords – 5
question– 3
answer– 2

This solution assumes that the keywords field contains the most relevant words that the user might use when he enters his question, so it gets the highest weight. The question field should be considered but with a lower weight than the keyword field. We will also include the answer field in the scoring profile since we can assume that some of the words that appear in the question will also appear in the answer, but it gets the lowest weight.

To simplify the process of creating this solution, we will use a small QnA corpus. Since we use a small corpus, it may seem that we can get good results without using the scoring profile feature, but this is only because of the limited size of our corpus. The more items the corpus contains, the scoring profile feature will be more powerful and provide more accurate results.

The following steps outline how to build a QnA service with Azure Search:

Open the Azure Portal and create a new resource group. Under this resource group, create a new Azure Search service as demonstrated here:

Provide a name for your search service and fill in the rest of the required fields, then press OK:

This is how it looks like when our Azure Search service is deployed:

Azure Search can import data from a few different data sources such as Document DB, SQL, or Azure Table. In this example I created a SQL DB with one table and used it as the source for the search service:

Fill in required fields:

This is how the final result should look:

I used SQL Server Management Studio to connect to my SQL DB. I then created a table using the following query:

Then I filled in the table with a few questions and answers:

Now that the data is ready, it’s time to import it into the Search service. Follow the red squares to import the data from SQL:

After connecting to the SQL, as part of the data import wizard, it’s time to define the index properties. Provide a name for the index, and select the ID field in the Key select box.

Next, we’ll need to select a few properties for each field in our table:

Retrievable should be checked if we want to retrieve this field as part of the search result
Filterable should be selected if we want to be able to filter based on this field
Sortable should be checked if we want to be able to sort based on this field
Facetable should be checked if we want to be able to get counters for the number of results grouped by this field. We’ll see that in action next.
Searchable should be used to mark the fields we would like the search engine to use when performing the actual search

The next step will be to define the analyzers that will be used for each field. In this case, we used English-Microsoft for all of the searchable fields. Use the Language-Microsoft value based on the language you use:

For this demo, I chose Once for the scheduler, but this is the place where you can provide scheduling details for when the service should re-index your data for changes and updates:

Click OK. When finished, you should receive a notification that the import process completed successfully.

Next, follow the red squares in the screenshot below to create your scoring profile:

Next, we will need to provide weights for the different fields, as explained above:

We are now ready to start searching our QnA corpus. Click the Search explorer button as demonstrated in the following screenshot:

The Search explorer provides an easy method to invoke REST API calls from within the portal. This point is where we can experiment using our search service.

Use search=your_free_text_question in the Query string input as demonstrated below. This query will invoke a simple search on the searchable fields without using the scoring profile feature.

Now let’s use the scoring profile by adding it to the query string:

search=in which mode am i&scoringProfile=qna-scoring-profile

Note: You can see that Azure Search returned different items and scores than before (the last item is 14 instead of 5 like it was when we searched without the scoring profile). It is a bit difficult to demonstrate with a small number of items how much the scoring profile helps. However, having a large number of items in the scoring profile makes a difference.

Learn more about search parameters and how to call the search service API

In this sample, we used the facet feature to get a counter for the number of search results under each category. This feature can be useful when there are few results for each category, by asking the user which category he would like to browse and displaying the number of results in each. After the user chooses the category, we can then drill down by filtering the category with an additional search:

search=in which mode am i?&scoringProfile=qna-scoring-profile&facet=category

Opportunities for Reuse

The work outlined in this case study can be reused for providing language support in any bot QnA scenario. For example, this approach can be used to build a QnA system for Norwegian or Hebrew, two languages not natively supported by QnA Maker.

Note: This approach is not useful just for bot clients. The technique described in this post can be used to build a general, domain-specific search service, which can be tweaked further with scoring profiles.

Additionally, because Azure Search provides the ability to label individually indexed items, the methodology above can be used as a scalable multiclass text classification method. This possibility is critical because many bots need to categorize intent based on messages from users. While solutions such as Luis or Azure ML exist, they can sometimes struggle when there are more than 10-15 classification categories. For example, if a doctor needs to classify a medical protocol out of hundreds of medical protocols, this approach can be used to classify text, given a large enough corpus of related protocols.