December 6th, 2024

Customer Case Study: How to use Elasticsearch Vector Store Connector for Microsoft Semantic Kernel for AI Agent development

How to use Elasticsearch Vector Store Connector for Microsoft Semantic Kernel for AI Agent development

Today we’re excited to feature the Elastic team to share more about their Semantic Kernel Elasticsearch Vector Store connector for Microsoft Semantic Kernel. Read the entire announcement here. I’ll turn it over to Srikanth Manvi and Florian Bernd to dive into it.

In collaboration with the Microsoft Semantic Kernel team, we are announcing the availability of Semantic Kernel Elasticsearch Vector Store Connector, for Microsoft Semantic Kernel (.NET) users. Semantic Kernel simplifies building enterprise-grade AI agents, including the capability to enhance large language models (LLMs) with more relevant, data-driven responses from a Vector Store. Semantic Kernel provides a seamless abstraction layer for interacting with Vector Stores like Elasticsearch, offering essential features such as creating, listing, and deleting collections of records and uploading, retrieving, deleting individual records.

The out-of-the-box Semantic Kernel Elasticsearch Vector Store Connector supports the Semantic Kernel vector store abstractions which make it very easy for developers to plugin Elasticsearch as a vector store while building AI agents.

Elasticsearch has a strong foundation in the open-source community and recently adopted the AGPL license. Combined with the open-source Microsoft Semantic Kernel, these tools offer a powerful, enterprise-ready solution. You can get started locally by spinning up Elasticsearch in a few minutes by running this command curl -fsSL https://elastic.co/start-local | sh (refer start-local for details) and move to cloud-hosted or self-hosted versions while productionizing your AI agents.

In this blog we look at how to use Semantic Kernel Elasticsearch Vector Store Connector when using Semantic Kernel. A Python version of the connector will be made available in the future.

High-level scenario

In the following section we go through an example. At a high-level we are building a RAG (Retrieval Augmented Generation) application which takes a user’s question as input and returns an answer. We will use Azure OpenAI (local LLM can be used as well) as the LLM, Elasticsearch as the vector store and Semantic Kernel (.net) as the framework to tie all components together.

If you are not familiar with RAG architectures, you can have a quick introduction with this article: https://www.elastic.co/search-labs/blog/retrieval-augmented-generation-rag.

The answer is generated by the LLM which is fed with context, relevant to the question, retrieved from Elasticsearch vectorstore. The response also includes the source that was used as the context by the LLM.

RAG Example

In this specific example, we build an application that allows users to ask questions about hotels stored in an internal hotel database. The user could e.g. search for a specific hotel, based on different criteria, or ask for a list of hotels.

For the example database, we generated a list of hotels containing 100 entries. The sample size is intentionally small to allow you to try out the connector demo as easily as possible. In a real-world application, the Elasticsearch connector would show its advantages over other options, such as the `InMemory` vector store implementation, especially when working with extremely large amounts of data.

The complete demo application can be found in the Elasticsearch vector store connector repository.

Let’s start with adding the required NuGet packages and using directives to our project:

dotnet add package "Elastic.Clients.Elasticsearch" -v 8.16.2
dotnet add package "Elastic.SemanticKernel.Connectors.Elasticsearch" -v 0.1.2
dotnet add package "Microsoft.Extensions.Hosting" -v 9.0.0
dotnet add package "Microsoft.SemanticKernel.Connectors.AzureOpenAI" -v 1.30.0
dotnet add package "Microsoft.SemanticKernel.PromptTemplates.Handlebars" -v 1.30.0
using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;

using Elastic.Clients.Elasticsearch;
using Elastic.Transport;

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.VectorData;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Data;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.PromptTemplates.Handlebars;

We can now create our data model and provide it with Semantic Kernel specific attributes to define the storage model schema and some hints for the text search:

/// <summary>
/// Data model for storing a "hotel" with a name, a description, a  description embedding and an optional reference link.
/// </summary>
public sealed record Hotel
{
	[VectorStoreRecordKey]
	public required string HotelId { get; set; }

	[TextSearchResultName]
	[VectorStoreRecordData(IsFilterable = true)]
	public required string HotelName { get; set; }

	[TextSearchResultValue]
	[VectorStoreRecordData(IsFullTextSearchable = true)]
	public required string Description { get; set; }

	[VectorStoreRecordVector(Dimensions: 1536, DistanceFunction.CosineSimilarity, IndexKind.Hnsw)]
	public ReadOnlyMemory<float>? DescriptionEmbedding { get; set; }

	[TextSearchResultLink]
	[VectorStoreRecordData]
	public string? ReferenceLink { get; set; }
}

The Storage Model Schema attributes (`VectorStore*`) are most relevant for the actual use of the Elasticsearch Vector Store Connector, namely:

  • VectorStoreRecordKey to mark a property on a record class as the key under which the record is stored in a vector store.
  • VectorStoreRecordData to mark a property on a record class as ‘data’.
  • VectorStoreRecordVector to mark a property on a record class as a vector.

All of these attributes accept various optional parameters that can be used to further customize the storage model. In the case of VectorStoreRecordKey , for example, it is possible to specify a different distance function or a different index type.

The text search attributes (TextSearch*) will be important in the last step of this example. We will come back to them later.

In the next step, we initialize the Semantic Kernel engine and obtain references to the core services. In a real world application, dependency injection should be used instead of directly accessing the service collection. The same thing applies to the hardcoded configuration and secrets, which should be read using a configuration provider instead:

var builder = Host.CreateApplicationBuilder(args);

// Register AI services.
var kernelBuilder = builder.Services.AddKernel();

kernelBuilder.AddAzureOpenAIChatCompletion("gpt-4o", "https://my-service.openai.azure.com", "my_token");

kernelBuilder.AddAzureOpenAITextEmbeddingGeneration("ada-002", "https://my-service.openai.azure.com", "my_token");

// Register text search service.
kernelBuilder.AddVectorStoreTextSearch<Hotel>();

// Register Elasticsearch vector store.
var elasticsearchClientSettings = new ElasticsearchClientSettings(new Uri("https://my-elasticsearch-instance.cloud"))
    .Authentication(new BasicAuthentication("elastic", "my_password"));

kernelBuilder.AddElasticsearchVectorStoreRecordCollection<string, Hotel>("skhotels", elasticsearchClientSettings);

// Build the host.
using var host = builder.Build();

// For demo purposes, we access the services directly without using a DI context.

var kernel = host.Services.GetService<Kernel>()!;
var embeddings = host.Services.GetService<ITextEmbeddingGenerationService>()!;
var vectorStoreCollection = host.Services.GetService<IVectorStoreRecordCollection<string, Hotel>>()!;

// Register search plugin.
var textSearch = host.Services.GetService<VectorStoreTextSearch<Hotel>>()!;
kernel.Plugins.Add(textSearch.CreateWithGetTextSearchResults("SearchPlugin"));

The vectorStoreCollection service can now be used to create the collection and to ingest a few demo records:

await vectorStoreCollection.CreateCollectionIfNotExistsAsync();

// CSV format: ID;Hotel Name;Description;Reference Link
var hotels = (await File.ReadAllLinesAsync("hotels.csv"))
    .Select(x => x.Split(';'));

foreach (var chunk in hotels.Chunk(25))
{
    var descriptionEmbeddings = await embeddings.GenerateEmbeddingsAsync(chunk.Select(x => x[2]).ToArray());
    
    for (var i = 0; i < chunk.Length; ++i)
    {
        var hotel = chunk[i];
        await vectorStoreCollection.UpsertAsync(new Hotel
        {
            HotelId = hotel[0],
            HotelName = hotel[1],
            Description = hotel[2],
            DescriptionEmbedding = descriptionEmbeddings[i],
            ReferenceLink = hotel[3]
        });
    }
}
This shows how Semantic Kernel reduces the use of a vector store with all its complexity to a few simple method calls.

Under the hood, a new index is created in Elasticsearch and all the necessary property mappings are created. Our data set is then mapped completely transparently into the storage model and finally stored in the index. Below is how the mappings look in Elasticsearch.

{
  "mappings": {
    "properties": {
      "descriptionEmbedding": {
        "dims": 1536,
        "index": true,
        "index_options": {
          "type": "hnsw"
        },
        "similarity": "cosine",
        "type": "dense_vector"
      },
      "hotelName": {
        "type": "keyword"
      },
      "description": {
        "type": "text"
      }
    }
  }
}

The embeddings.GenerateEmbeddingsAsync() calls transparently called the configured Azure AI Embeddings Generation service.

Even more magic can be observed in the last step of this demo.

With just a single call to InvokePromptAsync, all of the following operations are performed when the user asks a question about the data:

1. An embedding for the user’s question is generated

2. The vector store is searched for relevant entries

3. The results of the query are inserted into a prompt template

4. The actual query in the form of the final prompt is sent to the AI chat completion service

// Invoke the LLM with a template that uses the search plugin to
// 1. get related information to the user query from the vector store
// 2. add the information to the LLM prompt.
var response = await kernel.InvokePromptAsync(
    promptTemplate: """
                    Please use this information to answer the question:
                    {{#with (SearchPlugin-GetTextSearchResults question)}}
                      {{#each this}}
                        Name: {{Name}}
                        Value: {{Value}}
                        Source: {{Link}}
                        -----------------
                      {{/each}}
                    {{/with}}
                    
                    Include the source of relevant information in the response.

                    Question: {{question}}
                    """,
    arguments: new KernelArguments
    {
        { "question", "Please show me all hotels that have a rooftop bar." },
    },
    templateFormat: "handlebars",
    promptTemplateFactory: new HandlebarsPromptTemplateFactory());

Remember the TextSearch* attributes, we previously defined on our data model? These attributes enable us to use corresponding placeholders in our prompt template which are automatically populated with the information from our entries in the vector store.

The final response to our question “Please show me all hotels that have a rooftop bar.” is as follows:

Console.WriteLine(response.ToString());

// > The hotel that has a rooftop bar is Skyline Suites. You can find more information about this hotel [here](https://example.com/yz567).

The answer correctly refers to the following entry in our hotels.csv

9;
Skyline Suites;
Offering panoramic city views from every suite, this hotel is perfect for those who love the urban landscape. Enjoy luxurious amenities, a rooftop bar, and close proximity to attractions. Luxurious and contemporary.;
https://example.com/yz567

This example shows very well how the use of Microsoft Semantic Kernel achieves a significant reduction in complexity through its well thought abstractions, as well as enabling a very high level of flexibility. By changing a single line of code, for example, the vector store or the AI services used can be replaced without having to refactor any other part of the code.

At the same time, the framework provides an enormous set of high-level functionality, such as the `InvokePrompt` function, or the template or search plugin system.

The complete demo application can be found in the Elasticsearch vector store connector repository.

What else is possible with ES

What’s next?

  • We showed how the Elasticsearch vector store can be easily plugged into Semantic Kernel while building GenAI applications in .NET. Stay tuned for a Python integration next.
  • As Semantic Kernel builds abstractions for advanced search features like hybrid search, the Elasticsearch connect will enable .NET developers to easily implement them while using Semantic Kernel.

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

0 comments