October 29th, 2024

Introducing Microsoft.Extensions.VectorData Preview

Luis Quintanilla
Program Manager

We are excited to introduce the Microsoft.Extensions.VectorData.Abstractions library, now available in preview.

Just as the Microsoft.Extensions.AI libraries offer a unified layer for working with AI services, this package provides the .NET ecosystem with abstractions that help integrate vector stores into .NET applications and libraries.

Why vector stores?

Vector databases are important for tasks like search and grounding Generative AI responses.

Similar to how relational databases and document databases are optimized for structured and semi-structured data, vector databases are built to efficiently store, index, and manage data represented as embedding vectors. As a result, the indexing algorithms used by vector databases are optimized to efficiently retrieve data that can be used downstream in your applications.

What is Microsoft.Extensions.VectorData?

Microsoft.Extensions.VectorData is a set of core .NET libraries developed in collaboration with Semantic Kernel and the broader .NET ecosystem. These libraries provide a unified layer of C# abstractions for interacting with vector stores.

The abstractions in Microsoft.Extensions.VectorData provide library authors and developers with the following functionality:

  • Perform Create-Read-Update-Delete (CRUD) operations on vector stores
  • Use vector and text search on vector stores.

How to get started?

The easiest way to get started with Microsoft.Extensions.VectorData abstractions is by using any of the Semantic Kernel vector store connectors.

In this example, I’ll be using the in-memory vector store implementation.

To compliment this sample and make it feel more real, we’ll also be using the Ollama reference implementation in Microsoft.Extensions.AI. However, any of the other implementations that support embedding generation work as well.

Create application and add NuGet packages

  1. Create a C# console application.

  2. Install the following NuGet packages

  3. Add the following using statements to your application.

using System.Collections.ObjectModel;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.InMemory;

Store data

The scenario used by this sample performs semantic search over a collection of movies.

Define data models

Start by defining a class to represent your movie data.

public class Movie
{
    [VectorStoreRecordKey]
    public int Key {get;set;}

    [VectorStoreRecordData] 
    public string Title {get;set;}

    [VectorStoreRecordData]
    public string Description {get;set;}

    [VectorStoreRecordVector(384, DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float> Vector {get;set;}
}

By using attributes like VectoStoreRecordKey, VectorStoreRecordVector, and VectorStoreRecordData, you can annotate your data models to make it easier for vector store implementations to map POCO objects to their underlying data models. See the Semantic Kernel Vector Store Data Model learn page for more information on the options supported by each attribute.

Create vector store and movie collection

Now that you’ve defined your data models, create a vector store with a collection to store movie data.

var movieData = new List<Movie>()
{
    new Movie
        {
            Key=0, 
            Title="Lion King", 
            Description="The Lion King is a classic Disney animated film that tells the story of a young lion named Simba who embarks on a journey to reclaim his throne as the king of the Pride Lands after the tragic death of his father."
        },
    new Movie
        {
            Key=1,
            Title="Inception", 
            Description="Inception is a science fiction film directed by Christopher Nolan that follows a group of thieves who enter the dreams of their targets to steal information."
        },
    new Movie
        {
            Key=2,
            Title="The Matrix", 
            Description="The Matrix is a science fiction film directed by the Wachowskis that follows a computer hacker named Neo who discovers that the world he lives in is a simulated reality created by machines."
        },
    new Movie
        {
            Key=3,
            Title="Shrek", 
            Description="Shrek is an animated film that tells the story of an ogre named Shrek who embarks on a quest to rescue Princess Fiona from a dragon and bring her back to the kingdom of Duloc."
        }
};

var vectorStore = new InMemoryVectorStore();

var movies = vectorStore.GetCollection<int, Movie>("movies");

await movies.CreateCollectionIfNotExistsAsync();

Create embedding generator

To generate embeddings, use one of the models hosted in provided by Ollama. In this sample, the model used is all-minilm, but any would work.

  1. Install Ollama.
  2. Download the all-minilm model.
  3. Configure an OllamaEmbeddingGenerator in your application:

    IEmbeddingGenerator<string,Embedding<float>> generator = 
        new OllamaEmbeddingGenerator(new Uri("http://localhost:11434/"), "all-minilm");

Generate embeddings

Now that you have an embedding generator, use it to generate embeddings for your movie data store them in your vector store.

foreach(var movie in movieData)
{
    movie.Vector = await generator.GenerateEmbeddingVectorAsync(movie.Description);
    await movies.UpsertAsync(movie);
}

Query data

Now that you have data in your data store, you can query it.

Generate query embedding

Generate an embedding for the query “A family friendly movie”.

var query = "A family friendly movie";
var queryEmbedding = await generator.GenerateEmbeddingVectorAsync(query);

Query your data store

Now that you have an embedding for your query, you can use it to search your data store for relevant results.

var searchOptions = new VectorSearchOptions()
{
    Top = 1,
    VectorPropertyName = "Vector"
};

var results = await movies.VectorizedSearchAsync(queryEmbedding, searchOptions);

await foreach(var result in results.Results)
{
    Console.WriteLine($"Title: {result.Record.Title}");
    Console.WriteLine($"Description: {result.Record.Description}");
    Console.WriteLine($"Score: {result.Score}");
    Console.WriteLine();
}

The result should look as follows:

Console output displaying Lion King as a query result

Continue learning

For detailed documentation on using the abstractions see the Semantic Kernel Vector Store Learn Site.

Learn more from the following samples:

What’s next for Microsoft.Extensions.VectorData?

Similar to Microsoft.Extensions.AI, we plan to:

  • Continue collaborating with Semantic Kernel to build on top of the abstractions as well as Microsoft.Extensions.VectorData to bring more streamlined experiences in RAG scenarios. Check out the Semantic Kernel blog to learn more.
  • Work with vector store partners in the ecosystem that offer client SDKs, library authors, and developers across the .NET ecosystem on Microsoft.Extensions.VectorData adoption.

We’re excited for you to start building using Microsoft.Extensions.VectorData.

Try it out and give us feedback!

Author

Luis Quintanilla
Program Manager

Luis Quintanilla is a program manager based out of the New York City area working on machine learning for .NET. He's passionate about helping others be successful developing machine learning applications.

3 comments

  • Sarosh Wadia

    Hi!

    Great code example!

    Now that Azure SQL supports vector data type, is there code examples to use that in place of the InMemoryVectorStore() in your example to store the same data in Azure SQL?

    Thanks

  • Charles Chen

    At first glance, this feels like it really falls short.

    <code>

    Why not make this generic so the can be an expression on the type instead of a string?

    <code>

    ??? Why not use the metadata added from the attributes to automatically set the vector property?

    Does not feel very ergonomic at all.

    Read more
    • Luis QuintanillaMicrosoft employee Author

      Hi Charles,

      Thanks for the feedback.

      If I understand correctly, the question is, if you have a Vector property, why not just use and have the vector property name inferred?

      I can see how that may be confusing given the naming of the data model. In short, it's not a 1:1 mapping. You could have N number of vector properties. In which case, when you perform your query, you'd still need to specify which of the properties...

      Read more