Introducing Microsoft.Extensions.VectorData Preview

We are excited to introduce the Microsoft.Extensions.VectorData.Abstractions library, now available in preview.

Just as the Microsoft.Extensions.AI libraries offer a unified layer for working with AI services, this package provides the .NET ecosystem with abstractions that help integrate vector stores into .NET applications and libraries.

Why vector stores?

Vector databases are important for tasks like search and grounding Generative AI responses.

Similar to how relational databases and document databases are optimized for structured and semi-structured data, vector databases are built to efficiently store, index, and manage data represented as embedding vectors. As a result, the indexing algorithms used by vector databases are optimized to efficiently retrieve data that can be used downstream in your applications.

What is Microsoft.Extensions.VectorData?

Microsoft.Extensions.VectorData is a set of core .NET libraries developed in collaboration with Semantic Kernel and the broader .NET ecosystem. These libraries provide a unified layer of C# abstractions for interacting with vector stores.

The abstractions in Microsoft.Extensions.VectorData provide library authors and developers with the following functionality:

Perform Create-Read-Update-Delete (CRUD) operations on vector stores
Use vector and text search on vector stores.

How to get started?

The easiest way to get started with Microsoft.Extensions.VectorData abstractions is by using any of the Semantic Kernel vector store connectors.

In this example, I’ll be using the in-memory vector store implementation.

To compliment this sample and make it feel more real, we’ll also be using the Ollama reference implementation in Microsoft.Extensions.AI. However, any of the other implementations that support embedding generation work as well.

Create application and add NuGet packages

Create a C# console application.
Install the following NuGet packages
Add the following using statements to your application.

using System.Collections.ObjectModel;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.VectorData;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.InMemory;

Store data

The scenario used by this sample performs semantic search over a collection of movies.

Define data models

Start by defining a class to represent your movie data.

public class Movie
{
    [VectorStoreRecordKey]
    public int Key {get;set;}

    [VectorStoreRecordData] 
    public string Title {get;set;}

    [VectorStoreRecordData]
    public string Description {get;set;}

    [VectorStoreRecordVector(384, DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory<float> Vector {get;set;}
}

By using attributes like VectoStoreRecordKey, VectorStoreRecordVector, and VectorStoreRecordData, you can annotate your data models to make it easier for vector store implementations to map POCO objects to their underlying data models. See the Semantic Kernel Vector Store Data Model learn page for more information on the options supported by each attribute.

Create vector store and movie collection

Now that you’ve defined your data models, create a vector store with a collection to store movie data.

var movieData = new List<Movie>()
{
    new Movie
        {
            Key=0, 
            Title="Lion King", 
            Description="The Lion King is a classic Disney animated film that tells the story of a young lion named Simba who embarks on a journey to reclaim his throne as the king of the Pride Lands after the tragic death of his father."
        },
    new Movie
        {
            Key=1,
            Title="Inception", 
            Description="Inception is a science fiction film directed by Christopher Nolan that follows a group of thieves who enter the dreams of their targets to steal information."
        },
    new Movie
        {
            Key=2,
            Title="The Matrix", 
            Description="The Matrix is a science fiction film directed by the Wachowskis that follows a computer hacker named Neo who discovers that the world he lives in is a simulated reality created by machines."
        },
    new Movie
        {
            Key=3,
            Title="Shrek", 
            Description="Shrek is an animated film that tells the story of an ogre named Shrek who embarks on a quest to rescue Princess Fiona from a dragon and bring her back to the kingdom of Duloc."
        }
};

var vectorStore = new InMemoryVectorStore();

var movies = vectorStore.GetCollection<int, Movie>("movies");

await movies.CreateCollectionIfNotExistsAsync();

Create embedding generator

To generate embeddings, use one of the models hosted in provided by Ollama. In this sample, the model used is all-minilm, but any would work.

Install Ollama.
Download the all-minilm model.

Configure an OllamaEmbeddingGenerator in your application:

IEmbeddingGenerator<string,Embedding<float>> generator = 
    new OllamaEmbeddingGenerator(new Uri("http://localhost:11434/"), "all-minilm");

Generate embeddings

Now that you have an embedding generator, use it to generate embeddings for your movie data store them in your vector store.

foreach(var movie in movieData)
{
    movie.Vector = await generator.GenerateEmbeddingVectorAsync(movie.Description);
    await movies.UpsertAsync(movie);
}

Query data

Now that you have data in your data store, you can query it.

Generate query embedding

Generate an embedding for the query “A family friendly movie”.

var query = "A family friendly movie";
var queryEmbedding = await generator.GenerateEmbeddingVectorAsync(query);

Query your data store

Now that you have an embedding for your query, you can use it to search your data store for relevant results.

var searchOptions = new VectorSearchOptions()
{
    Top = 1,
    VectorPropertyName = "Vector"
};

var results = await movies.VectorizedSearchAsync(queryEmbedding, searchOptions);

await foreach(var result in results.Results)
{
    Console.WriteLine($"Title: {result.Record.Title}");
    Console.WriteLine($"Description: {result.Record.Description}");
    Console.WriteLine($"Score: {result.Score}");
    Console.WriteLine();
}

The result should look as follows:

Console output displaying Lion King as a query result

Continue learning

For detailed documentation on using the abstractions see the Semantic Kernel Vector Store Learn Site.

Learn more from the following samples:

What’s next for Microsoft.Extensions.VectorData?

Similar to Microsoft.Extensions.AI, we plan to:

Continue collaborating with Semantic Kernel to build on top of the abstractions as well as Microsoft.Extensions.VectorData to bring more streamlined experiences in RAG scenarios. Check out the Semantic Kernel blog to learn more.
Work with vector store partners in the ecosystem that offer client SDKs, library authors, and developers across the .NET ecosystem on Microsoft.Extensions.VectorData adoption.

We’re excited for you to start building using Microsoft.Extensions.VectorData.

Try it out and give us feedback!

Luis Quintanilla

Author October 31, 2024

Hi Charles,

Thanks for the feedback.

If I understand correctly, the question is, if you have a Vector property, why not just use and have the vector property name inferred?

I can see how that may be confusing given the naming of the data model. In short, it's not a 1:1 mapping. You could have N number of vector properties. In which case, when you perform your query, you'd still need to specify which of the properties you're performing the search against.

<code>

Similarly, because there may be additional vector properties, some of which may serve different purposes or use different models, you'd want...

Hi Charles,

Thanks for the feedback.

If I understand correctly, the question is, if you have a Vector property, why not just use VectorSearchOptions<T> and have the vector property name inferred?

I can see how that may be confusing given the naming of the data model. In short, it’s not a 1:1 mapping. You could have N number of vector properties. In which case, when you perform your query, you’d still need to specify which of the properties you’re performing the search against.

public class Movie
{
    [VectorStoreRecordKey]
    public int Key {get;set;}

    [VectorStoreRecordData] 
    public string Title {get;set;}

    [VectorStoreRecordData]
    public string Description {get;set;}

    [VectorStoreRecordVector(384, DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory TitleVector {get;set;}

    [VectorStoreRecordVector(384, DistanceFunction.CosineSimilarity)]
    public ReadOnlyMemory DescriptionVector {get;set;}

}

//...

var searchOptions = new VectorSearchOptions()
{
    Top = 1,
    VectorPropertyName = nameof(Movie.TitleVector)
};

Similarly, because there may be additional vector properties, some of which may serve different purposes or use different models, you’d want to set them individually.

foreach(var movie in movieData)
{
    movie.TitleVector = await generator.GenerateEmbeddingVectorAsync(movie.Title);
    movie.DescriptionVector = await generator.GenerateEmbeddingVectorAsync(movie.Description);
    await movies.UpsertAsync(movie);
}

That said. If, for example, you were building a library on top of the VectorData abstractions or your vector store (i.e. CharlesVectorStore) provider SDK which implements IVectorStore wanted to further simplify and abstract away ingestion and retrieval in the ways you’ve mentioned, it should be feasible.

3 comments

Discussion is closed. Login to edit/delete existing comments.

Sarosh Wadia November 20, 2024

Hi!

Great code example!

Now that Azure SQL supports vector data type, is there code examples to use that in place of the InMemoryVectorStore() in your example to store the same data in Azure SQL?

Thanks
Charles Chen October 30, 2024
At first glance, this feels like it really falls short.
```
var searchOptions = new VectorSearchOptions()
{
    Top = 1,
    VectorPropertyName = "Vector"
};
```
Why not make this generic so the VectorPropertyName can be an expression on the type instead of a string?
```
foreach(var movie in movieData)
{
    movie.Vector = await generator.GenerateEmbeddingVectorAsync(movie.Description);
    await movies.UpsertAsync(movie);
}
```
??? Why not use the metadata added from the attributes to automatically set the vector property?

Does not feel very ergonomic at all.
- Luis Quintanilla Author October 31, 2024
  Hi Charles,
  
  Thanks for the feedback.
  
  If I understand correctly, the question is, if you have a Vector property, why not just use and have the vector property name inferred?
  
  I can see how that may be confusing given the naming of the data model. In short, it's not a 1:1 mapping. You could have N number of vector properties. In which case, when you perform your query, you'd still need to specify which of the properties you're performing the search against.
  
  <code>
  
  Similarly, because there may be additional vector properties, some of which may serve different purposes or use different models, you'd want...
  Read more
  Hi Charles,
  
  Thanks for the feedback.
  
  If I understand correctly, the question is, if you have a Vector property, why not just use VectorSearchOptions<T> and have the vector property name inferred?
  
  I can see how that may be confusing given the naming of the data model. In short, it’s not a 1:1 mapping. You could have N number of vector properties. In which case, when you perform your query, you’d still need to specify which of the properties you’re performing the search against.
  
  public class Movie { [VectorStoreRecordKey] public int Key {get;set;} [VectorStoreRecordData] public string Title {get;set;} [VectorStoreRecordData] public string Description {get;set;} [VectorStoreRecordVector(384, DistanceFunction.CosineSimilarity)] public ReadOnlyMemory TitleVector {get;set;} [VectorStoreRecordVector(384, DistanceFunction.CosineSimilarity)] public ReadOnlyMemory DescriptionVector {get;set;} } //... var searchOptions = new VectorSearchOptions() { Top = 1, VectorPropertyName = nameof(Movie.TitleVector) };
  
  Similarly, because there may be additional vector properties, some of which may serve different purposes or use different models, you’d want to set them individually.
  
  foreach(var movie in movieData) { movie.TitleVector = await generator.GenerateEmbeddingVectorAsync(movie.Title); movie.DescriptionVector = await generator.GenerateEmbeddingVectorAsync(movie.Description); await movies.UpsertAsync(movie); }
  
  That said. If, for example, you were building a library on top of the VectorData abstractions or your vector store (i.e. CharlesVectorStore) provider SDK which implements IVectorStore wanted to further simplify and abstract away ingestion and retrieval in the ways you’ve mentioned, it should be feasible.
  Read less

Introducing Microsoft.Extensions.VectorData Preview

Why vector stores?

What is Microsoft.Extensions.VectorData?

How to get started?

Create application and add NuGet packages

Store data

Define data models

Create vector store and movie collection

Create embedding generator

Generate embeddings

Query data

Generate query embedding

Query your data store

Continue learning

What’s next for Microsoft.Extensions.VectorData?

Author

3 comments

Read next

Unlocking the Power of GitHub Models in .NET with Semantic Kernel

OpenSSF Scorecard for .NET and the NuGet ecosystem

Why vector stores?

What is Microsoft.Extensions.VectorData?

How to get started?

Create application and add NuGet packages

Store data

Define data models

Create vector store and movie collection

Create embedding generator

Generate embeddings

Query data

Generate query embedding

Query your data store

Continue learning

What’s next for Microsoft.Extensions.VectorData?

Author

3 comments

Read next

Unlocking the Power of GitHub Models in .NET with Semantic Kernel

OpenSSF Scorecard for .NET and the NuGet ecosystem

Stay informed