Introducing Vector Search in Azure Cosmos DB for MongoDB vCore

James Codella

Gahl Levy

Image vector search vcore

We are thrilled to announce the release of Vector Search in Azure Cosmos DB for MongoDB vCore, which will be showcased at Microsoft Build. This innovative feature opens a world of new opportunities for building intelligent AI-powered applications and makes Azure Cosmos DB for MongoDB vCore the first among MongoDB-compatible offerings to feature Vector Search!

With Vector Search, you can now seamlessly integrate AI-based applications, including those using OpenAI embeddings, with your data already stored in Cosmos DB. You can store, index, and query high dimensional vector data stored directly in Azure Cosmos DB for MongoDB vCore, eliminating the need to transfer your data to more expensive alternatives for vector similarity search capabilities.

This comprehensive solution streamlines your AI application development by reducing complexity and enhancing efficiency. With Vector Search, you’ll be able to unlock new insights from your data that were previously hidden or hard to find, leading to more accurate and powerful applications.

 

Vector search is a method that helps you find similar items based on their data characteristics rather than exact matches on a property field. This is especially useful in applications such as searching for similar text, finding related images, making recommendations, or even detecting anomalies. It works by taking the vector representations (lists of numbers) of your data that you have created using an ML model, or an embeddings API such as Azure OpenAI Service Embeddings or Hugging Face on Azure. It then measures the distance between the data vectors and your query vector. The data vectors that are closest to your query vector are the ones that are found to be most similar semantically.

By integrating vector search capabilities natively, you can now unlock the full potential of your data in your intelligent applications.

 

Create a Vector Index

You can create vector indexes to power vector search using the following createIndexes Spec template:

db.runCommand({
  createIndexes: 'exampleCollection',
  indexes: [
    {
      name: 'vectorSearchIndex',
      key: {
        "vectorContent": "cosmosSearch"
      },
      cosmosSearchOptions: {
        kind: 'vector-ivf',
        numLists: 100,
        similarity: 'COS',
        dimensions: 3
      }
    }
  ]
});

This command creates a vector-ivf index against the vectorContent property in the documents stored in the collection exampleCollection. The cosmosSearchOptions property specifies the parameters for the IVF vector index. In our small example, we specify dimensions: 3  where each vector has a dimensionality (or size) of 3. However, the dimensionality of your vectors may be different. For example, if you were using OpenAI Embeddings this would be set to 1536.

Add vectors to your database

To add vectors to your database’s collection, you can use the OpenAI Embeddings model, another API (such as Hugging Face), or to generate embeddings from the data. In this example, we’ll insert a few documents that contain sample embeddings:

db.exampleCollection.insertMany([
  {name: "Eugenia Lopez", bio: "Eugenia is the CEO of AdvenureWorks.", vectorContent: [0.51, 0.12, 0.23]},
  {name: "Cameron Baker", bio: "Cameron Baker CFO of AdvenureWorks.", vectorContent: [0.55, 0.89, 0.44]},
  {name: "Jessie Irwin", bio: "Jessie Irwin is the former CEO of AdventureWorks and now the director of the Our Planet initiative.", vectorContent: [0.13, 0.92, 0.85]},
  {name: "Rory Nguyen", bio: "Rory Nguyen is the founder of AdventureWorks and the president of the Our Planet initiative.", vectorContent: [0.91, 0.76, 0.83]},
]);

Continuing with the above example, let’s create another vector, queryVector. Vector search measures the distance between queryVector and the vectors in the vectorContent path of your documents. You can set the number of results the search returns by setting the parameter k, which we’ll set to 2.

const queryVector = [0.52, 0.28, 0.12];
db.exampleCollection.aggregate([
  {
    $search: {
      "cosmosSearch": {
        "vector": queryVector,
        "path": "vectorContent",
        "k": 2
      },
    "returnStoredSource": true
    }
  }
]);

We can perform a vector search using queryVector as an input via the Mongo shell. The search result (shown below) is a list of the two most similar items to the query vector, sorted by their similarity scores. In the example below, the document for Eugenia Lopez has the vectorContent that is most similar to queryVector, followed by Rory Nguyen.

[
  {
    _id: ObjectId("645acb54413be5502badff94"),
    name: 'Eugenia Lopez',
    bio: 'Eugenia is the CEO of AdvenureWorks.',
    vectorContent: [ 0.51, 0.12, 0.23 ]
  },
  {
    _id: ObjectId("645acb54413be5502badff97"),
    name: 'Rory Nguyen',
    bio: 'Rory Nguyen is the founder of AdventureWorks and the president of the Our Planet initiative.',
    vectorContent: [ 0.91, 0.76, 0.83 ]
  }
]

Next Steps

Vector Search is a game-changer for developers looking to use AI capabilities in their applications. Azure Cosmos DB for MongoDB vCore offers a single, seamless solution for transactional data and vector search utilizing embeddings from the Azure OpenAI Service API or other solutions. You’re now equipped to create smarter, more efficient, and user-focused applications that stand out. Check out these resources to help you get started:

Get Started with Azure Cosmos DB for free

Azure Cosmos DB is a fully managed NoSQL and relational database for modern app development with SLA-backed speed and availability, automatic and instant scalability, and support for open source PostgreSQL, MongoDB and Apache Cassandra. Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on TwitterYouTube, and LinkedIn.

1 comment

Comments are closed. Login to edit/delete your existing comments

  • Leo Wang 3

    This is fantastic.
    Thank you Microsoft for bring the Vector capability to Azure.

    In my typical Python code, there is vector database, just a local one like Chroma or FAISS.
    How easy is it to replace it with CosmosDB (which I had no prior experience)?

    Also I had another look at LangChain Docs that its vectorstore supports Azure Cognitive Search and Supabase (Postgres), which both are already supported within Azure. Should I take this approach if I want to bring my Python web app to Azure app service? instead of using Cosmos DB?

    Best regards,
    Nhtkid

Feedback usabilla icon