June 2nd, 2026
0 reactions

Announcing the Public Preview of Integrated Embeddings in Azure Cosmos DB: Build AI Apps With Embeddings That Stay in Sync

Principal Product Manager

AI applications built on Azure Cosmos DB depend on embeddings for grounded results. Keeping them in sync with your data is the hard part: it means building and operating a separate data pipeline to track changes, call an embedding model, and write the results back to Azure Cosmos DB. In practice, that pipeline also has to handle failures and retries, throttling, scaling, and monitoring as your data and traffic grow.

Integrated Embeddings in Azure Cosmos DB, now in Public Preview, removes that heavy lifting. Azure Cosmos DB automatically generates and maintains the embeddings for you as items are written and updated, so the vectors stored alongside your items always reflect the current state of your data. You configure it by specifying the source properties to embed, the Microsoft Foundry embedding model to use, and the path where the generated embeddings are stored, and then focus on building AI applications, such as Retrieval-Augmented Generation (RAG), on top of your data.

How Integrated Embeddings works

Integrated Embeddings is configured through a new embeddingSource block in the container vector policy. The rest of the policy (vector path, dimensions, distance function) stays the same. This block tells Azure Cosmos DB three things:

  • What to embed: one or more item properties listed in sourcePaths. When you list multiple paths, the values are combined into a single input for the embedding model. An item is re-embedded only when one of these properties changes.
  • What to embed with: a Microsoft Foundry embedding model deployment, identified by deploymentName, modelName, and endpoint.
  • How to authenticate: authType: "Entra" — currently the only supported value.

For example, here is a vector policy that embeds the /text property of each item using text-embedding-3-small and writes the resulting vector to /embedding:

{
  "vectorEmbeddings": [
    {
      "path": "/embedding",
      "dataType": "float32",
      "dimensions": 1536,
      "distanceFunction": "cosine",
      "embeddingSource": {
        "sourcePaths": ["/text"],
        "deploymentName": "text-embedding-3-small",
        "modelName": "text-embedding-3-small",
        "endpoint": "https://<foundry-resource-name>.openai.azure.com/",
        "authType": "Entra"
      }
    }
  ]
}

 

At the time of Public Preview, the following Azure OpenAI embedding models are supported through Microsoft Foundry: text-embedding-3-small, text-embedding-3-large, and text-embedding-ada-002.

Integrated Embeddings in action

To get started with a simple example, try the quickstart in the docs. It walks through creating a container with an embeddingSource policy, inserting a few sample items, and verifying that Azure Cosmos DB writes an embedding to each item.

The walkthrough below is a longer, end-to-end example. You’ll load a dataset of outdoor products into Azure Cosmos DB, let Integrated Embeddings generate the vectors, and then run a vector search against the container.

Before you start, complete the Integrated Embeddings prerequisites from the documentation: vector search, change feed mode enabled, and a Microsoft Foundry model deployment. The Azure Cosmos DB account’s managed identity also needs the Cognitive Services OpenAI User role on the Microsoft Foundry resource so it can call the model.

In addition, the principal you sign in as needs two role assignments on the Azure Cosmos DB account so the sample app can act on your behalf:

  • Cosmos DB Operator (Azure RBAC) to create the database and container through Azure Resource Manager.
  • Cosmos DB Built-in Data Contributor (Azure Cosmos DB RBAC) to upsert and read items. See how to assign these roles in the getting started section.

In step 4 the sample app calls the Microsoft Foundry embedding deployment directly to embed the query string. For simplicity, the sample uses an API key for this call (Integrated Embeddings itself uses Entra ID, as configured in the vector policy). Make sure you have the API key for your Foundry resource ahead of time and have it ready to drop into .env file.

Set up the sample application

The sample app is written in Python; you’ll need Python 3.x installed locally. Clone the GitHub repository and install dependencies:

git clone https://github.com/abhirockzz/integrated-embeddings-sample
cd integrated-embeddings-sample

# create a virtual environment and install dependencies
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Copy .env.example to .env and fill in your Azure Cosmos DB account endpoint and Microsoft Foundry deployment details:

cp .env.example .env
# edit .env

Sign in to the Azure CLI so the sample app can authenticate to Azure Cosmos DB and Microsoft Foundry using your identity:

az login

Step 1: Create the database and container

This script creates the database and a container along with a vector embedding policy that has an embeddingSource block with source path /description, model text-embedding-3-small, and output stored at /embedding. It adds a quantizedFlat vector index on /embedding so you can query the embeddings in step 4.

Run the script:

python create_db_and_container.py

The container is provisioned with autoscale 1,000 RU/s.

Step 2: Insert sample data

This script upserts 100 outdoor-product items from items.json into the container. Each item has an id, a name, a description, a category, tags, and a few other fields — but only /description is sent to the embedding model, per the policy you set in step 1.

python insert_sample_data.py

You’ll see one line per item as it’s inserted. None of the items have an embedding field yet; Azure Cosmos DB picks up the changes and generates embeddings asynchronously.

Step 3: Verify embeddings in the Azure portal

Open the Azure portal, navigate to your Azure Cosmos DB account → Data Explorer, open the container you created in step 1, and run the query below.

SELECT VALUE COUNT(1) FROM c WHERE IS_DEFINED(c.embedding)

When the count reaches 100, all embeddings have been generated.

Step 4: Run a vector search

Now that every item has an embedding, you can run a vector search against the container. The script embeds your query string by calling the same Microsoft Foundry embedding model deployment that Azure Cosmos DB used for the items (using the FOUNDRY_API_KEY you set in .env file), then runs a VectorDistance() query to find the closest items by cosine similarity.

python vector_search.py "I need to stay warm on a cold ski trip"

Running this returns something similar to the following results. You should see a mix of ski-related gloves and jackets along with some cold-weather sleeping bags — all relevant to the concept of “staying warm on a cold ski trip,” even though none of the item descriptions contain those exact words.

Query: 'I need to stay warm on a cold ski trip'
Top 5 results:

1. Studio Talon Insulated Storm Glove
category=Winter Sports, Gloves, Insulated Gloves score=0.4974
2. Prairie Nomad Waterproof Resort Shell Jacket
category=Skiing, Outerwear, Shell Jackets score=0.4923
3. Ridge Drift Touchscreen Insulated Ski Glove
category=Winter Sports, Gloves, Insulated Gloves score=0.4855
4. Everest All-Weather Short 850 Fill Trail Sack
category=Camping, Sleeping Bags, Down Bags score=0.4756
5. Brook Shift 850-Fill Trail Sack Sleeping Bag
category=Camping, Sleeping Bags, Down Bags score=0.4570

A couple more queries to give you a feel for what’s possible.

Try a planning-style query. Instead of gear, you’ll get trip-planning books and trail guides:

python vector_search.py "plan a long hike in unfamiliar terrain"

Or try a more specific query. The top result is a one-person shelter, with closely related tents below it:

python vector_search.py "easy to set up shelter for one person"

You can use --top-k to control how many results are returned (defaults to 5):

python vector_search.py "lightweight cookware for backpacking" --top-k 3

Pure vector search returns the most semantically similar items but doesn’t account for exact keyword matches. For queries where both semantic intent and specific terms matter, Azure Cosmos DB for NoSQL supports hybrid search, which combines vector similarity and full-text (BM25) ranking using Reciprocal Rank Fusion. You can also add a WHERE clause to narrow results to a specific category or tag. All of these queries run against the same embeddings that Integrated Embeddings generates and keeps in sync.

Build a simple RAG agent on top of the data

Retrieval-Augmented Generation (RAG) is a pattern where a language model answers user questions by first retrieving relevant content from a knowledge base, then using that content as grounding for its response. For RAG over your Azure Cosmos DB data, the retrieval step is vector search and the knowledge base is your container.

To turn the vector search into a Retrieval-Augmented Generation (RAG) application, we wrap it as a tool that a language model can call. We use a simple LangChain agent with a retrieve_context tool that embeds the user’s query and runs the same VectorDistance() search you saw in step 4. The agent decides when to call the tool, reads the results, and answers in natural language. To keep the agent grounded in your data, the system prompt instructs the model to recommend only products that appear in the retrieved results and to ignore any instructions contained in the retrieved text.

The agent needs a large language model (LLM). Deploy the model (for example gpt-5.4) in your Microsoft Foundry resource and set FOUNDRY_CHAT_DEPLOYMENT in .env to the deployment name. The agent uses the same FOUNDRY_API_KEY for both chat and query-time embedding calls.

Once you start the agent, it opens a simple interactive prompt where you can ask catalog-style questions:

python rag_agent.py

Try out a few queries. For example, ask about a product category and the agent surfaces every relevant item:

You: What sleeping bags do you have for cold nights?

The agent calls the retrieval tool, gets back the three cold-weather down bags in the catalog, and lists them with their shared 850-fill warmth and use cases.

Ask about a specific product feature and the agent filters the results for you:

You: What ski goggles do you have with a magnetic lens?

Vector search returns all three ski goggles in the catalog, but the agent recommends only the two that actually have a magnetic lens system. This is the agent + RAG advantage on top of pure vector search: broad retrieval, narrow reasoning.

Integrated Embeddings keeps the item embeddings in sync with the source data automatically, so the agent’s retrieval stays accurate as products are added, updated, or removed. You don’t have to build or run a separate embedding pipeline to keep the index fresh.

Other ways to configure Integrated Embeddings

You can embed more than one property at a time by listing multiple paths in sourcePaths. Azure Cosmos DB concatenates the values into a single input for the embedding model. This is useful when no single field carries enough information. For example, a product title is usually too short on its own, but combining /title and /description produces a richer vector.

{
  "vectorEmbeddings": [
    {
      "path": "/embedding",
      "dataType": "float32",
      "dimensions": 1536,
      "distanceFunction": "cosine",
      "embeddingSource": {
        "sourcePaths": [
          "/title",
          "/description"
        ],
        "deploymentName": "text-embedding-3-small",
        "modelName": "text-embedding-3-small",
        "endpoint": "https://<foundry-resource-name>.openai.azure.com/",
        "authType": "Entra"
      }
    }
  ]
}

You can also generate multiple embeddings by adding more entries to vectorEmbeddings. Each entry has its own path, model, and source properties, and Azure Cosmos DB maintains all of the vectors in parallel.

The example below generates /desc_embedding from /description using text-embedding-3-large, and /title_embedding from /title using text-embedding-3-small.

{
  "vectorEmbeddings": [
    {
      "path": "/desc_embedding",
      "dataType": "float32",
      "dimensions": 3072,
      "distanceFunction": "cosine",
      "embeddingSource": {
        "sourcePaths": [
          "/description"
        ],
        "deploymentName": "text-embedding-3-large",
        "modelName": "text-embedding-3-large",
        "endpoint": "https://<foundry-resource-name>.openai.azure.com/",
        "authType": "Entra"
      }
    },
    {
      "path": "/title_embedding",
      "dataType": "float32",
      "dimensions": 1536,
      "distanceFunction": "cosine",
      "embeddingSource": {
        "sourcePaths": [
          "/title"
        ],
        "deploymentName": "text-embedding-3-small",
        "modelName": "text-embedding-3-small",
        "endpoint": "https://<foundry-resource-name>.openai.azure.com/",
        "authType": "Entra"
      }
    }
  ]
}

What’s supported in Public Preview

You can configure Integrated Embeddings today through the Azure Cosmos DB SDK (for Python) with key-based authentication, or through the Azure Cosmos DB management SDK (Python and JavaScript) with Microsoft Entra ID. Both options are demonstrated in the documentation. Support across the Azure CLI, ARM, Bicep, and other SDKs will be added in subsequent releases.

Azure Portal support for configuring and managing Integrated Embeddings (in Data Explorer) is not available yet. As we work on adding this, you can configure Integrated Embeddings through one of the SDK options.

Get started

With Integrated Embeddings, Azure Cosmos DB keeps vector embeddings in sync with your data automatically, so you no longer need to build and operate separate pipelines to do it. Integrated Embeddings uses your existing Microsoft Foundry and Azure Cosmos DB resources, so the only costs are the Foundry inference calls and the request units used to read the change feed and write embeddings back to your items.

To start building:

We’d love your feedback during preview! Reach out to us at CosmosSearch@Microsoft.com.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless NoSQL and vector database for modern app development, including AI applications. With its SLA-backed speed and availability as well as instant dynamic scalability, it is ideal for real-time NoSQL and MongoDB applications that require high performance and distributed computing over massive volumes of NoSQL and vector data.

To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.  Join the discussion with other developers on the #nosql channel on the Microsoft Open Source Discord.

Author

Abhishek Gupta
Principal Product Manager

Principal Product Manager in the Azure Cosmos DB team.

0 comments