{"id":12178,"date":"2026-05-01T09:43:29","date_gmt":"2026-05-01T16:43:29","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=12178"},"modified":"2026-05-01T09:43:29","modified_gmt":"2026-05-01T16:43:29","slug":"langchain-azure-cosmos-db-agents-rag","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/langchain-azure-cosmos-db-agents-rag\/","title":{"rendered":"Introducing langchain-azure-cosmosdb: Build Agentic Apps and RAG with One Database"},"content":{"rendered":"<h1>Build AI Agents and RAG Applications with the New LangChain + LangGraph Connector for Azure Cosmos DB<\/h1>\n<p>Building AI agents and RAG applications today means stitching together half a dozen services, a vector database, a chat history store, a checkpointer for agent state, a semantic cache, a long-term memory layer. Each adds operational overhead, latency, and technical debt.<\/p>\n<p>langchain-azure-cosmosdb collapses that stack. It&#8217;s a Python LangChain and LangGraph connector that turns Azure Cosmos DB for NoSQL into the single persistence layer for all of your agentic app scenarios.<\/p>\n<p>Azure Cosmos DB for NoSQL natively supports <a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/vector-search\">vector<\/a>, <a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/gen-ai\/full-text-search?context=\/azure\/cosmos-db\/context\/context\">full-text<\/a> (lexical), and <a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/gen-ai\/hybrid-search?context=\/azure\/cosmos-db\/context\/context\">hybrid<\/a> search, combined with high elasticity, automatic sharding, <a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/provision-throughput-autoscale?context=\/azure\/cosmos-db\/context\/context\">autoscale<\/a> or <a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/serverless\">serverless<\/a> models, and up to a 99.999% SLA. It scales vector search from thousands to billions of vectors and is also the database powering scenarios across OpenAI including ChatGPT histories and memories. This integration makes it easier to build your agentic apps with one efficient, highly scalable source of truth for all your agentic data needs.<\/p>\n<p>It&#8217;s available today on <a href=\"http:\/\/pypi.org\/project\/langchain-azure-cosmosdb\">PyPI<\/a> and <a href=\"http:\/\/github.com\/langchain-ai\/langchain-azure\">Github<\/a>, and can be easily installed via:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">pip install langchain-azure-cosmosdb<\/code><\/pre>\n<h2><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2026\/05\/langchain-azure-cosmosdb.png\" alt=\"null\" \/><\/h2>\n<h2>Why a Dedicated Connector?<\/h2>\n<p>When building AI agents and RAG applications, developers often face a fragmented stack:<\/p>\n<ul>\n<li><strong>Vector search<\/strong> requires a separate vector database or service<\/li>\n<li><strong>Chat history<\/strong> needs another storage backend<\/li>\n<li><strong>Agent state checkpointing<\/strong> adds yet another system<\/li>\n<li><strong>Semantic caching<\/strong> means even more infrastructure<\/li>\n<li><strong>Long-term memory<\/strong> requires yet another bolt-on<\/li>\n<\/ul>\n<p>This sprawl leads to complex integrations, higher operational costs, and a larger security surface. It also makes global distribution nearly impossible to achieve consistently across all these components.<\/p>\n<p>The langchain-azure-cosmosdb package brings all of these capabilities directly into the LangChain and LangGraph ecosystem, so you can build your agentic apps on a single, highly scalable database, simplifying your app architecture and avoiding the need for specialized vector DBs or search engines.<\/p>\n<h2>What&#8217;s Included<\/h2>\n<p>The package ships with six integrations, each available in both synchronous and asynchronous variants:<\/p>\n<table>\n<thead>\n<tr>\n<th><strong>Integration<\/strong><\/th>\n<th><strong>Sync<\/strong><\/th>\n<th><strong>Async<\/strong><\/th>\n<th><strong>What It Does<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Vector Store<\/strong><\/td>\n<td>AzureCosmosDBNoSqlVectorSearch<\/td>\n<td>AsyncAzureCosmosDBNoSqlVectorSearch<\/td>\n<td>Vector, full-text, hybrid, and weighted hybrid search<\/td>\n<\/tr>\n<tr>\n<td><strong>Semantic Cache<\/strong><\/td>\n<td>AzureCosmosDBNoSqlSemanticCache<\/td>\n<td>AsyncAzureCosmosDBNoSqlSemanticCache<\/td>\n<td>Cache LLM responses to reduce latency and cost<\/td>\n<\/tr>\n<tr>\n<td><strong>Chat History<\/strong><\/td>\n<td>CosmosDBChatMessageHistory<\/td>\n<td>AsyncCosmosDBChatMessageHistory<\/td>\n<td>Persist conversation history with TTL support<\/td>\n<\/tr>\n<tr>\n<td><strong>LangGraph Checkpointer<\/strong><\/td>\n<td>CosmosDBSaverSync<\/td>\n<td>CosmosDBSaver<\/td>\n<td>Graph state persistence for multi-turn agents<\/td>\n<\/tr>\n<tr>\n<td><strong>LangGraph Cache<\/strong><\/td>\n<td>CosmosDBCacheSync<\/td>\n<td>CosmosDBCache<\/td>\n<td>Node-level result caching for graph workflows<\/td>\n<\/tr>\n<tr>\n<td><strong>LangGraph Store<\/strong><\/td>\n<td>CosmosDBStore<\/td>\n<td>AsyncCosmosDBStore<\/td>\n<td>Long-term memory with namespace organization and semantic search<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Every integration supports both <strong>access key<\/strong> and <strong>Microsoft Entra ID (Managed Identity)<\/strong> authentication out of the box.<\/p>\n<h2>Semantic Search: Beyond Basic Search<\/h2>\n<p>Azure Cosmos DB&#8217;s vector search capabilities go well beyond basic similarity matching. The connector exposes all search modes:<\/p>\n<ul>\n<li><strong>Vector similarity search<\/strong> with DiskANN or Quantized Flat vector indexes for efficient similarity search at any scale<\/li>\n<li><strong>Full-text search<\/strong> with BM25 ranking<\/li>\n<li><strong>Hybrid search<\/strong> combining vector and full-text with RRF (Reciprocal Rank Fusion)<\/li>\n<li><strong>Weighted hybrid search<\/strong> for fine-tuned control over vector vs. text relevance<\/li>\n<\/ul>\n<p>Here&#8217;s how to set up a vector store and run a hybrid search:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from azure.cosmos import CosmosClient, PartitionKey\r\nfrom azure.identity import DefaultAzureCredential\r\nfrom langchain_openai import AzureOpenAIEmbeddings\r\nfrom langchain_azure_cosmosdb import AzureCosmosDBNoSqlVectorSearch\r\n\r\n\r\ncosmos_client = CosmosClient(\r\n    \"&lt;endpoint&gt;\",\r\n    credential=DefaultAzureCredential(),\r\n)\r\n\r\nvectorstore = AzureCosmosDBNoSqlVectorSearch(\r\n    cosmos_client=cosmos_client,\r\n    embedding=AzureOpenAIEmbeddings(\r\n        azure_endpoint=\"&lt;openai-endpoint&gt;\",\r\n        azure_deployment=\"text-embedding-3-small\",\r\n    ),\r\n    vector_embedding_policy={\r\n        \"vectorEmbeddings\": [\r\n            {\r\n                \"path\": \"\/embedding\",\r\n                \"dataType\": \"float32\",\r\n                \"distanceFunction\": \"cosine\",\r\n                \"dimensions\": 1536,\r\n            }\r\n        ]\r\n    },\r\n    indexing_policy={\r\n        \"vectorIndexes\": [\r\n            {\r\n                \"path\": \"\/embedding\",\r\n                \"type\": \"diskANN\",\r\n            }\r\n        ],\r\n        \"fullTextIndexes\": [\r\n            {\r\n                \"path\": \"\/text\",\r\n            }\r\n        ],\r\n    },\r\n    cosmos_container_properties={\r\n        \"partition_key\": PartitionKey(path=\"\/id\"),\r\n    },\r\n    cosmos_database_properties={\r\n        \"id\": \"my-rag-db\",\r\n    },\r\n    vector_search_fields={\r\n        \"text_field\": \"text\",\r\n        \"embedding_field\": \"embedding\",\r\n    },\r\n    full_text_search_enabled=True,\r\n)\r\n\r\n# Add documents\r\nvectorstore.add_texts(\r\n    [\r\n        \"Azure Cosmos DB is a globally distributed database.\",\r\n    ]\r\n)\r\n\r\n# Hybrid search\r\nresults = vectorstore.similarity_search(\r\n    \"distributed database\",\r\n    k=5,\r\n    search_type=\"hybrid\",\r\n    full_text_rank_filter=[\r\n        {\r\n            \"search_field\": \"text\",\r\n            \"search_text\": \"distributed\",\r\n        }\r\n    ],\r\n)<\/code><\/pre>\n<h2>Building a Multi-Turn Agent with LangGraph<\/h2>\n<p>One of the most powerful use cases is building conversational agents that remember context across turns. With CosmosDBSaverSync, your LangGraph agent&#8217;s state is persisted to Cosmos DB automatically:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from langchain_openai import AzureChatOpenAI\r\nfrom langgraph.graph import StateGraph, START, END\r\nfrom langgraph.graph.message import add_messages\r\nfrom langchain_azure_cosmosdb import CosmosDBSaverSync\r\n\r\n\r\n# Initialize LLM\r\nllm = AzureChatOpenAI(\r\n    azure_endpoint=\"&lt;openai-endpoint&gt;\",\r\n    azure_deployment=\"&lt;chat-deployment&gt;\",\r\n)\r\n\r\n# Create checkpointer \u2014 falls back to DefaultAzureCredential if no key\r\ncheckpointer = CosmosDBSaverSync(\r\n    database_name=\"agents-db\",\r\n    container_name=\"checkpoints\",\r\n    endpoint=\"&lt;cosmos-endpoint&gt;\",\r\n)\r\n\r\n\r\n# Define state (simple message accumulator)\r\nclass State(dict):\r\n    messages: list\r\n\r\n\r\n# Define a simple chatbot graph\r\ndef chatbot(state):\r\n    return {\r\n        \"messages\": [\r\n            llm.invoke(state[\"messages\"]),\r\n        ]\r\n    }\r\n\r\n\r\ngraph = StateGraph(State)\r\ngraph.add_node(\"chatbot\", chatbot)\r\ngraph.add_edge(START, \"chatbot\")\r\ngraph.add_edge(\"chatbot\", END)\r\n\r\napp = graph.compile(checkpointer=checkpointer)\r\n\r\n# Multi-turn conversation \u2014 state persists across invocations\r\nconfig = {\r\n    \"configurable\": {\r\n        \"thread_id\": \"user-123\",\r\n    }\r\n}\r\n\r\napp.invoke(\r\n    {\r\n        \"messages\": [\r\n            (\"user\", \"Hi, I'm Alice!\"),\r\n        ]\r\n    },\r\n    config=config,\r\n)\r\n\r\napp.invoke(\r\n    {\r\n        \"messages\": [\r\n            (\"user\", \"What's my name?\"),\r\n        ]\r\n    },\r\n    config=config,\r\n)<\/code><\/pre>\n<p>The checkpointer stores each graph step as a separate document in Cosmos DB, with support for get_state(), get_state_history(), and thread isolation.<\/p>\n<h2>Long-Term Memory with LangGraph Store<\/h2>\n<p>For agents that need to remember facts across sessions &#8211; user preferences, learned knowledge, extracted entities, the CosmosDBStore provides namespace-organized storage with optional semantic search:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from azure.identity import DefaultAzureCredential\r\nfrom langchain_openai import AzureOpenAIEmbeddings\r\nfrom langchain_azure_cosmosdb import CosmosDBStore\r\n\r\n\r\n# Initialize embeddings\r\nembeddings = AzureOpenAIEmbeddings(\r\n    azure_endpoint=\"&lt;openai-endpoint&gt;\",\r\n    azure_deployment=\"text-embedding-3-small\",\r\n)\r\n\r\n# Create store\r\nstore = CosmosDBStore.from_endpoint(\r\n    endpoint=\"&lt;cosmos-endpoint&gt;\",\r\n    credential=DefaultAzureCredential(),\r\n    database_name=\"agents-db\",\r\n    container_name=\"memory\",\r\n    index={\r\n        \"dims\": 1536,\r\n        \"embed\": embeddings,\r\n        \"fields\": [\"text\"],\r\n    },\r\n)\r\n\r\nstore.setup()\r\n\r\n# Store user preferences\r\nstore.put(\r\n    (\"users\", \"alice\", \"preferences\"),\r\n    \"coffee\",\r\n    {\r\n        \"text\": \"Dark roast with oat milk\",\r\n    },\r\n)\r\n\r\n# Semantic search across all users\r\nresults = store.search(\r\n    (\"users\",),\r\n    query=\"beverage preferences\",\r\n    limit=5,\r\n)<\/code><\/pre>\n<h2>Semantic Caching: Reduce Costs and Latency<\/h2>\n<p>Identical or semantically similar prompts hitting your LLM repeatedly? The semantic cache stores responses and returns cached results for similar queries, dramatically reducing API costs and response times:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from langchain_core.globals import set_llm_cache\r\n\r\nfrom langchain_azure_cosmosdb import (\r\n    AzureCosmosDBNoSqlSemanticCache,\r\n)\r\n\r\n\r\ncache = AzureCosmosDBNoSqlSemanticCache(\r\n    cosmos_client=cosmos_client,\r\n    embedding=embeddings,\r\n    vector_embedding_policy=vector_policy,\r\n    indexing_policy=indexing_policy,\r\n    cosmos_container_properties=container_props,\r\n    cosmos_database_properties={\r\n        \"id\": \"cache-db\",\r\n    },\r\n    vector_search_fields={\r\n        \"text_field\": \"text\",\r\n        \"embedding_field\": \"embedding\",\r\n    },\r\n    score_threshold=0.5,  # Configurable similarity threshold\r\n)\r\n\r\nset_llm_cache(cache)\r\n\r\n\r\n# First call: ~3s (hits LLM)\r\nllm.invoke(\"What is Azure Cosmos DB?\")\r\n\r\n\r\n# Second call: ~0.2s (cache hit)\r\nllm.invoke(\"What is Azure Cosmos DB?\")\r\n\r\n\r\n# Similar prompt: ~0.2s (semantic cache hit)\r\nllm.invoke(\"Describe Azure Cosmos DB briefly\")<\/code><\/pre>\n<h2>Full Async Support<\/h2>\n<p>Every integration has a native async counterpart using azure.cosmos.aio, so you can build high-throughput applications without blocking the event loop:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from langchain_azure_cosmosdb import CosmosDBSaver\r\n\r\n\r\nasync with CosmosDBSaver.from_conn_info(\r\n    endpoint=\"&lt;cosmos-endpoint&gt;\",\r\n    key=\"&lt;key&gt;\",\r\n    database_name=\"agents-db\",\r\n    container_name=\"checkpoints\",\r\n) as checkpointer:\r\n    app = graph.compile(checkpointer=checkpointer)\r\n\r\n    result = await app.ainvoke(\r\n        input,\r\n        config=config,\r\n    )<\/code><\/pre>\n<h2>Enterprise-Ready from Day One<\/h2>\n<ul>\n<li><strong>Microsoft Entra ID \/ Managed Identity<\/strong>: All integrations that create their own Cosmos client automatically fall back to DefaultAzureCredential when no key is provided \u2014 no secrets to manage.<\/li>\n<li><strong>Global Distribution<\/strong>: Cosmos DB&#8217;s multi-region writes and up to 99.999% SLA extends to your AI agent&#8217;s state, memory, and vector store.<\/li>\n<li><strong>DiskANN Indexing<\/strong>: Purpose-built for high-dimensional vector search at scale, delivering low-latency results even with millions of vectors.<\/li>\n<li><strong>User Agent Tracking<\/strong>: Every client connection includes a per-integration user agent string for usage tracking and diagnostics.<\/li>\n<\/ul>\n<h2>Get Started<\/h2>\n<p>Install the package:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">pip install langchain-azure-cosmosdb<\/code><\/pre>\n<p>Explore the samples:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/langchain-ai\/langchain-azure\/tree\/main\/samples\/cosmosdb-nosql\">10 runnable samples<\/a> covering every integration<\/li>\n<li><a href=\"https:\/\/github.com\/langchain-ai\/langchain-azure\/tree\/main\/libs\/azure-cosmosdb\">Package documentation<\/a><\/li>\n<li><a href=\"https:\/\/pypi.org\/project\/langchain-azure-cosmosdb\/\">PyPI page<\/a><\/li>\n<\/ul>\n<p>Explore the documentation:<\/p>\n<ul>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/free-tier\">Azure Cosmos DB Free Tier<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/vector-search\">Vector indexing &amp; search<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/gen-ai\/full-text-search\">Full-text indexing &amp; search<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/gen-ai\/hybrid-search\">Hybrid search<\/a><\/li>\n<\/ul>\n<p>We&#8217;d love to hear your feedback. Try it out, build something amazing, and let us know how it goes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Build AI Agents and RAG Applications with the New LangChain + LangGraph Connector for Azure Cosmos DB Building AI agents and RAG applications today means stitching together half a dozen services, a vector database, a chat history store, a checkpointer for agent state, a semantic cache, a long-term memory layer. Each adds operational overhead, latency, [&hellip;]<\/p>\n","protected":false},"author":118435,"featured_media":12181,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1610,1980,14,1928,1217],"tags":[1946,499,1920,2016,1312,1866],"class_list":["post-12178","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-azure-cosmos-db","category-core-sql-api","category-multi-agent","category-python-sdk","tag-agents","tag-azure-cosmos-db","tag-langchain","tag-langgraph","tag-python","tag-vector-database"],"acf":[],"blog_post_summary":"<p>Build AI Agents and RAG Applications with the New LangChain + LangGraph Connector for Azure Cosmos DB Building AI agents and RAG applications today means stitching together half a dozen services, a vector database, a chat history store, a checkpointer for agent state, a semantic cache, a long-term memory layer. Each adds operational overhead, latency, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/12178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/118435"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=12178"}],"version-history":[{"count":2,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/12178\/revisions"}],"predecessor-version":[{"id":12200,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/12178\/revisions\/12200"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/12181"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=12178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=12178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=12178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}