{"id":10775,"date":"2025-08-06T04:00:21","date_gmt":"2025-08-06T11:00:21","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=10775"},"modified":"2025-08-05T10:08:04","modified_gmt":"2025-08-05T17:08:04","slug":"build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama\/","title":{"rendered":"Build a RAG application with LangChain and Local LLMs powered by Ollama"},"content":{"rendered":"<p>Local large language models (LLMs) provide significant advantages for developers and organizations. Key benefits include enhanced data privacy, as sensitive information remains entirely within your own infrastructure, and offline functionality, enabling uninterrupted work even without internet access. While cloud-based LLM services are convenient, running models locally gives you full control over model behavior, performance tuning, and potential cost savings. This makes them ideal for experimentation before running production workloads.<\/p>\n<p>The ecosystem for local LLMs has matured significantly, with several excellent options available, such as\u00a0<a href=\"https:\/\/ollama.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Ollama<\/a>,\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-foundry\/foundry-local\/get-started\" target=\"_blank\" rel=\"noopener noreferrer\">Foundry Local<\/a>,\u00a0<a href=\"https:\/\/docs.docker.com\/ai\/model-runner\/\" target=\"_blank\" rel=\"noopener noreferrer\">Docker Model Runner<\/a>, and more. Most popular AI\/Agent frameworks including\u00a0<a href=\"https:\/\/python.langchain.com\/docs\/how_to\/local_llms\/\" target=\"_blank\" rel=\"noopener noreferrer\">LangChain<\/a>\u00a0and\u00a0<a href=\"https:\/\/langchain-ai.github.io\/langgraph\/tutorials\/rag\/langgraph_self_rag_local\" target=\"_blank\" rel=\"noopener noreferrer\">LangGraph<\/a>\u00a0provide integration with these local model runners, making it easier to integrate them into your projects.<\/p>\n<p><center><iframe src=\"\/\/www.youtube.com\/embed\/AQ-h1JHaX7I\" width=\"560\" height=\"314\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/center><\/p>\n<h2><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#what-will-you-learn\" name=\"what-will-you-learn\"><\/a>What will you learn?<\/h2>\n<p>This blog post will illustrate how to use local LLMs with\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/gen-ai\/why-cosmos-ai\" target=\"_blank\" rel=\"noopener noreferrer\">Azure Cosmos DB as a vector database<\/a>\u00a0for retrieval-augmented generation (RAG) scenarios. It will guide you through setting up a local LLM solution, configuring Azure Cosmos DB, loading data, performing vector searches, and executing RAG queries. You can either use the\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/emulator\" target=\"_blank\" rel=\"noopener noreferrer\">Azure Cosmos DB emulator<\/a> for local development or connect to an Azure Cosmos DB account in the cloud. You will use Ollama (open-source solution) to run LLMs locally on your own machine. It lets you download, run, and interact with a variety of LLMs (like Llama 3, Mistral, and others) using simple commands \u2013 no cloud access or complex setup required.<\/p>\n<p>By the end of this blog post, you will have a working local RAG setup that leverages Ollama and Azure Cosmos DB. The sample app uses <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/gen-ai\/integrations?context=%2Fazure%2Fcosmos-db%2Fnosql%2Fcontext%2Fcontext\" target=\"_blank\" rel=\"noopener noreferrer\">LangChain integration with Azure Cosmos DB<\/a>\u00a0to perform embedding, data loading, and vector search. You can easily adapt it to other frameworks like LlamaIndex.<\/p>\n<p><div class=\"alert alert-success\">Refer to <a href=\"https:\/\/github.com\/abhirockzz\/local-llms-rag-cosmosdb\">this GitHub repository<\/a> for the sample app code.<\/div><\/p>\n<p>Alright, let&#8217;s dive in!<\/p>\n<h2><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#setup-ollama\" name=\"setup-ollama\"><\/a>Setup Ollama<\/h2>\n<p>To get started with Ollama, follow the\u00a0<a href=\"https:\/\/github.com\/ollama\/ollama?tab=readme-ov-file#ollama\" target=\"_blank\" rel=\"noopener noreferrer\">official installation guide<\/a> on GitHub to install it on your system. The installation process is straightforward across different platforms. For example, on Linux systems, you can install Ollama with a single command:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">curl -fsSL https:\/\/ollama.com\/install.sh | sh<\/code><\/pre>\n<p class=\"prettyprint language-default\">Once installed, start the Ollama service by running:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code>ollama serve\r\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">This blog post demonstrates the integration using two specific models from the Ollama library:<\/div>\n<\/div>\n<\/div>\n<ul>\n<li><strong><a href=\"https:\/\/ollama.com\/library\/mxbai-embed-large\" target=\"_blank\" rel=\"noopener noreferrer\">mxbai-embed-large<\/a><\/strong>\u00a0&#8211; A high-quality embedding model with 1024 dimensions, ideal for generating vector representations of text<\/li>\n<li><strong><a href=\"https:\/\/ollama.com\/library\/llama3:8b\" target=\"_blank\" rel=\"noopener noreferrer\">llama3<\/a><\/strong>\u00a0&#8211; The 8B parameter variant of Meta&#8217;s Llama 3, which serves as our chat model for the RAG pipeline<\/li>\n<\/ul>\n<p>Download both models using the following commands. Note that this process may take several minutes depending on your internet connection speed, as these are substantial model files:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">ollama pull mxbai-embed-large \r\nollama pull llama3:8b<\/code><\/pre>\n<\/div>\n<h3><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#something-to-keep-in-mind-\" name=\"something-to-keep-in-mind-\"><\/a>Something to keep in mind &#8230;<\/h3>\n<p>While tools like Ollama make it straightforward to run local LLMs, hardware requirements depend on the specific model and your performance expectations. Lightweight models (such as Llama 2 7B or Phi-2) can run on modern CPUs with as little as 8 GB RAM, though performance may be limited. Larger models (like Llama 3 70B or Mixtral) typically require a dedicated GPU with at least 16 GB VRAM for efficient inference.<\/p>\n<p>Ollama supports both CPU and GPU execution. On CPU-only systems, you can expect slower response times, especially with larger models or concurrent requests. Using a compatible GPU significantly accelerates inference required for demanding workloads.<\/p>\n<h2><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#setup-azure-cosmos-db\" name=\"setup-azure-cosmos-db\"><\/a>Setup Azure Cosmos DB<\/h2>\n<p>Since you&#8217;re working with local models, you&#8217;ll likely want to use the Azure Cosmos DB emulator for local development. The emulator provides a local environment that mimics the Azure Cosmos DB service, enabling you to develop and test your applications without incurring costs or requiring an internet connection.<\/p>\n<p><div class=\"alert alert-success\">Alternatively, you can use the cloud-based Azure Cosmos DB service. Simply <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/nosql\/quickstart-portal\" target=\"_blank\" rel=\"noopener noreferrer\">create an Azure Cosmos DB for NoSQL account<\/a>\u00a0and\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/nosql\/vector-search#enable-the-vector-indexing-and-search-feature\" target=\"_blank\" rel=\"noopener noreferrer\">enable the vector search feature<\/a>. Make sure to log in using\u00a0<code>az cli<\/code>\u00a0with an identity that has RBAC permissions for the account, since the application uses\u00a0<code>DefaultAzureCredential<\/code>\u00a0for authentication (not key-based authentication).<\/div><\/p>\n<p>The emulator is available as a Docker container, which is the recommended way to run it. Here are the steps to pull and start the Cosmos DB emulator. The commands shown are for Linux &#8211;\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/how-to-develop-emulator?tabs=docker-linux%2Ccsharp&amp;pivots=api-nosql#start-the-emulator\" target=\"_blank\" rel=\"noopener noreferrer\">refer to the documentation<\/a> for other platform options.<\/p>\n<p><div class=\"alert alert-success\">If you don&#8217;t have Docker installed, please refer to the\u00a0<a href=\"https:\/\/docs.docker.com\/get-docker\/\" target=\"_blank\" rel=\"noopener noreferrer\">Docker installation guide<\/a>.<\/div><\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code>docker pull mcr.microsoft.com\/cosmosdb\/linux\/azure-cosmos-emulator:latest\r\n\r\ndocker run <span class=\"nt\">--publish<\/span> 8081:8081 <span class=\"nt\">-e<\/span> <span class=\"nv\">AZURE_COSMOS_EMULATOR_PARTITION_COUNT<\/span><span class=\"o\">=<\/span>1 mcr.microsoft.com\/cosmosdb\/linux\/azure-cosmos-emulator:latest\r\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">Next, <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/cosmos-db\/how-to-develop-emulator?tabs=docker-linux%2Ccsharp&amp;pivots=api-nosql#import-the-emulators-tlsssl-certificate\" target=\"_blank\" rel=\"noopener noreferrer\">configure the emulator SSL certificate<\/a>. For example, on the Linux system I was using, I ran the following commands to download the certificate and regenerate the certificate bundle:<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">curl --insecure https:\/\/localhost:8081\/_explorer\/emulator.pem &gt; ~\/emulatorcert.crt sudo update-ca-certificates<\/code><\/pre>\n<\/div>\n<p>You should see output similar to this:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight plaintext\"><code>Updating certificates in \/etc\/ssl\/certs...\r\nrehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL\r\n1 added, 0 removed; done.\r\nRunning hooks in \/etc\/ca-certificates\/update.d...\r\ndone.<\/code><\/pre>\n<\/div>\n<h2><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#load-data-into-azure-cosmos-db\" name=\"load-data-into-azure-cosmos-db\"><\/a>Load data into Azure Cosmos DB<\/h2>\n<p>Now that both Ollama and Azure Cosmos DB are set up, it&#8217;s time to populate our vector database with some sample data. For this demonstration, we&#8217;ll use Azure Cosmos DB&#8217;s own documentation as our data source. The loader will fetch markdown content directly from the <a href=\"https:\/\/github.com\/MicrosoftDocs\/azure-databases-docs\/\">documentation repository<\/a>, specifically focusing on articles about Azure Cosmos DB <a href=\"https:\/\/raw.githubusercontent.com\/MicrosoftDocs\/azure-databases-docs\/refs\/heads\/main\/articles\/cosmos-db\/nosql\/vector-search.md\" target=\"_blank\" rel=\"noopener noreferrer\">vector search<\/a>\u00a0functionality.<\/p>\n<p>Our data loading process will read these documentation articles, generate embeddings using the\u00a0<code>mxbai-embed-large<\/code>\u00a0model, and store both the content and vector representations in Azure Cosmos DB for retrieval.<\/p>\n<p>Start by cloning the GitHub repository containing the sample application:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code>git clone https:\/\/github.com\/abhirockzz\/local-llms-rag-cosmosdb\r\n<span class=\"nb\">cd <\/span>local-llms-rag-cosmosdb<\/code><\/pre>\n<\/div>\n<p>Before running the loader application, ensure you have Python 3 installed on your system. Create a virtual environment and install the required dependencies:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">python3 -m venv .venv \r\nsource .venv\/bin\/activate \r\n\r\npip3 install -r requirements.txt<\/code><\/pre>\n<div class=\"highlight js-code-highlight\">\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">Next, configure the environment variables and execute the loading script. The example below uses the Azure Cosmos DB emulator for local development. If you prefer to use the cloud service instead, simply set the <code>COSMOS_DB_URL<\/code>\u00a0variable to your Azure Cosmos DB account URL and remove the\u00a0<code>USE_EMULATOR<\/code>\u00a0variable.<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\"># export COSMOS_DB_URL=\"https:\/\/&lt;Cosmos DB account name&gt;.documents.azure.com:443\/\" \r\n\r\nexport USE_EMULATOR=\"true\" \r\nexport DATABASE_NAME=\"rag_local_llm_db\" \r\nexport CONTAINER_NAME=\"docs\" \r\nexport EMBEDDINGS_MODEL=\"mxbai-embed-large\" \r\nexport DIMENSIONS=\"1024\" \r\n\r\npython3 load_data.py<\/code><\/pre>\n<\/div>\n<p>The script will automatically create the database and container if they don&#8217;t already exist. Once the data loading process completes successfully, you should see output similar to this:<\/p>\n<p>&nbsp;<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight plaintext\"><code>Uploading documents to Azure Cosmos DB ['https:\/\/raw.githubusercontent.com\/MicrosoftDocs\/azure-databases-docs\/refs\/heads\/main\/articles\/cosmos-db\/nosql\/vector-search.md', 'https:\/\/raw.githubusercontent.com\/MicrosoftDocs\/azure-databases-docs\/refs\/heads\/main\/articles\/cosmos-db\/nosql\/multi-tenancy-vector-search.md']\r\nUsing database: rag_local_llm_db, container: docs\r\nUsing embedding model: mxbai-embed-large with dimensions: 1024\r\nCreated instance of AzureCosmosDBNoSqlVectorSearch\r\nLoading 26 document chunks from 2 documents\r\nData loaded into Azure Cosmos DB<\/code><\/pre>\n<\/div>\n<p>To confirm that your data has been loaded successfully, you can inspect the results using the Azure Cosmos DB Data Explorer. If you&#8217;re using the emulator, navigate to\u00a0<code>https:\/\/localhost:8081\/_explorer\/index.html<\/code>\u00a0in your browser. You should see the same number of documents in your container as the number of chunks reported by the loader application.<\/p>\n<h2><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#run-vector-search-queries\" name=\"run-vector-search-queries\"><\/a>Run vector search queries<\/h2>\n<p>Now that your data is loaded, let&#8217;s test the vector search functionality. Set the same environment variables used for data loading and run the vector search script with your desired query:<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight shell\"><code><span class=\"c\"># export COSMOS_DB_URL=\"https:\/\/&lt;Cosmos DB account name&gt;.documents.azure.com:443\/\"<\/span>\r\n<span class=\"nb\">export <\/span><span class=\"nv\">USE_EMULATOR<\/span><span class=\"o\">=<\/span><span class=\"s2\">\"true\"<\/span>\r\n<span class=\"nb\">export <\/span><span class=\"nv\">DATABASE_NAME<\/span><span class=\"o\">=<\/span><span class=\"s2\">\"rag_local_llm_db\"<\/span>\r\n<span class=\"nb\">export <\/span><span class=\"nv\">CONTAINER_NAME<\/span><span class=\"o\">=<\/span><span class=\"s2\">\"docs\"<\/span>\r\n<span class=\"nb\">export <\/span><span class=\"nv\">EMBEDDINGS_MODEL<\/span><span class=\"o\">=<\/span><span class=\"s2\">\"mxbai-embed-large\"<\/span>\r\n<span class=\"nb\">export <\/span><span class=\"nv\">DIMENSIONS<\/span><span class=\"o\">=<\/span><span class=\"s2\">\"1024\"<\/span>\r\n\r\npython3 vector_search.py <span class=\"s2\">\"show me an example of a vector embedding policy\"<\/span>\r\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">The script will process your query through the embedding model and perform a similarity search against the stored document vectors. You should see output similar to the following:<\/div>\n<div><\/div>\n<div>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">Searching top 5 results for query: \"show me an example of a vector embedding policy\" \r\nUsing database: rag_local_llm_db, container: docs \r\nUsing embedding model: mxbai-embed-large with dimensions: 1024 \r\n\r\nCreated instance of AzureCosmosDBNoSqlVectorSearch Score: 0.7437641827298191 \r\n\r\nContent: \r\n\r\n``` ### A policy with two vector paths \/\/....<\/code><\/pre>\n<\/div>\n<div><\/div>\n<div class=\"highlight__panel-action js-fullscreen-code-action\">The output shows the top five results ordered by their similarity scores, with higher scores indicating better matches to your query.<\/div>\n<\/div>\n<\/div>\n<p><div class=\"alert alert-primary\">To modify the number of results returned, you can add the\u00a0<code>top_k<\/code>\u00a0argument. For example, to retrieve the top 10 results, run:\u00a0<code>python3 vector_search.py \"show me an example of a vector embedding policy\" 10<\/code><\/div><\/p>\n<h2><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#execute-retrievalaugmented-generation-rag-queries\" name=\"execute-retrievalaugmented-generation-rag-queries\"><\/a>Execute Retrieval-Augmented Generation (RAG) queries<\/h2>\n<p>Now we will put it all together with a simple chat based interface that leverages the <code>llama3<\/code>\u00a0model to generate responses based on the contextual information retrieved from Azure Cosmos DB.<\/p>\n<p>Configure the environment variables needed for the RAG application and launch the script:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\"># export COSMOS_DB_URL=\"https:\/\/&lt;Cosmos DB account name&gt;.documents.azure.com:443\/\" \r\nexport USE_EMULATOR=\"true\" \r\nexport DATABASE_NAME=\"rag_local_llm_db\" \r\nexport CONTAINER_NAME=\"docs\" \r\nexport EMBEDDINGS_MODEL=\"mxbai-embed-large\" \r\nexport DIMENSIONS=\"1024\" \r\nexport CHAT_MODEL=\"llama3\" \r\n\r\npython3 rag_chain.py<\/code><\/pre>\n<div class=\"highlight js-code-highlight\">\n<div class=\"highlight__panel js-actions-panel\">\n<div class=\"highlight__panel-action js-fullscreen-code-action\">Once the application initializes, you&#8217;ll see output confirming the RAG chain setup:<\/div>\n<div><\/div>\n<div>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">Building RAG chain. \r\nUsing model: llama3 Using database: rag_local_llm_db, container: docs \r\nUsing embedding model: mxbai-embed-large with dimensions: 1024 \r\n\r\nCreated instance of AzureCosmosDBNoSqlVectorSearch \r\n\r\nEnter your questions below. Type 'exit' to quit, 'clear' to clear chat history, 'history' to view chat history. \r\n\r\n[User]:<\/code><\/pre>\n<\/div>\n<\/div>\n<\/div>\n<p>Ask questions about the Azure Cosmos DB vector search documentation that you&#8217;ve loaded. For instance, try asking\u00a0<code>show me an example of a vector embedding policy<\/code>, and you&#8217;ll see a response like this (note that these may vary slightly for your case, even across different runs):<\/p>\n<p>&nbsp;<\/p>\n<div class=\"highlight js-code-highlight\">\n<pre class=\"highlight plaintext\"><code>\r\n\/\/...\r\n[User]: show me an example of a vector embedding policy\r\n[Assistant]: Here is an example of a vector embedding policy:\r\n\r\n{\r\n    \"vectorEmbeddings\": [\r\n        {\r\n            \"path\":\"\/vector1\",\r\n            \"dataType\":\"float32\",\r\n            \"distanceFunction\":\"cosine\",\r\n            \"dimensions\":1536\r\n        },\r\n        {\r\n            \"path\":\"\/vector2\",\r\n            \"dataType\":\"int8\",\r\n            \"distanceFunction\":\"dotproduct\",\r\n            \"dimensions\":100\r\n        }\r\n    ]\r\n}\r\n\r\nThis policy defines two vector embeddings: one with the path `\/vector1`, using `float32` data type, cosine distance function, and having 1536 dimensions; and another with the path `\/vector2`, using `int8` data type, dot product distance function, and having 100 dimensions.\r\n\r\n\r\n<\/code><\/pre>\n<div class=\"highlight__panel js-actions-panel\">\n<div><\/div>\n<div class=\"highlight__panel-action js-fullscreen-code-action\">To further explore the capabilities of your RAG system, try these additional example queries:<\/div>\n<\/div>\n<\/div>\n<ul>\n<li>&#8220;What is the maximum supported dimension for vector embeddings in Azure Cosmos DB?&#8221;<\/li>\n<li>&#8220;Is it suitable for large scale data?&#8221;<\/li>\n<li>&#8220;Is there a benefit to using the flat index type?&#8221;<\/li>\n<\/ul>\n<p><div class=\"alert alert-success\">You can enter &#8216;exit&#8217; to quit the application, &#8216;clear&#8217; to clear chat history, or &#8216;history&#8217; to view your previous interactions. Feel free to experiment with different data sources and queries. To modify the number of vector search results used as context, you can add the\u00a0<code>TOP_K<\/code>\u00a0environment variable (defaults to 5).<\/div><\/p>\n<h2><a href=\"https:\/\/dev.to\/abhirockzz\/build-a-rag-application-with-langchain-and-local-llms-powered-by-ollama-3c22-temp-slug-2593139?preview=b92caf318131d98e46eb42ac1eca8ad527e4f87d70c675fc6b75cce3e24972ef9d6048d1113c61129b44a90b9d965b72a814c4b36abc55eee51f3bdc#wrap-up\" name=\"wrap-up\"><\/a>Wrap up<\/h2>\n<p>In this walkthrough, you followed step-by-step instructions to set up a complete RAG application that runs entirely on your local infrastructure \u2014\u00a0 installing and configuring Ollama with embedding and chat models, loading documentation data, and using RAG through an interactive chat interface.<\/p>\n<p>Running models locally brings clear advantages in terms of costs, data privacy, and connectivity constraints. However, you need to plan for appropriate hardware, particularly for larger models that perform best with dedicated GPUs and sufficient memory. The trade-off between model size, performance, and resource requirements is crucial when planning your local AI setup.<\/p>\n<p>Have you experimented with local LLMs in your projects? What challenges or benefits have you encountered when moving from cloud-based to local AI solutions? Perhaps you have used both approaches? Share your experience and feedback!<\/p>\n<div class=\"crayons-article__main \">\n<div class=\"js-billboard-container body-billboard-container\" data-async-url=\"\/abhirockzz\/scaling-multi-tenant-go-applications-choosing-the-right-database-partitioning-approach-2amd\/bmar11\/post_body_bottom\">\n<div>\n<h2>About Azure Cosmos DB<\/h2>\n<article id=\"post-10622\" class=\"middle-column pe-xl-198\" data-clarity-region=\"article\">\n<div class=\"entry-content sharepostcontent \" data-bi-area=\"body_article\" data-bi-id=\"post_page_body_article\">\n<p>Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. To stay in the loop on Azure Cosmos DB updates, follow us on\u00a0<a href=\"https:\/\/twitter.com\/AzureCosmosDB\" target=\"_blank\" rel=\"noopener\">X<\/a>,\u00a0<a href=\"https:\/\/aka.ms\/AzureCosmosDBYouTube\" target=\"_blank\" rel=\"noopener\">YouTube<\/a>, and\u00a0<a href=\"https:\/\/www.linkedin.com\/company\/azure-cosmos-db\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a>.<\/p>\n<p>To easily build your first database, watch our\u00a0<a href=\"https:\/\/youtube.com\/playlist?list=PLmamF3YkHLoLLGUtSoxmUkORcWaTyHlXp\" target=\"_blank\" rel=\"noopener\">Get Started videos<\/a>\u00a0on YouTube and explore ways to\u00a0<a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/optimize-dev-test\" target=\"_blank\" rel=\"noopener\">dev\/test free.<\/a><\/p>\n<\/div>\n<\/article>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Local large language models (LLMs) provide significant advantages for developers and organizations. Key benefits include enhanced data privacy, as sensitive information remains entirely within your own infrastructure, and offline functionality, enabling uninterrupted work even without internet access. While cloud-based LLM services are convenient, running models locally gives you full control over model behavior, performance tuning, [&hellip;]<\/p>\n","protected":false},"author":181737,"featured_media":10785,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14],"tags":[499,1312,1866],"class_list":["post-10775","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-core-sql-api","tag-azure-cosmos-db","tag-python","tag-vector-database"],"acf":[],"blog_post_summary":"<p>Local large language models (LLMs) provide significant advantages for developers and organizations. Key benefits include enhanced data privacy, as sensitive information remains entirely within your own infrastructure, and offline functionality, enabling uninterrupted work even without internet access. While cloud-based LLM services are convenient, running models locally gives you full control over model behavior, performance tuning, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/10775","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/181737"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=10775"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/10775\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/10785"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=10775"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=10775"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=10775"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}