{"id":2092,"date":"2026-04-09T12:00:00","date_gmt":"2026-04-09T19:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=2092"},"modified":"2026-04-09T11:04:30","modified_gmt":"2026-04-09T18:04:30","slug":"foundry-local-ga","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/foundry-local-ga\/","title":{"rendered":"Foundry Local is now Generally Available"},"content":{"rendered":"<p>Today we\u2019re thrilled to announce the <strong>General Availability (GA) of Foundry Local<\/strong> \u2014 Microsoft\u2019s cross-platform local AI solution that lets developers bring <strong>AI directly into their applications<\/strong> across modalities like <strong>chat<\/strong> and <strong>audio<\/strong>, with no cloud dependency, no network latency, and no per-token costs.<\/p>\n<p>Whether you&#8217;re building a desktop assistant, a healthcare decision-support tool, a private coding companion, or an offline-capable edge application, Foundry Local gives you production-grade AI that runs entirely on the user&#8217;s machine.<\/p>\n<h2>What is Foundry Local?<\/h2>\n<p>Microsoft Foundry spans cloud to edge \u2014 from Microsoft Foundry in the cloud for frontier models, agents, and fine-tuning, to Foundry Local for on-premises and distributed deployments validated on Azure Local, to Foundry Local running natively across Windows, MacOS, Android, and other devices including phones, laptops, and desktops.<\/p>\n<p>Foundry Local is an end-to-end local AI solution in a compact package that is small\u00a0enough to bundle directly inside your application installer without meaningfully impacting download size. The small size with zero dependencies lets you ship a fully self-contained AI-powered app the same way you would ship any other desktop or edge application, and keeps your CI\/CD artifacts lean.<\/p>\n<h3>How Foundry\u00a0Local works<\/h3>\n<p>The following high-level architecture diagram articulates the components of the Foundry Local stack:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/04\/foundry-local-architecture.webp\"><img decoding=\"async\" class=\"alignnone wp-image-2091\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/04\/foundry-local-architecture-300x158.webp\" alt=\"foundry local architecture png image\" width=\"720\" height=\"379\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/04\/foundry-local-architecture-300x158.webp 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/04\/foundry-local-architecture.webp 750w\" sizes=\"(max-width: 720px) 100vw, 720px\" \/><\/a><\/p>\n<p>First, you install the Foundry Local SDK in your application code:<\/p>\n<pre><code class=\"language-bash\">npm install foundry-local-sdk # JavaScript\r\npip install foundry-local-sdk # Python\r\ndotnet add package Microsoft.AI.Foundry.Local # C#\r\ncargo add foundry-local-sdk # Rust<\/code><\/pre>\n<p>When you install the SDK, the <strong>Foundry Local Core<\/strong> and <a href=\"https:\/\/onnxruntime.ai\/\">ONNX Runtime<\/a> binaries are automatically downloaded and bundled into your application during build as a dependency. The Foundry Local SDKs (Python, JavaScript, C#, Rust) are thin wrappers that call into the Foundry Local Core native library, which is the main runtime managing the model lifecycle (download, load into device memory, inference management, and unload). Foundry Local integrates with Foundry Catalog to download models on first run that are optimized for the device hardware, which is done intelligently so that end users will get the most performant model for their hardware. On subsequent runs, the model is loaded from local cache on the user&#8217;s device.<\/p>\n<p>Foundry Local is <strong>cross platform: Windows, Linux and macOS<\/strong>. On Windows it integrates with <a href=\"https:\/\/learn.microsoft.com\/windows\/ai\/new-windows-ml\/overview\">Windows\u00a0ML (WinML)<\/a> for inferencing on Windows and to acquire hardware matched execution provider plugins from the OS \/ Windows update, which ensures driver compatibility and version negotiation. Your end users do not need to be concerned with installing device drivers to get the most optimal performance. On macOS Foundry Local runs natively on the Apple Silicon GPU through\u00a0<a href=\"https:\/\/developer.apple.com\/metal\/\">Metal<\/a>.<\/p>\n<p>The Foundry Local inference APIs support the OpenAI request\/response format for chat completions, audio transcription, and the <a href=\"https:\/\/www.openresponses.org\/\">Open Responses API<\/a> format allowing you to seamless switch between cloud and on-device inference without the overhead and complexity of spinning up a local HTTP webserver. However, if your scenario demands an OpenAI compliant HTTP webserver to make REST calls, Foundry Local has you covered; you can configure an optional webserver on initialization.<\/p>\n<h3>Foundry Local capabilities<\/h3>\n<ul>\n<li>Ship AI features with <strong>zero user setup<\/strong> \u2014 self-contained, no external dependencies like CLI or 3rd party app install for end users.<\/li>\n<li>Combine <strong>speech-to-text, tool calling, and chat in a single, unified SDK<\/strong> \u2014 no need to juggle multiple SDKs.<\/li>\n<li><strong>Automatic hardware acceleration<\/strong> \u2014 GPU, NPU, or CPU fallback with zero detection code required.<\/li>\n<li><strong>Stream responses token-by-token<\/strong> for real-time UX.<\/li>\n<li><strong>Works offline<\/strong> \u2014 user data never leaves the device, responses start with zero network latency.<\/li>\n<li><strong>Multi-language SDK support<\/strong> (C#, Python, JavaScript, Rust).<\/li>\n<li><strong>Resumable download<\/strong>\u00a0should your users lose connection or close your app, the model will resume downloading from where it left off.<\/li>\n<li><strong>Curated optimized local models<\/strong> (via Foundry Model Catalog) with support for:\n<ul>\n<li>GPT OSS<\/li>\n<li>Qwen Family<\/li>\n<li>Whisper<\/li>\n<li>Deepseek<\/li>\n<li>Mistral<\/li>\n<li>Phi<\/li>\n<\/ul>\n<\/li>\n<li><strong>Cross-Platform Support<\/strong> &#8211; Windows, macOS (Apple Silicon) and Linux x64.<\/li>\n<li><strong>(Optional) OpenAI compatible HTTP endpoint<\/strong>.<\/li>\n<\/ul>\n<h2>What&#8217;s next<\/h2>\n<p>General availability is a milestone, not a finish line. Here&#8217;s what&#8217;s ahead:<\/p>\n<ul>\n<li><strong>Foundry Local powered by Azure Local<\/strong> brings models and agentic AI \u2014 including RAG, chat \u2014 to customer-owned distributed infrastructure. This is in preview now, with more coming soon.<\/li>\n<li><strong>Expanded model catalog<\/strong> \u2014 More models across more domains, with community contributions<\/li>\n<li><strong>Real-time audio transcription<\/strong> \u2014 Transcribe in real-time from a microphone. Ideal for live captioning scenarios.<\/li>\n<li><strong>Enhanced hardware support<\/strong> \u2014 Broader NPU and GPU coverage as the silicon landscape evolves<\/li>\n<li><strong>Enhanced shared cache<\/strong> \u2014 Enhancements to allow models to be shared between applications.<\/li>\n<\/ul>\n<h2>Get Started Today<\/h2>\n<p>Build your first app with Foundry Local, by installing the Python SDK (other languages available):<\/p>\n<pre><code class=\"language-bash\"># Windows (recommended for hardware acceleration)\r\npip install foundry-local-sdk-winml\r\n# macOS\/Linux\r\npip install foundry-local-sdk<\/code><\/pre>\n<p>Then run the following code:<\/p>\n<pre><code class=\"language-python\">from foundry_local_sdk import Configuration, FoundryLocalManager\r\n\r\n# Initialize Foundry Local \r\nconfig = Configuration(app_name=\"foundry_local_samples\")\r\nFoundryLocalManager.initialize(config)\r\nmanager = FoundryLocalManager.instance \r\n\r\n# Download and load a model from the catalog\r\nmodel = manager.catalog.get_model(\"qwen2.5-0.5b\")\r\nmodel.download()\r\nmodel.load()\r\n\r\n# Get a chat client\r\nclient = model.get_chat_client()\r\n\r\n# Create and send message in OpenAI format\r\nmessages = [ {\"role\": \"user\", \"content\": \"What is the golden ratio?\"} ] \r\nresponse = client.complete_chat(messages)\r\n\r\n# Response in OpenAI format\r\nprint(f\"Response: {response.choices[0].message.content}\") \r\n\r\n# Unload the model from memory\r\nmodel.unload()<\/code><\/pre>\n<h2>Learn more<\/h2>\n<ul>\n<li>Samples for each language are provided in the <a href=\"https:\/\/github.com\/microsoft\/Foundry-Local\">Foundry Local GitHub repository<\/a>.<\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/foundry-local\/\">Foundry Local documentation<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Ship local AI to millions of devices &#8211; fast, private on-device inference with no per-token costs.<\/p>\n","protected":false},"author":189734,"featured_media":1563,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[32,31,33,35,117],"class_list":["post-2092","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-foundry","tag-ai","tag-azure","tag-foundry","tag-local-ai","tag-on-device"],"acf":[],"blog_post_summary":"<p>Ship local AI to millions of devices &#8211; fast, private on-device inference with no per-token costs.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2092","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/189734"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=2092"}],"version-history":[{"count":2,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2092\/revisions"}],"predecessor-version":[{"id":2097,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2092\/revisions\/2097"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/1563"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=2092"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=2092"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=2092"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}