{"id":2219,"date":"2026-05-12T09:51:39","date_gmt":"2026-05-12T16:51:39","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=2219"},"modified":"2026-05-12T09:51:39","modified_gmt":"2026-05-12T16:51:39","slug":"foundry-local-v1-1","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/foundry-local-v1-1\/","title":{"rendered":"Foundry Local 1.1: Live Transcription, Embeddings, and Responses API"},"content":{"rendered":"<p>Today we&#8217;re announcing the <strong>1.1.0 release of Foundry Local<\/strong> \u2014 Microsoft&#8217;s cross-platform local AI solution that lets developers bring <strong>AI directly into their applications<\/strong> with no cloud dependency, no network latency, and no per-token costs.<\/p>\n<p>This release adds the following:<\/p>\n<ul>\n<li><strong>Live audio transcription<\/strong> for real-time speech-to-text scenarios like captioning, voice UIs, and meeting transcription.<\/li>\n<li><strong>Text embeddings<\/strong> for semantic search, RAG, clustering, and similarity matching use cases.<\/li>\n<li><strong>Responses API<\/strong> support for structured agentic interactions, including tool calling and multimodal vision-language input.<\/li>\n<li><strong>WebGPU execution provider plugin<\/strong> delivered separately to reduce the default package size for applications that don&#8217;t need it.<\/li>\n<li><strong>Reduced JavaScript package size<\/strong> by replacing the koffi FFI layer with a custom Node-API C addon.<\/li>\n<li><strong>Broader .NET compatibility<\/strong> by targeting lower framework versions in the C# SDK.<\/li>\n<\/ul>\n<h2>What&#8217;s new<\/h2>\n<h3>Live Transcription API<\/h3>\n<p>Foundry Local now supports <strong>real-time speech-to-text streaming<\/strong> directly from a microphone \u2014 ideal for live captioning, voice-driven UIs, meeting transcription, and accessibility scenarios. The new Live Transcription API lets you push raw PCM audio chunks and receive transcription results as they arrive, with clear <code>is_final<\/code> markers distinguishing interim from finalized text.<\/p>\n<p>The API is built around a simple session-based pattern available across all SDK language bindings (JavaScript, C#, Python, Rust):<\/p>\n<ol>\n<li>Load a streaming speech model from the catalog<\/li>\n<li>Create a live transcription session with audio settings (sample rate, channels, language)<\/li>\n<li>Start the session and begin appending audio data<\/li>\n<li>Consume transcription results via an async stream<\/li>\n<\/ol>\n<h4>Example usage<\/h4>\n<p><div class=\"alert alert-primary\">Throughout this article, the examples are shown using the Python SDK language binding. However, in all examples, JavaScript, Rust, and C# bindings are also available. See the Foundry Local samples on GitHub.<\/div><\/p>\n<pre><code class=\"language-python\">\"\"\"\r\nLive microphone transcription using Foundry Local.\r\n\r\nThis script loads a streaming speech model, captures audio from the\r\nmicrophone via PyAudio, and prints transcription results in real time.\r\n\r\nRequirements:\r\n    pip install foundry-local-sdk pyaudio\r\n\"\"\"\r\n\r\nimport threading\r\nimport pyaudio\r\nfrom foundry_local_sdk import Configuration, FoundryLocalManager\r\n\r\n# ---------------------------------------------------------------------------\r\n# 1. Initialize Foundry Local\r\n# ---------------------------------------------------------------------------\r\n\r\nconfig = Configuration(app_name=\"foundry_local_samples\")\r\nFoundryLocalManager.initialize(config)\r\nmanager = FoundryLocalManager.instance\r\n\r\n# ---------------------------------------------------------------------------\r\n# 2. Download and load the streaming speech model\r\n# ---------------------------------------------------------------------------\r\n\r\nmodel = manager.catalog.get_model(\"nemotron-speech-streaming-en-0.6b\")\r\n\r\nif not model.is_cached:\r\n    print(\"Downloading model...\")\r\n    model.download(\r\n        lambda progress: print(f\"\\r  Progress: {progress:.1f}%\", end=\"\", flush=True)\r\n    )\r\n    print(\"\\n  Download complete.\")\r\n\r\nmodel.load()\r\n\r\n# ---------------------------------------------------------------------------\r\n# 3. Create a live transcription session\r\n# ---------------------------------------------------------------------------\r\n\r\naudio_client = model.get_audio_client()\r\nsession = audio_client.create_live_transcription_session()\r\nsession.settings.sample_rate = 16000\r\nsession.settings.channels = 1\r\nsession.settings.language = \"en\"\r\n\r\nsession.start()\r\n\r\n# ---------------------------------------------------------------------------\r\n# 4. Read transcription results in a background thread\r\n# ---------------------------------------------------------------------------\r\n\r\ndef read_results():\r\n    for result in session.get_stream():\r\n        text = result.content[0].text if result.content else \"\"\r\n        if result.is_final:\r\n            print(f\"\\n  [FINAL] {text}\")\r\n        elif text:\r\n            print(text, end=\"\", flush=True)\r\n\r\nread_thread = threading.Thread(target=read_results, daemon=True)\r\nread_thread.start()\r\n\r\n# ---------------------------------------------------------------------------\r\n# 5. Capture microphone audio and feed it to the session\r\n# ---------------------------------------------------------------------------\r\n\r\nRATE, CHANNELS, CHUNK = 16000, 1, 480  # 30 ms frames\r\npa = pyaudio.PyAudio()\r\nstream = pa.open(\r\n    format=pyaudio.paInt16,\r\n    channels=CHANNELS,\r\n    rate=RATE,\r\n    input=True,\r\n    frames_per_buffer=CHUNK,\r\n)\r\n\r\nprint(\"Speak into your microphone. Press Ctrl+C to stop.\\n\")\r\ntry:\r\n    while True:\r\n        pcm_data = stream.read(CHUNK, exception_on_overflow=False)\r\n        session.append(pcm_data)\r\nexcept KeyboardInterrupt:\r\n    print(\"\\nStopping...\")\r\n\r\n# ---------------------------------------------------------------------------\r\n# 6. Cleanup\r\n# ---------------------------------------------------------------------------\r\n\r\nstream.close()\r\npa.terminate()\r\nsession.stop()\r\nread_thread.join(timeout=5)\r\nmodel.unload()<\/code><\/pre>\n<h4>Optimized for on-device streaming ASR<\/h4>\n<p>To identify the best model for real-time on-device transcription, we conducted a systematic empirical study across over 50 configurations spanning encoder-decoder, transducer, and LLM-based ASR architectures \u2014 including OpenAI Whisper, NVIDIA Nemotron, Parakeet TDT, Canary, Conformer Transducer, and Qwen3-ASR \u2014 evaluated across batch, chunked, and streaming inference modes.<\/p>\n<p>From this study, we identified <strong>NVIDIA&#8217;s Nemotron Speech Streaming<\/strong> as the strongest candidate for real-time English streaming on resource-constrained hardware. We then re-implemented the complete streaming inference pipeline in ONNX Runtime and applied multiple post-training quantization strategies \u2014 including importance-weighted k-quant, mixed-precision schemes, and round-to-nearest quantization \u2014 combined with graph-level operator fusion. These optimizations <strong>reduced the model from 2.47 GB to as little as 0.67 GB<\/strong> while maintaining word error rate (WER) within 1% absolute of the full-precision PyTorch baseline.<\/p>\n<p>Our recommended configuration, the <strong>int4 k-quant variant<\/strong>, achieves <strong>8.20% average streaming WER<\/strong> across eight standard benchmarks, running comfortably <strong>faster than real-time on CPU<\/strong> with 0.56s algorithmic latency \u2014 establishing a new quality-efficiency Pareto point for on-device streaming ASR.<\/p>\n<p>The model is available in the Foundry catalog as <code>nemotron-speech-streaming-en-0.6b<\/code>.<\/p>\n<p>For the full methodology and benchmark results, see our paper: <a href=\"https:\/\/arxiv.org\/abs\/2604.14493\">Pushing the Limits of On-Device Streaming ASR: A Compact, High-Accuracy English Model for Low-Latency Inference<\/a> (arXiv:2604.14493).<\/p>\n<h3>Embeddings API for semantic search scenarios<\/h3>\n<p>Foundry Local now supports <strong>text embedding generation<\/strong> across all four SDKs (C#, JavaScript, Python, and Rust). Embeddings unlock a wide range of local AI scenarios including <strong>semantic search<\/strong>, <strong>RAG (retrieval-augmented generation)<\/strong>, <strong>clustering<\/strong>, and <strong>similarity matching<\/strong> \u2014 all running entirely on-device.<\/p>\n<p>The Embeddings API supports both <strong>single and batch input<\/strong>, with configurable dimensions and encoding format. Responses follow the OpenAI embeddings format for seamless cloud-to-edge portability.<\/p>\n<h4>Example usage<\/h4>\n<p>The following example pairs Foundry Local embeddings with <a href=\"https:\/\/www.trychroma.com\/products\/chromadb\">ChromaDB<\/a> to build a fully local semantic search pipeline \u2014 documents are embedded and indexed in-memory, then natural-language queries are matched to the most relevant results.<\/p>\n<pre><code class=\"language-python\">\"\"\"\r\nSemantic search using Foundry Local embeddings and ChromaDB.\r\n\r\nThis script loads an embedding model locally, indexes a set of documents\r\ninto an in-memory ChromaDB collection, and performs natural-language\r\nsemantic queries against them \u2014 all running on-device.\r\n\r\nRequirements:\r\n    pip install foundry-local-sdk chromadb\r\n\"\"\"\r\n\r\nimport chromadb\r\n\r\nfrom foundry_local_sdk import Configuration, FoundryLocalManager\r\n\r\n# ---------------------------------------------------------------------------\r\n# 1. Initialize Foundry Local\r\n# ---------------------------------------------------------------------------\r\n\r\nconfig = Configuration(app_name=\"foundry_local_samples\")\r\nFoundryLocalManager.initialize(config)\r\nmanager = FoundryLocalManager.instance\r\n\r\n# ---------------------------------------------------------------------------\r\n# 2. Enable additional hardware acceleration for end users\r\n# ---------------------------------------------------------------------------\r\n\r\nmanager.download_and_register_eps(\r\n    progress_callback=lambda ep, progress: print(\r\n        f\"\\r  Downloading EP '{ep}': {progress:.1f}%\", end=\"\", flush=True\r\n    )\r\n)\r\nprint(\"\\n  EP registration complete.\\n\")\r\n\r\nprint(\"Available EPs:\")\r\nfor ep in manager.discover_eps():\r\n    print(f\"  {ep.name} (registered: {ep.is_registered})\")\r\nprint()\r\n\r\n# ---------------------------------------------------------------------------\r\n# 3. Download and load an embedding model\r\n# ---------------------------------------------------------------------------\r\n\r\nmodel = manager.catalog.get_model(\"qwen3-embedding-0.6b\")\r\n\r\nif not model.is_cached:\r\n    print(\"Downloading model...\")\r\n    model.download(\r\n        lambda progress: print(f\"\\r  Progress: {progress:.1f}%\", end=\"\", flush=True)\r\n    )\r\n    print(\"\\n  Download complete.\")\r\n\r\nmodel.load()\r\n\r\nclient = model.get_embedding_client()\r\n\r\n# ---------------------------------------------------------------------------\r\n# 4. Build a knowledge base\r\n# ---------------------------------------------------------------------------\r\n\r\ndocuments = [\r\n    \"Python is a high-level programming language known for its readability and versatility.\",\r\n    \"Rust is a systems programming language focused on safety, speed, and concurrency.\",\r\n    \"Machine learning is a subset of artificial intelligence that learns from data.\",\r\n    \"The capital of France is Paris, known for the Eiffel Tower.\",\r\n    \"Docker containers package applications with their dependencies for consistent deployment.\",\r\n    \"PostgreSQL is a powerful open-source relational database system.\",\r\n    \"Neural networks are computing systems inspired by biological brain structures.\",\r\n    \"Kubernetes orchestrates containerized workloads across clusters of machines.\",\r\n    \"The Python GIL limits true multi-threading for CPU-bound tasks.\",\r\n    \"Vector databases store and search high-dimensional embeddings efficiently.\",\r\n]\r\n\r\nprint(\"Generating embeddings for knowledge base...\")\r\nbatch_response = client.generate_embeddings(documents)\r\nembeddings = [item.embedding for item in batch_response.data]\r\nprint(f\"Indexed {len(embeddings)} documents ({len(embeddings[0])} dimensions each)\")\r\n\r\n# ---------------------------------------------------------------------------\r\n# 5. Store embeddings in ChromaDB\r\n# ---------------------------------------------------------------------------\r\n\r\nchroma = chromadb.Client()\r\ncollection = chroma.create_collection(\r\n    name=\"knowledge_base\", metadata={\"hnsw:space\": \"cosine\"}\r\n)\r\ncollection.add(\r\n    ids=[f\"doc-{i}\" for i in range(len(documents))],\r\n    embeddings=embeddings,\r\n    documents=documents,\r\n)\r\n\r\n# ---------------------------------------------------------------------------\r\n# 6. Semantic search\r\n# ---------------------------------------------------------------------------\r\n\r\nqueries = [\r\n    \"What programming language is good for beginners?\",\r\n    \"How do I deploy applications in production?\",\r\n    \"Tell me about AI and deep learning\",\r\n]\r\n\r\nfor query in queries:\r\n    query_embedding = client.generate_embedding(query).data[0].embedding\r\n    results = collection.query(query_embeddings=[query_embedding], n_results=3)\r\n\r\n    print(f'\\n\ud83d\udd0d Query: \"{query}\"')\r\n    for doc, distance in zip(results[\"documents\"][0], results[\"distances\"][0]):\r\n        print(f\"   [{1 - distance:.3f}] {doc}\")\r\n\r\n# ---------------------------------------------------------------------------\r\n# 7. Cleanup\r\n# ---------------------------------------------------------------------------\r\n\r\nmodel.unload()<\/code><\/pre>\n<h3>Responses API<\/h3>\n<p>Foundry Local now includes an <a href=\"https:\/\/www.openresponses.org\/\">Open Responses API<\/a> client, bringing structured agentic AI capabilities to on-device inference. The Responses API provides a higher-level abstraction over chat completions with built-in support for:<\/p>\n<ul>\n<li><strong>Streaming<\/strong> \u2014 token-by-token server-sent events<\/li>\n<li><strong>Multi-turn conversations<\/strong> \u2014 chain responses with <code>previous_response_id<\/code><\/li>\n<li><strong>Tool calling<\/strong> \u2014 define function tools and handle tool call\/result round-trips<\/li>\n<li><strong>Vision<\/strong> \u2014 pass images alongside text input (model-dependent)<\/li>\n<\/ul>\n<h4>Example usage<\/h4>\n<p>With the Foundry Local 1.1 release we&#8217;ve also added <strong>Qwen3.5 VLM<\/strong> to the model catalog \u2014 a natively multimodal vision-language model that can reason over images and text together. Smaller variants (3B, 7B) are optimized for on-device inference, making it practical to run vision tasks locally without cloud dependencies.<\/p>\n<p>This enables scenarios like document understanding, diagram analysis, UI screenshot interpretation, and visual question answering \u2014 all running entirely on-device. For example, the following code streams a description of an image from the Qwen3.5 VLM using the Responses API:<\/p>\n<pre><code class=\"language-python\">\"\"\"\r\nImage description using Foundry Local and the OpenAI Responses API.\r\n\r\nThis script loads a vision-language model locally via Foundry Local,\r\nstarts the built-in web service, and uses the OpenAI SDK's Responses API\r\nto stream a description of an image \u2014 all running on-device.\r\n\r\nRequirements:\r\n    pip install foundry-local-sdk openai Pillow\r\n\"\"\"\r\n\r\nimport base64\r\nimport io\r\nimport urllib.request\r\n\r\nfrom PIL import Image\r\nfrom openai import OpenAI\r\n\r\nfrom foundry_local_sdk import Configuration, FoundryLocalManager\r\n\r\n# ---------------------------------------------------------------------------\r\n# 1. Initialize Foundry Local\r\n# ---------------------------------------------------------------------------\r\n\r\nconfig = Configuration(app_name=\"foundry_local_samples\")\r\nFoundryLocalManager.initialize(config)\r\nmanager = FoundryLocalManager.instance\r\n\r\n# ---------------------------------------------------------------------------\r\n# 2. Enable additional hardware acceleration for end users\r\n# ---------------------------------------------------------------------------\r\n\r\nmanager.download_and_register_eps(\r\n    progress_callback=lambda ep, progress: print(\r\n        f\"\\r  Downloading EP '{ep}': {progress:.1f}%\", end=\"\", flush=True\r\n    )\r\n)\r\nprint(\"\\n  EP registration complete.\\n\")\r\n\r\nprint(\"Available EPs:\")\r\nfor ep in manager.discover_eps():\r\n    print(f\"  {ep.name} (registered: {ep.is_registered})\")\r\nprint()\r\n\r\n# ---------------------------------------------------------------------------\r\n# 3. Download and load the vision model\r\n# ---------------------------------------------------------------------------\r\n\r\nmodel = manager.catalog.get_model(\"qwen3-vl-2b-instruct\")\r\n\r\nif not model.is_cached:\r\n    print(\"Downloading model...\")\r\n    model.download(\r\n        lambda progress: print(f\"\\r  Progress: {progress:.1f}%\", end=\"\", flush=True)\r\n    )\r\n    print(\"\\n  Download complete.\")\r\n\r\nprint(\"Loading model...\")\r\nmodel.load()\r\nprint(\"Model ready.\\n\")\r\n\r\n# ---------------------------------------------------------------------------\r\n# 4. Start the Foundry Local web service\r\n# ---------------------------------------------------------------------------\r\n\r\nmanager.start_web_service()\r\nbase_url = manager.urls[0].rstrip(\"\/\") + \"\/v1\"\r\nclient = OpenAI(base_url=base_url, api_key=\"notneeded\")\r\n\r\n# ---------------------------------------------------------------------------\r\n# 5. Prepare the image\r\n# ---------------------------------------------------------------------------\r\n\r\nimage_url = (\r\n    \"https:\/\/github.com\/microsoft\/Foundry-Local\/blob\/main\/\"\r\n    \"samples\/python\/web-server-responses-vision\/src\/test_image.jpg?raw=true\"\r\n)\r\n\r\nprint(f\"Fetching image: {image_url}\")\r\nwith urllib.request.urlopen(image_url) as resp:\r\n    img = Image.open(io.BytesIO(resp.read()))\r\n\r\nimg.thumbnail((512, 512))\r\nbuf = io.BytesIO()\r\nimg.save(buf, format=\"JPEG\")\r\nimage_b64 = base64.b64encode(buf.getvalue()).decode()\r\n\r\n# ---------------------------------------------------------------------------\r\n# 6. Call the Responses API with vision input\r\n# ---------------------------------------------------------------------------\r\n\r\nvision_input = [\r\n    {\r\n        \"type\": \"message\",\r\n        \"role\": \"user\",\r\n        \"content\": [\r\n            {\"type\": \"input_text\", \"text\": \"Describe what you see in this image.\"},\r\n            {\r\n                \"type\": \"input_image\",\r\n                \"image_data\": image_b64,\r\n                \"media_type\": \"image\/jpeg\",\r\n            },\r\n        ],\r\n    }\r\n]\r\n\r\nprint(\"Streaming response:\\n\")\r\nstream = client.responses.create(\r\n    model=model.id,\r\n    input=\"placeholder\",\r\n    extra_body={\"input\": vision_input},\r\n    stream=True,\r\n)\r\n\r\nfor event in stream:\r\n    if getattr(event, \"type\", None) == \"response.output_text.delta\":\r\n        print(getattr(event, \"delta\", \"\"), end=\"\", flush=True)\r\nprint(\"\\n\")\r\n\r\n# ---------------------------------------------------------------------------\r\n# 7. Cleanup\r\n# ---------------------------------------------------------------------------\r\n\r\nclient.close()\r\nmanager.stop_web_service()\r\nmodel.unload()\r\nprint(\"Done.\")<\/code><\/pre>\n<h3>WebGPU Execution Provider Plugin<\/h3>\n<p>The WebGPU execution provider is now delivered as a <strong>separate plugin<\/strong> rather than being bundled with the Windows ONNX Runtime package. This change <strong>reduces the default package size<\/strong> for applications that don&#8217;t need WebGPU, while keeping it available as an on-demand plugin for scenarios that require it. The plugin is automatically acquired via the standard execution provider download mechanism \u2014 no changes are needed in your application code.<\/p>\n<h3>.NET SDK: Broader Compatibility<\/h3>\n<p>The C# SDK packages now target lower framework versions, broadening compatibility for applications that haven&#8217;t yet upgraded to the latest .NET runtime:<\/p>\n<ul>\n<li><strong><code>Microsoft.AI.Foundry.Local<\/code><\/strong> now targets <strong><code>netstandard2.0<\/code><\/strong> (previously <code>net9.0<\/code>) \u2014 compatible with .NET Framework 4.6.1+, .NET Core 2.0+, Mono, Xamarin, and Unity. This makes it straightforward to add local AI capabilities to existing .NET applications regardless of runtime version.<\/li>\n<li><strong><code>Microsoft.AI.Foundry.Local.WinML<\/code><\/strong> now targets <strong><code>net8.0<\/code><\/strong> (previously <code>net9.0<\/code>) \u2014 providing Windows hardware acceleration via GPU\/NPU execution providers while maintaining broad compatibility across modern .NET LTS runtimes.<\/li>\n<\/ul>\n<h3>Reduced JavaScript package size<\/h3>\n<p>The JavaScript SDK&#8217;s native interop layer has been rewritten from <a href=\"https:\/\/koffi.dev\/\">koffi<\/a> (a runtime FFI library) to a purpose-built <strong>Node-API C addon<\/strong> with prebuilt binaries shipped per platform. This removes the large <code>koffi<\/code> dependency from the package while keeping the SDK&#8217;s public API surface unchanged.<\/p>\n<p>The benefits include:<\/p>\n<ul>\n<li><strong>~27 MB smaller install footprint<\/strong> \u2014 eliminates the koffi transitive dependency tree<\/li>\n<li><strong>Faster load times<\/strong> \u2014 prebuilt <code>.node<\/code> binaries load directly without runtime FFI setup<\/li>\n<li><strong>Better stability<\/strong> \u2014 Node-API provides a stable ABI across Node.js versions, avoiding breakage on engine upgrades<\/li>\n<li><strong>No native compilation required<\/strong> \u2014 prebuilt addons for each platform (Windows, macOS, Linux) ship with the npm package, so <code>npm install<\/code> just works without a C toolchain<\/li>\n<\/ul>\n<h2>Get Started<\/h2>\n<p>Update to Foundry Local 1.1.0 by installing the latest SDK for your language:<\/p>\n<pre><code class=\"language-bash\"># Python\r\npip install foundry-local-sdk --upgrade # macOS\/Linux\r\npip install foundry-local-sdk-winml --upgrade # Windows\r\n\r\n# JavaScript\r\nnpm install foundry-local-sdk@latest # macOS\/Linux\r\nnpm install foundry-local-sdk-winml@latest # Windows\r\n\r\n# C#\r\ndotnet add package Microsoft.AI.Foundry.Local # macOS\/Linux\r\ndotnet add package Microsoft.AI.Foundry.Local.WinML # Windows\r\n\r\n# Rust (macOS\/Linux)\r\ncargo add foundry-local-sdk # macOS\/Linux\r\ncargo add foundry-local-sdk --features winml # Windows<\/code><\/pre>\n<h2>What&#8217;s coming in the next release<\/h2>\n<ul>\n<li><strong>C++ language binding<\/strong> \u2014 already available for early testing and feedback in the <a href=\"https:\/\/github.com\/microsoft\/Foundry-Local\/tree\/main\/sdk\/cpp\">Foundry Local GitHub repo<\/a>.<\/li>\n<li><strong>Smaller package size<\/strong> \u2014 further reductions to the core runtime footprint.<\/li>\n<li><strong>Audio enhancements<\/strong> \u2014 word and segment level timestamps, and additional language support for live transcription.<\/li>\n<\/ul>\n<h2>Learn more<\/h2>\n<ul>\n<li>Samples for each language and feature are provided in the <a href=\"https:\/\/github.com\/microsoft\/Foundry-Local\">Foundry Local GitHub repository<\/a>.<\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/foundry-local\/\">Foundry Local documentation<\/a><\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/2604.14493\">Optimizing Streaming ASR for On-Device Deployment (arXiv:2604.14493)<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Foundry Local 1.1 adds live transcription, embeddings, Responses API, WebGPU plugin, and download cancellation.<\/p>\n","protected":false},"author":189734,"featured_media":1563,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[32,31,33,35,117],"class_list":["post-2219","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-foundry","tag-ai","tag-azure","tag-foundry","tag-local-ai","tag-on-device"],"acf":[],"blog_post_summary":"<p>Foundry Local 1.1 adds live transcription, embeddings, Responses API, WebGPU plugin, and download cancellation.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2219","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/189734"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=2219"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2219\/revisions"}],"predecessor-version":[{"id":2222,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2219\/revisions\/2222"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/1563"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=2219"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=2219"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=2219"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}