{"id":711,"date":"2025-05-21T15:30:54","date_gmt":"2025-05-21T22:30:54","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=711"},"modified":"2025-05-23T10:20:47","modified_gmt":"2025-05-23T17:20:47","slug":"foundry-local-a-new-era-of-edge-ai","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/foundry-local-a-new-era-of-edge-ai\/","title":{"rendered":"Foundry Local: A New Era of Edge AI"},"content":{"rendered":"<p>At Microsoft Build 2025, Microsoft unveiled its groundbreaking\u00a0<strong>Foundry Local<\/strong>\u00a0solution for edge devices\u2014an efficient platform specifically designed for local AI inference. As a critical component of Microsoft&#8217;s AI strategy, Foundry Local empowers developers to smoothly deploy and run Small Language Models (SLMs) on resource-constrained edge devices, opening new possibilities for the convergence of edge computing and artificial intelligence.<\/p>\n<p><span style=\"font-size: 14pt;\"><strong>Core Architecture and Technical Advantages<\/strong><\/span><\/p>\n<p><strong>Technical Foundation: The ONNX Ecosystem<\/strong><\/p>\n<p><strong>Foundry Local<\/strong>\u00a0is built on\u00a0<strong>ONNX<\/strong>\u00a0(Open Neural Network Exchange)\u2014a mature, open standard for model interoperability. As a widely recognized model exchange format in machine learning and deep learning, ONNX brings significant advantages to Foundry Local:<\/p>\n<ul>\n<li><strong>Broad Compatibility<\/strong>: Supports models converted from various deep learning frameworks (PyTorch, TensorFlow, JAX, etc.)<\/li>\n<li><strong>Cross-Platform Optimization<\/strong>: Delivers highly optimized inference performance across different hardware architectures (CPU, GPU, NPU)<\/li>\n<li><strong>Rich Tooling Ecosystem<\/strong>: Leverages mature tools like Microsoft Olive for model optimization and quantization<\/li>\n<\/ul>\n<p><strong>Comprehensive Development Toolkit<\/strong><\/p>\n<p>Foundry Local offers a one-stop development experience:<\/p>\n<ul>\n<li><strong>Diverse Interface Options<\/strong>\n<ul>\n<li><strong>Command Line Interface (CLI)<\/strong>: Provides powerful model management, deployment, and testing capabilities<\/li>\n<li><strong>Multi-language SDKs<\/strong>: Currently supports NodeJS and Python, offering native programming experiences<\/li>\n<li><strong>RESTFul API<\/strong>:Standardized interface supporting seamless integration with various applications \u2013 OpenAI API compatibility<\/li>\n<\/ul>\n<\/li>\n<li><strong>Developer-Optimized Experience<\/strong>\n<ul>\n<li>Clean, intuitive API design<\/li>\n<li>Comprehensive documentation and code examples<\/li>\n<li>Built-in model management and monitoring tools<\/li>\n<\/ul>\n<\/li>\n<li><strong>Edge-First Performance Optimization<\/strong>\n<ul>\n<li>Memory utilization optimized for resource-constrained environments<\/li>\n<li>Intelligent caching and inference acceleration techniques<\/li>\n<li>Flexible deployment options supporting various device configurations<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>Breakthrough Technical Advantages<\/strong><\/p>\n<p>Foundry Local brings four key advantages to edge AI applications:<\/p>\n<ul>\n<li><strong>Ultra-Low Latency Experience<\/strong>\n<ul>\n<li>Local inference eliminates network communication overhead<\/li>\n<li>Millisecond-level response times, enabling native-like fluid interactions<\/li>\n<li>Adaptive batch processing optimization to increase throughput<\/li>\n<\/ul>\n<\/li>\n<li><strong>Complete Offline Capability<\/strong>\n<ul>\n<li>No continuous internet connection required, suitable for network-limited environments<\/li>\n<li>Full functionality maintained in offline states<\/li>\n<li>Ideal for remote areas, edge devices, or high network security requirements<\/li>\n<\/ul>\n<\/li>\n<li><strong>Enterprise-Grade Data Privacy<\/strong>\n<ul>\n<li>Sensitive data processed entirely locally, no cloud uploads required<\/li>\n<li>Compliant with strict regulatory requirements<\/li>\n<li>Reduced data breach risks, enhanced customer trust<\/li>\n<\/ul>\n<\/li>\n<li><strong>Maximized Resource Efficiency<\/strong>\n<ul>\n<li>Finely tuned model quantization significantly reduces memory requirements<\/li>\n<li>Dynamic resource allocation adapts to device load variations<\/li>\n<li>Battery-friendly design extends edge device usage time<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"font-size: 14pt;\"><strong>Building Cloud-Edge Collaborative AI Solutions<\/strong><\/span><\/p>\n<p>By combining Azure AI Foundry&#8217;s cloud-based Model Catalog, powerful computing resources, and unified management platform, developers can build customized EdgeAI solutions that meet various business requirements. This &#8220;cloud training, edge inference&#8221; model enables enterprises to balance computational costs, performance, and privacy requirements.<\/p>\n<p><strong>Exploring and Selecting Ideal Models<\/strong><\/p>\n<p><strong>Built-in Model Library<\/strong><\/p>\n<p>Let&#8217;s start by exploring the pre-configured models provided by Foundry Local. With a simple CLI command, we can view all available options:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">foundry model list<\/code><\/pre>\n<p>After executing the command above, you&#8217;ll see output similar to the following:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/01.png\"><img decoding=\"async\" class=\" wp-image-712 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/01-300x220.png\" alt=\"01 image\" width=\"677\" height=\"496\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/01-300x220.png 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/01-1024x750.png 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/01-768x562.png 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/01-1536x1125.png 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/01-2048x1500.png 2048w\" sizes=\"(max-width: 677px) 100vw, 677px\" \/><\/a><\/p>\n<p>Currently, Foundry Local natively supports multiple high-quality small language models, including:<\/p>\n<ul>\n<li><strong>Microsoft Phi Series<\/strong>: Phi-3, Phi-3.5, Phi-4-mini, and Phi-4-mini-reasoning optimized specifically for inference<\/li>\n<li><strong>Alibaba Qwen Series<\/strong>: Lightweight models with excellent Chinese language capabilities<\/li>\n<li><strong>Mistral AI Series<\/strong>: Open-source models that perform exceptionally well with small parameter counts<\/li>\n<\/ul>\n<p><strong>Expanding Model Selection<\/strong><\/p>\n<p>For enterprise applications, pre-configured models may not satisfy specific requirements. In such cases, Azure AI Foundry&#8217;s Model Catalog offers a vast selection of over\u00a0<strong>11,000+<\/strong>\u00a0diverse models.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/02.png\"><img decoding=\"async\" class=\"wp-image-713 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/02-300x128.png\" alt=\"02 image\" width=\"722\" height=\"308\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/02-300x128.png 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/02-1024x438.png 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/02-768x329.png 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/02-1536x657.png 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/02-2048x877.png 2048w\" sizes=\"(max-width: 722px) 100vw, 722px\" \/><\/a><\/p>\n<p>When selecting models suitable for edge deployment, consider the following factors:<\/p>\n<table style=\"width: 73.3212%; height: 72px;\">\n<thead>\n<tr style=\"height: 24px;\">\n<td style=\"height: 24px;\"><strong>Model Parameters<\/strong><\/td>\n<td style=\"height: 24px;\"><strong>Recommended Range<\/strong><\/td>\n<td style=\"height: 24px;\"><strong>Use Cases<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"height: 24px;\">\n<td style=\"height: 24px;\">1B-3B<\/td>\n<td style=\"height: 24px;\">Edge devices, IoT devices<\/td>\n<td style=\"height: 24px;\">Simple conversations, classification tasks<\/td>\n<\/tr>\n<tr style=\"height: 24px;\">\n<td style=\"height: 24px;\">3B-7B<\/td>\n<td style=\"height: 24px;\">Edge servers, high-end devices<\/td>\n<td style=\"height: 24px;\">Complex reasoning, multimodal tasks<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>For EdgeAI applications, Microsoft Phi, Mistral AI, and Llama series models with parameter counts between 1B-7B are typically ideal choices, striking a good balance between performance and resource consumption.<\/p>\n<p>After selecting a base model, developers can use it directly or fine-tune it for specific tasks. This article focuses on direct usage scenarios (for detailed fine-tuning guides, please refer to my separate specialized blog post).<\/p>\n<p><strong>Efficient Model Conversion and Quantization<\/strong><\/p>\n<p>Foundry Local requires models in the ONNX format. To convert models obtained from Azure AI Foundry into formats suitable for edge deployment, we need to perform model conversion and quantization.<\/p>\n<p><strong>Choosing a Conversion Environment<\/strong><\/p>\n<p>You can perform model conversion on your local workstation or leverage Azure ML cloud environments. For large models, Azure ML is strongly recommended as it provides:<\/p>\n<ul>\n<li><strong>Flexible Computing Resources<\/strong>: On-demand selection of CPU or GPU for conversion<\/li>\n<li><strong>Scalability<\/strong>: Process large models without local hardware limitations<\/li>\n<li><strong>Pre-configured Environments<\/strong>: Skip complex environment setup<\/li>\n<\/ul>\n<p><strong>Using Microsoft Olive for Model Optimization<\/strong><\/p>\n<p><a href=\"https:\/\/github.com\/microsoft\/Olive\">Microsoft Olive<\/a>\u00a0is Microsoft&#8217;s model optimization tool, specifically designed to convert various models to high-performance ONNX format. It supports optimization of mainstream models like Phi, Llama, Mistral, and Qwen.<\/p>\n<p><strong>Environment Setup<\/strong>:<\/p>\n<p>Create a new Notebook in Azure ML Studio, select the &#8220;Python 3.10-Azure ML&#8221; environment, and install the following key dependencies:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">!pip install olive-ai==0.8.0 onnxruntime-genai==0.7.1 onnxruntime==1.12.1 transformers==4.51.3<\/code><\/pre>\n<p><strong>Executing Model Conversion<\/strong>:<\/p>\n<p>Use the following command to convert the model to INT4 quantized ONNX format, significantly reducing model size while preserving inference performance:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">!olive auto-opt \\\r\n\r\n\u00a0 --model_name_or_path {Your Model at Azure Model Location} \\\r\n\r\n\u00a0 --provider CPUExecutionProvider \\\r\n\r\n\u00a0 --use_model_builder \\\r\n\r\n\u00a0 --precision int4 \\\r\n\r\n\u00a0 --output_path {Your ONNX Model output path} \\\r\n\r\n\u00a0 --log_level 1 \\\r\n\r\n\u00a0 --trust_remote_code<\/code><\/pre>\n<p><strong>Tip<\/strong>: For edge devices, INT4 quantization typically reduces model size to approximately 25% of the original size while maintaining about 95% of the performance. For scenarios requiring higher performance, consider using INT8 quantization.<\/p>\n<p><strong>Cloud Model Management and Version Control<\/strong><\/p>\n<p>Saving converted models to Azure ML model registry is a best practice, providing professional version control, access management, and deployment tracking capabilities.<\/p>\n<p><strong>Model Registration and Management<\/strong><\/p>\n<p>Azure ML model registry enables teams to:<\/p>\n<ul>\n<li><strong>Centralized Management<\/strong>: Store and manage all models in a single location<\/li>\n<li><strong>Version Control<\/strong>: Track model evolution and roll back to previous versions at any time<\/li>\n<li><strong>Metadata Tagging<\/strong>: Add key information to models, such as accuracy, size, and purpose<\/li>\n<li><strong>Access Control<\/strong>: Set fine-grained permissions for security<\/li>\n<\/ul>\n<p>The image below shows the model management interface in Azure ML:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/03.png\"><img decoding=\"async\" class=\"wp-image-714 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/03-300x104.png\" alt=\"03 image\" width=\"695\" height=\"241\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/03-300x104.png 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/03-1024x355.png 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/03-768x266.png 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/03-1536x532.png 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/03.png 1686w\" sizes=\"(max-width: 695px) 100vw, 695px\" \/><\/a><\/p>\n<p>Here&#8217;s an example code for registering an ONNX model to Azure ML: <a href=\"https:\/\/github.com\/microsoft\/Build25-LAB329\/blob\/main\/Lab329\/Notebook\/04.AzureML_RegisterToAzureML.ipynb\">https:\/\/github.com\/microsoft\/Build25-LAB329\/blob\/main\/Lab329\/Notebook\/04.AzureML_RegisterToAzureML.ipynb<\/a><\/p>\n<p><strong>Deploying Models to Edge Devices<\/strong><\/p>\n<p>Foundry Local has designed a straightforward deployment process, enabling developers to quickly integrate custom models into edge applications. Here&#8217;s a detailed step-by-step guide:<\/p>\n<ol>\n<li><strong> Retrieve Optimized Models from the Cloud<\/strong><\/li>\n<\/ol>\n<p>First, download the optimized ONNX model from Azure ML model registry using the appropriate code.\u00a0 <a href=\"https:\/\/github.com\/microsoft\/Build25-LAB329\/blob\/main\/Lab329\/Notebook\/05.Local_Download.ipynb\">https:\/\/github.com\/microsoft\/Build25-LAB329\/blob\/main\/Lab329\/Notebook\/05.Local_Download.ipynb<\/a><\/p>\n<ol start=\"2\">\n<li><strong> Place Model Files Correctly<\/strong><\/li>\n<\/ol>\n<p>Place the downloaded model files in the Foundry Local model directory:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\"># Create directory for model\r\n\r\nmkdir -p .\/models\/llama\/\r\n\r\n# Move model files to appropriate location\r\n\r\n# Note: May need to adjust based on specific model structure\r\n\r\nmv .\/downloaded-model\/* .\/models\/llama\/<\/code><\/pre>\n<ol start=\"3\">\n<li><strong>Create an inference_model.json configuration file in the model directory, defining model metadata and prompt templates:<\/strong><\/li>\n<\/ol>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">{\r\n\r\n\u00a0 \"Name\": \"llama-3.2-1b-onnx\",\r\n\r\n\u00a0 \"PromptTemplate\": {\r\n\r\n\u00a0\u00a0\u00a0 \"assistant\": \"{Content}\",\r\n\r\n\u00a0\u00a0\u00a0 \"prompt\": \"&lt;|begin_of_text|&gt;&lt;|start_header_id|&gt;system&lt;|end_header_id|&gt;You are my EdgeAI assistant, help me to answer question&lt;|eot_id|&gt;&lt;|start_header_id|&gt;user&lt;|end_header_id|&gt;\\n\\n{Content}&lt;|eot_id|&gt;&lt;|start_header_id|&gt;assistant&lt;|end_header_id|&gt;\\n\\n\"\r\n\r\n\u00a0 }\r\n\r\n}<\/code><\/pre>\n<p><strong>Configuration Notes<\/strong>:<\/p>\n<ul>\n<li>Name: Unique identifier for the model, used to reference it in Foundry Local<\/li>\n<li>PromptTemplate: Defines how inputs and outputs are formatted, supports custom system prompts and special tokens<\/li>\n<\/ul>\n<ol start=\"4\">\n<li><strong> Verify Model Deployment<\/strong><\/li>\n<\/ol>\n<p>Use the Foundry Local CLI to test if the model is correctly deployed and can run:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">foundry cache cd models\r\n\r\nfoundry model run llama-3.2-1b-onnx --verbose<\/code><\/pre>\n<p>After execution, you&#8217;ll see output similar to the image below, indicating the model has been successfully loaded and can respond to queries:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/04.png\"><img decoding=\"async\" class=\"wp-image-715 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/04-300x213.png\" alt=\"04 image\" width=\"623\" height=\"442\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/04-300x213.png 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/04-1024x727.png 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/04-768x545.png 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/04-1536x1091.png 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/05\/04.png 1676w\" sizes=\"(max-width: 623px) 100vw, 623px\" \/><\/a><\/p>\n<p><strong>Integrating Foundry Local into Applications<\/strong><\/p>\n<p>After successfully deploying the model, you can use Foundry Local in your applications through multiple approaches, including SDKs and REST APIs. Let&#8217;s explore the integration options:<\/p>\n<p><strong>Using SDK for Integration<\/strong><\/p>\n<p>Foundry Local provides native SDKs for Python and Node.js, allowing developers to easily integrate edge AI capabilities into existing applications. Its APIs are intentionally designed to be compatible with the OpenAI API, making migration from cloud to edge straightforward.<\/p>\n<p><strong>Example with Streaming<\/strong>:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">import openai\r\n\r\nfrom foundry_local import FoundryLocalManager\r\n\r\n\r\n# Initialize the Foundry Local manager and get connection details\r\n\r\nmanager = FoundryLocalManager()\r\n\r\nalias = \"llama-3.2-1b-onnx-int4-cpu\"\r\n\r\n# Create a client using the OpenAI-compatible interface\r\n\r\nclient = openai.OpenAI(\r\n\r\n\u00a0\u00a0\u00a0 base_url=manager.endpoint,\r\n\r\n\u00a0\u00a0\u00a0 api_key=manager.api_key\r\n\r\n)\r\n\r\n# Stream responses for better UX\r\n\r\nstream = client.chat.completions.create(\r\n\r\n\u00a0\u00a0\u00a0 model=alias,\r\n\r\n\u00a0\u00a0\u00a0 messages=[{\"role\": \"user\", \"content\": \"explain 1+1=2 ?\"}],\r\n\r\n\u00a0\u00a0\u00a0 stream=True\r\n\r\n)\r\n\r\n\r\n# Process the streamed response\r\n\r\nfor chunk in stream:\r\n\r\n\u00a0\u00a0\u00a0 if chunk.choices[0].delta.content is not None:\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 print(chunk.choices[0].delta.content, end=\"\", flush=True)<\/code><\/pre>\n<p><strong>Application Scenarios and Best Practices<\/strong><\/p>\n<p>Foundry Local&#8217;s flexible architecture supports various edge AI application scenarios:<\/p>\n<p><strong>Key Application Areas<\/strong><\/p>\n<ul>\n<li><strong>Smart Home Devices<\/strong>: Provide offline voice assistant capabilities for smart speakers, home control centers, etc.<\/li>\n<li><strong>Industrial IoT<\/strong>: Deploy intelligent monitoring systems in factory floors without transmitting sensitive data to the cloud<\/li>\n<li><strong>Medical Devices<\/strong>: Provide AI-assisted diagnostic capabilities for medical devices while complying with regulations like HIPAA<\/li>\n<li><strong>Field Service Applications<\/strong>: Deliver offline AI support for service personnel in remote or poorly networked areas<\/li>\n<li><strong>Retail Smart Terminals<\/strong>: Offer personalized recommendations and customer service while protecting customer privacy<\/li>\n<\/ul>\n<p><strong>Deployment Best Practices<\/strong><\/p>\n<ul>\n<li><strong>Cache Optimization<\/strong>: Configure appropriate cache sizes to improve response times for common queries<\/li>\n<li><strong>Concurrent Processing<\/strong>: Adjust batch processing parameters to optimize throughput in multi-user scenarios<\/li>\n<li><strong>Monitoring and Updates<\/strong>: Implement model performance monitoring and update models regularly to improve accuracy<\/li>\n<\/ul>\n<p><strong>Security and Compliance Considerations<\/strong><\/p>\n<p>When deploying AI at the edge, security becomes a critical concern. Foundry Local implements several security features:<\/p>\n<ul>\n<li>: Cryptographic verification of models to prevent tampering<\/li>\n<li>: API key authentication and role-based access for multi-user deployments<\/li>\n<li><strong>Data Protection<\/strong>: Local processing eliminates data transmission risks<\/li>\n<li><strong>Audit Logging<\/strong>: Comprehensive logging of model usage and performance metrics<\/li>\n<\/ul>\n<p>For regulated industries, Foundry Local&#8217;s on-device processing simplifies compliance with regulations , and industry-specific standards by keeping sensitive data within organizational boundaries.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>Foundry Local represents an important step in bringing artificial intelligence technology from the cloud to the edge. By bringing AI capabilities directly to user devices, it not only addresses key challenges of latency, privacy, and connection reliability but also provides developers with a powerful foundation for building next-generation intelligent applications.<\/p>\n<p>As edge AI technology continues to develop, we can expect to see more and more innovative applications emerging across various industries, bringing users smarter, more private, and more efficient experiences. Whether you&#8217;re just beginning your AI development journey or seeking to optimize existing solutions as a professional developer, Foundry Local offers a platform worth exploring.<\/p>\n<p><strong>Resources<\/strong><\/p>\n<p>Official resources for learning more about Foundry Local and related technologies:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/microsoft\/Foundry-Local\">Microsoft Foundry Local Repository<\/a>\u00a0&#8211; Official codebase, documentation, and examples<\/li>\n<li><a href=\"https:\/\/github.com\/microsoft\/Olive\">Microsoft Olive Repository<\/a>\u00a0&#8211; Tool for optimizing and converting models<\/li>\n<li><a href=\"https:\/\/github.com\/microsoft\/Foundry-Local\/blob\/main\/docs\/how-to\/compile-models-for-foundry-local.md\">Custom Model Deployment Guide<\/a>\u00a0&#8211; Detailed deployment documentation and examples<\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-foundry\/what-is-azure-ai-foundry\">Azure AI Foundry Overview<\/a>\u00a0&#8211; Learn about the cloud AI platform<\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-foundry\/concepts\/foundry-models-overview\">Azure AI Model Catalog<\/a> &#8211; Explore available pre-trained models<\/li>\n<li><a href=\"https:\/\/github.com\/microsoft\/Build25-LAB329\">Fine-Tune End-to-End<\/a> Distillation Models with Azure AI Foundry Models<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At Microsoft Build 2025, Microsoft unveiled its groundbreaking\u00a0Foundry Local\u00a0solution for edge devices\u2014an efficient platform specifically designed for local AI inference. As a critical component of Microsoft&#8217;s AI strategy, Foundry Local empowers developers to smoothly deploy and run Small Language Models (SLMs) on resource-constrained edge devices, opening new possibilities for the convergence of edge computing and [&hellip;]<\/p>\n","protected":false},"author":106050,"featured_media":1563,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[37,1],"tags":[],"class_list":["post-711","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-foundrylocal","category-microsoft-foundry"],"acf":[],"blog_post_summary":"<p>At Microsoft Build 2025, Microsoft unveiled its groundbreaking\u00a0Foundry Local\u00a0solution for edge devices\u2014an efficient platform specifically designed for local AI inference. As a critical component of Microsoft&#8217;s AI strategy, Foundry Local empowers developers to smoothly deploy and run Small Language Models (SLMs) on resource-constrained edge devices, opening new possibilities for the convergence of edge computing and [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/711","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/106050"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=711"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/711\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/1563"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=711"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=711"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=711"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}