{"id":2415,"date":"2026-06-03T09:00:43","date_gmt":"2026-06-03T16:00:43","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=2415"},"modified":"2026-06-03T09:29:24","modified_gmt":"2026-06-03T16:29:24","slug":"agent-optimizer-build2026","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/agent-optimizer-build2026\/","title":{"rendered":"Introducing Agent Optimizer in Foundry Agent Service"},"content":{"rendered":"<p>With\u00a0<a href=\"https:\/\/devblogs.microsoft.com\/foundry\/introducing-the-new-hosted-agents-in-foundry-agent-service-secure-scalable-compute-built-for-agents\/\"><em>hosted agents<\/em><\/a>, we made it straightforward to build and deploy agents on Foundry. You write your logic, run azd deploy, and your agent is live. But \u201clive\u201d and \u201cproduction-ready\u201d aren\u2019t the same thing.<\/p>\n<p>The gap shows up quickly. Your customer support agent handles requests, but it forgets to ask for an order number before looking up status. It answers warranty questions without checking the purchase date. It gives electrical wiring advice when it should decline and recommends a professional. Each fix means rewriting your system prompt, testing by hand, and hoping you didn\u2019t break something else in the process.<\/p>\n<p>For one agent, that\u2019s manageable. For a team running ten agents across different domains, it\u2019s a bottleneck that doesn\u2019t scale. We heard this from developers consistently:\u00a0<em>the hard part isn\u2019t building the agent, it\u2019s getting the agent to behave correctly across all the scenarios it needs to handle.<\/em><\/p>\n<p>Today we\u2019re excited to introduce the\u00a0<strong>agent optimizer in Foundry Agent Service<\/strong>, in private preview and out in public preview in 30 days.<\/p>\n<p>To sign up for private preview here\u00a0<a href=\"https:\/\/aka.ms\/Agent-Optimizer-Private-Preview\">aka.ms\/Agent-Optimizer-Private-Preview<\/a>.<\/p>\n<h2>What is the agent optimizer in Foundry Agent Service?<\/h2>\n<p>Agent optimizer evaluates your hosted agent against defined criteria, generates better configurations, and ranks the results so you can deploy the best one. It automates the improvement loop that most teams do by hand today.\u00a0Here\u2019s how it works. The optimizer runs a closed-loop cycle:<\/p>\n<ol>\n<li><strong>Evaluate the baseline.<\/strong>\u00a0Your agent processes a set of tasks, each with explicit pass\/fail criteria. The result is a composite score from 0.0 to 1.0.<\/li>\n<li><strong>Generate candidates.<\/strong>\u00a0Guided by what failed, the optimizer produces new configurations. You choose the optimization target: instruction rewrites your system prompt, skill generates reusable procedures, or model finds the best deployment for your quality\/cost trade-off.<\/li>\n<li><strong>Evaluate candidates.<\/strong>\u00a0Each candidate runs against the same task set.<\/li>\n<li><strong>Rank and recommend.<\/strong>\u00a0Results are sorted by score. You see per-task breakdowns and token costs for each candidate before you commit.<\/li>\n<li><strong>Deploy the winner.<\/strong>\u00a0One command promotes the winning configuration to your live agent<\/li>\n<\/ol>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/optimization-cycle-1.webp\"><img decoding=\"async\" class=\"wp-image-2450 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/optimization-cycle-1-300x94.webp\" alt=\"optimization cycle 1 image\" width=\"730\" height=\"229\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/optimization-cycle-1-300x94.webp 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/optimization-cycle-1-1024x320.webp 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/optimization-cycle-1-768x240.webp 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/optimization-cycle-1-1536x480.webp 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/optimization-cycle-1.webp 1600w\" sizes=\"(max-width: 730px) 100vw, 730px\" \/><\/a><\/p>\n<p>The entire process runs in the cloud. Start it with\u00a0<code>azd ai agent<\/code>\u00a0optimize and a typical run completes in a few minutes. You don\u2019t need to provision any additional infrastructure. If you have a hosted agent deployed, you\u2019re ready to optimize.<\/p>\n<h2>Developer experience<\/h2>\n<p>Agent optimizer was made for developers making it easier to take their agents into production with confidence. Here\u2019s what this looks like in practice. Say you have a customer support agent for a consumer electronics company. Your current system prompt is bare: &#8220;<em>You are a helpful customer support agent.<\/em>&#8221;<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">$ azd ai agent optimize\r\n \r\nOptimizing agent \"travel-approver\"...\r\n  Baseline saved to .agent_configs\\baseline\\metadata.yaml\r\n  Job ID: opt_999f814ed77a\u2026.\r\n  Status: pending\r\n  Portal: https:\/\/ai.azure.com\/nextgen\/....\r\n  \u2807 completed \u00b7 iteration 2 \u00b7 score: 1.00 \u00b7 9m0s\r\nResults:\r\n  Candidate              Score    Pass  Eval\r\n  \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500 \u2500\u2500\u2500\u2500\u2500\u2500\u2500 \u2500\u2500\u2500\u2500\u2500\u2500\u2500  \u2500\u2500\u2500\u2500\u2500\u2500\r\n  baseline                0.60     71%  View\r\n  candidate_1 \u2605           0.92    100%  View\r\n\r\n  Candidate IDs:\r\n      baseline             cand_200345f6c7\u2026\r\n    \u2605 candidate_1          cand_300d8e4e3\u2026\r\n\r\n  Apply the best candidate locally, then deploy:\r\n    azd ai agent optimize apply --candidate cand_300d8e4e3\u2026\r\n    azd deploy<\/code><\/pre>\n<p>From 0.60 to 0.92. No model retraining. No code changes. Using synthetic data or historical traces of how your agent performed and evaluator signals that identified where it fell short, the optimizer rewrote the system prompt\/skills\/tools to strengthen return policies, escalation procedures, troubleshooting frameworks, and safety boundaries. The changes were driven by observed behavior and scored against the criteria you defined.<\/p>\n<h3>Get your agent optimizer-ready<\/h3>\n<p>Create a\u00a0.agent_configs\/baseline\/\u00a0directory in your agent project. At minimum, you need an\u00a0instructions.md\u00a0file that captures your agent&#8217;s system prompt:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">.agent_configs\/\r\n \u2514\u2500\u2500 baseline\/\r\n     \u251c\u2500\u2500 metadata.yaml       # points to your config files\r\n     \u251c\u2500\u2500 instructions.md     # your agent's system prompt\r\n     \u251c\u2500\u2500 skills\/             # skill definitions (optional)\r\n     \u2502   \u2514\u2500\u2500 &lt;skill_name&gt;\/\r\n     \u2502       \u2514\u2500\u2500 SKILL.md\r\n     \u2514\u2500\u2500 tools.json          # tool definitions in OpenAI function-calling format (optional)<\/code><\/pre>\n<p>The integration is a <code>load_config()<\/code> call at startup. It reads the optimized configuration when the service injects one during evaluation, and falls back to your defaults in normal operation.<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from azure.ai.agentserver.optimization import load_config\r\n   \r\nconfig = load_config()  # baseline or optimized agent config\r\n   \r\n# includes skill catalog if skills were generated\r\ninstructions = config.compose_instructions()\r\n   \r\n# Model optimization \u2014 optimizer tries different models from a candidate list\r\nmodel = config.model or \"gpt-4o\"\r\n   \r\n# Tool description optimization \u2014 rewrites tool docstrings for better function-calling\r\ntools = [my_search_tool, my_calculator_tool]\r\nconfig.apply_tool_descriptions(tools)\r\n   \r\nagent = create_agent(\r\n  system_prompt=instructions,\r\n  model=model,\r\n  tools=tools,\r\n)<\/code><\/pre>\n<p>That\u2019s it. Your agent works with or without optimization. No feature flags, no conditional logic. When the optimizer evaluates your agent, it injects candidate configurations through an environment variable. In production, that variable is absent and your defaults apply.<\/p>\n<h2>What gets optimized?<\/h2>\n<p>Agent optimizer gives you three targets. You can use them individually or combine them in a single run.<\/p>\n<p><strong>Instruction<\/strong>\u00a0is the default target. The optimizer analyzes where your agent\u2019s responses fall short, then generates alternative system prompts that address those gaps. For our customer support agent, the optimizer added return window details, warranty coverage specifics, and clear boundaries around electrical and medical advice.<\/p>\n<p><strong>Skill<\/strong>\u00a0generates reusable, named procedures that get appended to your agent\u2019s instructions. Think of them as playbooks: an escalation procedure, a troubleshooting sequence, a formatting template. Each skill has a name, description, and implementation body that the agent follows when the situation matches. Use this when your agent needs repeatable multi-step behaviors that a single prompt rewrite can\u2019t capture.<\/p>\n<p><strong>Model<\/strong> lets the optimizer evaluate your agent across multiple model deployments in the same run. If you&#8217;re wondering whether gpt-5-mini handles your workload as well as gpt-5, or whether stepping up to gpt-5.4 gives you a meaningful quality bump on the dimensions that matter to your agent, the optimizer scores each option against your evaluators and shows you which one produces better responses. You pick the model based on what actually performs, not on gut feel.<\/p>\n<p><strong>Tool Descriptions<\/strong> lets the optimizer improve how your agent understands and uses its local function tools. It rewrites tool descriptions and parameter definitions so the agent picks the right tool more reliably. For a customer support agent, the optimizer might clarify when to call an order lookup tool versus a knowledge base search, tighten parameter requirements, define fallback behavior when a tool fails, or specify situations where the agent should answer directly instead of making a call. Today this covers the tools defined in your agent&#8217;s own tool set (not external tools like MCP servers); the optimizer refines what&#8217;s already wired up rather than reaching beyond the agent boundary.<\/p>\n<pre class=\"language-bash\"><code class=\"language-bash\"><span class=\"token function\">azd<\/span> ai agent optimize   # auto detects the targest according to the provided agent config<\/code><\/pre>\n<p>You can also combine targets in a configuration file to run them all at once.<\/p>\n<h2>From evaluation to action<\/h2>\n<p>Foundry already gives you observability and evaluation for your agents. Tracing captures every interaction.\u00a0<a href=\"https:\/\/learn.microsoft.com\/azure\/foundry\/observability\/how-to\/evaluate-agent\"><em>Agent evaluation<\/em><\/a>\u00a0scores your agent\u2019s behavior against quality criteria. Agent optimizer is where you take action on what those systems tell you.\u00a0When paired with Foundry\u2019s observability and evaluation stack, the improvement cycle becomes:<\/p>\n<ol>\n<li>Your agent runs in production. Traces capture every interaction.<\/li>\n<li>Evaluation scores behavior against defined criteria.<\/li>\n<li>Agent optimizer generates better configurations based on what the evaluations surface.<\/li>\n<li>You deploy the winning candidate as a new versioned hosted agent.<\/li>\n<li>Evaluation runs again to confirm improvement.<\/li>\n<\/ol>\n<p><strong>Starting from zero.<\/strong>\u00a0Most teams don\u2019t have evaluation datasets ready on day one. We\u2019ve seen this over and over. The eval init command solves the cold-start problem by generating both a dataset and evaluation criteria from your agent\u2019s existing instructions:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">$ azd ai agent eval init\r\n \r\n Eval suite created\r\n   Dataset:    customer-support (2.0), 15 tasks\r\n   Evaluator:  customer-support (1)\r\n \r\n   Evaluator dimensions (6):\r\n     Weight  Dimension\r\n     \u2500\u2500\u2500\u2500\u2500\u2500  \u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\r\n         10  policy_compliance\r\n          6  resolution_accuracy\r\n          5  troubleshooting_structure\r\n          4  communication_clarity\r\n          3  safety_boundaries\r\n          5  general_quality<\/code><\/pre>\n<p>&nbsp;<\/p>\n<p>No manual test-writing required. The system looks at what your agent is supposed to do and generates tasks that test whether it actually does it. For our customer support agent, it produced tasks around order inquiries, returns, warranty claims, troubleshooting, escalation, and out-of-scope requests. Each task has specific pass\/fail criteria tied to the agent\u2019s responsibilities.<\/p>\n<p>This connects to the broader evaluation capabilities in Foundry:<\/p>\n<ul>\n<li><strong>Built-in evaluators<\/strong>\u00a0for task adherence, groundedness, and safety<\/li>\n<li><strong>Synthetic data generation<\/strong>\u00a0when you don\u2019t have production traffic yet<\/li>\n<li><strong>Continuous evaluation<\/strong>\u00a0to detect regressions after deployment<\/li>\n<li><strong>Portal UI<\/strong>\u00a0to browse results and compare candidates side-by-side<\/li>\n<\/ul>\n<p>The whole flow lives in the same azd toolchain you already use:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">azd ai agent init        # scaffold your agent\r\nazd deploy               # ship to Foundry\r\nazd ai agent eval init   # generate evaluation criteria\r\nazd ai agent eval run    # score your agent\r\nazd ai agent optimize    # improve it\r\nazd ai agent otimize apply --candidate \r\nazd deploy  #deploy the optimized agent<\/code><\/pre>\n<p>Each promoted candidate becomes a new hosted agent version. Versioned, auditable, and rollback-ready. Tracing captures every optimization run so you have full visibility into what changed and why.\u00a0The Foundry portal gives you a visual interface to browse optimization runs, inspect candidate details, and get the exact CLI commands to deploy.<\/p>\n<p><strong>Optimization runs overview:<\/strong><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-optimize-runs-1-scaled.webp\"><img decoding=\"async\" class=\"wp-image-2431\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-optimize-runs-1-300x95.webp\" alt=\"portal optimize runs image\" width=\"818\" height=\"259\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-optimize-runs-1-300x95.webp 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-optimize-runs-1-1024x325.webp 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-optimize-runs-1-768x243.webp 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-optimize-runs-1-1536x487.webp 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-optimize-runs-1-2048x649.webp 2048w\" sizes=\"(max-width: 818px) 100vw, 818px\" \/><\/a><\/p>\n<p><strong>Run details with candidate comparison:<\/strong><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-run-details-1-scaled.webp\"><img decoding=\"async\" class=\" wp-image-2434\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-run-details-1-300x156.webp\" alt=\"Agent Optimizer run details\" width=\"683\" height=\"355\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-run-details-1-300x156.webp 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-run-details-1-1024x531.webp 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-run-details-1-768x398.webp 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-run-details-1-1536x797.webp 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-run-details-1-2048x1062.webp 2048w\" sizes=\"(max-width: 683px) 100vw, 683px\" \/><\/a><\/p>\n<p><strong>Candidate deployment:<\/strong><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-deploy-candidate-1-scaled.webp\"><img decoding=\"async\" class=\" wp-image-2435\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-deploy-candidate-1-300x149.webp\" alt=\"portal deploy candidate image\" width=\"689\" height=\"342\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-deploy-candidate-1-300x149.webp 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-deploy-candidate-1-1024x509.webp 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-deploy-candidate-1-768x381.webp 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-deploy-candidate-1-1536x763.webp 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/05\/portal-deploy-candidate-1-2048x1017.webp 2048w\" sizes=\"(max-width: 689px) 100vw, 689px\" \/><\/a><\/p>\n<p>Industry leaders are already seeing the value of a more systematic approach to moving agents into production:<\/p>\n<blockquote><p>\u201cAgent Optimizer is a vital step in helping enterprises move AI agents beyond proof of concept and into trusted production use. By bringing together governance, observability, and continuous improvement, it helps organizations reduce hallucinations, enhance safety, and continuously evaluate and optimize agent performance. As these capabilities continue to evolve\u2014including Context Engineering and AgentOps, one of the core technologies behind NTT DATA\u2019s Smart AI Agent\u00ae concept\u2014we believe Agent Optimizer will play an important role in enabling business leaders to confidently adopt agentic AI at scale.\u201d<\/p>\n<p>\u2014\u00a0<strong>Yuji Shono<\/strong>, Head of the Global AI Office, NTT Data Group Corporation<\/p><\/blockquote>\n<h2>Get started<\/h2>\n<p>We want to make it easy for you to try agent optimizer today. Here\u2019s how to get going:<\/p>\n<ol>\n<li><a href=\"https:\/\/aka.ms\/Agent-Optimizer-Private-Preview\">Sign up for private preview (closing in 30 days)<\/a><\/li>\n<li><strong>Run the quickstart.<\/strong>\u00a0<a href=\"https:\/\/aka.ms\/ao\/quickstart\"><em>Optimize your hosted agent<\/em><\/a>\u00a0walks you through your first optimization in under 15 minutes.<\/li>\n<li><strong>Read the concepts.<\/strong>\u00a0<a href=\"https:\/\/aka.ms\/ao\/docs\"><em>Agent optimizer overview<\/em><\/a>\u00a0covers scoring, targets, and configuration in detail.<\/li>\n<li><strong>Try the sample.<\/strong>\u00a0The\u00a0<a href=\"https:\/\/aka.ms\/faos\/samples\"><em>customer support sample<\/em><\/a>\u00a0gives you a working agent with an evaluation dataset ready to optimize.<\/li>\n<\/ol>\n<p>We\u2019re excited to see what you build and how much better your agents get with optimization in the loop. Get started today and let us know how it goes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>With\u00a0hosted agents, we made it straightforward to build and deploy agents on Foundry. You write your logic, run azd deploy, and your agent is live. But \u201clive\u201d and \u201cproduction-ready\u201d aren\u2019t the same thing. The gap shows up quickly. Your customer support agent handles requests, but it forgets to ask for an order number before looking [&hellip;]<\/p>\n","protected":false},"author":26108,"featured_media":2584,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[112,163,1],"tags":[],"class_list":["post-2415","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-foundry-agent-service","category-microsoft-build","category-microsoft-foundry"],"acf":[],"blog_post_summary":"<p>With\u00a0hosted agents, we made it straightforward to build and deploy agents on Foundry. You write your logic, run azd deploy, and your agent is live. But \u201clive\u201d and \u201cproduction-ready\u201d aren\u2019t the same thing. The gap shows up quickly. Your customer support agent handles requests, but it forgets to ask for an order number before looking [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2415","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/26108"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=2415"}],"version-history":[{"count":2,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2415\/revisions"}],"predecessor-version":[{"id":2558,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2415\/revisions\/2558"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/2584"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=2415"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=2415"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=2415"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}