{"id":2601,"date":"2026-06-03T15:44:01","date_gmt":"2026-06-03T22:44:01","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=2601"},"modified":"2026-06-03T15:52:59","modified_gmt":"2026-06-03T22:52:59","slug":"build-2026-from-observability-to-roi-for-ai-agents-on-any-framework","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/build-2026-from-observability-to-roi-for-ai-agents-on-any-framework\/","title":{"rendered":"Build 2026: From observability to ROI for AI agents on any framework\u00a0"},"content":{"rendered":"<p><i><span data-contrast=\"auto\">9 min read \u00b7 June 3, 2026 \u00b7 Sebastian Kohlmeier<\/span><\/i><span data-ccp-props=\"{&quot;335559738&quot;:180,&quot;335559739&quot;:180}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Shipping an AI agent is the easy part. Keeping it\u00a0accurate, safe, and accountable in production is where teams get stuck. Agents are\u00a0non-deterministic. Their behavior shifts as models update, tools change, and traffic patterns\u00a0evolve\u00a0and most of that drift happens silently, long after the demo.\u00a0<\/span><b><span data-contrast=\"auto\">End-to-end observability\u00a0<\/span><\/b><span data-contrast=\"auto\">covering the full development lifecycle is how you close that gap: See every step an agent takes, evaluate quality and safety against criteria you define,\u00a0optimize\u00a0what\u00a0isn\u2019t\u00a0working, and prove the business value of what is.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">This spring we hit a major milestone \u2014\u00a0<\/span><b><span data-contrast=\"auto\">tracing and evaluations in Microsoft Foundry reached general availability\u00a0<\/span><\/b><span data-contrast=\"auto\">with hosted agents coming soon.\u00a0Every team building on Foundry can rely on them in production today. At Build 2026,\u00a0we\u2019re\u00a0extending that foundation to\u00a0<\/span><b><span data-contrast=\"auto\">any agent framework, any deployment target, and the full Agent DevOps loop<\/span><\/b><span data-contrast=\"auto\">\u00a0\u2014 from the first inference call to the ROI dashboard your CFO will ask about.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">This post walks through the key capabilities\u00a0we\u2019re\u00a0landing in BRK252 \u2014 From observability to ROI for AI agents on any framework: I<\/span><b><span data-contrast=\"auto\">nteroperability, context-specific rubric evaluators that are multi-turn enabled, code-first observability, optimization, and business ROI.<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><span style=\"color: #ff0000;\"><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Agent-DevOps-Lifecyclle.webp\"><img decoding=\"async\" class=\"wp-image-2613 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Agent-DevOps-Lifecyclle-300x168.webp\" alt=\"Agent DevOps Lifecyclle image\" width=\"450\" height=\"252\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Agent-DevOps-Lifecyclle-300x168.webp 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Agent-DevOps-Lifecyclle-768x429.webp 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Agent-DevOps-Lifecyclle.webp 993w\" sizes=\"(max-width: 450px) 100vw, 450px\" \/><\/a><\/span><\/p>\n<h2><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:180,&quot;335559740&quot;:240}\">\u00a0<\/span><b><span data-contrast=\"none\">What\u2019s\u00a0new at Build 2026<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:180,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">All capabilities are part of Microsoft Foundry observability. Preview status reflects state at Build.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Whats-New-at-Build-2026-chart.webp\"><img decoding=\"async\" class=\"wp-image-2611 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Whats-New-at-Build-2026-chart.webp\" alt=\"What 8217 s New at Build 2026 chart image\" width=\"683\" height=\"531\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Whats-New-at-Build-2026-chart.webp 651w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/Whats-New-at-Build-2026-chart-300x233.webp 300w\" sizes=\"(max-width: 683px) 100vw, 683px\" \/><\/a><\/p>\n<h2><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:300}\">\u00a0<\/span><b><span data-contrast=\"none\">Why observability is the foundation for trustworthy agents<\/span><\/b><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Traditional software is deterministic: Same input, same output, same code path. Agents\u00a0aren\u2019t. The same prompt can take three different tool routes today and a fourth one tomorrow when the model or prompt is updated. That makes the standard reliability stack \u2014 logs, metrics, error rates \u2014 insufficient on its own. You also need to know what the agent\u00a0<\/span><i><span data-contrast=\"auto\">decided<\/span><\/i><span data-contrast=\"auto\">, whether that decision was\u00a0<\/span><i><span data-contrast=\"auto\">good<\/span><\/i><span data-contrast=\"auto\">, and whether\u00a0it\u2019s\u00a0<\/span><i><span data-contrast=\"auto\">getting better or worse<\/span><\/i><span data-contrast=\"auto\">\u00a0over time.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"2\" data-aria-level=\"1\"><span data-contrast=\"auto\">Foundry observability is built around four capabilities you use continuously across the agent lifecycle:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"3\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Trace\u00a0<\/span><\/b><span data-contrast=\"auto\">\u2014 end-to-end telemetry for every step (prompt, model call, tool invocation, sub-agent\u00a0hop)<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:288}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"4\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Evaluate\u00a0<\/span><\/b><span data-contrast=\"auto\">\u2014 quality, safety, and task-completion scoring at single-turn and multi-turn granularity<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:288}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"5\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Monitor\u00a0<\/span><\/b><span data-contrast=\"auto\">\u2014 real-time issue detection with alerts and dashboards via Azure Monitor<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:288}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"6\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Optimize\u00a0<\/span><\/b><span data-contrast=\"auto\">\u2014 turn production signal into ranked, evidence-backed agent improvements<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:288}\">\u00a0<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2><b><span data-contrast=\"none\">1. Interoperability: Observe any agent, on any framework<\/span><\/b><span data-ccp-props=\"{&quot;335559739&quot;:180}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Agents are no longer built in one stack. A single production system might use Microsoft Agent Framework for orchestration,\u00a0LangChain\u00a0for retrieval, the OpenAI SDK for a side workflow, and a hosted Foundry agent for a long-running routine. Developers\u00a0shouldn\u2019t\u00a0have to choose between the framework they want and the observability they need.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">Tracing &amp; evaluations \u2014 any agent framework<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> | <\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"auto\">Foundry\u2019s production-grade tracing and evals now extend to agents built on\u00a0<\/span><b><span data-contrast=\"auto\">LangChain,\u00a0LangGraph, OpenAI SDK, Microsoft Agent Framework, and any custom framework<\/span><\/b><span data-contrast=\"auto\">\u00a0via\u00a0<\/span><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/foundry\/observability\/how-to\/trace-agent-setup\"><span data-contrast=\"none\">OpenTelemetry<\/span><\/a><span data-contrast=\"auto\">\u00a0 .<\/span><span data-contrast=\"auto\">\u00a0Every tool\u00a0call, LLM invocation, and handoff lands in one trace view \u2014 regardless of which framework produced it. Run structured evaluations against those traces and get consistent quality signals across your entire agent fleet.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><span style=\"color: #008000;\"><em>\u00a0<b>Developer tip:\u00a0<\/b>If your agents already emit\u00a0OpenTelemetry\u00a0spans,\u00a0you\u2019re\u00a0most of the way there. Point your\u00a0OTel\u00a0exporter at Foundry, and tracing and evals light up across the framework you already use.\u00a0<\/em><\/span><\/p>\n<hr \/>\n<h2><b><span data-contrast=\"none\">2. Observability: Spanning the full Agent DevOps lifecycle<\/span><\/b><\/h2>\n<p><span data-contrast=\"none\">Agents fail differently than traditional software \u2014 context drifts, reasoning\u00a0wanders,\u00a0quality erodes over a conversation rather than crashing on a single call. Catching that requires observability as one continuous loop: tracing, evaluation, and diagnosis living\u00a0<\/span><i><span data-contrast=\"none\">inside<\/span><\/i><span data-contrast=\"none\">\u00a0your workflow \u2014 the editor and CLI you already use, not a separate dashboard.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:1,&quot;335551620&quot;:1,&quot;335559738&quot;:180,&quot;335559739&quot;:180}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">This release threads that loop end to end \u2014 a built-in dev experience, tools to pressure-test and score agents before they ship, continuous monitoring in production, visual debugging, and a path that turns real production behavior back into your tests.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335551550&quot;:1,&quot;335551620&quot;:1,&quot;335559738&quot;:180,&quot;335559739&quot;:180}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">AZD observability dev experience<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">Tracing, logging, and eval insights are now integrated directly into the\u00a0<\/span><b><span data-contrast=\"none\">Azure Developer CLI (azd)<\/span><\/b><span data-contrast=\"none\">. Spin up a new hosted agent and observability\u00a0is on\u00a0by default when enabled in a guided experience \u2014 trace your first run, view evaluation results\u00a0inline, and\u00a0diagnose failures without leaving your terminal or VS Code.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">User simulation<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">Generating realistic test conversations by hand\u00a0doesn\u2019t\u00a0scale.\u00a0<\/span><b><span data-contrast=\"none\">User simulation<\/span><\/b><span data-contrast=\"none\">\u00a0automatically produces multi-turn conversations and edge-case\u00a0scenarios\u00a0so you can pressure-test your agent before it sees a real user.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">Multi-turn evaluation<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">Single-turn\u00a0evals miss the failure modes that only show up when context\u00a0accumulates:\u00a0tone drift, lost goals, contradictions, safety regressions across long conversations.\u00a0<\/span><b><span data-contrast=\"none\">Multi-turn evaluation<\/span><\/b><span data-contrast=\"none\">\u00a0scores agent quality across full multi-step conversations \u2014 capturing context carryover, reasoning consistency, and end-to-end task success.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">Rubric evaluator<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">\u201cGood\u201d is different for a vendor history agent,\u00a0a customer\u00a0support agent, and\u00a0a compliance\u00a0reviewer.\u00a0<\/span><b><span data-contrast=\"none\">Rubric<\/span><\/b><span data-contrast=\"none\">\u00a0is a new evaluator type that\u00a0<\/span><b><span data-contrast=\"none\">generates context-aware evaluation criteria from your agent\u2019s intended behavior<\/span><\/b><span data-contrast=\"none\">\u00a0\u2014 weighted across task success, tone, safety, cost, and latency \u2014 then\u00a0runs them\u00a0alongside Foundry\u2019s built-in safety and quality evaluators. The result is a unified scorecard you can run before deployment and continuously in production.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/rubric.webp\"><img decoding=\"async\" class=\"wp-image-2610 size-full aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/rubric.webp\" alt=\"rubric image\" width=\"682\" height=\"337\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/rubric.webp 682w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/rubric-300x148.webp 300w\" sizes=\"(max-width: 682px) 100vw, 682px\" \/><\/a> <a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/RUBRIC-EVALUATOR.mp4\">RUBRIC EVALUATOR<\/a><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><b><span data-contrast=\"none\">Evaluations with intelligent\u00a0trace\u00a0sampling<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">Evaluating every production trace is wasteful; evaluating none is risky.\u00a0<\/span><b><span data-contrast=\"none\">Intelligent trace sampling<\/span><\/b><span data-contrast=\"none\">\u00a0automatically runs evaluations against a curated sample of live production traces, using smart filtering to surface the most signal-rich interactions. Continuous quality monitoring without the bill for evaluating every request.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">Trace replay and visualization<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">Step through any agent execution trace visually \u2014 prompt, decision, tool call, model output \u2014 and replay it to understand exactly how an outcome was produced. Debugging multi-step\u00a0agents\u00a0stops is now much simpler.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">Traces to dataset<\/span><\/b><span data-ccp-props=\"{&quot;335559738&quot;:300,&quot;335559739&quot;:140}\"> | <\/span><b><span data-contrast=\"none\">Public Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">Production traces are the highest-fidelity test data you have.\u00a0<\/span><b><span data-contrast=\"none\">Traces to dataset<\/span><\/b><span data-contrast=\"none\">\u00a0converts them into structured evaluation datasets you can use offline \u2014 closing the loop between what users\u00a0actually do\u00a0and what your evals cover.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p style=\"text-align: left;\"><span style=\"color: #008000;\"><em><b>Developer tip:\u00a0<\/b>Wire trace replay into your incident review process. The fastest way to fix an\u00a0agent\u00a0regression is to replay the exact trace that broke and re-run it against the candidate fix \u2014 not to reproduce the failure from scratch.\u00a0<\/em><\/span><\/p>\n<hr \/>\n<h2><b><span data-contrast=\"none\">3. Optimization:\u00a0Turn\u00a0evals and\u00a0traces into action<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:180,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Traces tell you\u00a0<\/span><i><span data-contrast=\"auto\">what happened<\/span><\/i><span data-contrast=\"auto\">. Evaluations tell you\u00a0<\/span><i><span data-contrast=\"auto\">whether it was good<\/span><\/i><span data-contrast=\"auto\">.\u00a0Optimization tells you\u00a0<\/span><i><span data-contrast=\"auto\">what to change next<\/span><\/i><span data-contrast=\"auto\">\u00a0\u2014 and proves the change actually helped.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">Agent optimizer in Foundry Agent Service<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:300,&quot;335559739&quot;:140,&quot;335559740&quot;:240}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Private Preview\u00a0(public preview coming this month)<\/span><\/b><b><span data-contrast=\"none\">\u00a0.\u00a0<\/span><\/b><span data-contrast=\"none\">I<\/span><span data-contrast=\"none\">mproving an agent today is guess-and-check. Ship a tweak, watch users hit failures, try another prompt, hope it sticks.\u00a0<\/span><b><span data-contrast=\"none\">Agent optimizer<\/span><\/b><span data-contrast=\"none\">\u00a0replaces that cycle with a governed, evidence-backed loop. It reads the agent\u2019s current prompts and skills, searches for configurations that increase quality on your scenarios and constraints, and surfaces ranked candidates with full diffs, lineage, audit trail, and rollback.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Every candidate is evaluated against your rubric and shown side by side \u2014 exactly what improved, what regressed, and why. Promote the winner; new traces feed back into evaluation.\u00a0That\u2019s\u00a0the closed\u00a0observe\u00a0\u2192 evaluate \u2192\u00a0optimize\u00a0\u2192 deploy loop, and it runs continuously.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/foundry-agent-optimizer.gif\" alt=\"The image shows a screen capture of a Microsoft Foundry dashboard, displaying various tabs such as agents, projects, and a search bar. It also includes information about a build, including version number, duration, and various statuses like completed, failed, and running.\" \/><\/p>\n<hr \/>\n<h2><b><span data-contrast=\"none\">4. Prove the value: ROI for agents in Foundry<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:180,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Once teams can\u00a0observe, evaluate, and\u00a0optimize\u00a0agents continuously, the next question gets asked:\u00a0<\/span><b><i><span data-contrast=\"auto\">is this agent worth what it costs?<\/span><\/i><\/b><span data-contrast=\"auto\">\u00a0That question used to require a spreadsheet and a lot of intuition.\u00a0We\u2019re\u00a0replacing both.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><b><span data-contrast=\"none\">ROI for agents in Foundry<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:300,&quot;335559739&quot;:140,&quot;335559740&quot;:240}\"> |\u00a0<\/span><b><span data-contrast=\"none\">Private Preview.\u00a0<\/span><\/b><span data-contrast=\"none\">ROI for agents translates the cost of running an agent into the business value it creates \u2014\u00a0<\/span><b><span data-contrast=\"none\">task completion rates, time saved, and cost efficiency<\/span><\/b><span data-contrast=\"none\">\u00a0\u2014 and shows them side by side in the Foundry portal and via API. Compare across versions, track daily trends, and click\u00a0into\u00a0low-ROI traces to debug. Stakeholders finally get the data they need to justify investment and prioritize what to improve next.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/roi-agent.webp\"><img decoding=\"async\" class=\"wp-image-2652 size-full aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/roi-agent.webp\" alt=\"roi agent image\" width=\"690\" height=\"395\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/roi-agent.webp 690w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/06\/roi-agent-300x172.webp 300w\" sizes=\"(max-width: 690px) 100vw, 690px\" \/><\/a><\/p>\n<p><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><i><span data-contrast=\"none\">\u201cBy combining evaluations and tracing capabilities in\u00a0<\/span><\/i><b><i><span data-contrast=\"none\">Microsoft Foundry<\/span><\/i><\/b><i><span data-contrast=\"none\">\u00a0with Azure Monitor, we transform AI into an\u00a0<\/span><\/i><b><i><span data-contrast=\"none\">enterprise-grade, production<\/span><\/i><\/b><i><span data-contrast=\"none\">-ready system with\u00a0<\/span><\/i><b><i><span data-contrast=\"none\">built-in observability and continuous optimization<\/span><\/i><\/b><i><span data-contrast=\"none\">\u00a0\u2014 enabling ongoing evolution across the agent lifecycle and accelerating NTT DATA\u2019s Smart AI Agent\u00ae vision.\u201d<\/span><\/i><span data-ccp-props=\"{&quot;335559739&quot;:120}\">\u00a0<\/span><b><span data-contrast=\"none\">\u2014<\/span><\/b> Yuji Shono, Head of the Global AI Office, NTT DATA Group<\/p>\n<hr \/>\n<h2><b><span data-contrast=\"none\">The full loop, in one platform<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:180,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">Each of these announcements stands on its own. Connected, they form the continuous observability loop developers and operators have been asking for:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:120,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"7\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Interoperability\u00a0<\/span><\/b><span data-contrast=\"auto\">gives you freedom of framework and one place to see everything.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Segoe UI\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"8\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Observa<\/span><\/b><b><span data-contrast=\"auto\">bility: spanning the full Agent DevOps lifecycle\u00a0<\/span><\/b><span data-contrast=\"auto\">makes\u00a0tracing, multi-turn evals, rubric scoring, and intelligent sampling part of everyday development, so teams can build, test, and iterate faster.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"9\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Optimization\u00a0<\/span><\/b><span data-contrast=\"auto\">closes the loop from production signal to evidence-backed agent improvements.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"10\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">ROI\u00a0<\/span><\/b><span data-contrast=\"auto\">turns those improvements into a business case stakeholders can act on.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"auto\">All of it on the same Foundry control plane, with Azure Monitor for alerts and infrastructure signals, and\u00a0OpenTelemetry\u00a0as the common language underneath.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:200,&quot;335559740&quot;:300}\">\u00a0<\/span><\/p>\n<h2><b><span data-contrast=\"none\">Get started<\/span><\/b><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:180,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h2>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"11\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Start building in Microsoft Foundry:\u00a0<\/span><\/b><a href=\"https:\/\/ai.azure.com\/\"><span data-contrast=\"none\">ai.azure.com<\/span><\/a><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"12\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Get the BRK252 session code:\u00a0<\/span><\/b><a href=\"https:\/\/aka.ms\/build26-BRK252\"><span data-contrast=\"none\">aka.ms\/build26-BRK252<\/span><\/a><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"13\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Read the docs:\u00a0<\/span><\/b><a href=\"https:\/\/learn.microsoft.com\/azure\/foundry\/observability\"><span data-contrast=\"none\">Foundry observability documentation<\/span><\/a><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"14\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Join the community:\u00a0<\/span><\/b><a href=\"https:\/\/aka.ms\/ai\/discord\"><span data-contrast=\"none\">aka.ms\/ai\/discord<\/span><\/a><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li aria-setsize=\"-1\" data-leveltext=\"\u2022\" data-font=\"Calibri\" data-listid=\"2\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559685&quot;:540,&quot;335559991&quot;:270,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\u2022&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" data-aria-posinset=\"15\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Hands-on lab:\u00a0<\/span><\/b><span data-contrast=\"auto\">LAB540\u00a0|\u00a0<\/span><a href=\"https:\/\/github.com\/microsoft\/Build26-LAB540-observe-optimize-and-protect-your-hosted-agents-in-microsoft-foundry\"><span data-contrast=\"none\">aka.ms\/build26-LAB540<\/span><\/a><span data-ccp-props=\"{&quot;134233279&quot;:false,&quot;201341983&quot;:0,&quot;335559739&quot;:80,&quot;335559740&quot;:290}\">\u00a0<\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>9 min read \u00b7 June 3, 2026 \u00b7 Sebastian Kohlmeier\u00a0 \u00a0 Shipping an AI agent is the easy part. Keeping it\u00a0accurate, safe, and accountable in production is where teams get stuck. Agents are\u00a0non-deterministic. Their behavior shifts as models update, tools change, and traffic patterns\u00a0evolve\u00a0and most of that drift happens silently, long after the demo.\u00a0End-to-end observability\u00a0covering [&hellip;]<\/p>\n","protected":false},"author":190399,"featured_media":2602,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[163,1],"tags":[25,102,66,33,34,2],"class_list":["post-2601","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-build","category-microsoft-foundry","tag-agents","tag-azure-ai-foundry","tag-evaluations","tag-foundry","tag-microsoft-build","tag-microsoft-foundry"],"acf":[],"blog_post_summary":"<p>9 min read \u00b7 June 3, 2026 \u00b7 Sebastian Kohlmeier\u00a0 \u00a0 Shipping an AI agent is the easy part. Keeping it\u00a0accurate, safe, and accountable in production is where teams get stuck. Agents are\u00a0non-deterministic. Their behavior shifts as models update, tools change, and traffic patterns\u00a0evolve\u00a0and most of that drift happens silently, long after the demo.\u00a0End-to-end observability\u00a0covering [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2601","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/190399"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=2601"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2601\/revisions"}],"predecessor-version":[{"id":2666,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2601\/revisions\/2666"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/2602"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=2601"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=2601"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=2601"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}