May 19th, 2025
0 reactions

Achieve End-to-End Observability in Azure AI Foundry

Today, we’re thrilled to launch the public preview of Azure AI Foundry Observability, the first unified solution for governance, evaluation, tracing, and monitoring — all built into your AI development loop. From model selection to real-time debugging, our observability capabilities empower teams to ship production-grade AI with confidence and speed. 

Figure 1: Observability aligned with the end-to-end AI application development workflow.

See Everything, From Prototype to Production 

Foundry Observability brings continuous visibility across your entire AI application lifecycle. Whether you are prototyping, actively developing with CI/CD pipelines — we provide the capabilities you need to assess, monitor, scale and optimize your AI agents.

Kickstart Development 

AI Governance, Streamlined 

We’re bringing responsible AI front and center with new governance integrations in Azure AI Foundry. Now you can connect with Microsoft Purview Compliance Manager,  Credo AI and Saidot, to define evaluation plans aligned with frameworks like the EU AI Act — and run them directly via the Azure AI Evaluation SDK. No guesswork, just streamlined, audit-ready governance built into your dev workflow. 

To dive deeper into how these integrations work in practice, check out our AI governance blog post. 

Leaderboards That Lead the Way 

Choosing the right model just got easier. Azure AI Foundry’s new leaderboards let you compare foundation models by quality, cost, and performance — all backed by industry benchmarks. Visualize trade-offs, explore scenario-based rankings, and dive into quality, performance, and cost metrics to enhance your “model shopping” experience. Fast, confident model selection starts here. 

Figure 2: Model leaderboards UI in the Azure AI Foundry portal.

Evaluate and Debug with Traces in the Agents Playground See Inside Your Agent 

The Agents Playground now comes with built-in evaluation and tracing — so you can test, debug, and improve your agents in one place. Quality checks run by default, safety checks are just a toggle away, and every result is trace-linked for full visibility into tool calls, inputs, outputs, and metrics. 

image 32 image

 

Phase 2: Transition to Code 

Evaluate What Matters 

We’ve supercharged agent evaluation in Azure AI Foundry. You can now directly assess agent thread messages using built-in metrics like:

Intent Resolution Measures how accurately the agent identifies and addresses user intentions.
Task Adherence Measures how well the agent follows through on identified tasks.
Tool Call Accuracy Measures how well the agent selects and calls the correct tools.
Response Completeness Measures to what extent the response is complete (not missing critical information) with respect to the ground truth.

No extra parsing needed — just plug in and go, even if you’re building outside Azure AI Agent Service. 

And with our new integrations with Azure OpenAI Graders, you get even more precision: 

  • Label Grader
  • String Checker
  • Text Similarity
  • Custom General Grader 

Together, these tools give you a full-spectrum view of agent quality and safety — from prototype to production. 

image 33 image

Scan for Vulnerabilities with AI Red Teaming Agent 

Meet the Azure AI Foundry AI Red Teaming Agent — your built-in defense against unsafe AI. Powered by Microsoft’s open-source PyRIT, it simulates adversarial attacks to uncover vulnerabilities before you ship. 

  • Scan for content safety risks automatically 
  • Measure exposure with metrics like Attack Success Rate (ASR) 
  • Generate detailed readiness reports 

No specialized expertise required. Just plug it into your workflow and build with confidence. 

For a deeper dive into the capabilities and implementation details of the AI Red Teaming Agent, check out our dedicated AI Red Teaming blog post. 

metric dashboard red team image detailed metrics results image

CI/CD-Ready from Day One 

Azure AI Foundry now plugs straight into your CI/CD workflows. With our GitHub Action and Azure DevOps Extension, you can: 

  • Auto-evaluate agents on every commit
  • Compare versions with built-in quality, performance, and safety metrics
  • Get confidence intervals and significance tests to back your decisions 

It’s continuous evaluation, made continuous.

 

Phase #3: Operate in Production 

Monitor in Production, Effortlessly 

Once your agent is live, Azure AI Foundry keeps watch and enables continuous monitoring. A unified dashboard tracks performance, quality, safety, and resource usage — all in real time. 

  • Run continuous evaluations on live traffic (e.g., 10 per hour) 
  • Set alerts in Azure Monitor to catch drift or regressions 
  • Link directly to Azure Monitor Application Insights for full-stack visibility 

From metrics to traces, you’ve got everything you need to stay ahead of issues. 

Monitoring v2 image

This unified dashboard above is powered by Azure Monitor Application Insights and Azure Workbooks, which allows you to monitor app performance in the broader context of your infrastructure. You can navigate seamlessly from Foundry Observability to Azure Monitor for advanced monitoring capabilities, such as the ability to customize monitoring dashboards and set up alerts for advanced diagnostics and incident response. 

Trace Every Evaluation 

With tracing enabled, every evaluation result is mapped to a trace — giving you full visibility into your agent’s execution flow. From LLM inference to tool calls, inputs, outputs, and metrics, you can debug regressions (like groundedness drops) with precision and speed. 

 

Pricing 

AI-assisted evaluations and monitoring, risk and safety, incur charges of:

  • $20/1M input tokens
  • $60/1M output tokens

For all other evaluation metrics (NLP metrics), see compute costs.

Prices are estimates only and are not intended as actual price quotes. Actual pricing may vary depending on the type of agreement entered with Microsoft, date of purchase, and the currency exchange rate. Prices are calculated based on US dollars and converted using London closing spot rates that are captured in the two business days prior to the last business day of the previous month end. If the two business days prior to the end of the month fall on a bank holiday in major markets, the rate setting day is generally the day immediately preceding the two business days. This rate applies to all transactions during the upcoming month. Sign in to the Azure pricing calculator to see pricing based on your current program/offer with Microsoft. Contact an Azure sales specialist for more information on pricing or to request a price quote. See frequently asked questions about Azure pricing.

Additional Resources 

Author

Sebastian Kohlmeier
Principal PM Manager
Mehrnoosh Sameki
Principal PM Manager

0 comments