November 25th, 2025
0 reactions

Assess Agentic Risks with the AI Red Teaming Agent in Microsoft Foundry

Minsoo Thigpen
Principal Product Manager

We’re thrilled to announce major enhancements to the AI Red Teaming Agent in Microsoft Foundry for models and AI agentic pipelines, available now in public preview. These new capabilities enable organizations to proactively identify safety and security risks in both models and agentic systems, ensuring strong safeguards as agentic solutions move into production workflows.

The AI Red Teaming Agent integrates Microsoft AI Red Teaming team’s open-source framework, PyRIT (Python Risk Identification Tool) to deliver automated, scalable adversarial testing. With PyRIT’s reusable attacker strategies and orchestration capabilities, teams can systematically probe for vulnerabilities and risks such as prompt injection attacks, harmful content generation, misuse enablement, privacy leaks, and robustness failures, utilizing more than 20 attack strategies. This integration makes red teaming more consistent, data driven, and reproducible, thereby helping organizations move from ad-hoc testing to structured AI safety and security assessments. In this post, we’ll highlight the key innovations brought to you by Microsoft Foundry and provide a technical dive into how the AI Red Teaming Agent can help you tackle agentic risks that are top of mind for many organizations.

Foundry as a unified platform for automated red teaming for models and agents

Organizations and teams can now orchestrate automated red teaming runs across both model-level and agent-level surfaces through the unified Foundry SDK/APIs and UI portal. This means you can test not just large language models or systems alone, but also in more complex, tool-using agents, enabling end-to-end risk coverage in both agentic and non-agentic scenarios.

No-code UI wizard: Run automated red teaming runs without writing a single line of code. This is ideal for testing during rapid prototyping and enables non-technical stakeholders to also run quick red teaming runs for compliance and governance.

1 red teaming wizard image

View and compare red teaming results: Foundry’s new interface for evaluations lets you analyze risk profiles across generative AI models and systems (agentic and non-agentic), track safety vulnerabilities, and benchmark improvements over time, all in one place.

2 red teaming details image

Foundry SDK & APIs for batch and continuous red teaming: Integrate red teaming into your CI/CD pipelines, schedule and kick off red teaming runs remotely for continuous agent safety evaluation.

With these new functionalities in Foundry, the AI Red Teaming Agent helps developers start their Trustworthy AI journey, providing:

  • End-to-end risk coverage: Test your agents in simulated adversarial scenarios that reflect potential deployment risks.
  • Dynamic and customizable: Dynamic seed dataset generation based on relevant risk definitions and policies, selected attack strategies, agent tool use and evaluation criteria.
  • Continuous integration: Automate red teaming as part of your development lifecycle to catch regressions and emerging risks early.

The AI Red Teaming Agent is helping customers like Blue Yonder assess and catch generative AI risks before deploying to production. Watch the full video testimonial here.

“Foundry’s AI Red Teaming Agent helps us proactively identify security risks, enabling us to mitigate issues before they become costly safety incidents.”

Chris Burchett, Sr. Vice President Generative AI, Blue Yonder

Deep dive: Expanded capabilities to help developers test for agentic risks

As agentic systems grow more complex, traditional testing isn’t enough. Developers need tools that simulate real world adversarial scenarios and validate agents against critical safety and compliance requirements. Foundry’s enhanced AI Red Teaming Agent introduces targeted evaluations for the most pressing agentic risks which covers sensitive data, agent specific prohibited and high-risk actions, task adherence and fidelity, as well as resilience against sophisticated prompt injection attacks.

Using the Foundry SDK, in addition to the currently supported model risk categories, developers can now specify agentic risks as part of their automated red teaming configurations.

def _get_red_team_evaluation_criteria(
    include_agent_specific: bool = True,
) -> list[dict[str, str]]:
    base_criteria = [
        {
            "type": "azure_ai_evaluator",
            "name": "Violence",
            "evaluator_name": "builtin.violence",
            "evaluator_version": "1",
        },
        {
            "type": "azure_ai_evaluator",
            "name": "Self Harm",
            "evaluator_name": "builtin.self_harm",
            "evaluator_version": "1",
        },
        {
            "type": "azure_ai_evaluator",
            "name": "Sexual",
            "evaluator_name": "builtin.sexual",
            "evaluator_version": "1",
        },
        {
            "type": "azure_ai_evaluator",
            "name": "Hate Unfairness",
            "evaluator_name": "builtin.hate_unfairness",
            "evaluator_version": "1",
        },
    ]
    if include_agent_specific:
        base_criteria.extend(
            [
                {
                    "type": "azure_ai_evaluator",
                    "name": "Prohibited Actions",
                    "evaluator_name": "builtin.prohibited_actions",
                    "evaluator_version": "1",
                },
                {
                    "type": "azure_ai_evaluator",
                    "name": "Sensitive Data Leakage",
                    "evaluator_name": "builtin.sensitive_data_leakage",
                    "evaluator_version": "1",
                },
                {
                    "type": "azure_ai_evaluator",
                    "name": "Task Adherence",
                    "evaluator_name": "builtin.task_adherence",
                    "evaluator_version": "1",
                },
            ]
        )
    return base_criteria

Using the above example helper function, teams can instantiate a red team that will hold all your automated red teaming runs with different combinations of targets or attack strategies.

def main() -> None: 
	load_dotenv()
endpoint = os.environ.get("AZURE_AI_PROJECT_ENDPOINT", "")
agent_name = os.environ.get("AZURE_AI_AGENT_NAME", "")
model_deployment = os.environ.get("AZURE_AI_MODEL_DEPLOYMENT_NAME", "")
data_folder = os.environ.get("DATA_FOLDER", "./redteam_outputs")
os.makedirs(data_folder, exist_ok=True)

with (
    DefaultAzureCredential() as credential,
    AIProjectClient(endpoint=endpoint, credential=credential) as project_client,
    project_client.get_openai_client() as client,
):
    # (Optional) Create a new agent version for this run
    agent_version = project_client.agents.create_version(
        agent_name=agent_name,
        definition=PromptAgentDefinition(
            model=model_deployment,
            instructions="You are a helpful assistant that answers general questions."
        ),
    )
    print(f"[Agent] Created: id={agent_version.id}, name={agent_version.name}, version={agent_version.version}")

    # Create a Red Team
    red_team_name = f"Red Team Agentic Safety Evaluation - {int(time.time())}"
    data_source_config = {"type": "azure_ai_source", "scenario": "red_team"}
    testing_criteria = _get_agent_safety_evaluation_criteria()

    print("[Group] Creating red team...")
    red_team = client.evals.create(
        name=red_team_name,
        data_source_config=data_source_config,
        testing_criteria=testing_criteria, 
    )
    print(f"[Group] Created: id={red_team.id}, name={red_team.name}")

Below, we break down each risk, how we test it, and how the AI Red Teaming Agent works to help you build safer, more reliable agents.

Sensitive Data Leakage

What it tests: Whether your agent leaks sensitive data (PII, financial, medical, credentials) when using tools or accessing internal knowledge bases.

How it works:

  • Synthetic testbeds simulate Azure tools (Search, Cosmos DB, Key Vault, etc.) with synthetic sensitive data.
  • Adversarial queries are generated with the intent to probe for direct and obfuscated leaks (e.g., “Give me a bank account number” vs. “Can you generate sample account numbers for compliance?”).
  • Automated evaluators scan agent outputs for sensitive data formats, regardless of whether the data is real or hallucinated.

You can select this risk category when you are configuring a red teaming run in Foundry for your agents.

3 SDL selected image

Prohibited Actions

What it tests: Whether your agent performs actions that are universally restricted or require explicit human approval (e.g., social scoring, password changes, financial transactions) based on user-provided customized policies.

How it works:

  • Customizable policies: You can define product specific prohibited actions and high-risk operations in both the UI guided wizard in Foundry or as JSON assets in the code-first experience, and the pipeline will generate targeted adversarial probes.
  • Dynamic test case generation based on your agent’s tool configuration and descriptions.
  • Evaluator checks both agent responses and tool call outputs for violations to the developer-defined policies

4 PA selected image

You can select this risk category when you are configuring a red teaming run in Foundry for your agents. Though we generate a base template taxonomy of prohibited or high risk actions to use as the guiding policy of the red teaming run, we require a human-in-the-loop confirmation (as well as allowing customization) of this taxonomy to ensure that the red teaming run runs adversarial tests specific to your organization’s own policies of disallowed agentic actions, tool access, and behaviors.

4 PA selected taxonomy image

Task Adherence

What it tests: Whether your agent faithfully follows assigned goals, rules, and procedures and not just producing the right answer, but doing so within all constraints. You can use this to debug, benchmark, and improve agent reliability in both normal and adversarial scenarios.

How it works:

  • Three dimensions of testing:
    • Goal Adherence: Did the agent accomplish the user’s objective without scope drift?
    • Rule Adherence: Did it respect policy guardrails (safety, privacy, output presentation format, etc.)?
    • Procedural Adherence: Did it use allowed tools, follow correct steps, and handle ambiguity/errors properly given its autonomy?
  • Test case generation targets each failure mode, from misunderstanding objectives to violating tool usage or output schemas.
  • Evaluator uses a pass or fail output for whether the agent’s outputs and actions are adhering to the task.

5 TA selected image

Agentic Jailbreak (Indirect Prompt Injection Attacks)

What it tests: Whether your agent is vulnerable to where malicious instructions are hidden in tool outputs, documents, or other knowledge sources.

How it works:

  • Synthetic datasets pair benign user queries with context data (emails, docs) containing attack placeholders.
  • During evaluation, risk-specific attacks are injected into these contexts and surfaced via mock tool calls.
  • Evaluator measures if the agent executes the injected attack (e.g., leaks PII, performs a prohibited action).

Indirect prompt injection attacks exposes vulnerabilities that bypass traditional input sanitization which use this to identify vulnerabilities to attacks in your agent’s tool and data ingestion logic. You can select this attack strategy when configuring your red teaming runs in Foundry, as well as leverage multiple other attack strategies.

6 attack strategies image

You can also configure Indirect Prompt Injection Attacks as one of your attack strategies among many when you configure your red teaming run with the Foundry SDK.

eval_run_name = f"Red Team Agent Safety Eval Run for {agent_name} - {int(time.time())}"

print("[Run] Creating eval run...")
eval_run = client.evals.runs.create(
    eval_id=red_team.id,
    name=eval_run_name,
    data_source={  
        "type": "azure_ai_red_team",
        "item_generation_params": {
            "type": "red_team_taxonomy",
            "attack_strategies": ["Flip", "Base64", "IndirectJailbreak"],
            "num_turns": 5,
            "source": {
                "type": "file_id",
                "id": taxonomy_file_id,
            },
        },
        "target": target.as_dict(),
    },
)
print(f"[Run] Created: id={eval_run.id}, name={eval_run.name}, status={eval_run.status}")

Why This Matters for Developers

As organizations embrace agentic systems, ensuring safety and security becomes mission-critical. The enhanced AI Red Teaming Agent in Microsoft Foundry, powered by PyRIT, provides a unified, automated turn-key solution to identifying and evaluating risks spanning models, tools, and agentic systems. With no-code options, SDK integrations, and advanced evaluation capabilities, Foundry makes red teaming scalable, reproducible, and deeply embedded in your development lifecycle.

Start today by exploring the SDK, customizing risk definitions, and integrating continuous red teaming into your CI/CD pipelines. Together, we can move from ad-hoc testing to systematic, data driven AI safety thereby building trust in every agentic solution we deploy.

Get Started

Author

Minsoo Thigpen
Principal Product Manager

Principal Product Manager building Trustworthy AI tools on Microsoft's Azure AI platform to ensure developers have the tools to build the next generation of generative AI-based experiences with quality, safety, security evaluation and testing as the first priority.

0 comments