September 16th, 2025
0 reactions

Announcing Computer Use tool (Preview) in Azure AI Foundry Agent Service

Linda Li
Product Manager

Overview 

We are excited to announce Computer Use—are now available in preview in Azure AI Foundry Agent Service. It brings feature parity with the Azure OpenAI Responses API, but with the added advantage of seamless integration into the Foundry agent runtime and enterprise security. With this release, developers can create agents that not only reason over text, retrieve knowledge, or call APIs, but also directly interact with computer interfaces through natural language instructions. At launch, it is accessible through REST API and SDK, giving developers the flexibility to embed them directly into their applications, pipelines, and automation workflows. 

computer use preview image

By adding Computer Use, Foundry Agent Service expands the scope of what an agent can achieve in multi-step conversations. If you want to navigate complex enterprise applications, it opens the door to new, creative, and highly practical scenarios for enterprise AI. 

 

Use cases 

Computer Use addresses distinct, high-value scenarios.  Computer Use allows agents to interact with applications in the same way a human user would. It enables agents that not only reason and respond but also take meaningful actions across digital environments. 

Computer Use 

  • Web & desktop automation: Fill out forms, upload/download artifacts, or complete tasks in apps without APIs. 
  • Operational copilots: Help employees triage tickets or manage workflows across multiple enterprise dashboards. 
  • Legacy integration: Interact with older desktop apps by simulating clicks and keystrokes. 
  • Human-in-the-loop workflows: Require users to approve sensitive or high-risk steps before they are executed. 

How it works 

Computer Use operates as a continuous loop of suggested actions, execution, and screenshot feedback. 

Computer Use 

  • Action loop: The agent requests actions (e.g., click, type, screenshot) from the computer-use-preview model. 
  • Execution environment: Your code performs the action in a controlled environment (browser or desktop) and captures a screenshot. 
  • Screenshot feedback: The screenshot is sent back to the model to inform the next step. 
  • Pixel-based reasoning: Unlike Browser Automation, Computer Use interprets raw pixels, enabling adaptation to unfamiliar or dynamic UIs. 
  • Safety checks: Malicious instructions, irrelevant domains, or sensitive domains trigger warnings that require human acknowledgment before proceeding. 

Compared with Browser Automation, Computer Use offers richer visualization, broader cross-application support, and more human-like interaction patterns. However, because of its power, we strongly recommend using it only on low-privilege virtual machines that do not contain sensitive data or credentials. 

  Browser Automation   Computer Use 
Model support   Most models supported by Foundry Agent Service Computer-use-preview model  
Visualize whats happening   No   Yes  
How it understands the screen   Parse the HTML or XML pages into DOM documents   Raw pixel data from screenshots  
How it acts   A list of actions provided by the model   Virtual keyword and mouse  
Multi-step?   Yes    Yes   
Interfaces    Browser    Computer and browser  
Can I bring my own resource   BYO Playwright resource and store keys in connection   No resource required, we recommend using low-privilege virtual machines 

 

Security and responsible use 

WARNING: 

The Computer Use tool comes with significant security and privacy risks, including prompt injection attacks. Learn more about intended uses, capabilities, limitations, risks, and considerations when choosing a use case in the Azure OpenAI transparency note.  

 

Code samples 

# pylint: disable=line-too-long,useless-suppression
# ------------------------------------
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
# ------------------------------------

"""
DESCRIPTION:
    This sample demonstrates how to use agent operations with the Computer Use tool (preview)
    using a synchronous client. This sample uses fake screenshot to demonstrate how output actions work,
    but the actual implementation would involve mapping the output action types to their corresponding
    API calls in the user's preferred managed environment framework (e.g. Playwright or Docker).

    NOTE: Usage of the computer-use-preview model currently requires approval. Please see
    https://learn.microsoft.com/azure/ai-foundry/openai/how-to/computer-use for more information.

USAGE:
    python sample_agents_computer_use.py

    Before running the sample:

    pip install azure-ai-agents --pre
    pip install azure-ai-projects azure-identity

    Set these environment variables with your own values:
    1) PROJECT_ENDPOINT - The Azure AI Project endpoint, as found in the Overview
       page of your Azure AI Foundry portal.
    2) MODEL_DEPLOYMENT_NAME - The deployment name of the AI model, as found under the "Name" column in
       the "Models + endpoints" tab in your Azure AI Foundry project.

    Optional:
    - To target a specific environment, set COMPUTER_USE_ENVIRONMENT to one of: windows, mac, linux, browser
      Otherwise defaults to 'browser'.
"""

import os, time, base64
from typing import List
from azure.ai.agents.models._models import ComputerScreenshot, TypeAction
from azure.ai.projects import AIProjectClient
from azure.ai.agents.models import (
    MessageRole,
    RunStepToolCallDetails,
    RunStepComputerUseToolCall,
    ComputerUseTool,
    ComputerToolOutput,
    MessageInputContentBlock,
    MessageImageUrlParam,
    MessageInputTextBlock,
    MessageInputImageUrlBlock,
    RequiredComputerUseToolCall,
    SubmitToolOutputsAction,
)
from azure.identity import DefaultAzureCredential


def image_to_base64(image_path: str) -> str:
    """
    Convert an image file to a Base64-encoded string.

    :param image_path: The path to the image file (e.g. 'image_file.png')
    :return: A Base64-encoded string representing the image.
    :raises FileNotFoundError: If the provided file path does not exist.
    :raises OSError: If there's an error reading the file.
    """
    if not os.path.isfile(image_path):
        raise FileNotFoundError(f"File not found at: {image_path}")

    try:
        with open(image_path, "rb") as image_file:
            file_data = image_file.read()
        return base64.b64encode(file_data).decode("utf-8")
    except Exception as exc:
        raise OSError(f"Error reading file '{image_path}'") from exc


asset_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot.jpg"))
action_result_file_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "../assets/cua_screenshot_next.jpg"))
project_client = AIProjectClient(endpoint=os.environ["PROJECT_ENDPOINT"], credential=DefaultAzureCredential())

# Initialize Computer Use tool with a browser-sized viewport
environment = os.environ.get("COMPUTER_USE_ENVIRONMENT", "windows")
computer_use = ComputerUseTool(display_width=1026, display_height=769, environment=environment)

with project_client:

    agents_client = project_client.agents

    # Create a new Agent that has the Computer Use tool attached.
    agent = agents_client.create_agent(
        model=os.environ["MODEL_DEPLOYMENT_NAME"],
        name="my-agent-computer-use",
        instructions="""
            You are an computer automation assistant. 
            Use the computer_use_preview tool to interact with the screen when needed.
            """,
        tools=computer_use.definitions,
    )

    print(f"Created agent, ID: {agent.id}")

    # Create thread for communication
    thread = agents_client.threads.create()
    print(f"Created thread, ID: {thread.id}")

    input_message = (
        "I can see a web browser with bing.com open and the cursor in the search box."
        "Type 'movies near me' without pressing Enter or any other key. Only type 'movies near me'."
    )
    image_base64 = image_to_base64(asset_file_path)
    img_url = f"data:image/jpeg;base64,{image_base64}"
    url_param = MessageImageUrlParam(url=img_url, detail="high")
    content_blocks: List[MessageInputContentBlock] = [
        MessageInputTextBlock(text=input_message),
        MessageInputImageUrlBlock(image_url=url_param),
    ]
    # Create message to thread
    message = agents_client.messages.create(thread_id=thread.id, role=MessageRole.USER, content=content_blocks)
    print(f"Created message, ID: {message.id}")

    run = agents_client.runs.create(thread_id=thread.id, agent_id=agent.id)
    print(f"Created run, ID: {run.id}")

    # create a fake screenshot showing the text typed in
    result_image_base64 = image_to_base64(action_result_file_path)
    result_img_url = f"data:image/jpeg;base64,{result_image_base64}"
    computer_screenshot = ComputerScreenshot(image_url=result_img_url)

    while run.status in ["queued", "in_progress", "requires_action"]:
        time.sleep(1)
        run = agents_client.runs.get(thread_id=thread.id, run_id=run.id)

        if run.status == "requires_action" and isinstance(run.required_action, SubmitToolOutputsAction):
            print("Run requires action:")
            tool_calls = run.required_action.submit_tool_outputs.tool_calls
            if not tool_calls:
                print("No tool calls provided - cancelling run")
                agents_client.runs.cancel(thread_id=thread.id, run_id=run.id)
                break

            tool_outputs = []
            for tool_call in tool_calls:
                if isinstance(tool_call, RequiredComputerUseToolCall):
                    print(tool_call)
                    try:
                        action = tool_call.computer_use_preview.action
                        print(f"Executing computer use action: {action.type}")
                        if isinstance(action, TypeAction):
                            print(f"  Text to type: {action.text}")
                            # (add hook to input text in managed environment API here)

                            tool_outputs.append(
                                ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot)
                            )
                        if isinstance(action, ComputerScreenshot):
                            print(f"  Screenshot requested")
                            # (add hook to take screenshot in managed environment API here)

                            tool_outputs.append(
                                ComputerToolOutput(tool_call_id=tool_call.id, output=computer_screenshot)
                            )
                    except Exception as e:
                        print(f"Error executing tool_call {tool_call.id}: {e}")

            print(f"Tool outputs: {tool_outputs}")
            if tool_outputs:
                agents_client.runs.submit_tool_outputs(thread_id=thread.id, run_id=run.id, tool_outputs=tool_outputs)

        print(f"Current run status: {run.status}")

    print(f"Run completed with status: {run.status}")
    if run.status == "failed":
        print(f"Run failed: {run.last_error}")

    # Fetch run steps to get the details of the agent run
    run_steps = agents_client.run_steps.list(thread_id=thread.id, run_id=run.id)
    for step in run_steps:
        print(f"Step {step.id} status: {step.status}")
        print(step)

        if isinstance(step.step_details, RunStepToolCallDetails):
            print("  Tool calls:")
            run_step_tool_calls = step.step_details.tool_calls

            for call in run_step_tool_calls:
                print(f"    Tool call ID: {call.id}")
                print(f"    Tool call type: {call.type}")

                if isinstance(call, RunStepComputerUseToolCall):
                    details = call.computer_use_preview
                    print(f"    Computer use action type: {details.action.type}")

                print()  # extra newline between tool calls

        print()  # extra newline between run steps

    # Optional: Delete the agent once the run is finished.
    agents_client.delete_agent(agent.id)
    print("Deleted agent")

Getting started 

To start using Computer Use in Foundry Agent Service: 

  • Request access 
  • Create deployments in your Azure OpenAI resource once access is approved. 
  • Configure your agent through REST API or SDK: 
    • Computer Use: specify environment (browser, windows, mac, ubuntu) → implement the action loop (execute actions, send screenshots, continue until complete). 
  • Apply best practices 
    • Run Computer Use agents only on low-privilege, isolated machines.

Learn more 

With these steps and resources, you can quickly enable agents that combine text reasoning, image creation, and real-world computer interaction. 

 

Author

Linda Li
Product Manager

0 comments