Protecting against indirect prompt injection attacks in MCP

In this blog post, we will provide some guidelines on how to mitigate prompt injection attacks in Model Context Protocol (MCP) and share the steps Microsoft has taken to address emerging risks such as this one for our customers.

MCP is an open protocol spearheaded by Anthropic that defines a standardized interface for connecting Large Language Models (LLMs) with external data sources and tools. Rather than relying on different integration paths across platforms, MCP provides:

A consistent API contract for passing context between applications and LLMs
Structured methods for tool calling and data retrieval
A common format for handling both inputs to and outputs from LLMs

From an implementation perspective, MCP solves a key problem in the AI development workflow: the need to rewrite integration code when switching between different model providers or when updating the application’s context-fetching functions.

For developers building agentic AI systems, MCP reduces boilerplate code and creates clearer separation between the application logic and the AI reasoning layer. This makes systems more maintainable and allows for easier swapping of underlying LLMs as the technology evolves, while keeping the data layer consistent.

Indirect Prompt Injection Attacks

As the protocol evolves and gains traction, developers and security researchers have identified a few scenarios where malicious MCP servers can expose customer data in ways organizations don’t expect. While these vulnerabilities are not new in the industry, their applicability to MCP and AI-based workloads warrants a closer look. These include Prompt Injection as well as a subset of this class of attacks, known as Tool Poisoning.

An Indirect Prompt Injection vulnerability (also known as cross-domain prompt injection or XPIA) is a security exploit targeting generative AI systems where malicious instructions are embedded in external content, such as documents, web pages, or emails. When the AI system processes external content, it misinterprets the embedded instructions as valid commands from the user, leading to unintended actions, such as data exfiltration, generation of harmful or misleading content, or manipulating subsequent user interactions.

A type of Indirect Prompt Injection attack known as Tool Poisoning is a vulnerability where an attacker embeds malicious instructions within the descriptions of MCP tools. In a MCP server, every tool has some metadata associated with it, such as a name and a description. LLMs use this metadata to determine which tools to invoke based on user input. Compromised descriptions can manipulate the model into executing unintended tool calls, bypassing security controls designed to protect the system. Malicious instructions in tool metadata are invisible to users but can be interpreted by the AI model and its underlying systems. This is particularly dangerous in hosted MCP server scenarios, where tool definitions can be dynamically amended to include malicious content later (called a “rug pull” by some researchers). A user that previously approved a tool for use with their LLM may end up with a tool that changed since their approval happened. In that context, the tool can perform actions that were originally not declared, such as data exfiltration or manipulation.

To mitigate the risks of an Indirect Prompt Injection attacks in your AI system, we recommend two approaches: implementing AI prompt shields, which help by analyzing prompts and tool interactions to prevent unexpected behaviors, and establishing robust supply chain security mechanisms, ensuring that only approved packages and applications are used within an organization.

Using prompt shields to protect against Indirect Prompt Injection attacks

AI Prompt Shields are a solution developed by Microsoft to defend against both direct and indirect prompt injection attacks. They help through:

Detection and Filtering: Prompt Shields use advanced machine learning algorithms and natural language processing to detect and filter out malicious instructions embedded in external content, such as documents, web pages, or emails.
Spotlighting: This technique helps the AI system distinguish between valid system instructions and potentially untrustworthy external inputs. By transforming the input text in a way that makes it more relevant to the model, Spotlighting ensures that the AI can better identify and ignore malicious instructions.
Delimiters and Datamarking: Including delimiters in the system message explicitly outlines the location of the input text, helping the AI system recognize and separate user inputs from potentially harmful external content. Datamarking extends this concept by using special markers to highlight the boundaries of trusted and untrusted data.
Continuous Monitoring and Updates: Microsoft continuously monitors and updates Prompt Shields to address new and evolving threats. This proactive approach ensures that the defenses remain effective against the latest attack techniques.

You can read more about AI prompt shields in the Prompt Shields documentation.

Supply chain security

Supply chain security fundamentals persist in the AI era: verify all components before integration (including models, not just code packages), maintain secure deployment pipelines, and implement continuous application and security monitoring. The difference isn’t in the security principles but in what constitutes your supply chain. It now extends to foundation models, embeddings services, and context providers. They require the same rigorous verification as traditional dependencies.

GitHub Advanced Security provides several features to enhance supply chain security, including secret, dependency, and CodeQL scanning. These tools are integrated into Azure DevOps and Azure Repos, allowing teams to identify and mitigate security vulnerabilities in their code and dependencies.

Microsoft also implements extensive supply chain security practices internally for all our products. You can read more about this in The Journey to Secure the Software Supply Chain at Microsoft.

Returning to security fundamentals

It is critical to be mindful that any AI implementation inherits the existing security posture of your organization’s environment. When considering the security of your AI systems it is highly recommended that you improve the overall organizational security posture. Year after year, security research shows that getting security fundamentals right is the most effective way to protect against any kind of breaches. Research published in the Microsoft Digital Defence Report states that 98% of reported breaches would be prevented by robust security hygiene, such as enabling MFA, applying least privilege, keeping devices, infrastructure, and applications up to date, and otherwise protecting important data.

Whilst using AI potentially increases the attack surface of an application, using tried and tested application security principles, supply chain security, and security hygiene fundamentals alongside Microsoft Azure AI Foundry built-in platform protection provides a path to use AI and AI agents safely.