{"id":5255,"date":"2026-04-24T08:00:14","date_gmt":"2026-04-24T15:00:14","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/agent-framework\/?p=5255"},"modified":"2026-04-24T08:00:14","modified_gmt":"2026-04-24T15:00:14","slug":"chat-history-storage-patterns-in-microsoft-agent-framework","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/agent-framework\/chat-history-storage-patterns-in-microsoft-agent-framework\/","title":{"rendered":"Chat History Storage Patterns in Microsoft Agent Framework"},"content":{"rendered":"<p>When people talk about building AI agents, they usually focus on models, tools, and prompts. In practice, one of the most important architectural decisions is much simpler: <strong>where does the conversation history live?<\/strong><\/p>\n<p><span data-teams=\"true\">Imagine a user asks your agent a complex question, clicks \u201ctry again,\u201d explores two different answers in parallel, and then comes back tomorrow expecting the agent to remember everything. Whether that experience is possible depends on the answer to this question.<\/span><\/p>\n<p>Your choice affects cost, privacy, portability, and the kinds of user experiences you can build. It also determines whether your application treats a conversation as a simple thread, a branchable tree, or just a list of messages you resend on every call.<\/p>\n<p>This article explores the fundamental patterns for chat history storage, how different AI services implement them, and how Microsoft Agent Framework abstracts these differences to give you flexibility without complexity.<\/p>\n<h2>Why Chat History Storage Matters<\/h2>\n<p>Every time a user interacts with an AI agent, the model needs context from previous messages to provide coherent, contextual responses. Without this history, each interaction would be isolated. The agent couldn&#8217;t remember what was discussed moments ago.<\/p>\n<p>The storage strategy you choose affects:<\/p>\n<ul>\n<li><strong>User experience<\/strong>: Can users resume conversations? Branch into different directions? Undo and try again?<\/li>\n<li><strong>Compliance<\/strong>: Where does conversation data live? Who controls it?<\/li>\n<li><strong>Architecture<\/strong>: How tightly coupled is your application to a specific provider?<\/li>\n<\/ul>\n<h2>The Two Fundamental Patterns<\/h2>\n<p>At the highest level, there are two approaches to managing chat history:<\/p>\n<h3>Service-Managed Storage<\/h3>\n<p>The AI service stores conversation state on its servers. Agent Framework holds a reference (like a conversation_id or thread_id) in the AgentSession, and the service automatically includes relevant history when processing requests.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-service-managed.webp\"><img decoding=\"async\" class=\"size-full wp-image-5259 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-service-managed.webp\" alt=\"chat history service managed image\" width=\"960\" height=\"260\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-service-managed.webp 960w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-service-managed-300x81.webp 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-service-managed-768x208.webp 768w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><\/a><\/p>\n<p><strong>Benefits:<\/strong><\/p>\n<ul>\n<li>Simpler client implementation<\/li>\n<li>Service handles context window management and compaction automatically<\/li>\n<li>Built-in persistence across sessions<\/li>\n<li>Lower per-request payload size (just a reference ID, not full history)<\/li>\n<\/ul>\n<p><strong>Tradeoffs:<\/strong><\/p>\n<ul>\n<li>Data lives on provider&#8217;s servers<\/li>\n<li>Less control over what context is included<\/li>\n<li>No control over compaction strategy &#8211; you can&#8217;t customize what gets summarized, truncated, or dropped<\/li>\n<li>Provider lock-in for conversation state<\/li>\n<\/ul>\n<h3>Client-Managed Storage<\/h3>\n<p>Agent Framework maintains the full conversation history locally (in the AgentSession or associated history providers) and sends relevant messages with each request. The service is stateless. It processes the request and forgets.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-client-managed.webp\"><img decoding=\"async\" class=\"size-full wp-image-5262 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-client-managed.webp\" alt=\"chat history client managed image\" width=\"960\" height=\"280\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-client-managed.webp 960w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-client-managed-300x88.webp 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-client-managed-768x224.webp 768w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><\/a><\/p>\n<p><strong>Benefits:<\/strong><\/p>\n<ul>\n<li>Full control over data location and privacy<\/li>\n<li>Easy to switch providers (no state migration)<\/li>\n<li>Explicit control over what context is sent<\/li>\n<li>Full control over compaction strategies &#8211; truncation, summarization, sliding window, tool-call collapse<\/li>\n<li>Can implement custom context strategies<\/li>\n<\/ul>\n<p><strong>Tradeoffs:<\/strong><\/p>\n<ul>\n<li>Larger request payloads<\/li>\n<li>Client must handle context window limits<\/li>\n<li>Must implement and maintain compaction strategies as conversations grow<\/li>\n<li>More complex client-side logic<\/li>\n<\/ul>\n<h2>Service-Managed Storage Models<\/h2>\n<p>Not all service-managed storage is equal. There are two distinct models that affect what you can build:<\/p>\n<h3>Linear (Single-Threaded) Conversations<\/h3>\n<p>This is the traditional chat model: messages form an ordered sequence. Each new message appends to the thread, and you can&#8217;t branch or fork the conversation.<\/p>\n<p><strong>Examples:<\/strong><\/p>\n<ul>\n<li>Microsoft Foundry Prompt Agents (conversations)<\/li>\n<li>OpenAI Responses with Conversations API (conversations)<\/li>\n<li>[DEPRECATED] OpenAI Assistants API (threads)<\/li>\n<\/ul>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-linear.webp\"><img decoding=\"async\" class=\"size-full wp-image-5260 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-linear.webp\" alt=\"chat history linear image\" width=\"520\" height=\"380\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-linear.webp 520w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-linear-300x219.webp 300w\" sizes=\"(max-width: 520px) 100vw, 520px\" \/><\/a><\/p>\n<p><strong>Good for:<\/strong><\/p>\n<ul>\n<li>Chatbots and support agents<\/li>\n<li>Simple Q&amp;A flows<\/li>\n<li>Scenarios requiring strict audit trails<\/li>\n<\/ul>\n<p><strong>Limitations:<\/strong><\/p>\n<ul>\n<li>Can&#8217;t &#8220;go back&#8221; and try a different response<\/li>\n<li>No parallel exploration of different conversation paths<\/li>\n<\/ul>\n<h3>Forking-Capable Conversations<\/h3>\n<p>Modern Responses APIs introduce a more flexible model: each response has a unique ID, and new requests can reference any previous response as the conversation continuation point.<\/p>\n<p><strong>Examples:<\/strong><\/p>\n<ul>\n<li>Microsoft Foundry Responses endpoint<\/li>\n<li>Azure OpenAI Responses API<\/li>\n<li>OpenAI Responses API<\/li>\n<\/ul>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-forking.webp\"><img decoding=\"async\" class=\"size-full wp-image-5261 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-forking.webp\" alt=\"chat history forking image\" width=\"840\" height=\"400\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-forking.webp 840w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-forking-300x143.webp 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-forking-768x366.webp 768w\" sizes=\"(max-width: 840px) 100vw, 840px\" \/><\/a><\/p>\n<p><strong>Good for:<\/strong><\/p>\n<ul>\n<li>Exploration and brainstorming applications<\/li>\n<li>A\/B testing different response strategies<\/li>\n<li>&#8220;Undo&#8221; and &#8220;try again&#8221; functionality<\/li>\n<li>Building tree-structured conversation UIs<\/li>\n<li>Agentic workflows where multiple paths may be explored<\/li>\n<\/ul>\n<h2>Client-Managed Storage Patterns<\/h2>\n<p>When the AI service doesn&#8217;t store conversation state, your application takes full responsibility. This is the pattern used by many providers.<\/p>\n<p><strong>Providers using this model:<\/strong><\/p>\n<ul>\n<li>Azure OpenAI Chat Completions<\/li>\n<li>OpenAI Chat Completions<\/li>\n<li>Anthropic Claude<\/li>\n<li>Ollama<\/li>\n<li>Most open-source model APIs<\/li>\n<\/ul>\n<h3>Implementation Considerations<\/h3>\n<p><strong>Context Window Management:<\/strong>\nYou can&#8217;t send unlimited history. As conversations grow, you&#8217;ll need strategies like:<\/p>\n<ul>\n<li>Truncating older messages<\/li>\n<li>Summarizing earlier parts of the conversation<\/li>\n<li>Selective inclusion based on relevance<\/li>\n<\/ul>\n<p><strong>Persistence:<\/strong>\nIn-memory history works for demos and development, but production applications almost always need a durable store &#8211; a database, Redis, blob storage, or similar. This adds infrastructure and operational complexity that service-managed storage avoids entirely.<\/p>\n<p><strong>Privacy Control:<\/strong>\nThe upside: conversation data never leaves your control unless you explicitly send it. This can be crucial for sensitive applications.<\/p>\n<h3>Compaction: The Hidden Complexity<\/h3>\n<p>When the service manages history, it also manages compaction &#8211; keeping the conversation context within the model&#8217;s token limits. You don&#8217;t have to think about it, but you also can&#8217;t control it.<\/p>\n<p>With client-managed history, compaction becomes your responsibility. As conversations grow, you need explicit strategies to prevent context window overflows and control costs. Common approaches include:<\/p>\n<ul>\n<li><strong>Truncation<\/strong> &#8211; Drop the oldest messages beyond a threshold<\/li>\n<li><strong>Sliding window<\/strong> &#8211; Keep only the most recent N turns<\/li>\n<li><strong>Summarization<\/strong> &#8211; Replace older messages with an LLM-generated summary<\/li>\n<li><strong>Tool-call collapse<\/strong> &#8211; Replace verbose tool call\/result pairs with compact summaries<\/li>\n<\/ul>\n<p>Agent Framework provides built-in compaction strategies for all of these patterns, so you don&#8217;t have to build them from scratch. But you do need to choose, configure, and maintain the right strategy for your use case &#8211; a tradeoff that doesn&#8217;t exist with service-managed storage.<\/p>\n<h2>How Agent Framework Handles the Differences<\/h2>\n<p>Microsoft Agent Framework provides a unified programming model that works regardless of which storage pattern the underlying service uses. This abstraction lives in two key components:<\/p>\n<h3>AgentSession: The Unified Conversation Container<\/h3>\n<p>Every conversation in Agent Framework is represented by an AgentSession. This object:<\/p>\n<ul>\n<li>Contains any service-specific identifiers (thread IDs, response IDs)<\/li>\n<li>Holds local state (for client-managed history scenarios). This may include:\n<ul>\n<li>The actual chat history<\/li>\n<li>Storage identifiers for a custom database chat history store<\/li>\n<\/ul>\n<\/li>\n<li>Provides serialization for persistence across application restarts<\/li>\n<\/ul>\n<pre class=\"prettyprint language-cs language-csharp\"><code class=\"language-cs language-csharp\">\/\/ C#\r\n\/\/ Create a session - works the same regardless of provider\r\nAgentSession session = await agent.CreateSessionAsync();\r\n\r\n\/\/ Use the session across multiple turns\r\nvar first = await agent.RunAsync(\"My name is Alice.\", session);\r\nvar second = await agent.RunAsync(\"What is my name?\", session);\r\n\r\n\/\/ The session handles the details:\r\n\/\/ - If service-managed: tracks the conversation_id internally\r\n\/\/ - If client-managed: accumulates history locally<\/code><code class=\"language-cs language-csharp\">\r\n<\/code><\/pre>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># Python\r\n# Create a session - works the same regardless of provider\r\nsession = agent.create_session()\r\n\r\n# Use the session across multiple turns\r\nfirst = await agent.run(\"My name is Alice.\", session=session)\r\nsecond = await agent.run(\"What is my name?\", session=session)<\/code><\/pre>\n<h3>ChatHistoryProvider: Pluggable Storage Backends<\/h3>\n<p>When you need client-managed storage, history providers allow you to control where history lives and how it&#8217;s retrieved:<\/p>\n<pre class=\"prettyprint language-cs language-csharp\"><code class=\"language-cs language-csharp\">\/\/ C#\r\n\/\/ Built-in in-memory provider (simplest and default option)\r\nAIAgent agent = chatClient.AsAIAgent(new ChatClientAgentOptions\r\n{\r\n    ChatOptions = new() { Instructions = \"You are a helpful assistant.\" },\r\n    ChatHistoryProvider = new InMemoryChatHistoryProvider()\r\n});\r\n\r\n\/\/ Custom database-backed provider (you implement)\r\nAIAgent agent = chatClient.AsAIAgent(new ChatClientAgentOptions\r\n{\r\n    ChatOptions = new() { Instructions = \"You are a helpful assistant.\" },\r\n    ChatHistoryProvider = new DatabaseChatHistoryProvider(dbConnection)\r\n});\r\n<\/code><\/pre>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># Python\r\nfrom agent_framework import InMemoryHistoryProvider\r\nfrom agent_framework.openai import OpenAIChatCompletionClient\r\n\r\n# Built-in in-memory provider (simplest and default option)\r\nagent = OpenAIChatCompletionClient().as_agent(\r\n    name=\"Assistant\",\r\n    instructions=\"You are a helpful assistant.\",\r\n    context_providers=[InMemoryHistoryProvider(\"memory\", load_messages=True)],\r\n)\r\n\r\n# Custom database-backed provider (you implement)\r\nagent = OpenAIChatCompletionClient().as_agent(\r\n  name=\"Assistant\",\r\n  instructions=\"You are a helpful assistant.\",\r\n  context_providers=[DatabaseHistoryProvider(db_client)],\r\n)\r\n<\/code><\/pre>\n<p><strong>Key design principle:<\/strong> Your application code doesn&#8217;t change when switching between service-managed and client-managed storage. The abstraction handles the details.<\/p>\n<h3>Transparent Mode Switching<\/h3>\n<p>Consider this scenario: you start with OpenAI Chat Completions (client-managed) and later want to try the Responses API (service-managed with forking). Your agent invocation code stays the same:<\/p>\n<pre class=\"prettyprint language-cs language-csharp\"><code class=\"language-cs language-csharp\">\/\/ C#\r\n\/\/ Works with Chat Completions (client-managed)\r\nvar response = await agent.RunAsync(\"Hello!\", session);\r\n\r\n\/\/ Also works with Responses API (service-managed)\r\nvar response = await agent.RunAsync(\"Hello!\", session);\r\n<\/code><\/pre>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># Python\r\n# Works with Chat Completions (client-managed)\r\nresponse = await agent.run(\"Hello!\", session=session)\r\n\r\n# Also works with Responses API (service-managed)\r\nresponse = await agent.run(\"Hello!\", session=session)<\/code><\/pre>\n<p>The session and provider handle the underlying differences. This decoupling is valuable for:<\/p>\n<ul>\n<li>Experimenting with different providers<\/li>\n<li>Migrating between services<\/li>\n<li>Building provider-agnostic applications<\/li>\n<\/ul>\n<h2>Provider Comparison<\/h2>\n<p>Most AI services have a fixed storage model &#8211; the service either stores history or it doesn&#8217;t. The Responses API is the notable exception: it&#8217;s configurable.<\/p>\n<h3>Fixed-Mode Providers<\/h3>\n<p>These providers operate in a single storage mode:<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Provider<\/strong><\/td>\n<td><strong>Storage Location<\/strong><\/td>\n<td><strong>Storage Model<\/strong><\/td>\n<td><strong>Compaction<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>OpenAI Chat Completion<\/strong><\/td>\n<td>Client<\/td>\n<td>N\/A<\/td>\n<td>Developer<\/td>\n<\/tr>\n<tr>\n<td><strong>Azure OpenAI Chat Completion<\/strong><\/td>\n<td>Client<\/td>\n<td>N\/A<\/td>\n<td>Developer<\/td>\n<\/tr>\n<tr>\n<td><strong>Foundry Agent Service<\/strong><\/td>\n<td>Service<\/td>\n<td>Linear (threads)<\/td>\n<td>Service<\/td>\n<\/tr>\n<tr>\n<td><strong>Anthropic Claude<\/strong><\/td>\n<td>Client<\/td>\n<td>N\/A<\/td>\n<td>Developer<\/td>\n<\/tr>\n<tr>\n<td><strong>Ollama<\/strong><\/td>\n<td>Client<\/td>\n<td>N\/A<\/td>\n<td>Developer<\/td>\n<\/tr>\n<tr>\n<td><strong>GitHub Copilot SDK<\/strong><\/td>\n<td>Service<\/td>\n<td>N\/A<\/td>\n<td>Service<\/td>\n<\/tr>\n<tr>\n<td><strong>[DEPRECATED] OpenAI Assistants<\/strong><\/td>\n<td>Service<\/td>\n<td>Linear (threads)<\/td>\n<td>Service<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Configurable: The Responses API<\/h3>\n<p>The Responses API (available from Microsoft Foundry, OpenAI, and Azure OpenAI) is a special case. It supports multiple storage modes controlled by configuration &#8211; primarily the store parameter:<\/p>\n<table>\n<tbody>\n<tr>\n<td><strong>Mode<\/strong><\/td>\n<td><strong>Configuration<\/strong><\/td>\n<td><strong>Storage Location<\/strong><\/td>\n<td><strong>Storage Model<\/strong><\/td>\n<td><strong>Compaction<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>Forking (default)<\/strong><\/td>\n<td>store=true<\/td>\n<td>Service<\/td>\n<td>Forking via response IDs<\/td>\n<td>Service<\/td>\n<\/tr>\n<tr>\n<td><strong>Client-managed<\/strong><\/td>\n<td>store=false<\/td>\n<td>Client<\/td>\n<td>N\/A<\/td>\n<td>Developer<\/td>\n<\/tr>\n<tr>\n<td><strong>Linear conversations<\/strong><\/td>\n<td>Conversations API<\/td>\n<td>Service<\/td>\n<td>Linear<\/td>\n<td>Service<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This makes the Responses API uniquely flexible:<\/p>\n<ul>\n<li><strong>store=true (default)<\/strong> &#8211; The service stores each response and its history. New requests can reference any prior response ID to continue from that point, enabling branching and forking. The service handles compaction.<\/li>\n<li><strong>store=false<\/strong> &#8211; The service is stateless. Agent Framework manages the full conversation history client-side using history providers &#8211; just like Chat Completions.<\/li>\n<li><strong>Conversations API<\/strong> &#8211; Built on top of Responses, this provides a linear thread model similar to Assistants. The service manages an ordered conversation and handles compaction. Pass a conversation id as input to responses instead of a previous response id, to enable this model.<\/li>\n<\/ul>\n<p><strong>Legend:<\/strong><\/p>\n<ul>\n<li><strong>Storage Location<\/strong>: Where the canonical conversation state lives &#8211; &#8220;Service&#8221; (on the provider&#8217;s servers) or &#8220;Client&#8221; (in Agent Framework&#8217;s session\/history providers).<\/li>\n<li><strong>Storage Model<\/strong>: For service-stored history, the shape &#8211; linear (thread) or forking (response IDs).<\/li>\n<li><strong>Compaction<\/strong>: Who keeps context within token limits. &#8220;Service&#8221; = automatic. &#8220;Developer&#8221; = you configure compaction strategies in Agent Framework.<\/li>\n<\/ul>\n<h3>Configuring Responses API Modes<\/h3>\n<p>Here&#8217;s how each mode looks in practice:<\/p>\n<h4>Mode 1: Forking with service storage (default)<\/h4>\n<p>This is the simplest setup &#8211; just create an agent from the Responses client. The service stores everything and supports forking via response IDs.<\/p>\n<pre class=\"prettyprint language-cs language-csharp\"><code class=\"language-cs language-csharp\">\/\/ C# - Responses API with store=true (default)\r\n\/\/ The service stores each response and its history.\r\n\/\/ Each response ID can be used as a fork point.\r\nAIAgent agent = new OpenAIClient(\"&lt;your_api_key&gt;\")\r\n    .GetResponseClient(\"gpt-5.4-mini\")\r\n    .AsAIAgent(\r\n    instructions: \"You are a helpful assistant.\",\r\n    name: \"ForkingAgent\");\r\n\r\nAgentSession session = await agent.CreateSessionAsync();\r\nvar response1 = await agent.RunAsync(\"What are three good vacation spots?\", session);\r\n\r\n\/\/ The session tracks the response ID internally.\r\n\/\/ A new session forked from response1 could explore a different branch.\r\n<\/code><\/pre>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># Python - Responses API with store=true (default)\r\n# The service stores each response and its history.\r\n# Each response ID can be used as a fork point.\r\nfrom agent_framework import Agent\r\nfrom agent_framework.openai import OpenAIChatClient\r\n\r\nagent = Agent(\r\n    client=OpenAIChatClient(),\r\n    name=\"ForkingAgent\",\r\n    instructions=\"You are a helpful assistant.\",\r\n)\r\n\r\nsession = agent.create_session()\r\nresponse1 = await agent.run(\"What are three good vacation spots?\", session=session)\r\n\r\n# The session tracks the response ID internally.\r\n# A new session forked from response1 could explore a different branch.<\/code><\/pre>\n<h4>Mode 2: Client-managed with store=false<\/h4>\n<p>Here you use the same Responses client but disable service-side storage. Agent Framework manages history client-side, giving you full control over persistence and compaction.<\/p>\n<pre class=\"prettyprint language-cs language-csharp\"><code class=\"language-cs language-csharp\">\/\/ C# - Responses API with store=false\r\n\/\/ The service is stateless - Agent Framework manages history.\r\nAIAgent agent = new OpenAIClient(\"&lt;your_api_key&gt;\")\r\n    .GetResponseClient(\"gpt-5.4-mini\")\r\n    .AsIChatClientWithStoredOutputDisabled()\r\n    .AsAIAgent(new ChatClientAgentOptions\r\n    {\r\n        ChatOptions = new() { Instructions = \"You are a helpful assistant.\" },\r\n        ChatHistoryProvider = new InMemoryChatHistoryProvider()\r\n    });\r\n\r\nAgentSession session = await agent.CreateSessionAsync();\r\nvar response = await agent.RunAsync(\"Hello!\", session);\r\n\/\/ History lives in the InMemoryChatHistoryProvider,\r\n\/\/ not on the service. You control compaction.<\/code><\/pre>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># Python - Responses API with store=false\r\n# The service is stateless - Agent Framework manages history.\r\nfrom agent_framework import Agent, InMemoryHistoryProvider\r\nfrom agent_framework.openai import OpenAIChatClient\r\n\r\nagent = Agent(\r\n    client=OpenAIChatClient(),\r\n    name=\"StatelessAgent\",\r\n    instructions=\"You are a helpful assistant.\",\r\n    default_options={\"store\": False},\r\n    context_providers=[InMemoryHistoryProvider(\"memory\", load_messages=True)],\r\n)\r\n\r\nsession = agent.create_session()\r\nresponse = await agent.run(\"Hello!\", session=session)\r\n# History lives in the InMemoryHistoryProvider,\r\n# not on the service. You control compaction.<\/code><\/pre>\n<h4>Mode 3: Linear conversations<\/h4>\n<p>The Conversations API builds on Responses to provide a linear thread model. You create a server-side conversation first, and then bootstrap your session with it. This gives you service-managed storage with a simple, ordered history &#8211; similar to the deprecated Assistants API.<\/p>\n<p>In C#, the FoundryAgent class provides a CreateConversationSessionAsync() convenience method that creates the server-side conversation and links it to a session in a single call:<\/p>\n<div>\n<pre class=\"prettyprint language-cs language-csharp\"><code class=\"language-cs language-csharp\">\/\/ C# \u2014 Responses API with Conversations (via Foundry)\r\n\/\/ CreateConversationSessionAsync() creates a server-side conversation\r\n\/\/ that persists on the Foundry service and is visible in the Foundry Project UI.\r\nAIProjectClient aiProjectClient = new(new Uri(endpoint), new DefaultAzureCredential());\r\n\r\nFoundryAgent agent = aiProjectClient\r\n    .AsAIAgent(\"gpt-5.4-mini\", instructions: \"You are a helpful assistant.\", name: \"ConversationAgent\");\r\n\r\n\/\/ One call creates the conversation and binds it to the session.\r\nChatClientAgentSession session = await agent.CreateConversationSessionAsync();\r\n\r\nConsole.WriteLine(await agent.RunAsync(\"What is the capital of France?\", session));\r\nConsole.WriteLine(await agent.RunAsync(\"What about Germany?\", session));\r\n\/\/ Both responses are part of the same linear conversation thread\r\n\/\/ managed by the service.<\/code><\/pre>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># Python \u2014 Responses API with Conversations (via Foundry)\r\n# Use get_session with a conversation id from the conversation service to link to\r\n# a server-side conversation.\r\nfrom agent_framework.foundry import FoundryChatClient\r\nfrom azure.identity import AzureCliCredential\r\nfrom agent_framework import Agent\r\n\r\nfoundry_client = FoundryChatClient(credential=AzureCliCredential())\r\nagent = Agent(\r\n    client=foundry_client,\r\n    instructions=\"You are a helpful assistant.\"\r\n)\r\n\r\n# Create a session with a conversation id from the conversations service\r\nconversation_result = await foundry_client.client.conversations.create()\r\nsession = agent.get_session(service_session_id=conversation_result.id)\r\n\r\nresponse1 = await agent.run(\"What is the capital of France?\", session=session)\r\nresponse2 = await agent.run(\"What about Germany?\", session=session)\r\n# Both responses are part of the same linear conversation thread\r\n# managed by the service.<\/code><\/pre>\n<div>\n<h2>Decision Tree<\/h2>\n<p>This decision tree demonstrates some of the main options available when choosing a chat history storage mechanism.<\/p>\n<\/div>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-decision-tree.webp\"><img decoding=\"async\" class=\"size-full wp-image-5364 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-decision-tree.webp\" alt=\"chat history decision tree image\" width=\"1140\" height=\"820\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-decision-tree.webp 1140w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-decision-tree-300x216.webp 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-decision-tree-1024x737.webp 1024w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2026\/04\/chat-history-decision-tree-768x552.webp 768w\" sizes=\"(max-width: 1140px) 100vw, 1140px\" \/><\/a><\/p>\n<div>\n<h2>Conclusion<\/h2>\n<p>Chat history storage might seem like an implementation detail, but it fundamentally shapes what your AI application can do. Understanding the tradeoffs between service-managed and client-managed patterns\u2014and between linear and forking models\u2014helps you make architectural decisions that align with your requirements.<\/p>\n<p>Microsoft Agent Framework&#8217;s session and provider abstractions give you the flexibility to start with one approach and evolve without rewriting your application logic. Whether you&#8217;re building a simple chatbot or a complex agentic system with branching conversations, the framework adapts to your chosen storage strategy.<\/p>\n<p>The key takeaway: choose based on your actual requirements (privacy, control, capabilities), not just what&#8217;s easiest to start with. The right storage pattern will make your application more capable and maintainable in the long run.<\/p>\n<p>For more details on implementing these patterns with Microsoft Agent Framework, see:<\/p>\n<ul>\n<li>The <a href=\"https:\/\/learn.microsoft.com\/agent-framework\/agents\/conversations\/\">Conversations &amp; Memory documentation<\/a><\/li>\n<li>Individual <a href=\"https:\/\/learn.microsoft.com\/agent-framework\/agents\/providers\/\">provider guides<\/a><\/li>\n<\/ul>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>When people talk about building AI agents, they usually focus on models, tools, and prompts. In practice, one of the most important architectural decisions is much simpler: where does the conversation history live? Imagine a user asks your agent a complex question, clicks \u201ctry again,\u201d explores two different answers in parallel, and then comes back [&hellip;]<\/p>\n","protected":false},"author":162052,"featured_media":5323,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143],"tags":[48,122,147],"class_list":["post-5255","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-agent-framework","tag-ai","tag-chat-history","tag-microsoft-agent-framework"],"acf":[],"blog_post_summary":"<p>When people talk about building AI agents, they usually focus on models, tools, and prompts. In practice, one of the most important architectural decisions is much simpler: where does the conversation history live? Imagine a user asks your agent a complex question, clicks \u201ctry again,\u201d explores two different answers in parallel, and then comes back [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/5255","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/users\/162052"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/comments?post=5255"}],"version-history":[{"count":2,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/5255\/revisions"}],"predecessor-version":[{"id":5365,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/5255\/revisions\/5365"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media\/5323"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media?parent=5255"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/categories?post=5255"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/tags?post=5255"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}