{"id":2122,"date":"2026-04-16T12:34:46","date_gmt":"2026-04-16T19:34:46","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=2122"},"modified":"2026-04-16T12:34:46","modified_gmt":"2026-04-16T19:34:46","slug":"whats-new-in-foundry-finetune-april-2026","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/whats-new-in-foundry-finetune-april-2026\/","title":{"rendered":"What&#8217;s New in Microsoft Foundry Fine-Tuning | April 2026"},"content":{"rendered":"<p>This month we&#8217;re shipping three updates that make Reinforcement Fine-Tuning (RFT) more accessible, more powerful, and easier to get right:<\/p>\n<ol>\n<li><strong>Global Training for o4-mini<\/strong> \u2014 train from 13+ Azure regions at lower per-token rates.<\/li>\n<li><strong>New model graders<\/strong> \u2014 GPT-4.1, GPT-4.1-mini, and GPT-4.1-nano are now available as model graders, giving you more flexibility and cost control when scoring model outputs.<\/li>\n<li><strong><a href=\"https:\/\/github.com\/microsoft-foundry\/fine-tuning\/blob\/main\/Demos\/Agentic_RFT_PrivatePreview\/RFT_Best_Practice.md\">RFT best practices<\/a><\/strong> \u2014 a distilled guide to help you design graders, prepare data, and avoid common pitfalls.<\/li>\n<\/ol>\n<p>Read on for the details.<\/p>\n<hr \/>\n<h2>Global Training for o4-mini<\/h2>\n<p>Global Training expands the reach of model customization with the affordable pricing of our other Global offerings. With this update, <strong>o4-mini<\/strong> joins the list of models you can train globally:<\/p>\n<ul>\n<li><strong>Train from anywhere<\/strong> \u2014 launch fine-tuning jobs for o4-mini from <strong>13<\/strong> Azure regions today, expanding to all finetuning regions by end of April.<\/li>\n<li><strong>Save on training costs<\/strong> \u2014 benefit from lower per-token training rates compared to Standard training.<\/li>\n<li><strong>Same quality, broader reach<\/strong> \u2014 identical training infrastructure and model quality regardless of the region you start from.<\/li>\n<\/ul>\n<p><strong>Currently available regions:<\/strong> East US 2, North Central US, West US 3, Australia East, France Central, Germany West Central, Switzerland North, Norway East, Poland Central, Spain Central, Italy North, Switzerland West, and Sweden Central.<\/p>\n<p>o4-mini is one of the most popular models for reasoning-intensive and agentic workloads. Adding Global Training support makes it significantly more cost-effective to customize at scale\u2014especially for teams spread across multiple geographies.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2026\/04\/ft-o4mini-global-training.webp\" alt=\"o4-mini Global training\" \/><\/p>\n<h3>Create an o4-mini Global Training Job via REST API<\/h3>\n<pre><code class=\"language-bash\">curl -X POST \"https:\/\/&lt;your-resource&gt;.openai.azure.com\/openai\/fine_tuning\/jobs?api-version=2025-04-01-preview\" \\\r\n  -H \"Content-Type: application\/json\" \\\r\n  -H \"api-key: $AZURE_OPENAI_API_KEY\" \\\r\n  -d '{\r\n    \"model\": \"o4-mini\",\r\n    \"training_file\": \"&lt;your-training-file-id&gt;\",\r\n    \"method\": {\r\n      \"type\": \"reinforcement\",\r\n      \"reinforcement\": {\r\n        \"grader\": {\r\n          \"type\": \"string_check\",\r\n          \"name\": \"answer-check\",\r\n          \"input\": \"{{sample.output_text}}\",\r\n          \"reference\": \"{{item.reference_answer}}\",\r\n          \"operation\": \"eq\"\r\n        }\r\n      }\r\n    },\r\n    \"hyperparameters\": {\r\n      \"n_epochs\": 2,\r\n      \"compute_multiplier\": 1.0\r\n    },\r\n    \"trainingType\": \"globalstandard\"\r\n  }'<\/code><\/pre>\n<div class=\"d-flex\"><a class=\"cta_button_link btn-secondary\" href=\"https:\/\/learn.microsoft.com\/azure\/foundry\/openai\/how-to\/fine-tuning?tabs=oai-sdk&amp;pivots=programming-language-studio\" target=\"_blank\" rel=\"noopener\">Learn\u00a0more\u00a0about\u00a0fine-tuning<\/a><\/div>\n<hr \/>\n<h2>New Model Graders: GPT-4.1, GPT-4.1-mini, and GPT-4.1-nano<\/h2>\n<p>Graders are the engine of RFT\u2014they define the reward signal your model optimizes against. Until now, model-based graders were limited to a smaller set of models. Starting this month, three additional models are available as graders:<\/p>\n<ul>\n<li><strong>GPT-4.1<\/strong><\/li>\n<li><strong>GPT-4.1-mini<\/strong><\/li>\n<li><strong>GPT-4.1-nano<\/strong><\/li>\n<\/ul>\n<h3>When to Use Model Graders<\/h3>\n<p>Deterministic graders (string-match, Python, endpoint-based) should remain your default\u2014they are faster, cheaper, and more reproducible. Reach for model graders when:<\/p>\n<ul>\n<li>The task output is <strong>open-ended or subjective<\/strong> (e.g., summarization quality, tone adherence, multi-step reasoning coherence) and cannot be reduced to a simple string check.<\/li>\n<li>You need to score <strong>partial credit<\/strong> across multiple dimensions\u2014such as factual accuracy, completeness, and safety\u2014in a single grading pass.<\/li>\n<li>You are building an <strong>agentic workflow<\/strong> where tool-call correctness depends on semantic context that pattern matching cannot capture.<\/li>\n<\/ul>\n<h3>Choosing the Right Model Grader<\/h3>\n<ul>\n<li><strong>Start with GPT-4.1-nano<\/strong> for initial iterations\u2014its low cost lets you run more experiments and faster feedback loops.<\/li>\n<li><strong>Upgrade to GPT-4.1-mini<\/strong> once your grading rubric is stable and you need higher fidelity.<\/li>\n<li><strong>Reserve GPT-4.1<\/strong> for production grading or complex rubrics where every scoring decision counts.<\/li>\n<\/ul>\n<blockquote><p><strong>Tip:<\/strong> You can mix grader types within a single RFT job. For example, use a string-match grader for the &#8220;correct answer&#8221; dimension and a GPT-4.1-mini model grader for evaluating the &#8220;reasoning quality&#8221; dimension.<\/p><\/blockquote>\n<hr \/>\n<h2>Reinforcement Fine-Tuning Best Practices<\/h2>\n<p>Whether you are using the new model graders or deterministic ones, the following best practices will help you get the most out of RFT.<\/p>\n<h3>When to Use RFT<\/h3>\n<p>RFT improves reasoning accuracy and decision quality in tasks where outputs can be clearly evaluated and scored. It is especially effective when:<\/p>\n<ul>\n<li><strong>Tool-calling accuracy<\/strong> matters \u2014 the model must select and invoke the right tools with correct parameters.<\/li>\n<li><strong>Policy or rubric enforcement<\/strong> \u2014 outputs need to follow specific business rules that a grader can validate.<\/li>\n<li><strong>Structured data extraction<\/strong> \u2014 correctness is unambiguous and can be scored deterministically.<\/li>\n<\/ul>\n<blockquote><p><strong>Not a fit for style or tone.<\/strong> If you need formatting, voice, or stylistic adjustments, prefer prompt engineering, structured outputs, or supervised fine-tuning (SFT).<\/p><\/blockquote>\n<h3>Step 1: Define the Objective<\/h3>\n<p>Start by clearly stating the task and what success looks like. Then design a grader that reflects real task quality as reliably as possible. The grader is the primary driver of RFT success\u2014invest disproportionate effort here.<\/p>\n<h3>Step 2: Establish a Baseline<\/h3>\n<p>Before training, run a baseline evaluation on a small set of examples (10\u2013100 samples) so you understand starting performance and can measure real improvement. Evaluate using a base model (for example, o4-mini) and experiment with system prompts to reach the best possible performance before fine-tuning.<\/p>\n<div class=\"d-flex\"><a class=\"cta_button_link btn-secondary\" href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/foundry\/how-to\/evaluate-generative-ai-app\" target=\"_blank\" rel=\"noopener\">Learn\u00a0about\u00a0Foundry\u00a0Evaluation<\/a><\/div>\n<h3>Step 3: Design Effective Graders<\/h3>\n<p>The grader determines what the model optimizes for. Follow these principles:<\/p>\n<ul>\n<li><strong>Use the simplest grader that works.<\/strong> If validating an exact-match answer (a number, a multiple-choice letter), use a string-match grader rather than a model-based or Python grader.<\/li>\n<li><strong>Prefer deterministic checks.<\/strong> String validation, code\/Python-based graders, and endpoint-based graders are more reliable than model-based grading.<\/li>\n<li><strong>Aim for well-distributed rewards.<\/strong> Rewards that are too sparse or too uniform produce weak learning signals that limit model improvement.<\/li>\n<li><strong>Validate on diverse, real-world inputs.<\/strong> Validate the grader on diverse, real world inputs rather than relying only on synthetic data.<\/li>\n<\/ul>\n<h3>Step 4: Start Small and Iterate<\/h3>\n<p>Begin with small datasets (10\u2013100 samples), simple graders, and low epoch counts. A practical workflow:<\/p>\n<ol>\n<li>Start with <strong>o4-mini RFT<\/strong> to validate the end-to-end setup and grader behavior.<\/li>\n<li>Graduate to larger models once the reward signal and training loop look healthy.<\/li>\n<li>Change one variable at a time so gains or regressions can be clearly attributed.<\/li>\n<\/ol>\n<h3>Step 5: Tune Training Parameters<\/h3>\n<p>Expect <code>epoch count<\/code> and <code>compute_multiplier<\/code> to have the most impact on quality. Adjust one at a time and monitor the reward trend and variance throughout training.<\/p>\n<hr \/>\n<h2>RFT Data Format<\/h2>\n<p>RFT requires a different data format from SFT. The final message in each row must be a <strong>User<\/strong> or <strong>Developer<\/strong> role\u2014not Assistant.<\/p>\n<p><strong>SFT format<\/strong> (answer in the assistant message):<\/p>\n<pre><code class=\"language-json\">{\r\n  \"messages\": [\r\n    { \"role\": \"system\", \"content\": \"Reply to the user's question as accurately as possible.\" },\r\n    { \"role\": \"user\", \"content\": \"Question: What is the capital of France?\" },\r\n    { \"role\": \"assistant\", \"content\": \"Paris\" }\r\n  ]\r\n}<\/code><\/pre>\n<p><strong>RFT format<\/strong> (answer moved to a top-level key for the grader):<\/p>\n<pre><code class=\"language-json\">{\r\n  \"messages\": [\r\n    { \"role\": \"developer\", \"content\": \"Reply to the user's question as accurately as possible.\" },\r\n    { \"role\": \"user\", \"content\": \"Question: What is the capital of France?\" }\r\n  ],\r\n  \"reference_answer\": \"Paris\"\r\n}<\/code><\/pre>\n<p>The <code>reference_answer<\/code> (or any custom top-level key) can be referenced in the grader as <code>item.reference_answer<\/code>.<\/p>\n<hr \/>\n<h2>Common Pitfalls<\/h2>\n<h3>Data and Grader Mismatch<\/h3>\n<p>Every key referenced in your grader (e.g., <code>item.reference_answer<\/code>) must exist in <strong>all<\/strong> data rows. If your grader references <code>item.capital<\/code> but your data uses <code>reference_answer<\/code>, the job will fail silently or score incorrectly.<\/p>\n<p><strong>Example of a mismatched grader:<\/strong><\/p>\n<pre><code class=\"language-json\">{\r\n  \"type\": \"string_check\",\r\n  \"name\": \"answer-check\",\r\n  \"input\": \"{{sample.output_text}}\",\r\n  \"reference\": \"{{item.capital}}\",\r\n  \"operation\": \"eq\"\r\n}<\/code><\/pre>\n<p>If your data uses <code>reference_answer<\/code> instead of <code>capital<\/code>, update the grader reference to <code>{{item.reference_answer}}<\/code>.<\/p>\n<h3>Missing Response Format<\/h3>\n<p>To reference <code>sample.output_json<\/code> in your grader, you must provide a response format in the job definition. Without it, the model outputs free-form text and JSON grader references will fail.<\/p>\n<pre><code class=\"language-json\">{\r\n  \"type\": \"json_schema\",\r\n  \"json_schema\": {\r\n    \"name\": \"response\",\r\n    \"strict\": true,\r\n    \"schema\": {\r\n      \"properties\": {\r\n        \"capital\": { \"title\": \"Capital\", \"type\": \"string\" },\r\n        \"population\": { \"title\": \"Population\", \"type\": \"string\" }\r\n      },\r\n      \"title\": \"CapitalData\",\r\n      \"type\": \"object\",\r\n      \"additionalProperties\": false\r\n    }\r\n  }\r\n}<\/code><\/pre>\n<hr \/>\n<h2>Advanced: Agentic RFT Scenarios<\/h2>\n<h3>Tool Design<\/h3>\n<p>Treat tools as part of the environment, not passive helpers. Build tools that reflect the <strong>full decision-making cycle<\/strong> your task requires\u2014not just the final action. For example, an automatic escalation workflow shouldn&#8217;t only have a tool to trigger escalation; it also needs a tool to check recipient availability first. Without that step, the model never learns <em>when<\/em> escalation is appropriate.<\/p>\n<p>Design for training-scale traffic: set timeouts and rate limits, add tracing (latency + error codes), and plan retry behavior so that slow calls don&#8217;t cascade into a retry storm.<\/p>\n<h3>MCP Server Integration<\/h3>\n<p>RFT supports tool use through function-calling, but MCP is the preferred approach for production agentic systems. Implement each tool once, then expose it two ways\u2014via an MCP interface for MCP-native clients and via a function-calling-compatible interface for fine-tuning. This lets you seamlessly integrate with Agents, Evaluations, and Reinforcement Fine-Tuning on the Foundry platform.<\/p>\n<div class=\"d-flex\"><a class=\"cta_button_link btn-secondary\" href=\"https:\/\/github.com\/microsoft-foundry\/fine-tuning\/tree\/main\/Demos\/Agentic_RFT_PrivatePreview\" target=\"_blank\" rel=\"noopener\">Explore\u00a0agentic\u00a0RFT\u00a0samples<\/a><\/div>\n<h3>Monitor for Reward Hacking<\/h3>\n<p>Don&#8217;t wait for final scores\u2014inspect outputs and evaluation metrics throughout training using the <strong>Metrics tab<\/strong> on the fine-tuning job detail page in Foundry.<\/p>\n<p>Signs of reward hacking:<\/p>\n<ul>\n<li>Eval scores improve while visible output quality degrades.<\/li>\n<li>The model produces responses that &#8220;match&#8221; the grader without performing the intended behavior (e.g., a semantically incorrect tool call that still passes pattern checks).<\/li>\n<\/ul>\n<p>Mitigations:<\/p>\n<ul>\n<li>Use <strong>held-out evaluation sets<\/strong> with diverse, real-world inputs.<\/li>\n<li>Give partial credit across multiple dimensions (outcome, tool use, safety).<\/li>\n<li>Explicitly require critical intermediate steps (e.g., lookups before writes).<\/li>\n<li>Keep grading deterministic so improvements reflect policy changes, not grader noise.<\/li>\n<\/ul>\n<hr \/>\n<h2>What&#8217;s Next<\/h2>\n<ul>\n<li>Read the full <a href=\"https:\/\/github.com\/microsoft-foundry\/fine-tuning\/blob\/main\/Demos\/Agentic_RFT_PrivatePreview\/RFT_Best_Practice.md\">RFT Best Practices guide<\/a> on GitHub.<\/li>\n<li>Explore the <a href=\"https:\/\/github.com\/microsoft-foundry\/fine-tuning\/Demos\">fine-tuning code samples<\/a> for end-to-end workflows.<\/li>\n<li>Review the <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-foundry\/openai\/how-to\/reinforcement-fine-tuning\">Reinforcement Fine-Tuning how-to<\/a> in Microsoft Learn.<\/li>\n<\/ul>\n<div class=\"d-flex\"><a class=\"cta_button_link btn-secondary\" href=\"https:\/\/aka.ms\/foundrydevs\" target=\"_blank\" rel=\"noopener\">Join\u00a0the\u00a0Community<\/a><\/div>\n<h2>Conclusion<\/h2>\n<p>This month&#8217;s RFT updates work together: Global Training for o4-mini lowers your training costs across regions, new GPT-4.1 model graders give you richer reward signals for complex evaluation tasks, and the best practices guide helps you avoid common pitfalls from day one. Start small with a handful of scored examples and a simple grader, validate your setup, and scale from there.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>April 2026 brings three major Reinforcement Fine-Tuning updates: Global Training for o4-mini with lower per-token rates across 12+ regions, new GPT-4.1 model graders for richer reward signals, and a comprehensive RFT best practices guide to help you ship specialized models faster.<\/p>\n","protected":false},"author":210936,"featured_media":1563,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[71,125,2,127,124,123,126],"class_list":["post-2122","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-foundry","tag-fine-tuning","tag-global-training","tag-microsoft-foundry","tag-model-graders","tag-o4-mini","tag-reinforcement-fine-tuning","tag-rft"],"acf":[],"blog_post_summary":"<p>April 2026 brings three major Reinforcement Fine-Tuning updates: Global Training for o4-mini with lower per-token rates across 12+ regions, new GPT-4.1 model graders for richer reward signals, and a comprehensive RFT best practices guide to help you ship specialized models faster.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/210936"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=2122"}],"version-history":[{"count":1,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2122\/revisions"}],"predecessor-version":[{"id":2124,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/2122\/revisions\/2124"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/1563"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=2122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=2122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=2122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}