{"id":268,"date":"2025-11-20T08:00:21","date_gmt":"2025-11-20T16:00:21","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=268"},"modified":"2025-11-19T17:25:23","modified_gmt":"2025-11-20T01:25:23","slug":"azure-openai-gpt4o-audio-models-developer-guide","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/azure-openai-gpt4o-audio-models-developer-guide\/","title":{"rendered":"Introducing GPT-4o Audio Models in Microsoft Foundry: A Practical Guide for Developers"},"content":{"rendered":"<p>We&#8217;re excited to announce the availability of OpenAI&#8217;s latest GPT-4o audio models\u2014<strong>GPT-4o-Transcribe<\/strong>, <strong>GPT-4o-Mini-Transcribe<\/strong>, and <strong>GPT-4o-Mini-TTS <\/strong>in Microsoft Foundry Models. This practical guide provides developers with essential insights and steps to effectively leverage these advanced audio capabilities in their applications.<\/p>\n<h2>What&#8217;s New in OpenAI&#8217;s Audio Models?<\/h2>\n<p>Azure OpenAI introduces three powerful new audio models:<\/p>\n<ul>\n<li><strong>GPT-4o-Transcribe<\/strong> and <strong>GPT-4o-Mini-Transcribe<\/strong>: Speech-to-text models outperforming previous benchmarks.<\/li>\n<li><strong>GPT-4o-Mini-TTS<\/strong>: A customizable text-to-speech model enabling detailed instructions on speech characteristics.<\/li>\n<\/ul>\n<h2>Model Comparison<\/h2>\n<table>\n<thead>\n<tr>\n<th>Feature<\/th>\n<th>GPT-4o-Transcribe<\/th>\n<th>GPT-4o-Mini-Transcribe<\/th>\n<th>GPT-4o-Mini-TTS<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Performance<\/strong><\/td>\n<td>Best Quality<\/td>\n<td>Great Quality<\/td>\n<td>Best Quality<\/td>\n<\/tr>\n<tr>\n<td><strong>Speed<\/strong><\/td>\n<td>Fast<\/td>\n<td>Fastest<\/td>\n<td>Fastest<\/td>\n<\/tr>\n<tr>\n<td><strong>Input<\/strong><\/td>\n<td>Text, Audio<\/td>\n<td>Text, Audio<\/td>\n<td>Text<\/td>\n<\/tr>\n<tr>\n<td><strong>Output<\/strong><\/td>\n<td>Text<\/td>\n<td>Text<\/td>\n<td>Audio<\/td>\n<\/tr>\n<tr>\n<td><strong>Streaming<\/strong><\/td>\n<td>\u2705<\/td>\n<td>\u2705<\/td>\n<td>\u2705<\/td>\n<\/tr>\n<tr>\n<td><strong>Ideal Use Cases<\/strong><\/td>\n<td>Accurate transcription for challenging environments like customer call centers and automated meeting notes<\/td>\n<td>Rapid transcription for live captioning, quick-response apps, and budget-sensitive scenarios<\/td>\n<td>Customizable interactive voice outputs for chatbots, virtual assistants, accessibility tools, and educational apps<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Technical Innovations<\/h2>\n<h3>Targeted Audio Pretraining<\/h3>\n<p>OpenAI&#8217;s GPT-4o audio models leverage extensive pretraining on specialized audio datasets, significantly enhancing understanding of speech nuances.<\/p>\n<h3>Advanced Distillation Techniques<\/h3>\n<p>Employing sophisticated distillation methods, knowledge from larger models is transferred to efficient, smaller models, preserving high performance.<\/p>\n<h3>Reinforcement Learning<\/h3>\n<p>Integrated RL techniques dramatically improve transcription accuracy and reduce misrecognition, achieving state-of-the-art performance in complex speech recognition tasks.<\/p>\n<h2>Getting Started Guide for Developers<\/h2>\n<h3>Step 1: Set Up Azure OpenAI Environment<\/h3>\n<ul>\n<li>Obtain your Azure OpenAI endpoint and API key.<\/li>\n<li>Authenticate with Azure CLI:<\/li>\n<\/ul>\n<pre><code class=\"language-bash\">az login<\/code><\/pre>\n<h3>Step 2: Configure Project Environment<\/h3>\n<ul>\n<li>Create an <code>.env<\/code> file with your Azure credentials:<\/li>\n<\/ul>\n<pre><code class=\"language-bash\">AZURE_OPENAI_ENDPOINT=\"your-endpoint-url\"\r\nAZURE_OPENAI_API_KEY=\"your-api-key\"\r\nAZURE_OPENAI_API_VERSION=\"2025-04-14\"<\/code><\/pre>\n<h3>Step 3: Install Dependencies<\/h3>\n<ul>\n<li>Set up your virtual environment and install essential libraries:<\/li>\n<\/ul>\n<pre><code class=\"language-bash\">uv venv\r\nsource .venv\/bin\/activate  # macOS\/Linux\r\n.venv\\Scripts\\activate     # Windows\r\nuv add azure-ai-openai python-dotenv gradio aiohttp<\/code><\/pre>\n<h3>Step 4: Deploy and Test Using Gradio<\/h3>\n<ul>\n<li>Deploy and experiment with audio streaming using Gradio:<\/li>\n<\/ul>\n<pre><code class=\"language-bash\">python your_gradio_app.py<\/code><\/pre>\n<h2>Developer Impact<\/h2>\n<p>Integrating Azure OpenAI GPT-4o audio models allows developers to:<\/p>\n<ul>\n<li>Easily incorporate advanced transcription and TTS functionality.<\/li>\n<li>Create highly interactive, intuitive voice-driven applications.<\/li>\n<li>Enhance user experience with customizable and expressive audio interactions.<\/li>\n<\/ul>\n<h2>Further Exploration<\/h2>\n<ul>\n<li><a href=\"https:\/\/nick.fm\">Explore GPT-4o Audio Models on Nick.FM<\/a><\/li>\n<li><a href=\"https:\/\/azure.microsoft.com\/services\/openai\">Detailed Azure OpenAI Service Documentation<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/azure-ai-foundry\">Quickstart with Azure AI Foundry<\/a><\/li>\n<\/ul>\n<p>We encourage developers to leverage these innovative audio models and share their insights and feedback!<\/p>\n<pre><code><\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>How to get started with Azure OpenAI&#8217;s next-generation GPT-4o audio models for transcription and text-to-speech applications.<\/p>\n","protected":false},"author":187629,"featured_media":1563,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[12,22,19,21,20],"class_list":["post-268","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-foundry","tag-azure-openai","tag-developer-guide","tag-gpt-4o","tag-transcription","tag-tts"],"acf":[],"blog_post_summary":"<p>How to get started with Azure OpenAI&#8217;s next-generation GPT-4o audio models for transcription and text-to-speech applications.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/268","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/187629"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=268"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/268\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/1563"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=268"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=268"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=268"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}