{"id":226,"date":"2025-04-16T09:00:56","date_gmt":"2025-04-16T16:00:56","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/foundry\/?p=226"},"modified":"2025-04-17T16:02:05","modified_gmt":"2025-04-17T23:02:05","slug":"get-started-azure-openai-advanced-audio-models","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/foundry\/get-started-azure-openai-advanced-audio-models\/","title":{"rendered":"Get started with the new Advanced Audio Models in Azure OpenAI Service"},"content":{"rendered":"<p><figure id=\"attachment_227\" aria-labelledby=\"figcaption_attachment_227\" class=\"wp-caption aligncenter\" ><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/image.png\"><img decoding=\"async\" class=\"size-large wp-image-227\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/image-1024x576.png\" alt=\"Public Preview of Azure OpenAI Audio models\" width=\"1024\" height=\"576\" srcset=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/image-1024x576.png 1024w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/image-300x169.png 300w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/image-768x432.png 768w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/image-1536x864.png 1536w, https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/image.png 1600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption id=\"figcaption_attachment_227\" class=\"wp-caption-text\">Announcing the public preview of Azure OpenAI Audio models<\/figcaption><\/figure><\/p>\n<p><audio class=\"wp-audio-shortcode\" id=\"audio-226-1\" preload=\"none\" style=\"width: 100%;\" controls=\"controls\"><source type=\"audio\/mpeg\" src=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/aoai-audio-tts-blog-sample.mp3?_=1\" \/><a href=\"https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/aoai-audio-tts-blog-sample.mp3\">https:\/\/devblogs.microsoft.com\/foundry\/wp-content\/uploads\/sites\/89\/2025\/04\/aoai-audio-tts-blog-sample.mp3<\/a><\/audio><\/p>\n<p>We&#8217;re excited to announce the preview availability of Azure OpenAI&#8217;s advanced audio models\u2014<strong>GPT-4o-Transcribe<\/strong>, <strong>GPT-4o-Mini-Transcribe<\/strong>, and <strong>GPT-4o-Mini-TTS<\/strong>. This guide provides developers with essential insights and steps to effectively leverage these advanced audio capabilities in their applications.<\/p>\n<h2>What&#8217;s New in Azure OpenAI Audio Models?<\/h2>\n<p>Azure OpenAI introduces three powerful new audio models, available for deployment today in East US2 on Azure AI Foundry.<\/p>\n<ul>\n<li><strong>GPT-4o-Transcribe<\/strong> and <strong>GPT-4o-Mini-Transcribe<\/strong>: Speech-to-text models outperforming previous benchmarks.<\/li>\n<li><strong>GPT-4o-Mini-TTS<\/strong>: A customizable text-to-speech model enabling detailed instructions on speech characteristics.<\/li>\n<\/ul>\n<h2>Model Comparison<\/h2>\n<table>\n<thead>\n<tr>\n<th>Feature<\/th>\n<th>GPT-4o-Transcribe<\/th>\n<th>GPT-4o-Mini-Transcribe<\/th>\n<th>GPT-4o-Mini-TTS<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>Performance<\/strong><\/td>\n<td>Best Quality<\/td>\n<td>Great Quality<\/td>\n<td>Best Quality<\/td>\n<\/tr>\n<tr>\n<td><strong>Speed<\/strong><\/td>\n<td>Fast<\/td>\n<td>Fastest<\/td>\n<td>Fastest<\/td>\n<\/tr>\n<tr>\n<td><strong>Input<\/strong><\/td>\n<td>Text, Audio<\/td>\n<td>Text, Audio<\/td>\n<td>Text<\/td>\n<\/tr>\n<tr>\n<td><strong>Output<\/strong><\/td>\n<td>Text<\/td>\n<td>Text<\/td>\n<td>Audio<\/td>\n<\/tr>\n<tr>\n<td><strong>Streaming<\/strong><\/td>\n<td>\u2705<\/td>\n<td>\u2705<\/td>\n<td>\u2705<\/td>\n<\/tr>\n<tr>\n<td><strong>Ideal Use Cases<\/strong><\/td>\n<td>Accurate transcription for challenging environments like customer call centers and automated meeting notes<\/td>\n<td>Rapid transcription for live captioning, quick-response apps, and budget-sensitive scenarios<\/td>\n<td>Customizable interactive voice outputs for chatbots, virtual assistants, accessibility tools, and educational apps<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Technical Innovations<\/h2>\n<ul>\n<li><strong>Targeted Audio Pretraining<\/strong>: <span style=\"font-size: 16px;\">OpenAI&#8217;s GPT-4o audio models leverage extensive pretraining on specialized audio datasets, significantly enhancing understanding of speech nuances.<\/span><\/li>\n<li><strong>Advanced Distillation Techniques<\/strong>: Employing sophisticated distillation methods, knowledge from larger models is transferred to efficient, smaller models, preserving high performance.<\/li>\n<li><strong>Reinforcement Learning<\/strong>: Integrated RL techniques dramatically improve transcription accuracy and reduce misrecognition, achieving state-of-the-art performance for the speech-to-text models in complex speech recognition tasks.<\/li>\n<\/ul>\n<h2>Getting Started Guide for Developers<\/h2>\n<p>Use the <a href=\"https:\/\/github.com\/Azure-Samples\/azure-openai-tts-demo\" target=\"_blank\" rel=\"noopener\">Azure OpenAI TTS Demo repository<\/a> to explore GPT\u20114o audio models through practical, hands\u2011on examples.<\/p>\n<p><div  class=\"d-flex justify-content-left\"><a class=\"cta_button_link btn-primary mb-24\" href=\"https:\/\/github.com\/Azure-Samples\/azure-openai-tts-demo\" target=\"_blank\">Get started<\/a><\/div><\/p>\n<h3>Step\u00a01: Clone the Repository<\/h3>\n<pre><code class=\"language-bash\">git clone https:\/\/github.com\/Azure-Samples\/azure-openai-tts-demo.git\r\ncd azure-openai-tts-demo\r\n<\/code><\/pre>\n<h3>Step\u00a02: Configure Your Environment<\/h3>\n<p>Create your virtual environment and install dependencies:<\/p>\n<pre><code class=\"language-bash\">python -m venv .venv\r\nsource .venv\/bin\/activate  # macOS\/Linux\r\n.venv\\Scripts\\activate     # Windows\r\npip install -r requirements.txt\r\n<\/code><\/pre>\n<p>Set up your Azure credentials by creating a <code>.env<\/code> file:<\/p>\n<pre><code class=\"language-bash\">cp .env.example .env\r\n# Edit .env with your Azure OpenAI endpoint and API key\r\n<\/code><\/pre>\n<p><strong>Example <code>.env<\/code>:<\/strong><\/p>\n<pre><code class=\"language-env\">AZURE_OPENAI_ENDPOINT=\"https:\/\/&lt;your-resource-name&gt;.openai.azure.com\/\"\r\nAZURE_OPENAI_API_KEY=\"your-azure-openai-api-key\"\r\nAZURE_OPENAI_API_VERSION=\"2025-03-01-preview\"\r\n<\/code><\/pre>\n<h3>Step\u00a03: Run the Interactive Gradio Soundboard<\/h3>\n<p>Launch the demo to experiment interactively:<\/p>\n<pre><code class=\"language-bash\">python soundboard.py\r\n<\/code><\/pre>\n<p>Select different voices, vibes, and listen to generated speech.<\/p>\n<h3>Step\u00a04: Explore Additional Sample Scripts<\/h3>\n<p>Run sample scripts for specific audio tasks:<\/p>\n<ul>\n<li><strong>Streaming audio to a file<\/strong><\/li>\n<\/ul>\n<pre><code class=\"language-bash\">python streaming-tts-to-file-sample.py\r\n<\/code><\/pre>\n<ul>\n<li><strong>Asynchronous streaming and playback<\/strong><\/li>\n<\/ul>\n<pre><code class=\"language-bash\">python async-streaming-tts-sample.py\r\n<\/code><\/pre>\n<h2>Developer Impact<\/h2>\n<p>Integrating Azure OpenAI advanced audio models allows developers to:<\/p>\n<ul>\n<li>Easily incorporate advanced transcription and TTS functionality.<\/li>\n<li>Create highly interactive, intuitive voice-driven applications.<\/li>\n<li>Enhance user experience with customizable and expressive audio interactions.<\/li>\n<\/ul>\n<h2>Further Exploration<\/h2>\n<ul>\n<li><a href=\"https:\/\/github.com\/Azure-Samples\/azure-openai-tts-demo\" target=\"_blank\" rel=\"noopener\">Try the code samples<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/openai\/concepts\/models?tabs=global-standard%2Cstandard-chat-completions#audio-models\" target=\"_blank\" rel=\"noopener\">Azure OpenAI Audio models documentation<\/a><\/li>\n<li><a href=\"https:\/\/ai.azure.com\/explore\/models?selectedCollection=aoai\" target=\"_blank\" rel=\"noopener\">Explore the models in Azure AI Foundry<\/a><\/li>\n<\/ul>\n<p>We encourage developers to leverage these innovative audio models and share their insights and feedback!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How to get started with Azure OpenAI&#8217;s next-generation GPT-4o audio models for transcription and text-to-speech applications.<\/p>\n","protected":false},"author":187629,"featured_media":227,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[12,22,19,21,20],"class_list":["post-226","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-microsoft-foundry","tag-azure-openai","tag-developer-guide","tag-gpt-4o","tag-transcription","tag-tts"],"acf":[],"blog_post_summary":"<p>How to get started with Azure OpenAI&#8217;s next-generation GPT-4o audio models for transcription and text-to-speech applications.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/226","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/users\/187629"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/comments?post=226"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/posts\/226\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media\/227"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/media?parent=226"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/categories?post=226"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/foundry\/wp-json\/wp\/v2\/tags?post=226"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}