{"id":1394,"date":"2025-10-03T06:47:00","date_gmt":"2025-10-03T06:47:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/all-things-azure\/?p=1394"},"modified":"2025-10-03T06:47:00","modified_gmt":"2025-10-03T06:47:00","slug":"from-lab-to-live-a-blueprint-for-a-voice-powered-ai-sales-coach","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/all-things-azure\/from-lab-to-live-a-blueprint-for-a-voice-powered-ai-sales-coach\/","title":{"rendered":"From Lab to Live: A Blueprint for a Voice-Powered AI Sales Coach"},"content":{"rendered":"<div class=\"markdown-heading\" dir=\"auto\">\n<p>Until recently, building real-time voice AI for production was challenging. Developers faced hurdles such as managing audio streams and ensuring low-latency performance.<\/p>\n<\/div>\n<p dir=\"auto\">That landscape is changing. The <a href=\"https:\/\/techcommunity.microsoft.com\/blog\/azure-ai-foundry-blog\/upgrade-your-voice-agent-with-azure-ai-voice-live-api\/4458247\">general availability of the Azure Voice Live API<\/a> marks a turning point, providing a unified abstraction layer that simplifies the development of real-time voice and avatar experiences. This shift inspired me to build a reference implementation called the AI Sales Coach to demonstrate how these new capabilities can be applied to solve a practical business challenge: skill development.<\/p>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\">A Practical Application: The AI Sales Coach<\/h3>\n<\/div>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/prepre.png\"><img decoding=\"async\" class=\"wp-image-1422 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/prepre.png\" alt=\"prepre image\" width=\"775\" height=\"436\" srcset=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/prepre.png 1017w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/prepre-300x169.png 300w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/prepre-768x432.png 768w\" sizes=\"(max-width: 775px) 100vw, 775px\" \/><\/a><\/p>\n<p dir=\"auto\">Sales training is a universal need, making it an ideal use case. The AI Sales Coach application simulates sales conversations, allowing a user to select a scenario and engage with an AI-powered virtual customer, complete with a voice and a lifelike avatar. A sales professional can practice their pitch, handle objections, and navigate a realistic dialogue. Once the simulation ends, the application provides a performance analysis, turning the experience into a powerful learning tool.<\/p>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"article-editor-heading\">Session Configuration<\/h3>\n<p class=\"article-editor-paragraph\">When the connection to Voice Live API is opened, the backend constructs a <strong>session configuration message<\/strong>. This sets up modalities, voice, avatar, and audio settings.<\/p>\n<pre class=\"article-editor-code-block\"><code>def _build_session_config(self) -&gt; Dict[str, Any]:\r\n    \"\"\"Build the base session configuration.\"\"\"\r\n    return {\r\n        \"type\": \"session.update\",\r\n        \"session\": {\r\n            \"modalities\": [\"text\", \"audio\"],\r\n            \"turn_detection\": {\"type\": \"azure_semantic_vad\"},\r\n            \"input_audio_noise_reduction\": {\"type\": \"azure_deep_noise_suppression\"},\r\n            \"input_audio_echo_cancellation\": {\"type\": \"server_echo_cancellation\"},\r\n            \"avatar\": {\r\n                \"character\": \"lisa\",\r\n                \"style\": \"casual-sitting\",\r\n            },\r\n            \"voice\": {\r\n                \"name\": config[\"azure_voice_name\"],\r\n                \"type\": config[\"azure_voice_type\"],\r\n            },\r\n        },\r\n    }<\/code><\/pre>\n<p class=\"article-editor-paragraph\">This message is sent as the first step in establishing the conversation.<\/p>\n<ul class=\"article-editor-bullet-list\">\n<li class=\"article-editor-list-item\">\n<p class=\"article-editor-paragraph\"><strong>modalities<\/strong> define if both text and audio are active.<\/p>\n<\/li>\n<li class=\"article-editor-list-item\">\n<p class=\"article-editor-paragraph\"><strong>turn detection<\/strong> specifies how the system decides when the speaker has finished.<\/p>\n<\/li>\n<li class=\"article-editor-list-item\">\n<p class=\"article-editor-paragraph\"><strong>noise reduction<\/strong> and <strong>echo cancellation<\/strong> improve audio quality.<\/p>\n<\/li>\n<li class=\"article-editor-list-item\">\n<p class=\"article-editor-paragraph\"><strong>avatar<\/strong> and <strong>voice<\/strong> personalize the agent\u2019s presence.<\/p>\n<\/li>\n<\/ul>\n<h3 class=\"article-editor-heading\">Adding Local Agent Behavior<\/h3>\n<p class=\"article-editor-paragraph\">The <strong>behavior<\/strong> of the agent is not controlled by session settings alone. Instead, the backend injects the <strong>model, instructions, temperature, and max tokens<\/strong> into the session.<\/p>\n<pre class=\"article-editor-code-block\"><code>def _add_local_agent_config(self, config_message: Dict[str, Any], agent_config: Dict[str, Any]) -&gt; None:\r\n    \"\"\"Add local agent configuration to session config.\"\"\"\r\n    session = config_message[\"session\"]\r\n    session[\"model\"] = agent_config.get(\"model\", config[\"model_deployment_name\"])\r\n    session[\"instructions\"] = agent_config[\"instructions\"]\r\n    session[\"temperature\"] = agent_config[\"temperature\"]\r\n    session[\"max_response_output_tokens\"] = agent_config[\"max_tokens\"]<\/code><\/pre>\n<p class=\"article-editor-paragraph\">With this, we can dynamically swap out models (e.g., GPT-4o vs GPT-5), adjust creativity with <strong>temperature<\/strong>, and add <strong>prompt instructions<\/strong> that define how the AI should behave.<\/p>\n<h3 class=\"article-editor-heading\">The Instructions<\/h3>\n<p class=\"article-editor-paragraph\">The instructions enforce conversational realism, ensuring the AI responds like a real person in a role-play setting.<\/p>\n<pre class=\"article-editor-code-block\"><code>BASE_INSTRUCTIONS = \"\"\"\r\nCRITICAL INTERACTION GUIDELINES:\r\n- Keep responses SHORT and conversational (3 sentences max, as if speaking on phone)\r\n- ALWAYS stay in character, never break role\r\n- Simulate natural human speech patterns with pauses, um, well, occasional hesitation\r\n- Respond as a real person would in this business context\r\n- Use natural phone conversation style, direct but personable\r\n- Show genuine emotions and reactions appropriate to the situation\r\n- Ask follow-up questions to keep the conversation flowing naturally\r\n- Avoid overly formal or robotic language\r\n\"\"\"<\/code><\/pre>\n<p class=\"article-editor-paragraph\">These are appended to the role-play scenario instructions to guide the AI toward natural, in-character responses.<\/p>\n<\/div>\n<div class=\"markdown-heading\" dir=\"auto\">\n<div class=\"article-editor-horizontal-rule__container\" contenteditable=\"false\">\n<hr class=\"article-editor-horizontal-rule\" \/>\n<div class=\"article-editor-horizontal-rule__delete-button-container\"><\/div>\n<\/div>\n<h3 class=\"heading-element\" dir=\"auto\">Feedback System<\/h3>\n<\/div>\n<p dir=\"auto\">At the end of the conversation, the full transcript is sent to a GPT-4o model for evaluation. This &#8220;LLM-as-judge&#8221; pattern allows us to display a detailed scorecard covering key competencies.<\/p>\n<p dir=\"auto\"><a href=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/feedy.png\"><img decoding=\"async\" class=\"alignnone wp-image-1421 \" src=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/feedy-1024x747.png\" alt=\"feedy image\" width=\"721\" height=\"526\" srcset=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/feedy-1024x747.png 1024w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/feedy-300x219.png 300w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/feedy-768x561.png 768w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/feedy.png 1229w\" sizes=\"(max-width: 721px) 100vw, 721px\" \/><\/a><\/p>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\">The Manifestation of AI<\/h3>\n<\/div>\n<p dir=\"auto\">A key feature of the sample implementation is the avatar. This is not a gimmick; it addresses a fundamental design question we must now ask about AI: <strong>How should it manifest?<\/strong><\/p>\n<p dir=\"auto\">In a sales simulation, giving the AI a face makes the interaction more personal and realistic, improving the training&#8217;s effectiveness. However, an avatar is not always the right choice. In a high-stress customer support scenario, it might be distracting or inappropriate. This project highlights the need to be intentional about the &#8220;body&#8221; we give our AI, tailoring its manifestation to the specific use case and user&#8217;s emotional state.<\/p>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\">The Technical Architecture<\/h3>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/archi.png\"><img decoding=\"async\" class=\"alignnone wp-image-1423\" src=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/archi.png\" alt=\"archi image\" width=\"532\" height=\"571\" srcset=\"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/archi.png 931w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/archi-279x300.png 279w, https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-content\/uploads\/sites\/83\/2025\/09\/archi-768x825.png 768w\" sizes=\"(max-width: 532px) 100vw, 532px\" \/><\/a><\/p>\n<\/div>\n<p dir=\"auto\">The Azure Voice Live API is the core of the system, handling the real-time, speech-to-speech conversation and avatar simulation. It serves as an abstraction layer, allowing different voice and language models to be used as the underlying reasoning engine without rewriting the application.<\/p>\n<div class=\"article-editor-horizontal-rule__container\" contenteditable=\"false\">\n<hr class=\"article-editor-horizontal-rule\" \/>\n<div class=\"article-editor-horizontal-rule__delete-button-container\"><\/div>\n<\/div>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\">Final Thoughts<\/h3>\n<\/div>\n<p dir=\"auto\">We have moved from text-based assistants to fully interactive, voice-driven experiences that can collaborate in increasingly human-like ways. The technology to build these systems is no longer a future concept; it&#8217;s here and ready to be deployed.<\/p>\n<p dir=\"auto\">This project is a demonstration of what is now possible. I encourage you to explore the repository, envision how these capabilities could be used in your own organization, and start building. For a deeper dive into the Azure Voice Live API, check the <a class=\"decorated-link\" href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/speech-service\/voice-live\" target=\"_new\" rel=\"noopener\" data-start=\"626\" data-end=\"729\">official documentation<\/a>.<\/p>\n<p dir=\"auto\">The full code for this technology demonstrator is available on GitHub. You can deploy it to your own Azure subscription in minutes using the Azure Developer CLI (<code>azd up<\/code>).<\/p>\n<p dir=\"auto\"><strong>GitHub Repository:<\/strong>\u00a0<a href=\"https:\/\/github.com\/Azure-Samples\/voicelive-api-salescoach\">https:\/\/github.com\/Azure-Samples\/voicelive-api-salescoach<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Until recently, building real-time voice AI for production was challenging. Developers faced hurdles such as managing audio streams and ensuring low-latency performance. That landscape is changing. The general availability of the Azure Voice Live API marks a turning point, providing a unified abstraction layer that simplifies the development of real-time voice and avatar experiences. This [&hellip;]<\/p>\n","protected":false},"author":199347,"featured_media":1743,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1394","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure"],"acf":[],"blog_post_summary":"<p>Until recently, building real-time voice AI for production was challenging. Developers faced hurdles such as managing audio streams and ensuring low-latency performance. That landscape is changing. The general availability of the Azure Voice Live API marks a turning point, providing a unified abstraction layer that simplifies the development of real-time voice and avatar experiences. This [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts\/1394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/users\/199347"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/comments?post=1394"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts\/1394\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/media\/1743"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/media?parent=1394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/categories?post=1394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/tags?post=1394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}