{"id":2803,"date":"2023-08-31T13:27:49","date_gmt":"2023-08-31T20:27:49","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sdk\/?p=2803"},"modified":"2023-08-31T13:27:49","modified_gmt":"2023-08-31T20:27:49","slug":"transcribing-and-translating-with-azure-sdks","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sdk\/transcribing-and-translating-with-azure-sdks\/","title":{"rendered":"Audio Alchemy: Transcribing and Translating with Azure SDKs"},"content":{"rendered":"<p>Azure Cognitive Services offers a wide range of AI-powered services that can be utilized to enhance applications and services. In this article and associated demo project, I&#8217;m using Azure SDK client libraries to transcribe and translate audio files. By applying Azure Cognitive Services, you can easily convert audio files in one language to text in another language.<\/p>\n<h3>Prerequisites<\/h3>\n<p>Before diving into the tutorial, ensure that you have the following prerequisites:<\/p>\n<ol>\n<li>Python 3.6 or higher installed on your system.<\/li>\n<li>An Azure account with access to the Speech Service and Translator Service.<\/li>\n<li>The Azure Cognitive Services Speech library, Azure Cognitive Services Translator library, and <code>python-dotenv<\/code> package installed. You can install them using the following command:<\/li>\n<\/ol>\n<pre><code class=\"language-bash\">   pip install azure-cognitiveservices-speech azure-ai-translation-text python-dotenv<\/code><\/pre>\n<ol start=\"4\">\n<li>Get the demo project with the full script. You can clone the repo to your local machine using git:\n<pre><code class=\"language-bash\">git clone https:\/\/github.com\/mario-guerra\/azure-speech-translator.git<\/code><\/pre>\n<\/li>\n<\/ol>\n<h3>Set up the environment<\/h3>\n<p>To set up your environment, retrieve the keys and endpoints for both the Speech Service and Translator Service from your Azure account.<\/p>\n<p>To locate the keys and endpoints for both the Speech Service and Translator Service in your Azure account, follow these steps:<\/p>\n<ol>\n<li><strong>Sign in to the Azure Portal<\/strong>: Visit the <a href=\"https:\/\/portal.azure.com\/\">Azure portal<\/a> and sign in with your Azure account credentials.<\/li>\n<li><strong>Access the Speech Service<\/strong>:\n<ul>\n<li>In the left-hand menu, select &#8220;All services.&#8221;<\/li>\n<li>In the search box, type &#8220;Speech&#8221; and select &#8220;Speech&#8221; from the results.<\/li>\n<li>Choose the Speech Service you have created or create a new one if you haven&#8217;t already.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Retrieve the Speech Service key and endpoint<\/strong>:\n<ul>\n<li>Once you have accessed your Speech Service, navigate to the &#8220;Keys and Endpoint&#8221; on the left-side menu under the &#8220;Resource Management&#8221; section.<\/li>\n<li>Copy the <code>Key1<\/code> or <code>Key2<\/code> value (both are valid) and &#8220;Location\/Region&#8221; value. These values are used as your <code>AZURE_SPEECH_KEY<\/code> and <code>AZURE_SERVICE_REGION<\/code>, respectively.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Access the Translator Service<\/strong>:\n<ul>\n<li>Go back to the &#8220;All services&#8221; menu and search for &#8220;Translator.&#8221;<\/li>\n<li>Select &#8220;Translator&#8221; from the results.<\/li>\n<li>Choose the Translator Service you have created or create a new one if you haven&#8217;t already.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Retrieve the Translator Service key and endpoint<\/strong>:\n<ul>\n<li>Once you have accessed your Translator Service, navigate to &#8220;Keys and Endpoint&#8221; under the &#8220;Resource Management&#8221; section in the left-hand menu.<\/li>\n<li>Copy the <code>Key1<\/code> or <code>Key2<\/code> value (both are valid) and the &#8220;Text Translation&#8221; endpoint value under the &#8220;Web API&#8221; tab. These values are used as your <code>AZURE_TRANSLATOR_KEY<\/code> and <code>AZURE_TRANSLATOR_ENDPOINT<\/code>, respectively.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>Once you have your keys, region, and endpoint, create a <code>.env<\/code> file in the same directory as your script and add the following environment variables:<\/p>\n<pre><code>AZURE_SPEECH_KEY=&lt;your_speech_service_key&gt;\r\nAZURE_SERVICE_REGION=&lt;your_speech_service_region&gt;\r\nAZURE_TRANSLATOR_KEY=&lt;your_translator_service_key&gt;\r\nAZURE_TRANSLATOR_ENDPOINT=&lt;your_translator_service_endpoint&gt;<\/code><\/pre>\n<p>Replace the placeholder values with the appropriate keys and endpoints from your Azure account.<\/p>\n<h3>Audio translation script overview<\/h3>\n<p>The demo project features an audio translation script that processes input audio files in WAV format. It transcribes the audio using Azure Speech Service and translates the resulting text into the desired target language with Azure Translator Service.<\/p>\n<p>This powerful tool can be invaluable for various applications, such as language learning, content localization, and accessibility services.<\/p>\n<h3>Customize the audio translation script<\/h3>\n<p>The provided audio translation script can be customized to suit specific requirements. For example, you can modify the script to:<\/p>\n<ul>\n<li>Add support for more languages by updating the <a href=\"https:\/\/learn.microsoft.com\/azure\/cognitive-services\/speech-service\/language-support#speech-to-text\"><code>language_codes<\/code><\/a> and <a href=\"https:\/\/learn.microsoft.com\/azure\/ai-services\/translator\/language-support\"><code>translator_language_codes<\/code><\/a> dictionaries.<\/li>\n<li>Adjust the timeout settings for speech recognition by <a href=\"https:\/\/learn.microsoft.com\/dotnet\/api\/microsoft.cognitiveservices.speech.propertyid?view=azure-dotnet\">modifying the values<\/a> of the <code>SpeechServiceConnection_InitialSilenceTimeoutMs<\/code>, <code>SpeechServiceConnection_EndSilenceTimeoutMs<\/code>, and <code>Speech_SegmentationSilenceTimeoutMs<\/code> properties.<\/li>\n<li>Implement extra error handling and logging to improve the script&#8217;s robustness and maintainability.<\/li>\n<\/ul>\n<h3>Transcribe with continuous recognition vs. one-shot recognition<\/h3>\n<p>In speech recognition, two main approaches can be used: continuous recognition and one-shot recognition. Each has its own use cases and benefits. In my demo audio translation script, I chose continuous recognition for real-time translation, better handling of pauses, and greater flexibility.<\/p>\n<h4>Continuous recognition<\/h4>\n<p>Continuous recognition is a speech recognition approach that processes audio input in real-time and continuously recognizes speech as it is spoken. This method is useful when dealing with long audio files or live audio streams, as it provides real-time feedback and can handle pauses or interruptions in speech.<\/p>\n<p>In continuous recognition, the speech recognizer listens for speech and generates results as it recognizes words and phrases. It can also raise events when it recognizes speech, allowing you to perform actions, such as translating the recognized text, as demonstrated in our script.<\/p>\n<p>Here&#8217;s how I set up continuous recognition using the Azure Cognitive Services Speech library in my script:<\/p>\n<ol>\n<li>Configure the Speech Service by creating a <code>SpeechConfig<\/code> object and setting the speech recognition language and other properties:\n<pre><code class=\"language-python\">speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)\r\nspeech_config.speech_recognition_language = speech_recognition_language\r\nspeech_config.set_property(speechsdk.PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, \"15000\")\r\nspeech_config.set_property(speechsdk.PropertyId.SpeechServiceConnection_EndSilenceTimeoutMs, \"10000\")\r\nspeech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, \"5000\")<\/code><\/pre>\n<\/li>\n<li>Define event handlers for the <code>recognized<\/code> and <code>session_stopped<\/code> events:\n<pre><code class=\"language-python\">def on_recognized(recognition_args, in_lang, out_lang):\r\n    source_text = recognition_args.result.text\r\n    print(f\"Transcribed text: {source_text}\")\r\n\r\n    # Write the transcribed text to the transcription output file if specified\r\n    if cmd_line_args.transcription:\r\n        with open(cmd_line_args.transcription, 'a', encoding='utf-8') as f:\r\n            f.write(f\"{source_text}\\n\")\r\n\r\n    # Translate the transcribed text using the Azure Translator SDK\r\n    try:\r\n        source_language = translator_language_codes[in_lang]\r\n        # Translator service supports translation to multiple languages in one pass,\r\n        # so it expects a bracketed list even when translating to only one language.\r\n        target_languages = [translator_language_codes[out_lang]]\r\n        input_text_elements = [ InputTextItem(text = source_text) ]\r\n        response = text_translator.translate(content = input_text_elements, to = target_languages, from_parameter = source_language)\r\n        translation = response[0] if response else None\r\n\r\n        if translation:\r\n            for translated_text in translation.translations:\r\n                print(f\"Translated text: {translated_text.text}\")\r\n                # Write the translated text to the output file\r\n                with open(cmd_line_args.output_file, 'a', encoding='utf-8') as f:\r\n                    f.write(f\"{translation}\\n\")\r\n\r\n    except HttpResponseError as exception:\r\n        print(f\"Error Code: {exception.error.code}\")\r\n        print(f\"Message: {exception.error.message}\")\r\n\r\ndef on_session_stopped(args):\r\n    print(\"Continuous recognition session stopped.\")\r\n    global session_stopped\r\n    session_stopped = True<\/code><\/pre>\n<\/li>\n<li>Create a <code>SpeechRecognizer<\/code> object using the <code>SpeechConfig<\/code> object and an <code>AudioConfig<\/code> object that specifies the input audio file:\n<pre><code class=\"language-python\">audio_input = speechsdk.audio.AudioConfig(filename=input_audio_file)\r\nspeech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_input)<\/code><\/pre>\n<\/li>\n<li>Connect the event handlers to the corresponding events of the <code>SpeechRecognizer<\/code> object and start the continuous recognition process asynchronously:\n<pre><code class=\"language-python\">speech_recognizer.recognized.connect(lambda recognition_args: on_recognized(recognition_args, cmd_line_args.in_lang, cmd_line_args.out_lang))\r\nspeech_recognizer.session_stopped.connect(on_session_stopped)\r\nspeech_recognizer.start_continuous_recognition_async().get()<\/code><\/pre>\n<\/li>\n<li>Wait for the <code>session_stopped<\/code> event to be triggered before proceeding to the next audio file or terminating the script:\n<pre><code class=\"language-python\">while not session_stopped:\r\n   time.sleep(0.5)<\/code><\/pre>\n<\/li>\n<\/ol>\n<p>By following these steps, I&#8217;ve set up continuous recognition using the Azure Cognitive Services Speech library. This approach allows the script to process audio input in real-time, handle pauses or interruptions in speech, and perform actions, such as translation, as soon as speech is recognized.<\/p>\n<h4>One-shot recognition<\/h4>\n<p>One-shot recognition, also known as single-utterance recognition, processes an entire audio file or a single utterance and returns the recognition result once the audio input is complete. This approach is suitable for short audio clips or situations where real-time feedback isn&#8217;t necessary.<\/p>\n<p>To perform one-shot recognition, you would create a <code>SpeechRecognizer<\/code> object, just like in continuous recognition, and then call the <code>recognize_once_async()<\/code> method:<\/p>\n<pre><code class=\"language-python\">result = speech_recognizer.recognize_once_async().get()<\/code><\/pre>\n<p>The recognized text can be accessed using the <code>result.text<\/code> property. One-shot recognition is easier to implement, as it requires only a single function call, but it lacks the real-time feedback and flexibility of continuous recognition.<\/p>\n<h3>Translate the transcriptions<\/h3>\n<p>Once the audio files are transcribed, the script translates the transcriptions into the desired output language using the Azure Translator library. In the <code>on_recognized<\/code> event handler, the translation process is performed as follows:<\/p>\n<ol>\n<li>Retrieve the source and target language codes from the <code>translator_language_codes<\/code> dictionary:\n<pre><code class=\"language-python\">source_language = translator_language_codes[in_lang]\r\ntarget_languages = [translator_language_codes[out_lang]]<\/code><\/pre>\n<\/li>\n<li>Create a list of <code>InputTextItem<\/code> objects containing the transcribed text:\n<pre><code class=\"language-python\">input_text_elements = [ InputTextItem(text = source_text) ]<\/code><\/pre>\n<\/li>\n<li>Call the <code>translate<\/code> method of the <code>TextTranslationClient<\/code> object, passing the input text elements, target languages, and source language:\n<pre><code class=\"language-python\">response = text_translator.translate(content = input_text_elements, to = target_languages, from_parameter = source_language)<\/code><\/pre>\n<\/li>\n<li>Process the translation response and write the translated text to the output file:\n<pre><code class=\"language-python\">translation = response[0] if response else None\r\n\r\nif translation:\r\n   for translated_text in translation.translations:\r\n       print(f\"Translated text: {translated_text.text}\")\r\n       # Write the translated text to the output file\r\n       with open(cmd_line_args.output_file, 'a', encoding='utf-8') as f:\r\n           f.write(f\"{translation}\\n\")<\/code><\/pre>\n<\/li>\n<\/ol>\n<h3>Run the audio translation script<\/h3>\n<p>To run the script, use the following command:<\/p>\n<pre><code class=\"language-bash\">python audio_translation.py --in-lang &lt;input_language&gt; --out-lang &lt;output_language&gt; &lt;input_audio_pattern&gt; &lt;output_file&gt; [--transcription &lt;transcription_output_file&gt;]<\/code><\/pre>\n<p>Replace the placeholders with the appropriate values:<\/p>\n<ul>\n<li><code>&lt;input_language&gt;<\/code>: The input language (currently supported: <code>english, spanish, estonian, french, italian, german<\/code>)<\/li>\n<li><code>&lt;output_language&gt;<\/code>: The output language (currently supported: <code>english, spanish, estonian, french, italian, german<\/code>)<\/li>\n<li><code>&lt;input_audio_pattern&gt;<\/code>: The path to the input audio files with a wildcard pattern (for example, .\/*.wav)<\/li>\n<li><code>&lt;output_file&gt;<\/code>: The path to the output file containing the translations<\/li>\n<li><code>&lt;transcription_output_file&gt;<\/code> (optional): The path to the output file containing the transcriptions<\/li>\n<\/ul>\n<p>For example:<\/p>\n<pre><code class=\"language-bash\">python audio_translation.py --in-lang english --out-lang spanish .\/input_audio\/*.wav output.txt --transcription transcription.txt<\/code><\/pre>\n<p>This command transcribes and translates all <code>.wav<\/code> files in the <code>input_audio<\/code> directory from English to Spanish. The translations are saved in <code>output.txt<\/code>, and the transcriptions are saved in <code>transcription.txt<\/code>.<\/p>\n<p>Output:<\/p>\n<pre><code class=\"language-bash\">python .\\azure_translator.py --in-lang spanish --out-lang english '.\\Spanish test.wav' .\\translation.txt\r\nProcessing audio file: .\\Spanish test.wav\r\nTranscribed text: Esta es una prueba del sistema de transmisi\u00f3n de emergencia. Solo es una prueba si esto fuera una emergencia real, estar\u00eda corriendo para salvar mi vida.\r\nTranslated text: This is a test of the emergency transmission system. It's just a test if this was a real emergency, I would be running for my life.\r\nContinuous speech recognition session stopped.<\/code><\/pre>\n<h3>Conclusion<\/h3>\n<p>By following this tutorial and using the provided audio translation script, you can efficiently transcribe and translate audio files into different languages. This powerful tool, powered by Azure Cognitive Services, opens up numerous possibilities for language learning, content localization, and accessibility services. With the added customization options and a choice between continuous recognition and one-shot recognition, the script becomes an even more versatile solution for your audio translation needs. I encourage you to explore further and experiment with the script to discover its full potential and adapt it to various use cases.<\/p>\n<p>Happy coding!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Discover the power of Azure Cognitive Services libraries for audio transcription and translation, enhancing multilingual experiences.<\/p>\n","protected":false},"author":63526,"featured_media":2806,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[915,918,914,917,919,916],"class_list":["post-2803","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sdk","tag-cognitive-services","tag-speech-transcription","tag-speech-translation","tag-text-transcription","tag-transcribing-text","tag-translation"],"acf":[],"blog_post_summary":"<p>Discover the power of Azure Cognitive Services libraries for audio transcription and translation, enhancing multilingual experiences.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/2803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/users\/63526"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/comments?post=2803"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/2803\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media\/2806"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media?parent=2803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/categories?post=2803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/tags?post=2803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}