{"id":1511,"date":"2021-09-01T09:45:59","date_gmt":"2021-09-01T16:45:59","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sdk\/?p=1511"},"modified":"2021-09-01T09:45:59","modified_gmt":"2021-09-01T16:45:59","slug":"extractive-summarization-preview","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sdk\/extractive-summarization-preview\/","title":{"rendered":"Text Analytics for Extractive Summarization"},"content":{"rendered":"<p><!-- TIPS: - Use `SDK` when talking about all of the client libraries. - Use `Client libraries\/ry` when talking about individual libraries. - Make sure all links do not have Locale, i.e remove `en-us` from all links. - All image links need to start with `.\/images\/posts\/*.png` and need to match exact case of the file. - Avoid using `here` for link text. Use the title of the link\/file. - Please include summary at the end. --><\/p>\n<p>We&#8217;re delighted to announce that Text Analytics now supports extractive summarization!\nIn general, there are two approaches for automatic text summarization: extractive and abstractive.\nThe Text Analytics API provides extractive summarization starting in version 3.2-preview.1<\/p>\n<h2>Extractive Summarization Analysis<\/h2>\n<p>Text Analytics for Extractive Summarization is a feature in Azure Text Analytics that produces a text\nsummary by extracting sentences that collectively represent the most important or relevant information\nwithin the original content. This feature is designed to shorten content that could be considered too\nlong to read. Extractive summarization condenses articles, papers, or documents to key sentences.<\/p>\n<p>Text Analytics for Extractive Summarization supports the following features:<\/p>\n<ul>\n<li>Extracted sentences: These sentences collectively convey the main idea of the document. They are\noriginal sentences extracted from the input document&#8217;s content. Each of these extracted sentences has\na rank score, an offset (position where the sentence starts at in the input document), and a length.<\/li>\n<li>Rank score: A rank score is an indicator of how relevant a sentence is determined to be, to the main\nidea of a document. The model gives a score between 0 and 1 (inclusive) to each sentence and returns\nthe highest scored sentences per request. For example, if you request a three-sentence summary, the\nservice returns the three highest scored sentences given that the input document already has three or\nmore sentences.<\/li>\n<li>Maximum sentences: The maximum count of sentences to be returned. The default value is three, which means\nthe sentences with top three highest rank scores will be returned as the extractive summarization\nanalysis result.<\/li>\n<li>Sorting algorithm: The extracted sentences can be sorted by their offset or rank score. The default\nbehavior is sorted by offset. The sorting algorithm applies after the maximum sentences count applies.\nThat means the service will find the top highest rank score summarized sentences first and then sort\nthe sentences.<\/li>\n<\/ul>\n<p>Next, we will walk through a sample usage of extractive summarization.<\/p>\n<h3>An example<\/h3>\n<p>You can find complete samples for <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-java\/blob\/main\/sdk\/textanalytics\/azure-ai-textanalytics\/src\/samples\/java\/com\/azure\/ai\/textanalytics\/lro\/AnalyzeExtractiveSummarization.java\">Java<\/a>,\n<a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-net\/blob\/main\/sdk\/textanalytics\/Azure.AI.TextAnalytics\/samples\/Sample8_ExtractSummary.md\">C#<\/a>, and <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-python\/blob\/main\/sdk\/textanalytics\/azure-ai-textanalytics\/samples\/sample_extract_summary.py\">Python<\/a> on our website.<\/p>\n<p>In this section, we will show how to use extractive summarization in Java.\nTo use Text Analytics for Text Extractive Summarization, start with creating a Text Analytics client,\nand then use the client to make a request to the Text Analytics service on the documents input, which\nwill return the analyzed output that includes the features described above.<\/p>\n<p>Create a Text Analytics client,<\/p>\n<pre><code class=\"language-java\">TextAnalyticsClient client = new TextAnalyticsClientBuilder()\r\n                                 .credential(new AzureKeyCredential(\"{key}\"))\r\n                                 .endpoint(\"{endpoint}\")\r\n                                 .buildClient();<\/code><\/pre>\n<p>Prepare a batch of documents as input,<\/p>\n<pre><code class=\"language-java\">List&lt;String&gt; documents = Arrays.asList(\r\n    \"&lt;first document input string your want to analyze&gt;\",\r\n    \"&lt;second document input string your want to analyze&gt;\");                              <\/code><\/pre>\n<p>Next, let&#8217;s create an extractive summarization action and pass it in a call to <code>beginAnalyzeActions<\/code>:<\/p>\n<pre><code class=\"language-java\">SyncPoller&lt;AnalyzeActionsOperationDetail, AnalyzeActionsResultPagedIterable&gt; syncPoller =\r\n  client.beginAnalyzeActions(documents,\r\n    new TextAnalyticsActions().setExtractSummaryActions(new ExtractSummaryAction()),\r\n    \"en\",\r\n    null);\r\n\r\nsyncPoller.waitForCompletion();<\/code><\/pre>\n<p>Since this operation is long-running, we will call <code>getFinalResult()<\/code> on the poller to get the results after waiting\nis completed:<\/p>\n<pre><code class=\"language-java\">syncPoller.getFinalResult().forEach(actionsResult -&gt; {\r\n  System.out.println(\"Extractive Summarization action results:\");\r\n  for (ExtractSummaryActionResult actionResult : actionsResult.getExtractSummaryResults()) {\r\n    for (ExtractSummaryResult documentResult : actionResult.getDocumentsResults()) {\r\n      System.out.println(\"tExtracted summary sentences:\");\r\n      for (SummarySentence summarySentence : documentResult.getSentences()) {\r\n        System.out.printf(\"tt Sentence text: %s, length: %d, offset: %d, rank score: %f.%n\",\r\n          summarySentence.getText(), summarySentence.getLength(), summarySentence.getOffset(), summarySentence.getRankScore());\r\n      }   \r\n    }\r\n  }\r\n});<\/code><\/pre>\n<p>For example, given an article document,<\/p>\n<p><em>&#8220;At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic,\nhuman-centric approach to learning and understanding. As Chief Technology Officer of Azure AI Cognitive\nServices, I have been working with a team of amazing scientists and engineers to turn this quest into a\nreality. In my role, I enjoy a unique perspective in viewing the relationship among three attributes of\nhuman cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z). At the\nintersection of all three, there\u2019s magic\u2014what we call XYZ-code as illustrated in Figure 1\u2014a joint\nrepresentation to create more powerful AI that can speak, hear, see, and understand humans better. We\nbelieve XYZ-code will enable us to fulfill our long-term vision: cross-domain transfer learning, spanning\nmodalities and languages. The goal is to have pretrained models that can jointly learn representations to\nsupport a broad range of downstream AI tasks, much in the way humans do today. Over the past five years,\nwe have achieved human performance on benchmarks in conversational speech recognition, machine translation,\nconversational question answering, machine reading comprehension, and image captioning. These five\nbreakthroughs provided us with strong signals toward our more ambitious aspiration to produce a leap in AI\ncapabilities, achieving multisensory and multilingual learning that is closer in line with how humans learn\nand understand. I believe the joint XYZ-code is a foundational component of this aspiration, if grounded\nwith external knowledge sources in the downstream AI tasks.&#8221;<\/em><\/p>\n<p>The extracted summary sentences display as,<\/p>\n<pre><code>Extractive Summarization action results:\r\n  Extracted summary sentences:\r\n    Sentence text: At Microsoft, we have been on a quest to advance AI beyond existing techniques, by taking a more holistic, human-centric approach to learning and understanding., length: 160, offset: 0, rank score: 1.000000.\r\n    Sentence text: In my role, I enjoy a unique perspective in viewing the relationship among three attributes of human cognition: monolingual text (X), audio or visual sensory signals, (Y) and multilingual (Z)., length: 192, offset: 324, rank score: 0.958233.\r\n    Sentence text: At the intersection of all three, there\u2019s magic\u2014what we call XYZ-code as illustrated in Figure 1\u2014a joint representation to create more powerful AI that can speak, hear, see, and understand humans better., length: 203, offset: 517, rank score: 0.929475.<\/code><\/pre>\n<h2>Summary<\/h2>\n<p>Text Analytics for Extractive Summarization is a new feature in the Azure Text Analytics service that\nproduces a text summary by extracting sentences that collectively represent the most important or\nrelevant information within the original document. Furthermore, we have shown how to use it in Java by\ncalling <code>beginAnalyzeActions<\/code>.<\/p>\n<p>This article introduced the Text Analytics library features for <a href=\"https:\/\/docs.microsoft.com\/azure\/cognitive-services\/text-analytics\/how-tos\/extractive-summarization\">extractive summarization analysis<\/a>.<\/p>\n<p>For more information about each language from this article, see the following resources:<\/p>\n<ul>\n<li>.NET: <a href=\"https:\/\/docs.microsoft.com\/dotnet\/api\/azure.ai.textanalytics?view=azure-dotnet-preview\">Document Reference<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-net\/blob\/master\/sdk\/textanalytics\/Azure.AI.TextAnalytics\/README.md\">README<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-net\/blob\/master\/sdk\/textanalytics\/Azure.AI.TextAnalytics\/samples\/README.md\">Samples<\/a><\/li>\n<li>Java: <a href=\"https:\/\/docs.microsoft.com\/java\/api\/overview\/azure\/ai-textanalytics-readme?view=azure-java-preview\">Document Reference<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-java\/tree\/master\/sdk\/textanalytics\/azure-ai-textanalytics\/README.md\">README<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-java\/tree\/master\/sdk\/textanalytics\/azure-ai-textanalytics\/src\/samples\">Samples<\/a><\/li>\n<li>JavaScript: <a href=\"https:\/\/docs.microsoft.com\/javascript\/api\/@azure\/ai-text-analytics\/?view=azure-node-preview\">Document Reference<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-js\/tree\/master\/sdk\/textanalytics\/ai-text-analytics\/README.md\">README<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-js\/blob\/master\/sdk\/textanalytics\/ai-text-analytics\/samples\/v5\/javascript\/README.md\">Samples<\/a><\/li>\n<li>Python: <a href=\"https:\/\/docs.microsoft.com\/python\/api\/azure-ai-textanalytics\/azure.ai.textanalytics?view=azure-python-preview\">Document Reference<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-python\/tree\/master\/sdk\/textanalytics\/azure-ai-textanalytics\/README.md\">README<\/a> | <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-python\/blob\/master\/sdk\/textanalytics\/azure-ai-textanalytics\/samples\/README.md\">Samples<\/a><\/li>\n<\/ul>\n<p><!-- FOOTER: DO NOT EDIT OR REMOVE --><\/p>\n<p><div  class=\"d-flex justify-content-center\"><a class=\"cta_button_link btn-primary mb-24\" href=\"https:\/\/aka.ms\/azsdk\/releases\" target=\"_blank\">Azure SDK Releases<\/a><\/div><\/p>\n<h2>Azure SDK Blog Contributions<\/h2>\n<p>Thank you for reading this Azure SDK blog post!\nWe hope that you learned something new and welcome you to share this post.\nWe are open to Azure SDK blog contributions.\nPlease contact us at <a href=\"mailto:azsdkblog@microsoft.com\">azsdkblog@microsoft.com<\/a> with your topic and we&#8217;ll get you set up as a guest blogger.<\/p>\n<h2>Azure SDK Links<\/h2>\n<ul>\n<li>Azure SDK Website: <a href=\"https:\/\/aka.ms\/azsdk\">aka.ms\/azsdk<\/a><\/li>\n<li>Azure SDK Intro (3 minute video): <a href=\"https:\/\/aka.ms\/azsdk\/intro\">aka.ms\/azsdk\/intro<\/a><\/li>\n<li>Azure SDK Intro Deck (PowerPoint deck): <a href=\"https:\/\/aka.ms\/azsdk\/intro\/deck\">aka.ms\/azsdk\/intro\/deck<\/a><\/li>\n<li>Azure SDK Releases: <a href=\"https:\/\/aka.ms\/azsdk\/releases\">aka.ms\/azsdk\/releases<\/a><\/li>\n<li>Azure SDK Blog: <a href=\"https:\/\/aka.ms\/azsdk\/blog\">aka.ms\/azsdk\/blog<\/a><\/li>\n<li>Azure SDK Twitter: <a href=\"https:\/\/twitter.com\/AzureSDK\">twitter.com\/AzureSDK<\/a><\/li>\n<li>Azure SDK Design Guidelines: <a href=\"https:\/\/aka.ms\/azsdk\/guide\">aka.ms\/azsdk\/guide<\/a><\/li>\n<li>Azure SDKs &amp; Tools: <a href=\"https:\/\/azure.microsoft.com\/downloads\">azure.microsoft.com\/downloads<\/a><\/li>\n<li>Azure SDK Central Repository: <a href=\"https:\/\/github.com\/azure\/azure-sdk#azure-sdk\">github.com\/azure\/azure-sdk<\/a><\/li>\n<li>Azure SDK for .NET: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-net\">github.com\/azure\/azure-sdk-for-net<\/a><\/li>\n<li>Azure SDK for Java: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-java\">github.com\/azure\/azure-sdk-for-java<\/a><\/li>\n<li>Azure SDK for Python: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-python\">github.com\/azure\/azure-sdk-for-python<\/a><\/li>\n<li>Azure SDK for JavaScript\/TypeScript: <a href=\"https:\/\/github.com\/azure\/azure-sdk-for-js\">github.com\/azure\/azure-sdk-for-js<\/a><\/li>\n<li>Azure SDK for Android: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-android\">github.com\/Azure\/azure-sdk-for-android<\/a><\/li>\n<li>Azure SDK for iOS: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-ios\">github.com\/Azure\/azure-sdk-for-ios<\/a><\/li>\n<li>Azure SDK for Go: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-go\">github.com\/Azure\/azure-sdk-for-go<\/a><\/li>\n<li>Azure SDK for C: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-c\">github.com\/Azure\/azure-sdk-for-c<\/a><\/li>\n<li>Azure SDK for C++: <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-cpp\">github.com\/Azure\/azure-sdk-for-cpp<\/a><\/li>\n<\/ul>\n<p><!-- FOOTER: DO NOT EDIT OR REMOVE --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article introduces the new Text Analytics feature for extractive summarization.<\/p>\n","protected":false},"author":56388,"featured_media":1513,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[160,793,792],"class_list":["post-1511","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sdk","tag-java","tag-preview","tag-text-analytics"],"acf":[],"blog_post_summary":"<p>This article introduces the new Text Analytics feature for extractive summarization.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/1511","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/users\/56388"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/comments?post=1511"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/1511\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media\/1513"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media?parent=1511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/categories?post=1511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/tags?post=1511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}