{"id":38410,"date":"2020-02-15T06:00:43","date_gmt":"2020-02-15T13:00:43","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/premier-developer\/?p=38410"},"modified":"2020-02-21T12:08:20","modified_gmt":"2020-02-21T19:08:20","slug":"creating-words-cloud-for-sentiment-analysis-with-azure-cognitive-services-text-analytics","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/creating-words-cloud-for-sentiment-analysis-with-azure-cognitive-services-text-analytics\/","title":{"rendered":"Creating Words Cloud For Sentiment Analysis With Azure Cognitive Services Text Analytics"},"content":{"rendered":"<p>In a previous blog, <a href=\"https:\/\/devblogs.microsoft.com\/premier-developer\/using-azure-cognitive-services-text-analytics-api-version-3-preview-for-sentiment-analysis\/\">Using Azure Cognitive Services Text Analytics API Version 3 Preview for Sentiment Analysis<\/a>, App Dev Manager Fidelis Ekezue demonstrated how to use the Text Analytics AP Version 3 to analyze the sentiment expressed in the Public Comments of the 2016 <a href=\"https:\/\/public.medicaid.gov\/connect.ti\/public.comments\/questionnaireResults?qid=1886531\">North Carolina\u2019s Medicaid Reform<\/a>. In this blog, I will expand on how Text Analytics API Version 3 Preview of the Microsoft Azure Cognitive Services can be used to further extract more information like the key phrases out of the comments.<\/p>\n<hr \/>\n<p>The previous blog used REST APIs to extract the sentiments from comments, however, this blog will use the recently released new SDKs that greatly simplify coding and hide all the complexities in the previous blog. Using the SDK, I will demonstrate how to use four out of the six functions of Text Analytics to further analyze large unstructured data. As in the previous blog, I will be using the public comments from the North Carolina Medicaid Reform of 2016 to create a visualization in the form of <a href=\"https:\/\/www.boostlabs.com\/what-are-word-clouds-value-simple-visualizations\/\">word clouds<\/a> to highlight the common words used by those with positive, negative or neutral sentiments as analyzed by Text Analytics. Word clouds attempt to extract the most used words in a document and present them in different sizes based on the frequencies of occurrence. The larger the text size the more such words appeared in the document. An example of a word cloud is figure 1 below<\/p>\n<p><img decoding=\"async\" width=\"2500\" height=\"1562\" class=\"wp-image-38452\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-an-example-of-a-word-cloud.png\" alt=\"Image of an example of a word cloud\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-an-example-of-a-word-cloud.png 2500w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-an-example-of-a-word-cloud-300x187.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-an-example-of-a-word-cloud-1024x640.png 1024w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-an-example-of-a-word-cloud-768x480.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-an-example-of-a-word-cloud-1536x960.png 1536w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-an-example-of-a-word-cloud-2048x1280.png 2048w\" sizes=\"(max-width: 2500px) 100vw, 2500px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 1: Example of a word cloud<\/span><\/em><\/p>\n<p>Given that the Text Analytics does not produce word clouds without any code, I developed a small python code in Jupyter notebook to do the following:<\/p>\n<ul>\n<li>Read the CSV file into a Pandas data frame<\/li>\n<li>Cleanse the data to eliminate blank rows, obvious duplicates and irrelevant data like attachment-only comments and non-English language comments. This step leverages the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/text-analytics\/quickstarts\/text-analytics-sdk?tabs=version-3&amp;pivots=programming-language-python#language-detection\">Language Detection<\/a> function of Text Analytics SDK.<\/li>\n<li>Using the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/text-analytics\/quickstarts\/text-analytics-sdk?tabs=version-3&amp;pivots=programming-language-python#sentiment-analysis\">Sentiment Analysis<\/a> function of the Text Analytics SDK, analyze the cleaned data to retrieve the sentiments expressed by each comment in the data frame.<\/li>\n<li>Generate stop words \u2013 These are words that will be excluded from the visualizations. Building the STOPWORDS required either using the <a href=\"https:\/\/www.nltk.org\/index.html\">NLTK<\/a> STOPWORDS or the <a href=\"http:\/\/members.unine.ch\/jacques.savoy\/clef\/englishST.txt\">Unine.ch EnglishST<\/a> STOPWORDS. For this blog, I will be using the latter. Additionally, since the data for this blog is from North Carolina and related to the Public Comments for Medicaid Reform, some specific information like city names, businesses, persons that were part of the comments data were added to the STOPWORDS. The <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/text-analytics\/quickstarts\/text-analytics-sdk?tabs=version-3&amp;pivots=programming-language-python#named-entity-recognition-ner\">Named Entity Recognition<\/a> function of the SDK will be used to create a list of the entities that are organization, location or person. This list in addition to such words as Medicaid, Healthcare, Reform, etc. will be added to the generated STOPWORDS list. This ensured that the final word list, generated by using the SDK <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cognitive-services\/text-analytics\/quickstarts\/text-analytics-sdk?tabs=version-3&amp;pivots=programming-language-python#key-phrase-extraction\">Key Phrases Extraction<\/a>, used for the words cloud visualization will exclude these entities.<\/li>\n<\/ul>\n<p>The python code does not perform any error checking and it\u2019s not tuned for performance, but they will be available as a Jupyter notebook on my <a href=\"https:\/\/github.com\/fekezue\/text-analytics-word-cloud\">GitHub page<\/a>.<\/p>\n<h2>Create Azure Resource Group<\/h2>\n<p><img decoding=\"async\" width=\"721\" height=\"357\" class=\"wp-image-38454\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-screen-to-create-a-resource-group-in-azur.png\" alt=\"Image of screen to create a resource group in Azure\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-screen-to-create-a-resource-group-in-azur.png 721w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-screen-to-create-a-resource-group-in-azur-300x149.png 300w\" sizes=\"(max-width: 721px) 100vw, 721px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 2: Create Azure Resource Group<\/span><\/em><\/p>\n<h2>Create Azure Cognitive Services<\/h2>\n<p><img decoding=\"async\" width=\"989\" height=\"443\" class=\"wp-image-38455\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-how-to-create-azure-cognitive-services.png\" alt=\"Image of how to create Azure Cognitive Services\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-how-to-create-azure-cognitive-services.png 989w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-how-to-create-azure-cognitive-services-300x134.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-how-to-create-azure-cognitive-services-768x344.png 768w\" sizes=\"(max-width: 989px) 100vw, 989px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 3: Create Azure Cognitive Service<\/span><\/em><\/p>\n<p>Supply the name, subscription, location, service plan and resource group. For the resource group, use the resource group created in the figure 2 and then click Create.<\/p>\n<p><img decoding=\"async\" width=\"1826\" height=\"574\" class=\"wp-image-38456\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-to-supply-the-name-subscription-location.png\" alt=\"Image to supply the name, subscription, location, service plan and resource group\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-to-supply-the-name-subscription-location.png 1826w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-to-supply-the-name-subscription-location-300x94.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-to-supply-the-name-subscription-location-1024x322.png 1024w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-to-supply-the-name-subscription-location-768x241.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-to-supply-the-name-subscription-location-1536x483.png 1536w\" sizes=\"(max-width: 1826px) 100vw, 1826px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 4: Provide name, subscription, location, service plan and resource group<\/span><\/em><\/p>\n<p>After the resource is created, click on the Go to resource<\/p>\n<p><img decoding=\"async\" width=\"813\" height=\"223\" class=\"wp-image-38457\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-showing-your-deployment-of-the-azure-cogniti.png\" alt=\"Image showing your deployment of the Azure Cognitive Services is complete\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-showing-your-deployment-of-the-azure-cogniti.png 813w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-showing-your-deployment-of-the-azure-cogniti-300x82.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-showing-your-deployment-of-the-azure-cogniti-768x211.png 768w\" sizes=\"(max-width: 813px) 100vw, 813px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 5: Deployment complete screen<\/span><\/em><\/p>\n<h2>Make Note of the Key and Endpoint<\/h2>\n<p><img decoding=\"async\" width=\"611\" height=\"262\" class=\"wp-image-38458\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-the-azure-cognitive-services-key-and-endp.png\" alt=\"Image of the Azure Cognitive Services key and endpoint information.\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-the-azure-cognitive-services-key-and-endp.png 611w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/image-of-the-azure-cognitive-services-key-and-endp-300x129.png 300w\" sizes=\"(max-width: 611px) 100vw, 611px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 6: Make note of the key and endpoint<\/span><\/em><\/p>\n<p>Also, note that the Text Analytics API offers six different endpoints (as of the time of this writing in February 2020). These endpoints are:<\/p>\n<ul>\n<li><a href=\"https:\/\/westus2.dev.cognitive.microsoft.com\/docs\/services\/TextAnalytics-v3-0-Preview-1\/operations\/Languages\">Detect Language<\/a> \u2013 Detects the language in a document and the score between 0 and 1, with 1 being 100% certainty.<\/li>\n<li><a href=\"https:\/\/westus2.dev.cognitive.microsoft.com\/docs\/services\/TextAnalytics-v3-0-Preview-1\/operations\/EntitiesRecognitionGeneral\">Entities<\/a> \u2013 Identify known entities, for example, Person, PersonType, Organization, Location, Currency, Datetime, etc., in a document.<\/li>\n<li><a href=\"https:\/\/westus2.dev.cognitive.microsoft.com\/docs\/services\/TextAnalytics-v3-0-Preview-1\/operations\/EntitiesLinking\">Entity Linking<\/a> \u2013 Returns a list of recognized entities with links to a well-known knowledge base\u201d<\/li>\n<li><a href=\"https:\/\/westus2.dev.cognitive.microsoft.com\/docs\/services\/TextAnalytics-v3-0-Preview-1\/operations\/EntitiesRecognitionPii\">EntityPII<\/a> \u2013 Return know PII entities, for example, Credit Card Number, Driver\u2019s License, etc., in each document<\/li>\n<li><a href=\"https:\/\/westus2.dev.cognitive.microsoft.com\/docs\/services\/TextAnalytics-v3-0-Preview-1\/operations\/KeyPhrases\">Key Phrases<\/a> \u2013 Return the list key words in a document.<\/li>\n<li><a href=\"https:\/\/westus2.dev.cognitive.microsoft.com\/docs\/services\/TextAnalytics-v3-0-Preview-1\/operations\/Sentiment\">Sentiment<\/a> \u2013 Returns the overall predicted sentiment of the given document. Additionally, the API also predict individual sentences in the document. At the sentence level, the prediction is either positive, negative or neutral. At the document level the sentiment prediction can be one of the following:\n<ul>\n<li>Mixed \u2013 When the document has multiple sentences and there is at least one sentence with Positive sentiment and at least one sentence with negative sentiment. For example, if a document has two sentences, if the sentiment of one of the sentences is negative and the other is positive. The document sentiment is assumed to be Mixed.<\/li>\n<li>Positive \u2013 The entire document has positive sentiment<\/li>\n<li>Negative \u2013 The entire document has negative sentiment<\/li>\n<li>Neutral \u2013 The sentiment expressed is neither negative nor positive<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2>Load All the Necessary Libraries in Jupyter Notebook<\/h2>\n<p>The code is written in Jupyter Notebook from <a href=\"https:\/\/www.anaconda.com\/distribution\/\">Anaconda<\/a> and the entire notebook is available on my GitHub page. To use the SDK, you must install the library using the following command from the console:<\/p>\n<pre class=\"lang:default decode:true \">pip install azure-ai-textanalytics<\/pre>\n<p>After this, the next important step is to setup the variables needed as follows using the information from figure 6 above:<\/p>\n<pre class=\"lang:default decode:true\">key = \"&lt;paste-your-text-analytics-key-here&gt;\"\r\nendpoint = \"&lt;paste-your-text-analytics-endpoint-here&gt;\"<\/pre>\n<h2>Helper functions<\/h2>\n<p>The following functions were created in python to help with the various aspect of loading the comment files into a Pandas data frame and cleaning the data up for use to generate the words cloud. You can download the full listing of the python code at my <a href=\"https:\/\/github.com\/fekezue\/sentiment-analysis-with-text-analytics-api-version3-preview\">GitHub page<\/a>.<\/p>\n<ul>\n<li>get_text_analytics_client \u2013 Create a client service of the Text Analytics SDK.<\/li>\n<li>get_comment_sentiment \u2013 Analyze comments and return the overall sentiment.<\/li>\n<li>language_detection \u2013 Using the SDK, identify the language of the comments.<\/li>\n<li>get_comment_keyphrases \u2013 Extracts key phrases from the comments<\/li>\n<li>get_comment_entities \u2013 Extracts well-known entities like name, organization, location, etc. from the comments<\/li>\n<li>get_stopwords &#8212; Generate external STOP WORDS from this site: <a href=\"http:\/\/members.unine.ch\/jacques.savoy\/clef\/englishST.txt\">http:\/\/members.unine.ch\/jacques.savoy\/clef\/englishST.txt<\/a>. And then add the domain specific STOP WORDS from the get_comment_entities function above.<\/li>\n<li>generate_commentSTwords \u2013 Generate specific stop words that pertains to the comments in the CSV file.<\/li>\n<li>create_word_cloud &#8212; Generates a word cloud visualization and optionally write it to .png file, if specified.<\/li>\n<\/ul>\n<h2>Positive Sentiment Word Cloud<\/h2>\n<p><img decoding=\"async\" width=\"2024\" height=\"1092\" class=\"wp-image-38459\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-positive-sentiments.png\" alt=\"Word cloud image of the positive sentiments\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-positive-sentiments.png 2024w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-positive-sentiments-300x162.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-positive-sentiments-1024x552.png 1024w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-positive-sentiments-768x414.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-positive-sentiments-1536x829.png 1536w\" sizes=\"(max-width: 2024px) 100vw, 2024px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 7: Word cloud of the positive sentiments<\/span><\/em><\/p>\n<h2>Negative Sentiment Words Cloud<\/h2>\n<p><img decoding=\"async\" width=\"2018\" height=\"1100\" class=\"wp-image-38460\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-negative-sentiments.png\" alt=\"Word cloud image of the negative sentiments\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-negative-sentiments.png 2018w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-negative-sentiments-300x164.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-negative-sentiments-1024x558.png 1024w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-negative-sentiments-768x419.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2020\/02\/word-cloud-image-of-the-negative-sentiments-1536x837.png 1536w\" sizes=\"(max-width: 2018px) 100vw, 2018px\" \/><\/p>\n<p><em><span style=\"font-size: 10pt;\">Figure 8: Word cloud for negative sentiments<\/span><\/em><\/p>\n<p>Download the source code <a href=\"https:\/\/github.com\/fekezue\/text-analytics-word-cloud\">here<\/a>.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, App Dev Manager Fidelis Ekezue explores Azure Cognitive Services Text Analytics.<\/p>\n","protected":false},"author":582,"featured_media":38419,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[25,1],"tags":[24,72,3],"class_list":["post-38410","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure","category-permierdev","tag-azure","tag-cognitive-services","tag-team"],"acf":[],"blog_post_summary":"<p>In this post, App Dev Manager Fidelis Ekezue explores Azure Cognitive Services Text Analytics.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/38410","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/582"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=38410"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/38410\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/38419"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=38410"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=38410"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=38410"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}