Building a solution on Azure Government Part II: HDInsight
In the previous post I went over how to use Cognitive services to get a document translated, let’s do some simple text analytics on the document. Specifically, we’ll get some simple word counts so we can see how frequently the key words are used.
We can easily provision HDInsight clusters in the Microsoft Azure Government portal. The screen shot below shows a 4-node Spark cluster provisioned on Linux.
When an HDInsight cluster is provisioned, an Azure storage account is provisioned for the underlying storage. Let’s connect to the storage with the Microsoft Azure Storage Explorer so that we can upload our translated text document. The Azure Storage Explorer is an easy-to-use tool, that enables developers to easily interact with Azure Storage (i.e., blobs, queues, tables). HDInsight creates an “HdiSamples” directory – let’s create our own sub-directory inside that called “TheArtOfWar” where we’ll upload our text file
Now that our document is ready, let’s go back to the Azure portal and click “Jupyter Notebook” to open the Jupyter dashboard. We can then click “New – PySpark” notebook to open a new notebook. We want to take the following steps:
- Load the data from the text file in storage
- Get a complete list of words
- Get a count of each distinct word
- Filter by only the top results (i.e., words that occur at least 10 times)
- Create a schema to hold our data structure
- Create a table in HDInsight with the data based on the schema
The complete logic for those steps can be seen here:
Now that we have the data in an easy to use table, we can run SQL queries on the data and visualize results in HDInsight:
Now that we’ve parsed our data in HDInsight, the next and final part will cover how to use Power BI to provide additional visualizations on the data.
We welcome your comments and suggestions to help us continually improve your Azure Government experience. To stay up to date on all things Azure Government, be sure to subscribe to our RSS feed and to receive emails, click “Subscribe by Email!” on the Azure Government Blog. To experience the power of Azure Government for your organization, sign up for an Azure Government Trial.