{"id":77751,"date":"2016-04-12T00:01:15","date_gmt":"2016-04-12T07:01:15","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/?p=77751"},"modified":"2019-02-18T09:10:50","modified_gmt":"2019-02-18T16:10:50","slug":"read-a-text-tile-and-do-frequency-analysis-using-powershell","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/read-a-text-tile-and-do-frequency-analysis-using-powershell\/","title":{"rendered":"Read a text file and do frequency analysis by using PowerShell"},"content":{"rendered":"<p><strong>Summary<\/strong>: Learn how to read a text file and do a letter-frequency analysis using Windows PowerShell in this article written by the Microsoft Scripting Guy, Ed Wilson.<\/p>\n<p>This is the third post in a multi-part series of blog posts that deal with how to determine letter frequency in text files. To fully understand this post, you should read the entire series in order.<\/p>\n<p>Here are the posts in the series:<\/p>\n<ol>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/08\/letter-frequency-analysis-of-a-text-using-powershell\/\" target=\"_blank\">Letter frequency analysis of text by using PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/11\/how-to-skip-the-beginning-and-ending-of-a-file-with-powershell\/\" target=\"_blank\">How to skip the beginning and ending of a file with PowerShell<\/a><\/li>\n<li>Read a text file and do frequency analysis by using PowerShell<\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/13\/compare-the-letter-frequency-of-two-text-files-with-powershell\/\" target=\"_blank\">Compare the letter frequency of two text files by using PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/14\/calculate-percentage-character-frequencies-from-a-text-file-by-using-powershell\/\" target=\"_blank\">Calculate percentage character frequencies from a text file by using PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/15\/additional-resources-for-text-analysis-by-using-powershell\/\" target=\"_blank\">Additional resources for text analysis by using PowerShell<\/a><\/li>\n<\/ol>\n<p>Today I am going to put the script I wrote yesterday together with the script that I wrote on Friday. After I do that, I will be able to get a more accurate letter-frequency analysis of a text file. The code that I wrote the other day reads a text file by using the <strong>Get-Content<\/strong> cmdlet. Then I join the strings together so that I can have a single string to parse. I then convert the script to all uppercase, get the enumerator, group my results, and sort my results.<\/p>\n<p>So, first of all, here is the basic letter-frequency analysis code that I wrote the other day:<\/p>\n<p style=\"padding-left: 30px\"><code>$a = Get-Content C:\\fso\\ATaleOfTwoCities.txt\n$a.Count\n$ajoined = $a -join \"`r\"\n$ajoinedUC = $ajoined.ToUpper()\n$ajoinedUC.GetEnumerator() | group -NoElement | sort count -Descending<\/code><\/p>\n<h2>Put the script together<\/h2>\n<p>The first thing I do is copy the code to a blank page in my Windows PowerShell integrated scripting environment (ISE). This is shown here:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-01.png\"><img decoding=\"async\" class=\"alignnone size-large wp-image-77753\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-01-1024x705.png\" alt=\"Screenshot of the basic letter-frequency analysis code in the Windows PowerShell ISE.\" width=\"879\" height=\"605\" \/><\/a><\/p>\n<p>Now I need to take the code that I wrote yesterday. This code removes the beginning and ending portions of the text file.<\/p>\n<p style=\"padding-left: 30px\"><code>$a= Get-Content 'C:\\fso\\MobyDick.txt'<\/code><\/p>\n<p style=\"padding-left: 30px\">$array = @()\nfor ($i = 0; $i -lt $a.Count; $i++)\n{\nIf ($a[$i] -cmatch &#8216;START&#8217;)\n{$array +=$i }\nIf ($a[$i] -like &#8220;End of *Project*&#8221;)\n{$array += $i }\n}<\/p>\n<p style=\"padding-left: 30px\">$start = $array[0] +7\n$end = $array[1] -1\n$a[$start .. $end]<\/p>\n<p>This script also reads the text file. It then creates an empty array, loops through the text, and looks for start and end strings. It then saves the line numbers that it finds so that I can use array notation to return a range of text from the file.<\/p>\n<p>I paste this code at the beginning of my new script page because I need to grab the\u00a0correct text BEFORE I convert it all to a single line of text, convert it to uppercase, and count the letters. So, at this point, my script appears as shown here:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-02.png\"><img decoding=\"async\" class=\"alignnone size-large wp-image-77763\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-02-1024x705.png\" alt=\"Screenshot of yesterday\u2019s code pasted before the basic letter-frequency analysis code in the Windows PowerShell ISE.\" width=\"879\" height=\"605\" \/><\/a><\/p>\n<h2>Clean up the code<\/h2>\n<p>Well, there are some redundancies. The code as it stands is shown here:<\/p>\n<p style=\"padding-left: 30px\"><code>$a= Get-Content 'C:\\fso\\MobyDick.txt'<\/code><\/p>\n<p style=\"padding-left: 30px\">$array = @()\nfor ($i = 0; $i -lt $a.Count; $i++)\n{\nIf ($a[$i] -cmatch &#8216;START&#8217;)\n{$array +=$i }\nIf ($a[$i] -like &#8220;End of *Project*&#8221;)\n{$array += $i }\n}<\/p>\n<p style=\"padding-left: 30px\">$start = $array[0] +7\n$end = $array[1] -1\n$a[$start .. $end]<\/p>\n<p style=\"padding-left: 30px\">$a = Get-Content C:\\fso\\ATaleOfTwoCities.txt\n$a.Count\n$ajoined = $a -join &#8220;`r&#8221;\n$ajoinedUC = $ajoined.ToUpper()\n$ajoinedUC.GetEnumerator() | group -NoElement | sort count -Descending<\/p>\n<p>So, the obvious duplication is the second <strong>Get-Content<\/strong> line. I delete it, and my script is shown here:<\/p>\n<p style=\"padding-left: 30px\"><code>$a= Get-Content 'C:\\fso\\MobyDick.txt'<\/code><\/p>\n<p style=\"padding-left: 30px\">$array = @()\nfor ($i = 0; $i -lt $a.Count; $i++)\n{\nIf ($a[$i] -cmatch &#8216;START&#8217;)\n{$array +=$i }\nIf ($a[$i] -like &#8220;End of *Project*&#8221;)\n{$array += $i }\n}<\/p>\n<p style=\"padding-left: 30px\">$start = $array[0] +7\n$end = $array[1] -1\n$a[$start .. $end]<\/p>\n<p style=\"padding-left: 30px\">$a.Count\n$ajoined = $a -join &#8220;`r&#8221;\n$ajoinedUC = $ajoined.ToUpper()\n$ajoinedUC.GetEnumerator() | group -NoElement | sort count -Descending<\/p>\n<p>The next thing I need to do is to delete the <code>$a.count<\/code> line because I do not need it either. The script now is shown here:<\/p>\n<p style=\"padding-left: 30px\"><code>$a= Get-Content 'C:\\fso\\MobyDick.txt'<\/code><\/p>\n<p style=\"padding-left: 30px\">$array = @()\nfor ($i = 0; $i -lt $a.Count; $i++)\n{\nIf ($a[$i] -cmatch &#8216;START&#8217;)\n{$array +=$i }\nIf ($a[$i] -like &#8220;End of *Project*&#8221;)\n{$array += $i }\n}<\/p>\n<p style=\"padding-left: 30px\">$start = $array[0] +7\n$end = $array[1] -1\n$a[$start .. $end]<\/p>\n<p style=\"padding-left: 30px\">$ajoined = $a -join &#8220;`r&#8221;\n$ajoinedUC = $ajoined.ToUpper()\n$ajoinedUC.GetEnumerator() | group -NoElement | sort count -Descending<\/p>\n<p>The last thing I need to do is to store the result of grabbing my text from array notation. So that I do not need to modify my copied frequency code, I simply store the <code>$a[$start ... $end]<\/code> code back into the <code>$a<\/code> variable. This revised line is shown here:<\/p>\n<p style=\"padding-left: 30px\"><code>$a = $a[$start .. $end]<\/code><\/p>\n<p>The entire script is shown here:<\/p>\n<p style=\"padding-left: 30px\"><code>$a= Get-Content 'C:\\fso\\MobyDick.txt'<\/code><\/p>\n<p style=\"padding-left: 30px\">$array = @()\nfor ($i = 0; $i -lt $a.Count; $i++)\n{\nIf ($a[$i] -cmatch &#8216;START&#8217;)\n{$array +=$i }\nIf ($a[$i] -like &#8220;End of *Project*&#8221;)\n{$array += $i }\n}<\/p>\n<p style=\"padding-left: 30px\">$start = $array[0] +7\n$end = $array[1] -1\n$a = $a[$start .. $end]<\/p>\n<p style=\"padding-left: 30px\">$ajoined = $a -join &#8220;`r&#8221;\n$ajoinedUC = $ajoined.ToUpper()\n$ajoinedUC.GetEnumerator() | group -NoElement | sort count -Descending<\/p>\n<p>The script is shown here in the ISE:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-03.png\"><img decoding=\"async\" class=\"alignnone size-large wp-image-77773\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-03-1024x705.png\" alt=\"Screenshot of the entire edited script in the Windows PowerShell ISE.\" width=\"879\" height=\"605\" \/><\/a><\/p>\n<p>The output from this script is shown here:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-04.png\"><img decoding=\"async\" class=\"alignnone size-large wp-image-77781\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-12-16-04-1024x726.png\" alt=\"Screenshot of output of the script.\" width=\"879\" height=\"623\" \/><\/a><\/p>\n<p>I invite you to follow me on <a href=\"http:\/\/bit.ly\/scriptingguystwitter\" target=\"_blank\">Twitter<\/a> and <a href=\"http:\/\/bit.ly\/scriptingguysfacebook\" target=\"_blank\">Facebook<\/a>. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the <a href=\"http:\/\/bit.ly\/scriptingforum\" target=\"_blank\">Official Scripting Guys Forum<\/a>. Also check out my <a href=\"https:\/\/blogs.technet.microsoft.com\/msoms\/\" target=\"_blank\">Microsoft Operations Management Suite Blog<\/a>. See you tomorrow. Until then, peace.<\/p>\n<p><strong>Ed Wilson<\/strong>\nMicrosoft Scripting Guy<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: Learn how to read a text file and do a letter-frequency analysis using Windows PowerShell in this article written by the Microsoft Scripting Guy, Ed Wilson. This is the third post in a multi-part series of blog posts that deal with how to determine letter frequency in text files. To fully understand this post, [&hellip;]<\/p>\n","protected":false},"author":596,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[568,683,641],"tags":[3,4,617,45],"class_list":["post-77751","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hey-scripting-guy","category-text-files","category-windows-powershell","tag-scripting-guy","tag-scripting-techniques","tag-text","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Summary: Learn how to read a text file and do a letter-frequency analysis using Windows PowerShell in this article written by the Microsoft Scripting Guy, Ed Wilson. This is the third post in a multi-part series of blog posts that deal with how to determine letter frequency in text files. To fully understand this post, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/77751","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/596"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=77751"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/77751\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=77751"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=77751"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=77751"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}