{"id":77811,"date":"2016-04-14T00:01:51","date_gmt":"2016-04-14T07:01:51","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/?p=77811"},"modified":"2019-02-18T09:10:49","modified_gmt":"2019-02-18T16:10:49","slug":"calculate-percentage-character-frequencies-from-a-text-file-by-using-powershell","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/calculate-percentage-character-frequencies-from-a-text-file-by-using-powershell\/","title":{"rendered":"Calculate percentage character frequencies from a text file by using PowerShell"},"content":{"rendered":"<p><strong>Summary<\/strong>: Learn how to use Windows PowerShell to calculate the percentage of how often a character appears in a text file.<\/p>\n<p>This is the fifth post in a multi-part series of blog posts that deal with how to determine letter frequency in text files. To fully understand this post, you should read the entire series in order.<\/p>\n<p>Here are the posts in the series:<\/p>\n<ol>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/08\/letter-frequency-analysis-of-a-text-using-powershell\/\" target=\"_blank\">Letter frequency analysis of text by using PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/11\/how-to-skip-the-beginning-and-ending-of-a-file-with-powershell\/\" target=\"_blank\">How to skip the beginning and ending of a file with PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/12\/read-a-text-tile-and-do-frequency-analysis-using-powershell\/\" target=\"_blank\">Read a text file and do frequency analysis by using PowerShell<\/a><\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/13\/compare-the-letter-frequency-of-two-text-files-with-powershell\/\" target=\"_blank\">Compare the letter frequency of two text files by using PowerShell<\/a><\/li>\n<li>Calculate percentage character frequencies from a text file by using PowerShell<\/li>\n<li><a href=\"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2016\/04\/15\/additional-resources-for-text-analysis-by-using-powershell\/\" target=\"_blank\">Additional resources for text analysis by using PowerShell<\/a><\/li>\n<\/ol>\n<p>Okay, I will admit that I am just playing around, but I wanted to calculate the percentages of letter frequencies in a text file. For this example, I am using <em>A Tale of Two Cities <\/em>as a text file. You should refer to earlier blog articles about this topic so that what I write will have a chance of making sense.<\/p>\n<h2>Create a header for my report<\/h2>\n<p>The first thing I want to do is create a header for my report that I will display. To do this, I create an expanding <strong>here<\/strong> string. The basic <strong>here<\/strong> string is a bit fussy, but it begins with <span style=\"color: #000000\">@\u201d<\/span> that is immediately followed by a return. Then it ends a new line that has <span style=\"color: #000000\">\u201c@<\/span>.<\/p>\n<p>Here&#8217;s the <strong>here<\/strong> string I use. The thing that is pretty cool is that it is the first thing I am defining in my script, and so neither the value of the <code>$path<\/code> nor the total number of characters are yet determined. These will be evaluated when it is time to display the header. Until then, here\u2019s the <strong>here<\/strong> string:<\/p>\n<p style=\"padding-left: 30px\"><span style=\"color: #000000\">$header = @&#8221;<\/span>\n<span style=\"color: #000000\"> ****************************************************************<\/span>\n<span style=\"color: #000000\"> |<\/span>\n<span style=\"color: #000000\"> | Letter Frequency Analysis<\/span>\n<span style=\"color: #000000\"> | of $path<\/span>\n<span style=\"color: #000000\"> | Analyzing $($total) characters &#8230;<\/span>\n<span style=\"color: #000000\"> |<\/span>\n<span style=\"color: #000000\"> ****************************************************************<\/span>\n<span style=\"color: #000000\"> &#8220;@<\/span><\/p>\n<h2>Read the contents of the file and count the characters<\/h2>\n<p>The next thing I need to do is to read the contents of the text file and count the characters in the file. I assign the path of\u00a0my file to a variable that I name <span style=\"color: #000000\">$path<\/span>. I then use <strong>Get-Content<\/strong> to read the contents of the file, and I store the results in the <span style=\"color: #000000\">$a<\/span> variable. I now want to count how many characters are in the file. Because I have the entire contents of the text file in the <span style=\"color: #000000\">$a<\/span> variable, I can use that. Although, the <span style=\"color: #800080\">count<\/span> property contains the number of lines in the file, it does not contain the number of characters. The easy way to obtain this information is to use the <strong>Measure-Object<\/strong> cmdlet and call the <strong>-Character<\/strong> switch to cause it to count characters. I then directly access the <strong>Characters<\/strong> property from the object returned from the<strong> Measure-Object<\/strong> cmdlet and store the number of characters in the<span style=\"color: #000000\"> $total<\/span> variable. I then display the header <strong>here<\/strong> string that I previously stored in the <span style=\"color: #000000\">$header<\/span> variable. This code is shown here:<\/p>\n<p style=\"padding-left: 30px\"><span style=\"color: #000000\">$path = &#8216;C:\\fso\\ATaleOfTwoCities.txt&#8217;<\/span>\n<span style=\"color: #000000\"> $a = Get-Content $path<\/span>\n<span style=\"color: #000000\"> $total = ($a | measure -Character).characters<\/span>\n<span style=\"color: #000000\"> $header<\/span><\/p>\n<p>The code that I use to count the frequency of the characters was explained in an earlier article, so the code is shown here without additional comment:<\/p>\n<p style=\"padding-left: 30px\"><span style=\"color: #000000\">$ajoined = $a -join &#8220;`r&#8221;<\/span>\n<span style=\"color: #000000\"> $ajoinedUC = $ajoined.ToUpper()<\/span>\n<span style=\"color: #000000\"> $ajoinedUC.GetEnumerator() |<\/span>\n<span style=\"color: #000000\"> group -NoElement | sort count -Descending<\/span><\/p>\n<h2>Use custom properties in Select-Object to get percentages<\/h2>\n<p>So, now I use the <strong>Select-Object<\/strong> cmdlet, and I compute some custom properties to display. This is a great technique that works well with the pipeline. It takes the form of the following:<\/p>\n<p style=\"padding-left: 30px\"><code>@{ LABEL = STRING ; EXPRESSION = SCRIPTBLOCK}<\/code><\/p>\n<p>The first custom property simply displays the <strong>Name<\/strong> of the character. I add a column heading called \u201cCharacter\u201d, and under that column heading, I will display each character.<\/p>\n<p>For the second property, I use the <strong>Count<\/strong> property that comes from my <strong>Group-Object<\/strong> cmdlet that groups all of the characters together. My <strong>Sort-Object<\/strong> command sorts these from largest number to smallest number. I place the <strong>Count<\/strong> property below a column heading that I call \u201cFrequency\u201d.<\/p>\n<p>The last property is the most complex. I add a column heading called \u201cPercent\u201d. I calculate the percentage of representation by dividing the letter frequency by the total number of characters. I then use the built-in <strong>Percentage<\/strong> format specifier,\u00a0<strong>p<\/strong>, in conjunction with the <strong>-f<\/strong> format operator. I tell it to calculate percentage from my number and to display it to two decimal places of accuracy. The total <strong>Select<\/strong> statement is shown here:<\/p>\n<p style=\"padding-left: 30px\"><span style=\"color: #000000\">Select @{L = &#8216;Character&#8217;; E = {$_.Name} },<\/span>\n<span style=\"color: #000000\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 @{L = &#8216;Frequency&#8217; ; E = {$_.count} },<\/span>\n<span style=\"color: #000000\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 @{L = &#8216;Percent&#8217; ; E = {&#8220;{0:p2}&#8221; -f ($_.count \/ $total)}}<\/span><\/p>\n<h2>The complete script<\/h2>\n<p>The complete code is shown here:<\/p>\n<p style=\"padding-left: 30px\">$header = @&#8221;\n****************************************************************\n|\n| Letter Frequency Analysis\n| of $path\n| Analyzing $($total) characters &#8230;\n|\n****************************************************************\n&#8220;@\n$path = &#8216;C:\\fso\\ATaleOfTwoCities.txt&#8217;\n$a = Get-Content $path\n$total = ($a | measure -Character).characters\n$header\n$ajoined = $a -join &#8220;`r&#8221;\n$ajoinedUC = $ajoined.ToUpper()\n$ajoinedUC.GetEnumerator() |\ngroup -NoElement | sort count -Descending |\n<span style=\"color: #000000\">Select @{L = &#8216;Character&#8217;; E = {$_.Name} },<\/span>\n<span style=\"color: #000000\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 @{L = &#8216;Frequency&#8217; ; E = {$_.count} },<\/span>\n<span style=\"color: #000000\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 @{L = &#8216;Percent&#8217; ; E = {&#8220;{0:p2}&#8221; -f ($_.count \/ $total)}}<\/span><\/p>\n<p>The script is in the figure here:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-14-16-01.png\"><img decoding=\"async\" class=\"alignnone size-large wp-image-77813\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-14-16-01-1024x656.png\" alt=\"Screenshot of completed script in PowerShell ISE.\" width=\"879\" height=\"563\" \/><\/a><\/p>\n<p>When I run the script, the following output is shown:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-14-16-02.png\"><img decoding=\"async\" class=\"alignnone size-large wp-image-77821\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/HSG-4-14-16-02-1024x882.png\" alt=\"Screenshot of output of script.\" width=\"879\" height=\"757\" \/><\/a><\/p>\n<p>I invite you to follow me on <a href=\"http:\/\/bit.ly\/scriptingguystwitter\" target=\"_blank\">Twitter<\/a> and <a href=\"http:\/\/bit.ly\/scriptingguysfacebook\" target=\"_blank\">Facebook<\/a>. If you have any questions, send email to me at <a href=\"mailto:scripter@microsoft.com\" target=\"_blank\">scripter@microsoft.com<\/a>, or post your questions on the <a href=\"http:\/\/bit.ly\/scriptingforum\" target=\"_blank\">Official Scripting Guys Forum<\/a>. Also check out my <a href=\"https:\/\/blogs.technet.microsoft.com\/msoms\/\" target=\"_blank\">Microsoft Operations Management Suite Blog<\/a>. See you tomorrow. Until then, peace.<\/p>\n<p><strong>Ed Wilson<\/strong>\nMicrosoft Scripting Guy<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: Learn how to use Windows PowerShell to calculate the percentage of how often a character appears in a text file. This is the fifth post in a multi-part series of blog posts that deal with how to determine letter frequency in text files. To fully understand this post, you should read the entire series [&hellip;]<\/p>\n","protected":false},"author":596,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[568,684,678,641],"tags":[416,164,3,617,45],"class_list":["post-77811","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hey-scripting-guy","category-math","category-text","category-windows-powershell","tag-formatting-output","tag-math","tag-scripting-guy","tag-text","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Summary: Learn how to use Windows PowerShell to calculate the percentage of how often a character appears in a text file. This is the fifth post in a multi-part series of blog posts that deal with how to determine letter frequency in text files. To fully understand this post, you should read the entire series [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/77811","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/596"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=77811"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/77811\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=77811"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=77811"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=77811"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}