{"id":65423,"date":"2007-02-28T01:04:00","date_gmt":"2007-02-28T01:04:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2007\/02\/28\/how-can-i-tally-up-all-the-words-found-in-a-text-file\/"},"modified":"2007-02-28T01:04:00","modified_gmt":"2007-02-28T01:04:00","slug":"how-can-i-tally-up-all-the-words-found-in-a-text-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-tally-up-all-the-words-found-in-a-text-file\/","title":{"rendered":"How Can I Tally Up All the Words Found in a Text File?"},"content":{"rendered":"<p><H2><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" height=\"34\" alt=\"Hey, Scripting Guy! Question\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"> <\/H2>\n<P>Hey, Scripting Guy! While browsing the Internet I found a script that showed me how to get a list of all the unique words in a text file. That\u2019s useful, but I\u2019d like to go one step further: how can I determine the number of times each of those words occurs??<BR><BR>&#8212; TZ<\/P><IMG height=\"5\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" border=\"0\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" height=\"34\" alt=\"Hey, Scripting Guy! Answer\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" height=\"288\" alt=\"Script Center\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" align=\"right\" border=\"0\"><\/A> \n<P>Hey, TZ. As it turns out, that\u2019s a lot easier than you might think. In fact, all you have to do is \u2013 hang on a second, we just got an email. And not just any email: based on the subject line \u2013 <B>underpaid and not appreciated?<\/B> \u2013 this must be a legitimate email that\u2019s truly intended for the Scripting Guy who writes this column. Let\u2019s see what it says:<\/P><PRE class=\"codeSample\">I wanted to write and tlel you about a great new De!gree program, i tried it out and it worked!<\/p>\n<p>I got my M!asters in 2 weeks ;]. Call the folowing number, this program really works great i was very surprised!\n<\/PRE>\n<P>Now <I>that\u2019s<\/I> a good deal. Although you might find this hard to believe, the Scripting Guy who writes this column already <I>has<\/I> a Masters degree, and from the University of Washington to boot. (Further proof that the value of a college education is highly overrated.) But the Scripting Guy who writes this column didn\u2019t get his degree in just two weeks; instead, it took him almost two years. Furthermore, in <I>his<\/I> degree program he had to know spelling and grammar along with everything else. This new program sounds <I>way<\/I> better! And while we aren\u2019t positive that a M!asters degree is the same thing as a Masters degree, as long as it enables the Scripting Guy who writes this column to get the money and appreciation he deserves, well \u2026.<\/P>\n<P>Anyway, it looks like the Scripting Guy who writes this column will be going back to college, at least for two weeks anyway. That means he has a lot to do: buy some cinder blocks to build a bookcase; stock his cupboards with Top Ramen and boxed macaroni-and-cheese; and write home to his parents asking if they can send him some money. Oh: and show you how to tally up the words found in a text file. You know, by using a script like this one:<\/P><PRE class=\"codeSample\">Const ForReading = 1<\/p>\n<p>Set objDictionary = CreateObject(&#8220;Scripting.Dictionary&#8221;)<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;c:\\scripts\\test.txt&#8221;, ForReading)<\/p>\n<p>strText = objFile.ReadAll\nobjFile.Close<\/p>\n<p>strText = Replace(strText, &#8220;,&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;.&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;!&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;?&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&gt;&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&lt;&#8220;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&amp;&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;*&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;=&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, vbCrLf, &#8221; &#8220;)<\/p>\n<p>arrWords = Split(strText, &#8221; &#8220;)<\/p>\n<p>For Each strWord in arrWords\n    If Len(strWord) &gt; 0 Then\n        If objDictionary.Exists(strWord) Then\n            objDictionary.Item(strWord) = objDictionary.Item(strWord) + 1\n        Else\n            objDictionary.Add strWord, 1\n        End If<\/p>\n<p>    End If\nNext<\/p>\n<p>colKeys = objDictionary.Keys<\/p>\n<p>For Each strKey in colKeys\n    Wscript.Echo strKey &amp; &#8221; &#8212; &#8221; &amp; objDictionary.Item(strKey)\nNext\n<\/PRE>\n<P>In case any of you are thinking that the Scripting Guy who writes this column is too old and too out-of-touch to go back to college, we can set your mind at ease by pointing out that he cheated in order to finish today\u2019s assignment: in particular, he copied an <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/apr06\/hey0403.mspx\"><B>existing script<\/B><\/A> from the Internet and then just modified it slightly to meet his needs. And even though he cheated, he still waited until the last possible minute to complete the assignment. If that doesn\u2019t sound like a college student, well, we don\u2019t know what does.<\/P>\n<P>Let\u2019s take a few minutes to discuss how this script works; that will be good practice when it comes time for the Scripting Guy who writes this column to defend his master\u2019s thesis. (And yes, we are a <I>little<\/I> concerned about having just two weeks to complete 45 credits of coursework <I>and<\/I> write a master\u2019s thesis. But no doubt this school knows what it\u2019s doing.)<\/P>\n<P>The script starts out by defining a constant named ForReading and setting the value to 1; we\u2019ll need this constant when we open our text file. Next we create an instance of the <B>Scripting.Dictionary<\/B> object. Why? We\u2019ll get to that in just a second. For now, let\u2019s forget about the Dictionary object and focus on the next two lines of code, which create an instance of the <B>Scripting.FileSystemObject<\/B> and use the <B>OpenTextFile<\/B> method to open the file C:\\Scripts\\Test.txt:<\/P><PRE class=\"codeSample\">Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;c:\\scripts\\test.txt&#8221;, ForReading)\n<\/PRE>\n<P>Once the file is open we use the <B>ReadAll<\/B> method to read the entire file into memory and store it in a variable named strText:<\/P><PRE class=\"codeSample\">strText = objFile.ReadAll\n<\/PRE>\n<P>Now that the contents of Test.txt are stored in memory we call the <B>Close<\/B> method and close the file.<\/P>\n<P>Our next task is to identify the individual words in the file. In general that\u2019s fairly easy; as you\u2019ll see, we simply use the <B>Split<\/B> method to create an array from all the words in strText. And how will we identify the individual words? By splitting on the blank space (\u201c \u201c), acting under the assumption that all the words in the file are separated by blank spaces.<\/P>\n<P>For the most part, that will work pretty well. However, there is a potential problem here. For example, consider this sample text file:<\/P><PRE class=\"codeSample\">I saw the cat. The cat was black.\n<\/PRE>\n<P>How many times does the word <I>cat<\/I> appear in this file? We\u2019d agree: it appears twice. However, our script won\u2019t agree; instead, the script sees two different words that happen to include the letters c-a-t:<\/P>\n<TABLE class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<TBODY>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P>cat.<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P>cat<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<P>See the problem? It\u2019s the period immediately following the first instance of <I>cat<\/I>. Because our script doesn\u2019t know anything about punctuation (which definitely makes it a candidate for a M!asters degree) it doesn\u2019t know to ignore the period at the end of the sentence. Punctuation \u2013 and carriage return-linefeeds \u2013 can create problems in this script. Therefore, we use a series of <B>Replace<\/B> commands to find these characters and replace them with blank spaces:<\/P><PRE class=\"codeSample\">strText = Replace(strText, &#8220;,&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;.&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;!&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;?&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&gt;&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&lt;&#8220;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&amp;&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;*&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;=&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, vbCrLf, &#8221; &#8220;)\n<\/PRE>\n<P>That turns our practice file into something that looks like this:<\/P><PRE class=\"codeSample\">I saw the cat  The cat was black\n<\/PRE>\n<P>In turn, our script now tells us that the word <I>cat<\/I> appears twice.<\/P>\n<TABLE class=\"dataTable\" id=\"E2F\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD class=\"\">\n<P class=\"lastInCell\"><B>Note<\/B>. For a somewhat more detailed discussion of this issue, see <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/apr06\/hey0403.mspx\"><B>this Hey, Scripting Guy! column<\/B><\/A>.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>After we\u2019ve cleaned up our text file, we then use the Split function to create an array consisting of the individual words found in the text file. For our simple example, that means the array arrWords will contain these items:<\/P><PRE class=\"codeSample\">I \nsaw \nthe \ncat  \nThe \ncat \nwas \nblack\n<\/PRE>\n<P>And now it\u2019s time to start tallying the number of times each individual word occurs. That\u2019s what this block of code, and the Dictionary object, is for:<\/P><PRE class=\"codeSample\">For Each strWord in arrWords\n    If Len(strWord) &gt; 0 Then\n        If objDictionary.Exists(strWord) Then\n            objDictionary.Item(strWord) = objDictionary.Item(strWord) + 1\n        Else\n            objDictionary.Add strWord, 1\n        End If<\/p>\n<p>    End If\nNext\n<\/PRE>\n<P>What are we doing here? Good question. The first thing we\u2019re doing is setting up a For Each loop that will loop through all the items in the array; in other words, through all the words in the text file. For each word we first verify that the length (<B>Len<\/B>) is greater than 0 characters. (Why? See our <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/apr06\/hey0403.mspx\"><B>previous column<\/B><\/A> on this topic for details.) Assuming that the length <I>is<\/I> greater than 0 we then use the following line of code the see if the word in question already exists in our Dictionary:<\/P><PRE class=\"codeSample\">If objDictionary.Exists(strWord) Then\n<\/PRE>\n<P>Let\u2019s assume that the word <I>can\u2019t<\/I> be found in the Dictionary. In that case, we use the <B>Add<\/B> method to add the word as a new Dictionary <B>Key<\/B>. At the same time, we set the value of the corresponding Dictionary <B>Item<\/B> to 1:<\/P><PRE class=\"codeSample\">objDictionary.Add strWord, 1\n<\/PRE>\n<P>Why 1? Because, so far, we\u2019ve found 1 occurrence of that particular word.<\/P>\n<P>If the word already exists in the Dictionary we don\u2019t try adding it a second time; that would cause an error. Instead, we simply increment the value of the Item property by 1:<\/P><PRE class=\"codeSample\">objDictionary.Item(strWord) = objDictionary.Item(strWord) + 1\n<\/PRE>\n<P>You probably don\u2019t need us to tell you this, but if the Item was equal to 1 then, after we execute this line of code, the Item will be equal to 2. You probably also don\u2019t need is to tell you why we chose to use the Dictionary object; unlike an array, it\u2019s easy to 1) locate a specified key; and 2) determine whether a key already exists. (See this <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/begin\/ss0906.mspx\"><B>Sesame Script<\/B><\/A> article for more on how the Dictionary object works.<\/P>\n<P>From there all we do is loop around and repeat the process with the next word in the array.<\/P>\n<P>After we\u2019ve finished our For Each loop we use this block of code to report back all the Keys and Item values in the Dictionary:<BR><\/P><PRE class=\"codeSample\">colKeys = objDictionary.Keys<\/p>\n<p>For Each strKey in colKeys\n    Wscript.Echo strKey &amp; &#8221; &#8212; &#8221; &amp; objDictionary.Item(strKey)\nNext\n<\/PRE>\n<P>That\u2019s going to give us a report similar to this:<\/P><PRE class=\"codeSample\">I &#8212; 1\nsaw &#8212; 1\nthe &#8212; 1\ncat &#8212; 2\nThe &#8212; 1\nwas &#8212; 1\nblack \u2013 1\n<\/PRE>\n<P>And yes, it <I>would<\/I> be nice if those words were sorted alphabetically, wouldn\u2019t it? But that\u2019s a task for another day.<\/P>\n<P>As for going back to college, the Scripting Guy who writes this column is actually having second thoughts. Granted, the idea that you could get a Masters degree in two weeks is a bit suspicious; it\u2019s even more suspicious that the telephone number provided in the email is an unlisted number. But the big problem is that, as near as he can tell, the alleged school has neither a football team nor a basketball team. No football team or basketball team? Then why even <I>have<\/I> a college?<\/P>\n<P>Besides, there\u2019s no need for him to waste two weeks of his life getting a Masters degree. After all, according to another email he just received the Scripting Guy who writes this column can make $50,000 a month while working home; that works out to $600,000 a year. Sure, that would be a bit of a pay cut, but it might be worth giving up a little money for the chance to work from home.<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! While browsing the Internet I found a script that showed me how to get a list of all the unique words in a text file. That\u2019s useful, but I\u2019d like to go one step further: how can I determine the number of times each of those words occurs??&#8212; TZ Hey, TZ. As [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[3,4,14,5],"class_list":["post-65423","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-scripting-guy","tag-scripting-techniques","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! While browsing the Internet I found a script that showed me how to get a list of all the unique words in a text file. That\u2019s useful, but I\u2019d like to go one step further: how can I determine the number of times each of those words occurs??&#8212; TZ Hey, TZ. As [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/65423","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=65423"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/65423\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=65423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=65423"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=65423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}