{"id":67623,"date":"2006-04-03T21:11:00","date_gmt":"2006-04-03T21:11:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2006\/04\/03\/how-can-i-count-the-number-of-words-in-a-text-file\/"},"modified":"2006-04-03T21:11:00","modified_gmt":"2006-04-03T21:11:00","slug":"how-can-i-count-the-number-of-words-in-a-text-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-count-the-number-of-words-in-a-text-file\/","title":{"rendered":"How Can I Count the Number of Words in a Text File?"},"content":{"rendered":"<p><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\"> \n<P>Hey, Scripting Guy! How can I count the number of words in a text file?<BR><BR>&#8212; LA<\/P><IMG border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" border=\"0\" alt=\"Script Center\" align=\"right\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" height=\"288\"><\/A> \n<P>Hey, LA. You know, this is one of those questions where the Scripting Guys outsmarted themselves. (Not that outsmarting the Scripting Guys is particularly hard to do, mind you.) For one thing, we\u2019re writing this column on a Friday, and we <I>always<\/I> look for an easy way out on Fridays. For another, we were just involved in a discussion on word counts the other day, so the subject was already on our minds. This question sounded easy and we\u2019d already been thinking about word counts: add the two together and you have the perfect column for a Friday.<\/P>\n<P>Or so we thought.<\/P>\n<P>The first hint of trouble occurred right off the bat, when we sat down to figure out the answer to your question. After all, there are several different ways we could approach this problem. For example, it\u2019s easy to <A href=\"http:\/\/null\/technet\/scriptcenter\/resources\/officetips\/mar05\/tips0324.mspx\"><B>calculate word counts<\/B><\/A> using Microsoft Word, so our first thought was, \u201cLet\u2019s just use Microsoft Word.\u201d But that seemed like overkill, and we didn\u2019t want to imply that you couldn\u2019t count the number of words in a text file unless you went out and bought Microsoft Office. (Although if the Office team would give us a commission we\u2019d reconsider that position.) We then thought, \u201cYou know, this is probably the perfect scenario for using <A href=\"http:\/\/www.microsoft.com\/events\/EventDetails.aspx?CMTYSvcSource=MSCOMMedia&amp;Params=%7eCMTYDataSvcParams%5e%7earg+Name%3d%22ID%22+Value%3d%221032271679%22%2f%5e%7earg+Name%3d%22ProviderID%22+Value%3d%22A6B43178-497C-4225-BA42-DF595171F04C%22%2f%5e%7earg+Name%3d%22lang%22+Value%3d%22en%22%2f%5e%7earg+Name%3d%22cr%22+Value%3d%22US%22%2f%5e%7esParams%5e%7e%2fsParams%5e%7e%2fCMTYDataSvcParams%5e\" target=\"_blank\"><B>regular expressions<\/B><\/A>.\u201d But then we got a headache just thinking about regular expressions and so we abandoned that idea, too.<\/P>\n<P>We then came up with this simple and elegant solution:<\/P><PRE class=\"codeSample\">Const ForReading = 1<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;c:\\scripts\\test.txt&#8221;, ForReading)<\/p>\n<p>strText = objFile.ReadAll\nobjFile.Close<\/p>\n<p>arrWords = Split(strText, &#8221; &#8220;)\nWscript.Echo Ubound(arrWords) + 1\n<\/PRE>\n<P>Simple and elegant indeed: all we did here was open the text file C:\\Scripts\\Test.txt and store the entire text file into a variable named strText. We then used the <B>Split<\/B> function to split the array on blank spaces (figuring that the only time you would have a blank space would be between words.) Having used the Split function to create an array named arrWords (an array in which each element represents a single word), all we had to do then was echo back the <B>Ubound<\/B> (upper bound) value of the array, plus 1. (Why plus 1? Because the Ubound value of an array is always the number of items in the array minus 1.)<\/P>\n<P>That worked &#8211; sort of. As it turned out, though, the text file we used occasionally had extra blank spaces to align information:<\/P><PRE class=\"codeSample\">Name                                        Date\nKen Myer                                    3\/30\/2006\nPilar Ackerman                              3\/31\/2006\n<\/PRE>\n<P>That created a problem: each of those extra blank spaces was counted as being a word. Thus our final word count was a little bit higher than it should have been.<\/P>\n<P>Back to the drawing board:<\/P><PRE class=\"codeSample\">Const ForReading = 1<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;c:\\scripts\\test.txt&#8221;, ForReading)<\/p>\n<p>strText = objFile.ReadAll\nobjFile.Close<\/p>\n<p>arrWords = Split(strText, &#8221; &#8220;)<\/p>\n<p>For Each strWord in arrWords\n    If Len(strWord) &gt; 0 Then\n        i = i + 1\n    End If\nNext<\/p>\n<p>Wscript.Echo i\n<\/PRE>\n<P>As you can see, this time around we didn\u2019t echo back the Ubound value. Instead, we set up a For Each loop to loop through all the items in the array. Inside that loop we used the <B>Len<\/B> function to determine the number of characters in each individual item. If the length of the item was 0, that meant we had encountered one of our excess blank spaces. In that case we simply skipped that item (because few words have 0 characters in them). If the length was greater than 0, then we incremented a counter variable by 1:<\/P><PRE class=\"codeSample\">i = i + 1\n<\/PRE>\n<P>After looping through the entire array we then echoed back the value of our counter variable:<\/P><PRE class=\"codeSample\">Wscript.Echo i\n<\/PRE>\n<P>This was much better, but the word count still seemed a little too high. After puzzling this over for a minute or two we realized why. Suppose our text file consisted of this sentence:<\/P><PRE class=\"codeSample\">Two plus two = four\n<\/PRE>\n<P>Most people would say that there are four words in this sentence; however, our script insisted that there were <I>five<\/I> words in the sentence:<\/P><PRE class=\"codeSample\">Two\nplus\ntwo\n=\nfour.\n<\/PRE>\n<P>Why five words? Because the script was counting the equals sign (=) as a word. Likewise, we had other \u201cextraneous\u201d characters in the document: for example, this construction counted as three words all by itself: <\/P><PRE class=\"codeSample\">. . .\n<\/PRE>\n<P>Yuck. <\/P>\n<P>We didn\u2019t like that, and so we modified the script one final time, using a series of <B>Replace<\/B> functions to replace characters such as the equals sign and the period with blank spaces:<\/P><PRE class=\"codeSample\">Const ForReading = 1<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;c:\\scripts\\test.txt&#8221;, ForReading)<\/p>\n<p>strText = objFile.ReadAll<\/p>\n<p>strText = Replace(strText, &#8220;,&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;.&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;!&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;?&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&gt;&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&lt;&#8220;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;&amp;&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;*&#8221;, &#8221; &#8220;)\nstrText = Replace(strText, &#8220;=&#8221;, &#8221; &#8220;)<\/p>\n<p>strText = Replace(strText, vbCrLf, &#8221; &#8220;)<\/p>\n<p>objFile.Close<\/p>\n<p>arrWords = Split(strText, &#8221; &#8220;)<\/p>\n<p>For Each strWord in arrWords\n    If Len(strWord) &gt; 0 Then\n        i = i + 1\n    End If\nNext<\/p>\n<p>Wscript.Echo i\n<\/PRE>\n<P>This one we liked better. Like our previous scripts, we start this one off by defining a constant named ForReading; this constant tells the <B>FileSystemObject<\/B> that we want to read the text file (as opposed to writing or appending to it). Next we create an instance of the FileSystemObject and use the OpenTextFile method to open the file C:\\Scripts\\Test.txt. Once we get the FileSystemObject up and running we then use the <B>ReadAll<\/B> method to read the entire file into a variable named strText:<\/P><PRE class=\"codeSample\">strText = objFile.ReadAll\n<\/PRE>\n<P>Following that we execute a series of Replace functions to replace characters in the variable strText. (Note that we aren\u2019t touching the actual file itself, just the copy of the file stored in memory.) For example, this line of code replaces all the commas in strText with a blank space:<\/P><PRE class=\"codeSample\">strText = Replace(strText, &#8220;,&#8221;, &#8221; &#8220;)\n<\/PRE>\n<P>We\u2019ll leave it up to you to decide which characters &#8211; if any &#8211; you want to replace. If you\u2019re OK with the equals sign and the plus sign (+) being counted as individual words then you might not have to make any replacements at all.<\/P>\n<P>Wait, check that: there\u2019s one replacement that you <I>will<\/I> have to make. Suppose we have a text file that looks like this:<\/P><PRE class=\"codeSample\">A\nB\nC\nD\nE\n<\/PRE>\n<P>How many words in <I>this<\/I> text file? We would have said 5, too, but the script said there was just 1. Why? Well, we told the script to split the text on the blank space; however, this file doesn\u2019t <I>have<\/I> any blank spaces, just carriage return-linefeeds at the end of each line. Therefore, our array has only one item in it. Ouch.<\/P>\n<P>So how do we overcome <I>that<\/I> problem? That was actually pretty easy: we just replaced all the carriage return-linefeeds (vbCrLf) with blank spaces:<\/P><PRE class=\"codeSample\">strText = Replace(strText, vbCrLf, &#8221; &#8220;)\n<\/PRE>\n<P>Once we had blank spaces between each character (rather than carriage return-linefeeds between each character) the script correctly reported back 5 words for this sample text file.<\/P>\n<P>Now where we were? Oh, yeah. After we close the file we then call then Split function to split strText into an array. We then use the For Each loop we already showed you to count the number of words in the array (and hence the number of words in the text file), skipping over excess blank spaces. We then echo back the value of our counter variable and we\u2019re done.<\/P>\n<P>At least to our satisfaction. Whether or not the word count is 100% accurate is somewhat subjective. For example suppose you have this line in the text file:<\/P><PRE class=\"codeSample\">2+2=4\n<\/PRE>\n<P>Do you have 5 words in this line (<B>2<\/B>, <B>+<\/B>, <B>2<\/B>, <B>=<\/B>, and <B>4<\/B>)? Maybe you have just three words: <B>2<\/B>, <B>2<\/B>, and <B>4<\/B>. Or maybe you just have one word: <B>2+2=4<\/B>. (Microsoft Word sees this as being a single word.) You\u2019ll have to make those decisions on your own. As for us, we\u2019ve decided that the next time we find an \u201ceasy\u201d question to answer we\u2019ll just skip that one and try something else!<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! How can I count the number of words in a text file?&#8212; LA Hey, LA. You know, this is one of those questions where the Scripting Guys outsmarted themselves. (Not that outsmarting the Scripting Guys is particularly hard to do, mind you.) For one thing, we\u2019re writing this column on a Friday, [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[40,3,4,14,5],"class_list":["post-67623","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-filesystemobject","tag-scripting-guy","tag-scripting-techniques","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! How can I count the number of words in a text file?&#8212; LA Hey, LA. You know, this is one of those questions where the Scripting Guys outsmarted themselves. (Not that outsmarting the Scripting Guys is particularly hard to do, mind you.) For one thing, we\u2019re writing this column on a Friday, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/67623","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=67623"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/67623\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=67623"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=67623"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=67623"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}