{"id":67013,"date":"2006-06-28T18:45:00","date_gmt":"2006-06-28T18:45:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2006\/06\/28\/how-can-i-get-a-list-of-the-unique-words-used-in-a-microsoft-word-document\/"},"modified":"2006-06-28T18:45:00","modified_gmt":"2006-06-28T18:45:00","slug":"how-can-i-get-a-list-of-the-unique-words-used-in-a-microsoft-word-document","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-get-a-list-of-the-unique-words-used-in-a-microsoft-word-document\/","title":{"rendered":"How Can I Get a List of the Unique Words Used in a Microsoft Word Document?"},"content":{"rendered":"<p><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\"> \n<P>Hey, Scripting Guy! How can I get a list of the unique words used in a Microsoft Word document?<BR><BR>&#8212; RK<\/P><IMG border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"TechNet Script Center\" border=\"0\" alt=\"TechNet Script Center\" align=\"right\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" height=\"288\"><\/A> \n<P>Hey, RK. Funny you should mention unique words. Last Saturday the Scripting Coach\u2019s baseball team played in the city championship. Despite the importance of the game the team was missing two key players, and the Scripting Coach knew that meant that his infield would be a little weak defensively. What he didn\u2019t realize was just <I>how<\/I> weak it would be. In the bottom of the second inning the opposition\u2019s leadoff hitter singled. That would be the last ball that would leave the infield in that inning, yet the opposing team managed to score 6 runs. (How? By having seven consecutive batters reach base on an infield error). As you might expect, the Scripting Coach\u2019s team never really recovered after that.<\/P>\n<P>Needless to say, a number of words ran through the Scripting Coach\u2019s head during that disastrous second inning, with many of those words being very \u2026 unique \u2026.<\/P>\n<P>Well, except to other baseball coaches, of course.<\/P>\n<P>Ah, but you don\u2019t want to hear about the city championship game, do you? (Which is good, because this Scripting Guy doesn\u2019t want to talk about it, either.) You\u2019d rather talk about a script that can produce a list of the unique words used in a Microsoft Word document. You know, a script similar to this:<\/P><PRE class=\"codeSample\">Set objDictionary = CreateObject(&#8220;Scripting.Dictionary&#8221;)<\/p>\n<p>Set objWord = CreateObject(&#8220;Word.Application&#8221;)\nobjWord.Visible = True<\/p>\n<p>Set objDoc = objWord.Documents.Open(&#8220;C:\\Scripts\\Sample.doc&#8221;)<\/p>\n<p>Set colWords = objDoc.Words<\/p>\n<p>For Each strWord in colWords\n    strWord = LCase(strWord)\n    If objDictionary.Exists(strWord) Then\n    Else\n        objDictionary.Add strWord, strWord\n   End If\nNext<\/p>\n<p>Set objDoc2 = objWord.Documents.Add()\nSet objSelection = objWord.Selection<\/p>\n<p>For Each strItem in objDictionary.Items\n    objSelection.TypeText strItem &amp; vbCrLf\nNext<\/p>\n<p>Set objRange = objDoc2.Range\nobjRange.Sort\n<\/PRE>\n<P>Let\u2019s see if we can figure out how this script works. As you can see, we start out simple enough: we just create an instance of the <B>Scripting.Dictionary<\/B> object. (Of course, everything <I>starts<\/I> simple, just like the second inning of the city championship game did.) In just a moment we\u2019ll use the Dictionary object as a place to store all the unique words in the document. We then use these three lines of code to create a visible instance of the <B>Word.Application<\/B> object and open the document C:\\Scripts\\Sample.doc:<\/P><PRE class=\"codeSample\">Set objWord = CreateObject(&#8220;Word.Application&#8221;)\nobjWord.Visible = True<\/p>\n<p>Set objDoc = objWord.Documents.Open(&#8220;C:\\Scripts\\Sample.doc&#8221;)\n<\/PRE>\n<P>That <I>was<\/I> easy, wasn\u2019t it? (Apparently much easier than catching a lazy little pop fly.) With our document open we\u2019re now ready to grab a list of the unique words. To do that, we first need to get a list of <I>all<\/I> the words. That sounds like a complicated procedure, but, fortunately, it\u2019s not; that\u2019s because all the words in a Microsoft Word document are stored in the document\u2019s <B>Words<\/B> collection. That\u2019s a collection we can retrieve using just one line of code:<\/P><PRE class=\"codeSample\">Set colWords = objDoc.Words\n<\/PRE>\n<P>Our next step is to weed out all duplicate words in the collection; that will leave us (and RK) with a list of unique words. For example, suppose our Word document contains the following words:<\/P><PRE class=\"codeSample\">these\nwords\nare\nthe\nwords\nin\nthe\ndocument\n<\/PRE>\n<P>When we weed out all the duplicates (like multiple instances of the words <I>word<\/I> and <I>the<\/I>) we\u2019re left with this:<\/P><PRE class=\"codeSample\">these\nwords\nare\nthe\nin\ndocument\n<\/PRE>\n<P>Which is the very thing RK is hoping to get.<\/P>\n<P>To weed out the duplicate words we use this block of code:<\/P><PRE class=\"codeSample\">For Each strWord in colWords\n    strWord = LCase(strWord)\n    If objDictionary.Exists(strWord) Then\n    Else\n        objDictionary.Add strWord, strWord\n   End If\nNext\n<\/PRE>\n<P>As you can see, we start by setting up a For Each loop to loop through the collection of words in the document. Inside the loop we examine each word individually, using the <B>LCase<\/B> function to convert the word to all lowercase letters. (Why? Well, that helps us avoid any problems like having the words <I>Cat<\/I>, <I>cat<\/I>, and <I>CAT<\/I> being marked as different words.)<\/P>\n<P>After the word has been converted to lowercase we then use the <B>Exists<\/B> method to determine whether or not the word is already in the Dictionary:<\/P><PRE class=\"codeSample\">If objDictionary.Exists(strWord) Then\n<\/PRE>\n<P>If the word is already in the Dictionary the script simply loops around and repeats this process with the next word in the collection. If the word is <I>not<\/I> in the Dictionary then we use this line of code to add the word (specifying the same value &#8211; the word itself &#8211; as both the Dictionary item and Dictionary key):<\/P><PRE class=\"codeSample\">objDictionary.Add strWord, strWord\n<\/PRE>\n<TABLE id=\"EAF\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD>\n<P><B>Note<\/B>. Sorry; we thought that the Dictionary object was like the Heimlich Maneuver, something everyone already knows. If you aren\u2019t familiar with the Dictionary object and how to use it, take a peek at the <A href=\"http:\/\/null\/technet\/scriptcenter\/guide\/sas_scr_ildk.mspx\" target=\"_blank\"><B>Microsoft Windows 2000 Scripting Guide<\/B><\/A>.<\/P>\n<P>What do you mean you don\u2019t know the Heimlich Maneuver, either? Fine; we\u2019ll see if we can locate an email address for <I>Hey, Heimlich Maneuver Guy!<\/I><\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>After we\u2019re done with the loop the unique words in the document will be safely stashed in the Dictionary. If we wanted to, we could simply echo back those values; that requires no more code than this:<\/P><PRE class=\"codeSample\">For Each strItem in objDictionary.Items\n    Wscript.Echo strItem \nNext\n<\/PRE>\n<P>We thought we\u2019d go one better than that, however, and add these words &#8211; in alphabetical order &#8211; to a brand-new Word document. To do that we need to create a new document (notice the new object reference, objDoc2) and then create an instance of the Word <B>Selection<\/B> object, which simply positions the cursor at the beginning of the document:<\/P><PRE class=\"codeSample\">Set objDoc2 = objWord.Documents.Add()\nSet objSelection = objWord.Selection\n<\/PRE>\n<P>Once we\u2019ve done that we can then loop through the items in the Dictionary, using the <B>TypeText<\/B> method to add the word (plus a carriage return-linefeed) to the document:<\/P><PRE class=\"codeSample\">For Each strItem in objDictionary.Items\n    objSelection.TypeText strItem &amp; vbCrLf\nNext\n<\/PRE>\n<P>That gives us a Word document that looks something like this:<\/P><PRE class=\"codeSample\">these\nwords\nare\nthe\nin\ndocument\n<\/PRE>\n<P>What\u2019s that? Alphabetical order? No one ever said anything about &#8211; oh, that\u2019s right, we <I>did<\/I> say we\u2019d sort these words in alphabetical order, didn\u2019t we? OK, that\u2019s easy enough:<\/P><PRE class=\"codeSample\">Set objRange = objDoc2.Range\nobjRange.Sort\n<\/PRE>\n<P>That\u2019s all we have to do. We create a new instance of the <B>Range<\/B> object; because we provided no additional parameters the new range will, by default, encompass the entire document. And then we call the <B>Sort<\/B> method; when we call Sort without any parameters we get the items sorted in alphabetical order. Just like this:<\/P><PRE class=\"codeSample\">are\ndocument\nin\nthe\nthese\nwords\n<\/PRE>\n<P>Pretty slick, huh?<\/P>\n<P>We should point out that you might get a few anomalies in your list of unique words: that\u2019s because Microsoft Word considers some crazy things &#8211; like periods &#8211; to be words. If you don\u2019t want punctuation marks to be tagged as words you can add code to, say, weed out anything that doesn\u2019t start with a letter. We won\u2019t discuss this revised script; we\u2019ll just note that it uses the <A href=\"http:\/\/msdn.microsoft.com\/library\/en-us\/script56\/html\/c847a40b-9a73-4434-8f65-c52c0085b059.asp\" target=\"_blank\"><B>ASC function<\/B><\/A> to filter out anything that doesn\u2019t start with a letter:<\/P><PRE class=\"codeSample\">Set objDictionary = CreateObject(&#8220;Scripting.Dictionary&#8221;)<\/p>\n<p>Set objWord = CreateObject(&#8220;Word.Application&#8221;)\nobjWord.Visible = True<\/p>\n<p>Set objDoc = objWord.Documents.Open(&#8220;C:\\Scripts\\Sample.doc&#8221;)<\/p>\n<p>Set colWords = objDoc.Words<\/p>\n<p>For Each strWord in colWords\n    strWord = LCase(strWord)\n    strLetter = Left(strWord, 1)\n    If ASC(strLetter) &lt; 97 OR ASC(strLetter) &gt; 122 Then\n    Else\n        If objDictionary.Exists(strWord) Then\n            Else\n            objDictionary.Add strWord, StrWord\n       End If\n    End If\nNext<\/p>\n<p>Set objDoc2 = objWord.Documents.Add()\nSet objSelection = objWord.Selection<\/p>\n<p>For Each strItem in objDictionary.Items\n    objSelection.TypeText strItem &amp; vbCrLf\nNext<\/p>\n<p>Set objRange = objDoc2.Range\nobjRange.Sort\n<\/PRE>\n<P>As for baseball, after Saturday\u2019s debacle both the Scripting Coach and his Scripting Son vowed that they were through with the sport forever. Of course, that night another team called the Scripting Son and asked if he\u2019d be willing to join their squad for the remainder of the season. And, needless to say, the Scripting Coach said he\u2019d be willing to help out if the team needed his help, too. <\/P>\n<P>But other than that, both father and son are through with baseball. Forever.<\/P>\n<P>Or at least until Fall baseball starts up in August.<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! How can I get a list of the unique words used in a Microsoft Word document?&#8212; RK Hey, RK. Funny you should mention unique words. Last Saturday the Scripting Coach\u2019s baseball team played in the city championship. Despite the importance of the game the team was missing two key players, and the [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[84,49,3,5,395],"class_list":["post-67013","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-microsoft-word","tag-office","tag-scripting-guy","tag-vbscript","tag-word-application"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! How can I get a list of the unique words used in a Microsoft Word document?&#8212; RK Hey, RK. Funny you should mention unique words. Last Saturday the Scripting Coach\u2019s baseball team played in the city championship. Despite the importance of the game the team was missing two key players, and the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/67013","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=67013"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/67013\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=67013"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=67013"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=67013"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}