{"id":66743,"date":"2006-08-07T11:22:00","date_gmt":"2006-08-07T11:22:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2006\/08\/07\/how-can-i-tell-whether-a-phrase-occurs-at-least-twice-in-a-text-file\/"},"modified":"2006-08-07T11:22:00","modified_gmt":"2006-08-07T11:22:00","slug":"how-can-i-tell-whether-a-phrase-occurs-at-least-twice-in-a-text-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-tell-whether-a-phrase-occurs-at-least-twice-in-a-text-file\/","title":{"rendered":"How Can I Tell Whether a Phrase Occurs At Least Twice in a Text File?"},"content":{"rendered":"<p><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\"> \n<P>Hey, Scripting Guy! How can I tell whether or not the phrase <I>226 transfer complete<\/I> occurs at least twice in a text file?<BR><BR>&#8212; JR<\/P><IMG border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" border=\"0\" alt=\"Script Center\" align=\"right\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" height=\"288\"><\/A> \n<P>Hey, JR. You know, this was a tough one for us to answer; that\u2019s because the Scripting Guys are always content with just one of everything. Take the Scripting Guy who writes this column, for example. He has one son; he writes one column; he was right one time in his life. <I>(Editor\u2019s Note: When was that?)<\/I> A second piece of pie? No, thank you; one is plenty.<\/P>\n<P>Um, what kind of pie are we talking about? <\/P>\n<P>The point here (assuming that there <I>is<\/I> a point here) is that determining whether or not a particular phrase occurs more than once in a text file is something a Scripting Guy would never do; we\u2019re not the greedy type. On the other hand, though, if someone asked us for help it wouldn\u2019t be very polite to ignore them, would it? Tell you what: we\u2019ll see what we can do. <\/P>\n<P>Oh: and we\u2019ll take that second piece of pie, too. Just to be \u2026 polite.<\/P>\n<P>As you pointed out, JR, this task is actually a bit more complicated than it might first appear. Sure, you can use the <B>InStr<\/B> function to determine whether the string <I>226 transfer complete<\/I> appears in the file (although even there we have a problem, as we\u2019ll explain in a moment). However, InStr simply gives you back a yes-no answer: yes, the target phrase was found, or, no, the target phrase was not found. What InStr won\u2019t tell you is how many <I>times<\/I> the target phrase can be found.<\/P>\n<P>Of course, we <I>could<\/I> try using the <B>Split<\/B> command to split the contents of the file on the target phrase; that would give us an array that \u2013 after a little mathematical wizardry \u2013 would eventually tell us how many instances of the target phrase occurred. Except for one thing: as JR noted, there\u2019s no guarantee that all the words of the target phrase will appear on the same line of the text file. For example, suppose we had this very simple text file:<\/P><PRE class=\"codeSample\">226\ntransfer complete\n<\/PRE>\n<P>Does the phrase <I>226 transfer complete<\/I> appear in this file? Believe it or not, it doesn\u2019t: if you try splitting the file on the phrase <I>226 transfer complete<\/I> nothing will happen. Why not? Because of the carriage return-linefeed that appears at the end of the first line. Technically, <I>this<\/I> is the string that makes up the contents of our sample file:<\/P><PRE class=\"codeSample\">226 vbCrlf transfer complete\n<\/PRE>\n<P>That\u2019s a problem.<\/P>\n<P>Now, admittedly, there are a couple of clever ways that we could manipulate this file and then still use the Split function to count up the number of times the target phrase appears. But the Scripting Guys are too lazy to do anything clever. Because of that, we used this script instead:<\/P><PRE class=\"codeSample\">Const ForReading = 1<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;C:\\scripts\\Test.txt&#8221;)<\/p>\n<p>strContents = objFile.ReadAll<\/p>\n<p>objFile.Close<\/p>\n<p>Set objRegEx = CreateObject(&#8220;VBScript.RegExp&#8221;)\nobjRegEx.IgnoreCase = True\nobjRegEx.Global = True\nobjRegEx.Pattern = &#8220;226\\W{1,}transfer\\W{1,}complete&#8221;<\/p>\n<p>Set colMatches = objRegEx.Execute(strContents)  <\/p>\n<p>Wscript.Echo &#8220;Total Matches: &#8221; &amp; colMatches.Count\n<\/PRE>\n<P>The secret to this approach is that we use a regular expression to search for the phrase <I>226 transfer complete<\/I>. Why do we use a regular expression? That\u2019s easy: regular expressions return a collection of all the matches found. To determine how many instances of our target phrase occur in the file all we have to do is determine how many items are in the collection of matches.<\/P>\n<P>Of course, even with a regular expression we can\u2019t just search for the phrase <I>226 transfer complete<\/I>. Why not? Because we still face the problem of handling instances of the target phrase that get broken across lines:<\/P><PRE class=\"codeSample\">226\ntransfer complete\n<\/PRE>\n<P>Did we find a way to deal with that problem? To find out, just keep reading.<\/P>\n<TABLE id=\"EYE\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD>\n<P class=\"lastInCell\"><B>Note<\/B>. Sorry, but we don\u2019t want to spoil the suspense.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>Let\u2019s see if we can figure out how the script works. To begin with, we define a constant named ForReading and set the value to 1; we\u2019ll use this constant in order to open the text file for, well, reading. We create an instance of the <B>Scripting.FileSystemObject<\/B> , then use this line of code to open the file:<\/P><PRE class=\"codeSample\">Set objFile = objFSO.OpenTextFile(&#8220;C:\\Scripts\\Test.txt&#8221;, ForReading)\n<\/PRE>\n<P>As soon as we have the file open we use the <B>ReadAll<\/B> method to read in the contents of the file and store that information in a variable named strContents. We need to do this because we can\u2019t actually search the file itself; instead we need to search a copy of the file stored in memory. And because we <I>can\u2019t<\/I> actually search the file itself we then use the <B>Close<\/B> method to close the file as soon as we\u2019ve finished reading in the contents.<\/P>\n<P>Now the fun begins. We start out by creating an instance of the <B>VBScript.RegExp<\/B> object. We then configure two property values for the regular expressions object:<\/P>\n<TABLE border=\"0\" cellSpacing=\"0\" cellPadding=\"0\">\n<TBODY>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>IgnoreCase<\/B>. We set this value to True, which means our search will not be case sensitive. (In other words, <I>226 Transfer Complete<\/I> and <I>226 transfer complete<\/I> will both register as matches.)<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>Global<\/B>. We set this value to True to ensure that we locate all instances of the target phrase. If set to False the regular expression object would look for the first instance of the target phrase and then stop looking.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<P>That brings us to this line of code:<\/P><PRE class=\"codeSample\">objRegEx.Pattern = &#8220;226\\W{1,}transfer\\W{1,}complete&#8221;\n<\/PRE>\n<P>As you might have guessed, the <B>Pattern<\/B> property represents our target phrase. If you squint your eyes and hold your monitor up to the light, you can probably see the words <I>226, transfer<\/I> and <I>complete<\/I> in the value we\u2019re assigning to the Pattern property. But what\u2019s the deal with those two instances of <B>\\W{1,}<\/B>?<\/P>\n<P>Good question. We\u2019ve already determined that we can\u2019t just search for the phrase <I>226 transfer complete<\/I>. Why not? Well, for one thing, the word <I>226<\/I> could be followed by a blank space; however, it could also be followed by a carriage return-linefeed. And that makes a big difference: <I>226 blank space<\/I> is definitely not the same thing as <I>226 carriage return-linefeed<\/I>.<\/P>\n<P>Fortunately, regular expressions are designed to deal with ambiguous situations like that. What does <B>\\W{1,}<\/B> mean? To begin with, the \\ tells VBScript that the next character in the string is a special character; in other words, we\u2019re saying, \u201cDon\u2019t look for a W. Instead, look for a \u2018non-word\u2019 character.\u201d In regular expressions, a non-word character is any character that does not begin with a letter or a number. Neither the blank space nor the carriage return-linefeed begins with a letter or a number, so \\W enables us to match <I>either<\/I> a blank space or a carriage return-linefeed. <\/P>\n<P>Cool, huh?<\/P>\n<P>So then what\u2019s the {1,} for? What we\u2019re doing here is specifying the number of non-word characters allowed to come after <I>226<\/I>. The <B>1<\/B> tells the script that there must be at least one non-word character after <I>226<\/I>. The comma followed by nothing tells the script that while there must be at least one non-word character there could be more than one; we\u2019re fine with that. And that\u2019s good, because, technically, a carriage return-linefeed actually consists of <I>two<\/I> characters: the carriage return and the linefeed. That\u2019s why we can\u2019t match just one character: criteria like that would find the blank space but not the carriage return-linefeed.<\/P>\n<TABLE id=\"EAAAC\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD>\n<P class=\"lastInCell\"><B>Note<\/B>. So how do we know all this stuff? Well, to tell you the truth; we don\u2019t. What we <I>do<\/I> know, however, is how to look up regular expression syntax in the <A href=\"http:\/\/msdn.microsoft.com\/library\/en-us\/script56\/html\/ab0766e1-7037-45ed-aa23-706f58358c0e.asp\" target=\"_blank\"><B>VBScript Language Reference<\/B><\/A>.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>Of course, we need to put this same little construction \u2013 \\W{1,} \u2013 after the word <I>transfer<\/I>. That\u2019s because we could end up with either a blank space or a carriage return-linefeed in there, too.<\/P>\n<P>From here on out it\u2019s easy. With this line of code we call the <B>Execute<\/B> method; in turn, this causes the regular expression object to start searching through the strContents, looking for the Pattern we set just a second ago:<\/P><PRE class=\"codeSample\">Set colMatches = objRegEx.Execute(strContents)\n<\/PRE>\n<P>As we mentioned earlier, the Execute method returns a collection of all the matches found (a collection we named colMatches). To determine how many instances of the target phrase appear in the text file we simply echo back the value of the collection\u2019s <B>Count<\/B> property:<\/P><PRE class=\"codeSample\">Wscript.Echo &#8220;Total Matches: &#8221; &amp; colMatches.Count\n<\/PRE>\n<P>That\u2019s all we have to do. <\/P>\n<P>Well, that and go ahead and take a third piece of pie. After all, we don\u2019t want you to think we left that last piece of pie because we didn\u2019t like it. Although we\u2019d just as soon <I>not<\/I> eat it we don\u2019t want to hurt anyone\u2019s feelings.<\/P>\n<P>Could you pass the whipped cream, please?<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! How can I tell whether or not the phrase 226 transfer complete occurs at least twice in a text file?&#8212; JR Hey, JR. You know, this was a tough one for us to answer; that\u2019s because the Scripting Guys are always content with just one of everything. Take the Scripting Guy who [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[174,3,4,14,5],"class_list":["post-66743","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-regular-expressions","tag-scripting-guy","tag-scripting-techniques","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! How can I tell whether or not the phrase 226 transfer complete occurs at least twice in a text file?&#8212; JR Hey, JR. You know, this was a tough one for us to answer; that\u2019s because the Scripting Guys are always content with just one of everything. Take the Scripting Guy who [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/66743","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=66743"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/66743\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=66743"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=66743"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=66743"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}