{"id":65203,"date":"2007-03-29T03:08:00","date_gmt":"2007-03-29T03:08:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2007\/03\/29\/how-can-i-search-a-text-file-for-strings-meeting-a-specified-pattern\/"},"modified":"2007-03-29T03:08:00","modified_gmt":"2007-03-29T03:08:00","slug":"how-can-i-search-a-text-file-for-strings-meeting-a-specified-pattern","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-search-a-text-file-for-strings-meeting-a-specified-pattern\/","title":{"rendered":"How Can I Search a Text File for Strings Meeting a Specified Pattern?"},"content":{"rendered":"<p><H2><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" height=\"34\" alt=\"Hey, Scripting Guy! Question\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"> <\/H2>\n<P>Hey, Scripting Guy! How can I search a text file of product IDs and retrieve just those lines that meet a specified pattern?<BR><BR>&#8212; WT<\/P><IMG height=\"5\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" border=\"0\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" height=\"34\" alt=\"Hey, Scripting Guy! Answer\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" height=\"288\" alt=\"Script Center\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" align=\"right\" border=\"0\"><\/A> \n<P>Hey, WT. Before we get to today\u2019s question we were wondering if anyone else has seen the commercial for the car that features the all-new \u201cheartbeat sensor?\u201d The idea is that, before you open your car door, you check the sensor, which can detect the heartbeat of anyone who happens to be hiding in the car waiting to pounce on you. <\/P>\n<P>As a general rule the Scripting Guys are opposed to people hiding in cars waiting to pounce on unsuspecting drivers. And we don\u2019t doubt that this very thing has happened before: someone has opened their car door and been pounced upon. Nevertheless, we\u2019d be curious to know how <I>often<\/I> this sort of thing happens. It makes sense to have smoke alarms in houses; houses do occasionally catch on fire. But do we really need heartbeat sensors in cars? We\u2019re just not sure. To tell you the truth, no one <I>ever<\/I> hides in any of the Scripting Guys\u2019 cars.<\/P>\n<P>But, then again, that could simply be because no one <I>wants<\/I> to catch a Scripting Guy.<\/P>\n<P>The Scripting Guy who writes this column finds this all very interesting, in part because there is plenty of research to indicate that even though our lives continue to get better and better people continue to get unhappier and more depressed. That could be due to the fact that money and material goods truly <I>don\u2019t<\/I> buy happiness. Alternatively, it could be due to the fact that, just when things start looking up, someone invents a new menace to worry about, and provides a solution to a problem no one even knew existed.<\/P>\n<P>Just wondering.<\/P>\n<P>By contrast, the Scripting Guys only provide solutions to problems that <I>do<\/I> exist. (We\u2019re also responsible for <I>creating<\/I> many of those problems in the first place. But that\u2019s another story.) For example, some people need to be able to retrieve a list of specific products from a text file. How are they supposed to do <I>that<\/I>? Here\u2019s how:<\/P><PRE class=\"codeSample\">Const ForReading = 1<\/p>\n<p>Set objRegEx = CreateObject(&#8220;VBScript.RegExp&#8221;)\nobjRegEx.Pattern = &#8220;^[1-9]&#8230;GRP&#8221;<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;C:\\Scripts\\Test.txt&#8221;, ForReading)<\/p>\n<p>Do Until objFile.AtEndOfStream\n    strSearchString = objFile.ReadLine\n    Set colMatches = objRegEx.Execute(strSearchString)  \n    If colMatches.Count &gt; 0 Then\n        For Each strMatch in colMatches   \n            Wscript.Echo strSearchString \n        Next\n    End If\nLoop<\/p>\n<p>objFile.Close\n<\/PRE>\n<P>Before we explain how the script works we should note that, based on WT\u2019s description, we have a text file similar to this:<\/P><PRE class=\"codeSample\">1XXXGRPABCEFG\n2YYYGRPDEF\nAZZZGRPDEF\nRTRRABCGRPRTY\nYTHJABCPBCOP\n<\/PRE>\n<P>WT is looking only for those records (lines in the text file) that meet the following criteria:<\/P>\n<TABLE class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<TBODY>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P>The first character is a number, 1 through 9. This character indicates a specific product type.<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P>The second, third, and fourth characters are \u2013 well, it doesn\u2019t matter. We don\u2019t care about these characters.<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P>The fifth, sixth, and seventh characters are GRP. These happen to indicate different product groups.<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P>The remaining characters are \u2013 well, again, it doesn\u2019t matter.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<P>Based on these criteria the first two lines in the text file are the only two lines we\u2019re looking for: they both begin with a number and then have <B>GRP<\/B> in the fifth, sixth, and seventh character spots. Granted, Line 3 has GRP in the designated spot; however, line 3 doesn\u2019t begin with a number. As for lines 4 and 5, well, the less said about them the better.<\/P>\n<P>So how do we go about finding the desired records? For us the easiest way to do that was to use a regular expression; after all, we\u2019re looking for a specific pattern (a number followed by three characters followed by GRP) and regular expressions are very adept at sniffing out patterns (as opposed to finding specific words and phrases). Will our regular expression work? Let\u2019s find out. <\/P>\n<P>The script starts out by defining a constant named ForReading and setting the value to 1; we\u2019ll need this constant when we open our text file for reading. We then use these two lines of code to create an instance of the <B>VBScript.RegExp<\/B> object and to specify a <B>Pattern<\/B> for our search:<\/P><PRE class=\"codeSample\">Set objRegEx = CreateObject(&#8220;VBScript.RegExp&#8221;)\nobjRegEx.Pattern = &#8220;^[1-9]&#8230;GRP&#8221;\n<\/PRE>\n<P>Needless to say, the Pattern is the key to getting this script to work; because of that we should take a minute or two to explain the various components of this regular expression. To begin with, we have the <B>^<\/B> character. That simply tells the script that the pattern must be found at the beginning of the search text; that prevents a value like this from being incorrectly tagged as a match:<\/P><PRE class=\"codeSample\">AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA1XXXGRP\n<\/PRE>\n<P>As you can see, the desired pattern is there, but it occurs at the very end of the string, not at the beginning. Therefore, it doesn\u2019t count, at least not with this script.<\/P>\n<P>Next we have this construction: <B>[1-9]<\/B>. This simply says that the next character in the pattern must be one of the digits 1 through 9 (the square brackets indicate a range of acceptable values). In other words, to be a match the string must start with one of the numbers 1 through 9.<\/P>\n<P>Pretty simple so far, right? Right.<\/P>\n<P>Next up we have three dots: <B>\u2026<\/B>. What do the dots represent? In a regular expression a dot represents <I>any<\/I> character (except for the newline character). We use the three dots here simply to indicate that, to meet our pattern, the string must have three characters following the opening digit. What three characters? It doesn\u2019t matter.<\/P>\n<P>Finally, we have the product group code: <B>GRP<\/B>. After the opening digit we need to have three characters. And after those three characters we need to have the letters GRP, in that order. Any string that fits that complete pattern will be considered a match; any string that doesn\u2019t fit that complete pattern <I>won\u2019t<\/I> be considered a match.<\/P>\n<TABLE class=\"dataTable\" id=\"ETF\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD class=\"\">\n<P class=\"lastInCell\"><B>Note<\/B>. Does all that make sense? If not, you might want to take a look at <A href=\"http:\/\/www.microsoft.com\/events\/EventDetails.aspx?CMTYSvcSource=MSCOMMedia&amp;Params=%7eCMTYDataSvcParams%5e%7earg+Name%3d%22ID%22+Value%3d%221032271679%22%2f%5e%7earg+Name%3d%22ProviderID%22+Value%3d%22A6B43178-497C-4225-BA42-DF595171F04C%22%2f%5e%7earg+Name%3d%22lang%22+Value%3d%22en%22%2f%5e%7earg+Name%3d%22cr%22+Value%3d%22US%22%2f%5e%7esParams%5e%7e%2fsParams%5e%7e%2fCMTYDataSvcParams%5e\" target=\"_blank\"><B>String Theory for System Administrators<\/B><\/A>, Scripting Guy Dean Tsaltas\u2019 definitive explanation for how to use regular expressions in a script.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>After defining our pattern we next use these two lines of code to create an instance of the <B>Scripting.FileSystem<\/B> object and to open the text file C:\\Scripts\\Test.txt for reading:<\/P><PRE class=\"codeSample\">Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;C:\\Scripts\\Test.txt&#8221;, ForReading)\n<\/PRE>\n<P>At this point, the game is afoot. Our next step is to set up a Do Until loop that runs until we\u2019ve read each and every line in the text file (technically, until the <B>AtEndOfStream<\/B> property is True). Inside that loop we use the <B>ReadLine<\/B> method to read the first line in the text file and store it in a variable named strSearchString. That brings us to this line of code:<\/P><PRE class=\"codeSample\">Set colMatches = objRegEx.Execute(strSearchString)\n<\/PRE>\n<P>What we\u2019re doing here is using the <B>Execute<\/B> method to determine whether or not our regular expression pattern can be found in the value of strSearchString. If it <I>can<\/I>, that information will be returned as a collection named colMatches. If it can\u2019t well,, then colMatches will end up a collection consisting of 0 items.<\/P>\n<P>With that in mind all we have to do now is check to see if the collection colMatches has anything in it. If it does, we echo back the value of strSearchString. (Why? Because that\u2019s the value that meets our pattern.) If it doesn\u2019t, we simply loop around and repeat the process with the next line in the text file. All of that takes place in this block of code:<\/P><PRE class=\"codeSample\">If colMatches.Count &gt; 0 Then\n    For Each strMatch in colMatches   \n        Wscript.Echo strSearchString \n    Next\nEnd If\n<\/PRE>\n<P>When we\u2019re all done we close the file Test.txt and then sit back and admire the results:<\/P><PRE class=\"codeSample\">1ABCGRPABCEFG\n2DEFGRPDEF\n<\/PRE>\n<P>Beautiful.<\/P>\n<P>Incidentally, the Scripting Guys couldn\u2019t resist: just a few minutes ago we all went out to test the new heartbeat sensor. Somewhat to our surprise it seemed to work pretty good, at least until the Scripting Editor hid in the car; with her in there the sensor failed to detect a heartbeat. Does that mean that, as many of us have long suspected, the Scripting Editor truly <I>is<\/I> heartless? We can\u2019t say that for sure; she wouldn\u2019t let us do an autopsy. However, we do know, for sure, that she at least has something in her head: emergency room staples. We don\u2019t know the details either, but you can find out more by reading her <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/topics\/mms2007\/tuesday.mspx\"><B>daily dispatch<\/B><\/A> from the Microsoft Management Summit.<\/P>\n<DIV><A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/mar07\/hey0329.mspx#top\"><IMG height=\"9\" alt=\"Top of page\" src=\"http:\/\/www.microsoft.com\/library\/gallery\/templates\/MNP2.Common\/images\/arrow_px_up.gif\" width=\"7\" border=\"0\"><\/A><A class=\"topOfPage\" href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/mar07\/hey0329.mspx#top\">Top of page<\/A><\/DIV><A class=\"\" title=\"ELH\" name=\"ELH\"><\/A>\n<H2>Searching for a String Pattern Using Windows PowerShell<\/H2>\n<P>Another way to solve this problem, courtesy of Microsoft\u2019s very own June Blender:<\/P>\n<P>This task is exceptionally easy to do in Windows PowerShell because the PowerShell expression parser is designed to interpret regular expressions.<\/P>\n<P>Here&#8217;s the same solution in Windows PowerShell. You can enter this command at the PowerShell command line or save it as a script file (.ps1).&nbsp;<\/P><PRE class=\"codeSample\">get-childitem file.txt | select-string -pattern ^[1-9]&#8230;GRP | foreach {$_.line}\n<\/PRE>\n<P>The first command uses the Get-ChildItem cmdlet (similar to <B>dir<\/B> or <B>ls<\/B>) to find the text file. The pipeline operator (|) sends the output to the next command. <\/P><PRE class=\"codeSample\">get-childitem file.txt\n<\/PRE>\n<P>The second command uses the Select-String cmdlet to search for the regular expression in the File.txt file. The Pattern parameter (-pattern) specifies the regular expression. Because both VBScript and PowerShell use standard regular expressions, the syntax of the regular expression in this PowerShell command is identical to the one that you use in the VBScript script.<\/P><PRE class=\"codeSample\">select-string -pattern ^[1-9]&#8230;GRP\n<\/PRE>\n<P>The third command uses the ForEach-Object cmdlet (alias = foreach) to select the Line property of each match object from the output. Without it, the command displays the file name and the matched characters. With it, the command displays only the matched characters.<\/P><PRE class=\"codeSample\">foreach {$_.line}\n<\/PRE>\n<P>Both the VBScript and PowerShell solutions for this task really demonstrate the power of regular expressions. Although regular expression syntax isn&#8217;t easy, it&#8217;s so useful that it&#8217;s worth taking the time to learn it.<\/P>\n<P>In Windows PowerShell, start with the About_Regular_Expression topic: get-help about_regular_expression.<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! How can I search a text file of product IDs and retrieve just those lines that meet a specified pattern?&#8212; WT Hey, WT. Before we get to today\u2019s question we were wondering if anyone else has seen the commercial for the car that features the all-new \u201cheartbeat sensor?\u201d The idea is that, [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[3,4,14,5],"class_list":["post-65203","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-scripting-guy","tag-scripting-techniques","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! How can I search a text file of product IDs and retrieve just those lines that meet a specified pattern?&#8212; WT Hey, WT. Before we get to today\u2019s question we were wondering if anyone else has seen the commercial for the car that features the all-new \u201cheartbeat sensor?\u201d The idea is that, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/65203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=65203"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/65203\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=65203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=65203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=65203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}