{"id":66373,"date":"2006-09-28T12:10:00","date_gmt":"2006-09-28T12:10:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2006\/09\/28\/how-can-i-locate-strings-that-consist-of-a-series-of-numbers-followed-by-zip\/"},"modified":"2006-09-28T12:10:00","modified_gmt":"2006-09-28T12:10:00","slug":"how-can-i-locate-strings-that-consist-of-a-series-of-numbers-followed-by-zip","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-locate-strings-that-consist-of-a-series-of-numbers-followed-by-zip\/","title":{"rendered":"How Can I Locate Strings That Consist of a Series of Numbers Followed by .ZIP?"},"content":{"rendered":"<p><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\"> \n<P>Hey, Scripting Guy! I have a file which includes a number of file names. All of these names consist of a series of numbers followed by a .zip file extension; for example, 1234.zip, 5678.zip, etc. How can I write a script that locates all these file names and then saves just the names to a second file?<BR><BR>&#8212; RS<\/P><IMG border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" border=\"0\" alt=\"Script Center\" align=\"right\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" height=\"288\"><\/A> \n<P>Hey, RS. You know, numbers are very important. Take the number 19, for example. Last night the Scripting Guy who writes this column and the Scripting Son were playing 21, a one-on-one basketball game in which you start off by trying to make a field goal (worth 2 points). After you make a field goal you then step to the foul line and shoot free throws, with each free throw worth 1 point. Furthermore, you are allowed to continue shooting free throws until you miss. The game continues in this fashion until someone reaches 21 points.<\/P>\n<P>Last night the Scripting Son got the ball first and quickly made a basket; he then made three straight free throws, putting him up 5-0. Unfortunately \u2013 for him anyway \u2013 he missed his next free throw; the Scripting Dad grabbed the rebound, made a field goal, and then ran the table, makinge 19 consecutive free throws to win the game. (Best of all, he made them despite the Scripting Son throwing the ball back hard to him, throwing it back soft to him, throwing it over his head, throwing it at his feet, and doing all those time-honored strategies designed to upset the shooter\u2019s rhythm.) <\/P>\n<P>But we have to let you in on a secret: the Scripting Dad cheated. After all, he didn\u2019t bother to mention that, when he was in fourth grade, he was the school free throw shooting champion at Eastgate Elementary School in Kennewick, WA. The poor Scripting Son had no idea who he was up against.<\/P>\n<TABLE id=\"E4C\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD>\n<P class=\"lastInCell\"><B>Note<\/B>. Want to know a truly sad story? In fourth grade the Scripting Guy who writes this column really <I>was<\/I> the free throw shooting champion at Eastgate Elementary School. Winning the school championship qualified him for the district championship and could have, in theory, led him all the way to the national championship. Except, of course, the school forgot to tell him about the district championship. He thus never even got a chance to compete for the national championship. Life has pretty much been downhill ever since.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>So will any of that help you locate your target file names? Probably not. But this will:<\/P><PRE class=\"codeSample\">Const ForReading = 1<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;C:\\Scripts\\Test.txt&#8221;, ForReading)<\/p>\n<p>strContents = objFile.ReadAll()\nobjFile.Close<\/p>\n<p>Set objRegEx = CreateObject(&#8220;VBScript.RegExp&#8221;)\nobjRegEx.Global = True   \nobjRegEx.Pattern = &#8220;\\d{1,}.zip&#8221;<\/p>\n<p>Set colMatches = objRegEx.Execute(strContents)  <\/p>\n<p>If colMatches.Count &gt; 0 Then\n   Set objFile = objFSO.CreateTextFile(&#8220;C:\\Scripts\\Zipfiles.txt&#8221;)\n   objFile.Write strList\n   For Each objMatch in colMatches   \n       objFile.WriteLine objMatch.Value\n   Next\n   objFile.Close\nEnd If\n<\/PRE>\n<P>Before we go into the nitty-gritty details we should point out that we\u2019re assuming you have a text file similar to this:<\/P><PRE class=\"codeSample\">Here is our first file: 1234.zip.\nAnother file is 5678.zip.\n123456789.zip is the third file.\nThis file &#8212; 987654321.zip &#8212; is file number 4.\n<\/PRE>\n<P>As you can see, there are four file names scattered throughout the contents of this file:<\/P><PRE class=\"codeSample\">1234.zip\n5678.zip\n123456789.zip\n987654321.zip\n<\/PRE>\n<P>If all goes well our script will open the file C:\\Scripts\\Test.txt, locate all the file names, and then write those file names to a second text file (C:\\Scripts\\Zipfiles.txt). Granted, if we knew all the file names in advance this would be easy: we could just use a series of <B>InStr<\/B> commands to see if any of those names could be found in Test.txt. Unfortunately, though, we don\u2019t know the names of the files and we don\u2019t know how many files might be listed in Test.txt; we don\u2019t even know how many characters are in each file name. (For example, 1234.zip has 4 characters in the file name itself, while 123456789.zip has 9 characters in the file name.) Sounds hopeless, doesn\u2019t it?<\/P>\n<P>Well, maybe for some people. But not for a Scripting Guy who can make 19 consecutive free throws to defeat his son. <\/P>\n<P>So how does our script manage to overcome such a hopeless situation? Well, we start out simple enough, defining a constant named ForReading and setting the value to 1; we\u2019ll use this constant when we open and read the text file C:\\Scripts\\Test.txt. After defining the constant we create an instance of the <B>Scripting.FileSystemObject<\/B> and open the file Test.txt. With the file open we can then use the <B>ReadAll()<\/B> method to read the entire contents of the file into a variable named strContents:<\/P><PRE class=\"codeSample\">strContents = objFile.ReadAll()\n<\/PRE>\n<P>At that point we have no further need for Test.txt so we use the <B>Close<\/B> method to close the file.<\/P>\n<P>Got all that? All we\u2019ve done so far is open the file Test.txt and copy the contents to the variable strContents. What we\u2019ll do now is search for those target file names using the value strContents rather than the actual text file itself. (Why? Because the FileSystemObject doesn\u2019t really provide a way for us to search a text file; we need to make a copy of that file in memory and do our searches on that copy.)<\/P>\n<P>That brings us to our secret ingredient for the day: regular expressions. We\u2019ve already noted that the InStr function \u2013 commonly used for locating string values within a text file \u2013 is of little use to us here. But that\u2019s OK; regular expressions, while admittedly a bit cryptic at times, are far more powerful than InStr, as well as far more adaptable to situations where all you have is a general idea of what you\u2019re looking for (a string of numbers followed by <I>.zip<\/I>). InStr, by contrast, works best when you know <I>exactly<\/I> what you\u2019re looking (i.e., a file named <I>1234.zip<\/I>).<\/P>\n<TABLE id=\"ERE\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD>\n<P class=\"lastInCell\"><B>Note<\/B>. Yes, we know: regular expressions might be new to a lot of you. Buy we Scripting Guys think of everything: we already have a <A href=\"http:\/\/www.microsoft.com\/events\/EventDetails.aspx?CMTYSvcSource=MSCOMMedia&amp;Params=%7eCMTYDataSvcParams%5e%7earg+Name%3d%22ID%22+Value%3d%221032271679%22%2f%5e%7earg+Name%3d%22ProviderID%22+Value%3d%22A6B43178-497C-4225-BA42-DF595171F04C%22%2f%5e%7earg+Name%3d%22lang%22+Value%3d%22en%22%2f%5e%7earg+Name%3d%22cr%22+Value%3d%22US%22%2f%5e%7esParams%5e%7e%2fsParams%5e%7e%2fCMTYDataSvcParams%5e\" target=\"_blank\"><B>webcast<\/B><\/A> that explains the fundamentals of regular expressions.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>Before we conduct the search itself we need to do a little preparation. First, we create an instance of the <B>VBScript.RegExp<\/B> object. Second, we execute these two lines of code:<\/P><PRE class=\"codeSample\">objRegEx.Global = True   \nobjRegEx.Pattern = &#8220;\\d{1,}.zip&#8221;\n<\/PRE>\n<P>In line 1 we set the <B>Global<\/B> property of the regular expressions object to True; that tells the script that we want to find every instance of the target string. (Had we set this to False the script would have stopped after finding the first file name in the file.) In line 2 we then set the <B>Pattern<\/B> property to the target string. Believe it or not, this is what we\u2019re searching for:<\/P><PRE class=\"codeSample\">\\d{1,}.zip\n<\/PRE>\n<P>Like we said, regular expressions can look a little cryptic at times. With that in mind let\u2019s break this pattern down to its constituent parts:<\/P>\n<TABLE border=\"0\" cellSpacing=\"0\" cellPadding=\"0\">\n<TBODY>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>\\d<\/B>. The <B>\\d<\/B> indicates that we only want to match digits (0-9). Letters, blank spaces, punctuation marks \u2013 we aren\u2019t interested in any of those things. Just numbers.<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>{1,}<\/B>. This odd-looking construction tells the script how many consecutive numbers qualify as a match. The 1 simply says that the target string must have at least 1 number in it; the comma followed by nothing means that there is no limit to the total number of digits in the target string. In others words, a 1-digit number is a match; so is a 4-digit number, and a 10-digit number, and a 7,585-digit number. This is perhaps easier explained by posing a different scenario: what if our target string had to consist of a number with at least 3 digits but with no more than 7 digits? In that case we\u2019d use this syntax: <B>{3,7}<\/B>. Make sense?<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>.zip<\/B>. Finding a series of consecutive numbers is great, but those numbers then have to be followed by<B>.zip<\/B>. That\u2019s the reason for adding <I>.zip<\/I> to the pattern.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<P>In other words, we\u2019re asking the script to search for a number or consecutive set of numbers (it doesn\u2019t matter how many numbers) immediately followed by a <I>.zip<\/I>. You know, strings such as this:<\/P><PRE class=\"codeSample\">1234.zip\n5678.zip\n123456789.zip\n987654321.zip\n<\/PRE>\n<P>After defining the Pattern we then call the <B>Execute<\/B> method and actually search the value of strContents:<\/P><PRE class=\"codeSample\">Set colMatches = objRegEx.Execute(strContents)\n<\/PRE>\n<P>Any time you call the Execute method all the instances of the target string that are discovered are stored in the Matches collection (in our script, we use the object reference colMatches to refer to that collection). To determine which file names (if any) can be found in strContents we simply need to set up a For Each loop to loop through all the items in the collection:<\/P><PRE class=\"codeSample\">If colMatches.Count &gt; 0 Then\n   Set objFile = objFSO.CreateTextFile(&#8220;C:\\Scripts\\Zipfiles.txt&#8221;)\n   For Each objMatch in colMatches   \n       objFile.WriteLine objMatch.Value\n   Next\n   objFile.Close\nEnd If\n<\/PRE>\n<P>Oh, right: first we check to see if the value of the <B>Count<\/B> property is greater than 0. If it is, that means at least one instance of the target string was found. In that case, we <I>then<\/I> go ahead and use the <B>CreateTextFile<\/B> method to create a new text file, C:\\Scripts\\Zipfiles.txt.<\/P>\n<P>With Zipfiles.txt created and ready for business we next set up our For Each loop. Inside that loop we simply use the <B>WriteLine<\/B> method to write the <B>Value<\/B> property to Zipfiles.txt. As you can probably guess, the Value property will correspond to the value of the matching string: if we found the string <I>1234.zip<\/I> then the Value for that match will be, well, <I>1234.zip<\/I>.<\/P>\n<P>After we\u2019ve looped through the entire collection we call the <B>Close<\/B> method to close Zipfiles.txt. And guess what we\u2019ll see the next time we open Zipfiles.txt:<\/P><PRE class=\"codeSample\">1234.zip\n5678.zip\n123456789.zip\n987654321.zip\n<\/PRE>\n<P>Cool, huh?<\/P>\n<P>OK, maybe not as cool as hitting 19 consecutive free throws in order to defeat your overly-competitive son. (And no, we don\u2019t have any idea where he gets that from.) But it\u2019ll do for now.<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I have a file which includes a number of file names. All of these names consist of a series of numbers followed by a .zip file extension; for example, 1234.zip, 5678.zip, etc. How can I write a script that locates all these file names and then saves just the names to a [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[174,3,4,5],"class_list":["post-66373","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-regular-expressions","tag-scripting-guy","tag-scripting-techniques","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I have a file which includes a number of file names. All of these names consist of a series of numbers followed by a .zip file extension; for example, 1234.zip, 5678.zip, etc. How can I write a script that locates all these file names and then saves just the names to a [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/66373","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=66373"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/66373\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=66373"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=66373"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=66373"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}