How Can I Locate Strings That Consist of a Series of Numbers Followed by .ZIP?

ScriptingGuy1

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a file which includes a number of file names. All of these names consist of a series of numbers followed by a .zip file extension; for example, 1234.zip, 5678.zip, etc. How can I write a script that locates all these file names and then saves just the names to a second file?

— RS

SpacerHey, Scripting Guy! AnswerScript Center

Hey, RS. You know, numbers are very important. Take the number 19, for example. Last night the Scripting Guy who writes this column and the Scripting Son were playing 21, a one-on-one basketball game in which you start off by trying to make a field goal (worth 2 points). After you make a field goal you then step to the foul line and shoot free throws, with each free throw worth 1 point. Furthermore, you are allowed to continue shooting free throws until you miss. The game continues in this fashion until someone reaches 21 points.

Last night the Scripting Son got the ball first and quickly made a basket; he then made three straight free throws, putting him up 5-0. Unfortunately – for him anyway – he missed his next free throw; the Scripting Dad grabbed the rebound, made a field goal, and then ran the table, makinge 19 consecutive free throws to win the game. (Best of all, he made them despite the Scripting Son throwing the ball back hard to him, throwing it back soft to him, throwing it over his head, throwing it at his feet, and doing all those time-honored strategies designed to upset the shooter’s rhythm.)

But we have to let you in on a secret: the Scripting Dad cheated. After all, he didn’t bother to mention that, when he was in fourth grade, he was the school free throw shooting champion at Eastgate Elementary School in Kennewick, WA. The poor Scripting Son had no idea who he was up against.

Note. Want to know a truly sad story? In fourth grade the Scripting Guy who writes this column really was the free throw shooting champion at Eastgate Elementary School. Winning the school championship qualified him for the district championship and could have, in theory, led him all the way to the national championship. Except, of course, the school forgot to tell him about the district championship. He thus never even got a chance to compete for the national championship. Life has pretty much been downhill ever since.

So will any of that help you locate your target file names? Probably not. But this will:

Const ForReading = 1

Set objFSO = CreateObject(“Scripting.FileSystemObject”) Set objFile = objFSO.OpenTextFile(“C:\Scripts\Test.txt”, ForReading)

strContents = objFile.ReadAll() objFile.Close

Set objRegEx = CreateObject(“VBScript.RegExp”) objRegEx.Global = True objRegEx.Pattern = “\d{1,}.zip”

Set colMatches = objRegEx.Execute(strContents)

If colMatches.Count > 0 Then Set objFile = objFSO.CreateTextFile(“C:\Scripts\Zipfiles.txt”) objFile.Write strList For Each objMatch in colMatches objFile.WriteLine objMatch.Value Next objFile.Close End If

Before we go into the nitty-gritty details we should point out that we’re assuming you have a text file similar to this:

Here is our first file: 1234.zip.
Another file is 5678.zip.
123456789.zip is the third file.
This file — 987654321.zip — is file number 4.

As you can see, there are four file names scattered throughout the contents of this file:

1234.zip
5678.zip
123456789.zip
987654321.zip

If all goes well our script will open the file C:\Scripts\Test.txt, locate all the file names, and then write those file names to a second text file (C:\Scripts\Zipfiles.txt). Granted, if we knew all the file names in advance this would be easy: we could just use a series of InStr commands to see if any of those names could be found in Test.txt. Unfortunately, though, we don’t know the names of the files and we don’t know how many files might be listed in Test.txt; we don’t even know how many characters are in each file name. (For example, 1234.zip has 4 characters in the file name itself, while 123456789.zip has 9 characters in the file name.) Sounds hopeless, doesn’t it?

Well, maybe for some people. But not for a Scripting Guy who can make 19 consecutive free throws to defeat his son.

So how does our script manage to overcome such a hopeless situation? Well, we start out simple enough, defining a constant named ForReading and setting the value to 1; we’ll use this constant when we open and read the text file C:\Scripts\Test.txt. After defining the constant we create an instance of the Scripting.FileSystemObject and open the file Test.txt. With the file open we can then use the ReadAll() method to read the entire contents of the file into a variable named strContents:

strContents = objFile.ReadAll()

At that point we have no further need for Test.txt so we use the Close method to close the file.

Got all that? All we’ve done so far is open the file Test.txt and copy the contents to the variable strContents. What we’ll do now is search for those target file names using the value strContents rather than the actual text file itself. (Why? Because the FileSystemObject doesn’t really provide a way for us to search a text file; we need to make a copy of that file in memory and do our searches on that copy.)

That brings us to our secret ingredient for the day: regular expressions. We’ve already noted that the InStr function – commonly used for locating string values within a text file – is of little use to us here. But that’s OK; regular expressions, while admittedly a bit cryptic at times, are far more powerful than InStr, as well as far more adaptable to situations where all you have is a general idea of what you’re looking for (a string of numbers followed by .zip). InStr, by contrast, works best when you know exactly what you’re looking (i.e., a file named 1234.zip).

Note. Yes, we know: regular expressions might be new to a lot of you. Buy we Scripting Guys think of everything: we already have a webcast that explains the fundamentals of regular expressions.

Before we conduct the search itself we need to do a little preparation. First, we create an instance of the VBScript.RegExp object. Second, we execute these two lines of code:

objRegEx.Global = True   
objRegEx.Pattern = “\d{1,}.zip”

In line 1 we set the Global property of the regular expressions object to True; that tells the script that we want to find every instance of the target string. (Had we set this to False the script would have stopped after finding the first file name in the file.) In line 2 we then set the Pattern property to the target string. Believe it or not, this is what we’re searching for:

\d{1,}.zip

Like we said, regular expressions can look a little cryptic at times. With that in mind let’s break this pattern down to its constituent parts:

•

\d. The \d indicates that we only want to match digits (0-9). Letters, blank spaces, punctuation marks – we aren’t interested in any of those things. Just numbers.

•

{1,}. This odd-looking construction tells the script how many consecutive numbers qualify as a match. The 1 simply says that the target string must have at least 1 number in it; the comma followed by nothing means that there is no limit to the total number of digits in the target string. In others words, a 1-digit number is a match; so is a 4-digit number, and a 10-digit number, and a 7,585-digit number. This is perhaps easier explained by posing a different scenario: what if our target string had to consist of a number with at least 3 digits but with no more than 7 digits? In that case we’d use this syntax: {3,7}. Make sense?

•

.zip. Finding a series of consecutive numbers is great, but those numbers then have to be followed by.zip. That’s the reason for adding .zip to the pattern.

In other words, we’re asking the script to search for a number or consecutive set of numbers (it doesn’t matter how many numbers) immediately followed by a .zip. You know, strings such as this:

1234.zip
5678.zip
123456789.zip
987654321.zip

After defining the Pattern we then call the Execute method and actually search the value of strContents:

Set colMatches = objRegEx.Execute(strContents)

Any time you call the Execute method all the instances of the target string that are discovered are stored in the Matches collection (in our script, we use the object reference colMatches to refer to that collection). To determine which file names (if any) can be found in strContents we simply need to set up a For Each loop to loop through all the items in the collection:

If colMatches.Count > 0 Then
   Set objFile = objFSO.CreateTextFile(“C:\Scripts\Zipfiles.txt”)
   For Each objMatch in colMatches   
       objFile.WriteLine objMatch.Value
   Next
   objFile.Close
End If

Oh, right: first we check to see if the value of the Count property is greater than 0. If it is, that means at least one instance of the target string was found. In that case, we then go ahead and use the CreateTextFile method to create a new text file, C:\Scripts\Zipfiles.txt.

With Zipfiles.txt created and ready for business we next set up our For Each loop. Inside that loop we simply use the WriteLine method to write the Value property to Zipfiles.txt. As you can probably guess, the Value property will correspond to the value of the matching string: if we found the string 1234.zip then the Value for that match will be, well, 1234.zip.

After we’ve looped through the entire collection we call the Close method to close Zipfiles.txt. And guess what we’ll see the next time we open Zipfiles.txt:

1234.zip
5678.zip
123456789.zip
987654321.zip

Cool, huh?

OK, maybe not as cool as hitting 19 consecutive free throws in order to defeat your overly-competitive son. (And no, we don’t have any idea where he gets that from.) But it’ll do for now.

0 comments

Discussion is closed.

Feedback usabilla icon