April 15th, 2005

How Can I Search a Text File for 9-Digit Numbers?

Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I search for user names in a text file? In my case, the user names all consist of 9 digits, although they can be any 9 digits.

— GG

SpacerHey, Scripting Guy! AnswerScript Center

Hey, GG. Interesting question. You want to search for 9-digit numbers (like 123456789) in a text file; the only problem is that these could be any 9-digit numbers. And that definitely is a problem. If you were looking for a specific 9-digit number you could use VBScript’s InStr function:

intMatch = InStr(strText, “123456789”)

If intMatch is greater than 0 that means the value 123456789 was found in the search string strText; if intMatch is 0, then the value couldn’t be found. Simple.

But that won’t work here, not unless you want to create separate InStr calls for each possible 9-digit value. (In case you’re wondering, that would be around 1 billion lines of code.) So if InStr can’t do the job for us, what can?

Did someone say regular expressions? Well, you should have, because that’s the answer. Regular expressions – which are actually built into VBScript – provide a very flexible and very powerful way to search through text data. Regular expressions are used quite a bit in the Unix world simply because so much of Unix administration is built around the use of text files. But as this question illustrates, regular expressions have their use in Windows as well. InStr can’t find any 9-digit number, regardless of the 9 digits; regular expressions can.

Note. Actually, regular expressions can do almost anything. (Hey, we said almost.) We can’t even begin to detail all the uses of regular expressions in one little column. If you’re interested in learning more about this very powerful technology, view the Scripting Guys webcast String Theory for System Administrators: An Introduction to Regular Expressions.

Let’s take a look at a script that can locate any 9-digit numbers in a search string. For this simple example we’ll search a hard-coded value. After we’ve explained how the script works we’ll then modify it to search a text file.

Here’s the script:

Set objRegEx = CreateObject(“VBScript.RegExp”)
objRegEx.Global = True   
objRegEx.Pattern = “\d{9}”

strSearchString = “aaaaaaa123456789qqqqqqqqqqqq234567891aaaa12345678”

Set colMatches = objRegEx.Execute(strSearchString)

If colMatches.Count > 0 Then strMessage = “The following user accounts were found:” & vbCrlf For Each strMatch in colMatches strMessage = strMessage & strMatch.Value & ” (character position ” & _ strMatch.FirstIndex & “)” & vbCrLf Next End If

Wscript.Echo strMessage

If you run this script under Cscript you should get back information that looks like this; in other words, the script should have located (and reported the position of) the two 9-digit values in the search string:

The following user accounts were found:
123456789 (character position 7)
234567891 (character position 28)

So how did we do this? We used regular expressions, remember? Sheesh.

Oh, right: sorry. Actually, we began by creating an instance of VBScript’s Regular Expressions object (RegExp); that’s what this line of code does:

Set objRegEx = CreateObject(“VBScript.RegExp”)

We then set two properties of the Regular Expressions object. By setting the Global property to True we’re telling the script to go ahead and search through the entire string and locate all matches; by default, the Regular Expressions object locates only the first match and then stops searching.

The Pattern property represents the item we’re searching for. Note that, in this case at least, we really are specifying a pattern to search for. The \d means “Search for digits; ignore anything that isn’t a number.” The {9} means “Nine consecutive matches.” Taken together we’re saying, “Look for any 9 numbers in a row.” 123456789 will be a match; that’s because there are 9 numbers in a row. 12345678 9 won’t be a match; that’s because we have 8 numbers and then a blank space. Close, but no cigar.

After specifying the pattern we then assign our search string to a variable named strSearchString. That happens here:

strSearchString = “aaaaaaa123456789qqqqqqqqqqqq234567891aaaa12345678”

And now we’re ready to go. This line of code fires off the search and returns a collection of all the matches:

Set colMatches = objRegEx.Execute(strSearchString)

How do we know if our search actually found any 9-digit numbers? The easiest way is to simply check the Count property of the collection; if the count is greater than 0 then at least one match was found. If that’s the case we then loop through the collection and grab the Value and FirstIndex values for each match. (FirstIndex is simply the character position in the string where the match can be found.) We store this information in a variable named strMessage; at the end of the script we then echo the value of this variable. As we noted earlier, that should look like this:

The following user accounts were found:
123456789 (character position 7)
234567891 (character position 28)

Is that cool or what? Of course, the preceding example only searches a string hard-coded into the script. To search a text file you need to use the FileSystemObject to open the file, read the contents into the variable strSearchString, and then execute the Regular Expressions search. Here’s a modified script that searches the file C:\Scripts\Test.txt:

Const ForReading = 1

Set objRegEx = CreateObject(“VBScript.RegExp”) objRegEx.Global = True objRegEx.Pattern = “\d{9}”

Set objFSO = CreateObject(“Scripting.FileSystemObject”) Set objFile = objFSO.OpenTextFile(“C:\Scripts\Test.txt”, ForReading) strSearchString = objFile.ReadAll objFile.Close

Set colMatches = objRegEx.Execute(strSearchString)

If colMatches.Count > 0 Then strMessage = “The following user accounts were found:” & vbCrlf For Each strMatch in colMatches strMessage = strMessage & strMatch.Value & ” (character position ” & _ strMatch.FirstIndex & “)” & vbCrLf Next End If

Wscript.Echo strMessage

Like we said, is that cool or what? To learn even more, don’t forget about the regular expressions webcast coming April 26th.


Author

0 comments

Discussion are closed.