May 6th, 2008

Hey, Scripting Guy! How Can I Search for a Word in a Text File and Return the Entire Line Where That Word Was Found?

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a scenario in which I need to read a list of names from a text file. I then need to search for each of those names in a second text file; if a name is found I need to extract the entire line in which that name appears. I’ve figured out how to search for a name in a text file, but I can’t figure out how to extract the entire line in which that name appears. Can you help me?
— DS

SpacerHey, Scripting Guy! AnswerScript Center

Hey, DS. Today is May 5th, Cinco de Mayo. Cinco de Mayo is actually the Scripting Guy who writes this column’s favorite holiday; that’s because this holiday is primarily celebrated in America, and yet the average American has no idea what it is he or she is celebrating. Most Americans think that Cinco de Mayo is Mexico’s Independence Day; it’s not. (And why would the average American be so excited about celebrating Mexico Independence Day anyway? Well, we’re not sure, but we’re willing to bet that Mexican beer has something to do with it.)

Instead, Cinco de Mayo actually pays homage to the 1862 Battle of Puebla. Back then, due to a severe financial crisis, Mexico stopped payment on its foreign debts, a very high percentage of which were owed to France. The French weren’t too happy about this, so they sent an army across the ocean and invaded Mexico. An outmanned and outgunned Mexican army defeated the French at the Battle of Puebla, and the country was saved.

Well, for awhile anyway: a year later the French launched a second invasion, overran the capital city, and installed the Emperor Maximilian of Hapsburg as the new ruler of Mexico. It would be another 5 years before the Mexicans were able to regain their independence.

In other words, Mexico won the Battle of Puebla but lost the war. But hey, that’s still one more victory than the Scripting Guys have ever had.

As we noted, Cinco de Mayo is primarily an American holiday, one designed to celebrate Mexican culture. And how do Americans celebrate Mexican culture? Well, a large number of them seem to celebrate Mexican culture by drinking beer at an Irish pub; an Internet search for Cinco de Mayo and Irish pub returned 143,000 hits. That’s about the same number of hits returned for Cinco de Mayo and Mexican restaurant.

Sadly enough, though, a search for Cinco de Mayo and system administration scripting returned zero hits; for some reason, people don’t associate Cinco de Mayo with system administration scripting. But don’t despair; after all, that’s a problem the Scripting Guys can rectify, and right away:

Const ForReading = 1

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

strTargetText = objFile.ReadAll
objFile.Close

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True  

Set objFile = objFSO.OpenTextFile("C:\Scripts\Names.txt", ForReading)

Do Until objFile.AtEndOfStream
    strName = objFile.ReadLine
    objRegEx.Pattern = ".{0,}" & strName & ".{0,}\n"
    Set colMatches = objRegEx.Execute(strTargetText)  

    If colMatches.Count > 0 Then
       For Each strMatch in colMatches   
           strText = strText & strMatch.Value 
       Next
    End If
Loop

Wscript.Echo strText

Before we talk about the script and how it works we should explain the scenario in a little more detail. DS has a text file (C:\Scripts\Test.txt) that includes lines similar to these:

192.168.0.0, this is sql users, tom, testmachine 
192.168.1.2, this is mysql users, john, testmachine2 
192.168.0.0, this is sql users, joseph, testmachine5 
192.168.0.3, this is sql users, maria, testmachine6 
192.168.0.5, this is sql users, donald, testmachine7 
192.168.0.6, this is sql users, tom, testmachine11

In addition, he has a second text file (C:\Scripts\Names.txt) that includes user names like the following:

tom
peter
joseph

What DS needs to do is search Test.txt for each of these names, reporting back any lines where a given name is found. For example, when the script searches for joseph it needs to return this entire line:

192.168.0.0, this is sql users, joseph, testmachine5

Sounds like an impossible task? Let’s put it this way: that’s what the naysayers said about the Battle of Puebla.

Note. In case you’re wondering, the Babelfish Web site tells us that the Spanish word for naysayer is, well, naysayer.

We start our script off by defining a constant named ForReading and setting the value to 1; we’ll need this constant when we open our text files for reading. After defining the constant we create an instance of the Scripting.FileSystemObject, then use this line of code to open the file Test.txt:

Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

And what do we do with that file once it’s open? To tell you the truth, not much; in fact, all we’re going to do is use the ReadAll method to read in the entire contents of the file, storing that information in a variable named strTargetText:

strTargetText = objFile.ReadAll

At that point we call the Close method and never touch Test.txt again. Why not? Because we don’t need it anymore; instead, we’ll conduct all our searches on the virtual copy of Test.txt stored in the variable strTargetText.

Next we create an instance of the VBScript.RegExp object, the object that enables us to do regular expression searches within a VBScript script:

Set objRegEx = CreateObject("VBScript.RegExp")

We set the Global property of our regular expressions object to True (to make sure that we find all the possible matches in the target text), then reuse the FileSystemObject and open the file Names.txt:

Set objFile = objFSO.OpenTextFile("C:\Scripts\Names.txt", ForReading)

Now we’re ready to actually do something useful. To begin with, we set up a Do Until loop that runs until the file’s AtEndOfStream property is True; that simply means that we’re going to keep reading the file until there’s nothing left to read. Inside this loop we use the ReadLine method to read in the first line (the first name) from the text file, storing that value in the variable strName:

strName = objFile.ReadLine

That brings us to this line of code:

objRegEx.Pattern = ".{0,}" & strName & ".{0,}\n"

What we’re doing here is telling the script exactly what text we’re searching for. And what text are we searching for? Well, for starters, the construction .{0,} means we’re searching for zero or more instances of any character (that’s what the . represents) except for the carriage return linefeed character. In other words, we want some characters (any characters except the carriage return-linefeed characters) followed by the name we just read in from the text file. That name must then be followed by 0 or more characters, followed – at long last – by a carriage return-linefeed (\n).

So why do search for all that even though we’re just looking for a name like tom or joseph? One reason and one reason only: that’s how we can extract an entire line from the file. After all, we’re no longer searching for the name tom; we’re now searching for a line of text that happens to include the name tom. It looks a little crazy (as do many regular expressions), but it works.

After we assign the value to the Pattern property we then use this line of code to search strTargetText for that pattern:

Set colMatches = objRegEx.Execute(strTargetText)

Any instances of the pattern we find will be placed in a collection we named colMatches. How do we know if we actually found any instances of the pattern? That’s easy; we simply check the value of the collection’s Count property:

If colMatches.Count > 0 Then

If the Count is equal to 0 that means that the pattern could not be found; in that case we simply go back to the top of the loop and try again with the next name in the text file. If the Count is greater than 0 that means at least one match was found. With that in mind, we use a For Each loop to loop through all the matches in the collection, adding the value of each match (in effect, the line from the text file) to a variable named strText:

strText = strText & strMatch.Value

After we’ve run similar searches for all the names in the text file we then echo back the value of strText. If all goes according to plan, that should give us output like this:

192.168.0.0, this is sql users, tom, testmachine
192.168.0.6, this is sql users, tom, testmachine11
192.168.0.0, this is sql users, joseph, testmachine5

Perfect! Or, as they say in Spanish, perfecto!

Speaking of Spanish, the term Cinco de Mayo translates to the fifth of May. The Scripting Guy who writes this column grew up in the Tri-Cities, WA. Try to guess when the Tri-Cities held its Cinco de Mayo celebration this year. You’re right: May 3rd.

See? It’s not his fault that the Scripting Guy who writes this column turned out the way he did.

Well, not entirely, anyway.

Author

0 comments

Discussion are closed.