February 11th, 2008

Hey, Scripting Guy! How Can I Remove Extraneous Spaces From Fields in a Text File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a text file in which each field has a bunch of extra spaces tacked on to the end. How can I remove all those extra spaces?

— GC

SpacerHey, Scripting Guy! AnswerScript Center

Hey, GC. Well, today is Monday, February 11th, which can mean only one thing: the 2008 Winter Scripting Games are just a few days away. But we didn’t need to tell you that, did we? After all, if you’re anything like the Scripting Guys, you can feel the excitement in the air, can’t you?

Oh, wait, never mind: that’s just more Seattle rain. We’ve been getting a lot of that lately.

At any rate, if you’re still waffling about whether or not you want to enter the Scripting Games you might want to check out the Scripting Games home page, where we’ve added information about our User Group, International, Windows PowerShell 2.0, and Sudden Death challenges. What do all those things mean to you? Well, in a nutshell, that means that not only will the 2008 Games be even more fun than last year’s Games, but, on top of that, you’ll also have all sorts of ways to win a prize:

Score at least 60 points in any one division (e.g., Perl Beginners) and you’ll earn a Scripting Games Certificate of Excellence.

Enter at least one event and you’ll be eligible to win – via random drawing – one of the many outstanding prizes shown on our Prizes page. Best of all, you’re eligible for the drawing even if your script fails. How many other times in your life have you scored a 0 on a test and been given a prize for that? Well, except for those of you who went to Washington State University.

Just kidding, Cougar fans.

Be a member of the user group that (on a percentage basis) has the most members participate in the Games. Do that and you’ll win … well, something. (We’re still working on that.)

Be a representative of the country that (on a proportional basis) has the most people participate in the Games. Do that and you’ll win … well, nothing. But your country will be featured throughout the month of April in the Script Center. And that is definitely the highest honor any nation could ever hope to receive.

At least from the Scripting Guys.

Use a Windows PowerShell 2.0 feature in one of your solutions and get yourself a limited-edition PowerShell T-shirt.

Enter the Sudden Death Challenge and win … well, we haven’t quite nailed that down, either. But we will have prizes for the Sudden Death Challenge; you can count on that.

At any rate, we could go on and on except for one thing: that’s about it, at least for the moment. But who knows what will happen between now and Friday? For some reason, this year’s edition of the Scripting Games seems to have a life of its own.

Of course, that’s all well and good; however, it doesn’t help GC much with his problem, does it? Fortunately, though the following bit of code should:

Const ForReading = 1Const ForWriting = 2Set objFSO = CreateObject("Scripting.FileSystemObject")Set objRegEx = CreateObject("VBScript.RegExp")objRegEx.Global = True   objRegEx.Pattern = " {2,}"Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)Do Until objFile.AtEndOfStream    strSearchString = objFile.ReadLine    strNewString = objRegEx.Replace(strSearchString,"")    strNewContents = strNewContents & strNewString & vbCrLfLoopobjFile.CloseSet objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting)objFile.Write strNewContentsobjFile.Close

Before we go any further we should take a moment and explain the scenario. GC has a bunch of text files containing data similar to this:

Emailaddr                            |Firstname                        |Lastname"kmyer@fabrikam.com          "|"Ken                     "|"Myer            "|"aabudayah@fabrikam.com          "|"Ahmad          "|"Abu Dayah                "|"htandersen@fabrikam.com          "|"Henriette Thaulow              "|"Andersen"|

What he would like to do is eliminate all the blank spaces that come at the end of each field. In other words, he’d like his text files to look like this:

Emailaddr|Firstname|Lastname"kmyer@fabrikam.com"|"Ken"|"Myer"|"aabudayah@fabrikam.com"|"Ahmad"|"Abu Dayah"|"htandersen@fabrikam.com"|"Henriette Thaulow"|"Andersen"|

Interestingly enough, that was our first thought, too: why don’t we just use the Replace function to remove all the blank spaces? (That is, replace each blank space with nothing?) That almost works, except for one thing: if we remove all the blank spaces then a name like Henriette Thaulow would end up looking like this:

HenrietteThaulow

Not exactly what we wanted.

If you take a careful look at the text file, however, you’ll see that what we really want to do is remove any instances of two or more blank spaces in succession. We can’t remove single spaces; that messes up poor old Henriette Thaulow. If we stipulate that there must be at least two blank spaces in a row (and any number of blanks spaces after that), well, then we can delete all those blank spaces while making allowances for Henriette Thaulow and Ahmad Abu Dayah. Did we just hear someone say, “That sounds like a job for regular expressions”? Good call; this is a job for regular expressions. As it turns out, we’re going to use a regular expression to search for – and replace – any instances of two or more blank spaces in succession. That’s going to let us eliminate all the extraneous blank spaces tacked on the end of each field, all the while preserving the blank spaces between two names like Henriette and Thaulow.

To that end, we start our script off by defining a pair of constants, ForReading and ForWriting; we’ll use these constants to open our text file. After creating an instance of the Scripting.FileSystemObject (also needed for working with the text file), we then create an instance of VBScript’s RegExp object, the COM object that enables VBScript to use regular expressions:

Set objRegEx = CreateObject("VBScript.RegExp")

Once we have an instance of the RegExp object we then assign values to two of the object’s properties. To begin with, we set the Global property to True; this tells the regular expressions object that we want to search for all instances of our target text, not just the first such instance. We then define that target text by assigning the following value to the Pattern property:

" {2,}"

As you can see, we have a blank space here followed by this construction: {2,}. That simply means that we’re looking for at least two consecutive instances of the preceding character (the blank space). The comma followed by nothing means that the target text needs to have at least 2 consecutive blank spaces, but can have as many as an infinite number of consecutive blank spaces. (Although in that case the script would probably take a while to finish running.) In other words, this pattern will find 2 consecutive blank spaces. It will also find 3 consecutive blanks spaces, 11 consecutive blank spaces, 147 consecutive blank spaces, etc., etc.

After configuring the regular expressions object we use this line of code to open the file C:\Scripts\Test.txt for reading:

Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

Our next task is to set up a Do Until loop that runs until that file’s AtEndOfStream property is True; that simply means we’re going to keep reading the file until there’s nothing left to read. Inside that loop the first thing we do is use the ReadLine method to read the initial line in the file and store it in a variable named strSearchString:

strSearchString = objFile.ReadLine

That brings us to this line of code:

strNewString = objRegEx.Replace(strSearchString,"")

What we’re doing here is using the regular expression object’s Replace method to search for the target text and replace it with, well, nothing (“”). What’s that going to do? That’s going to take a line of text like this:

Emailaddr                            |Firstname                        |Lastname

And turn it into a line of text that looks like this:

Emailaddr|Firstname|Lastname

In the next line, we add this new value to a variable named strNewContents. Each time we remove the extraneous spaces from a line in the text file we’ll append that modified line to strNewContents. When we’ve finished modifying all the lines in the file we’ll then replace the existing contents of Test.txt with the value of strNewContents.

We should note that there’s one potential flaw in this approach. Suppose a field looked like this:

"kmyer@fabrikam.com "|"Ken       "|"Myer     "

As you can see, there’s a single space between fabrikam.com and the closing double quote mark. That space probably should be removed; however, because it’s just one space our regular expression won’t find it. We could try to write a more complicated regular expression, but it might be easier just to run our simple regular expression and then execute this line of code:

strNewString = Replace(strNewString, " " & Chr(34), Chr(34))

All this line does is search for any instance of a blank space followed by a double quote mark (Chr(34)) and then replaces that with just a double quote mark. That should do the trick.

At any rate, after we’ve read (and modified) all the lines in the text file we use the Close method to close Test.txt. We then use this line of code to immediately reopen the file, this time for writing:

Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting)

All we have to do now is use the Write method (and the variable strNewContents) to overwrite the existing contents of Test.txt. After that we call the Close method to close the file one final time, and then are free to concentrate on more important things.

Like, say, the 2008 Winter Scripting Games. Remember, the Games start this Friday, February 15th. And while you don’t have to show up on the first day, keep in mind that the deadline for submitting events 1 and 2 is 8:00 AM (Redmond time) on Wednesday, February 20th. If you’re aiming for a perfect score in this year’s Games make sure you leave yourself plenty of time to get events 1 and 2 submitted before the deadline.

And what happens if you don’t get a perfect score? Will the Scripting Guys make fun of you? You bet we will. (But don’t worry; the Scripting Editor will probably edit all that stuff out anyway.)

Note. OK, if you want to know the truth, we have no desire or intention to make fun of anyone just because they don’t get a perfect score in the Games. (And we can assure you that most people don’t get a perfect score in the Games; it actually means something to get a perfect score in the Scripting Games.) The truth is, the Games are all about having some fun and maybe learning a little something about scripting along the way. And the Scripting Guys would never make fun of anyone who wants to have some fun and learn a little something about scripting.

Now, if you’re talking about people who don’t participate in the Games, well, that’s a different story ….

Author

0 comments

Discussion are closed.