Hey, Scripting Guy! How can I parse a tab-delimited file, and then save the resulting information as a comma-separated values file?
— DM
Hey, DM. You know, yesterday morning the Scripting Guy who writes this column had to scrape the ice off his windshield; today it’s much warmer – 44 at last report – although it is raining. Of course, you know what that means: the Seattle-area baseball season is in full swing!
We mention that simply because it provides a sneaky way for us to also mention that the Script Son took the mound for his high school team yesterday afternoon. The results? Five shutout innings, one hit (on a checked-swing), no walks, eight strikeouts. Not bad for his first outing of the season, although it’s a shame that he gave up that one hit; after all, that’s one more hit than the Scripting Dad gave up during his entire high school career.
Note. What’s that? How many times did the Scripting Dad actually pitch during his high school career? Well, that’s hard to say; after all, that was a long time ago, and pitching records from that era are spotty at best. Off the top of our heads, we’d have to say the number of times he pitched was definitely greater than – or, at least, equal to – zero. |
But, hey, this is supposed to be about the Scripting Son. Let’s let him have his moment and not concern ourselves with the Scripting Dad.
At any rate, it’s always a treat to watch your son mow down the opposition. He blew his fast ball right by several batters, and completely froze several more with his curve.
Although, come to think of it, maybe the curveball didn’t freeze the hitters as much as the weather did. The Scripting Dad, for example, was wearing long underwear, a long-sleeved shirt, a hooded sweatshirt, a coat, and a pair of gloves. If that doesn’t scream “baseball season” we don’t know what does.
Of course, in years past the start of baseball season meant that the Scripting Guy who writes this column disappears and isn’t heard from again until August; that’s because he finds baseball far more interesting than work. (Hard to believe, but he does.) Right now, however, he has to admit that he finds work far warmer than he finds baseball. Which means that, rather than disappearing, he’s going to sit in his nice warm office and see if he can figure out how to parse a tab-delimited file, and then save the resulting information as a comma-separated values (CSV) file.
At least until it’s time to leave work early and head for the next game.
To begin with, DM has a text file that looks something like this, with fields separated by tabs:
Cre Rec Name=Jack Address=5 XYZ Drive Phone=555-4567 Cre Rec Name=Jill Address=7 XYZ Drive Phone=555-6547 Cre Rec Name=Jake Address=9 XYZ Drive Phone=555-9876
And sure, that is a very nice text file. But it’s nowhere near as nice as this CSV file, which is the file DM would like to have:
Name,Address,Phone “Jack”,”5 XYZ Drive”,”555-4567″ “Jill”,”7 XYZ Drive”,”555-6547″ “Jake”,”9 XYZ Drive”,”555-9876″
The question for today is this: how can DM get from point A (the tab-delimited file) to point B (the CSV file)? Here’s how:
Const ForReading = 1Set objFSO = CreateObject(“Scripting.FileSystemObject”) Set objFile = objFSO.OpenTextFile(“C:\Scripts\Test.txt”, ForReading)
Do Until objFile.AtEndOfStream strLine = objFile.ReadLine arrFields = Split(strLine, vbTab) arrName = Split(arrFields(1), “=”) strName = arrName(1) arrAddress = Split(arrFields(2), “=”) strAddress = arrAddress(1) arrPhone = Split(arrFields(3), “=”) strPhone = arrPhone(1) strNewContent = strNewContent & Chr(34) & strName & Chr(34) & “,” & Chr(34) & strAddress & Chr(34) & _ “,” & Chr(34) & strPhone & Chr(34) & vbCrLf Loop
objFile.Close
Set objFile = objFSO.CreateTextFile(“C:\Scripts\Test.csv”)
objFile.WriteLine “Name,Address,Phone” objFile.Write strNewContent
objFile.Close
As you can see, we start out by defining a constant named ForReading and setting the value to 1; we’ll use this constant when we set out to open our tab-delimited file. We then use these two lines of code to create an instance of the Scripting.FileSystemObject and to open the file C:\Scripts\Test.txt:
Set objFSO = CreateObject(“Scripting.FileSystemObject”) Set objFile = objFSO.OpenTextFile(“C:\Scripts\Test.txt”, ForReading)
So now what do we do?
Oh, right: that’s our job to figure that out, isn’t it? OK, well, here’s one thing we can do: we can set up a Do Until loop that reads the tab-delimited file line-by-line. In other words, a Do Until loop that runs until the file’s AtEndOfStream property is True:
Do Until objFile.AtEndOfStream
Inside that loop we use this line of code to read the first line in the file and store it in a variable named strLine:
strLine = objFile.ReadLine
What does that mean? That means that strLine is equal to this:
Cre RecName=JackAddress=5 XYZ DrivePhone=555-4567
Of course, that doesn’t do us much good; somehow we need to tease out the individual values for Name, Address, and Phone. Is that too much to ask for? Of course it is. But let’s give it a try anyway.
To begin with, we use the Split function within the following line of code to split the value of strLine into an array:
arrFields = Split(strLine, vbTab)
By splitting the value on the tab character (using the VBScript constant vbTab) we end up with an array named arrFields that consists of the following items:
• |
Cre Rec |
• |
Name=Jack |
• |
Address=5 XYZ Drive |
• |
Phone=555-4567 |
OK, now we’re getting somewhere. As you can see, the user’s name (Jack) is part of the second item in the array. (And, remember, because the first item in an array has an index number 0 that means the second item in the array has an index number of 1). The only problem, of course, is that the name is prefaced with Name=. So why don’t we just get rid of that prefix:
arrName = Split(arrFields(1), “=”) strName = arrName(1)
See what we’re doing here? In the first line, we’re again using the Split function to create an array; this time we’re splitting the value of array item 1 (Name=Jack) on the equals sign (=). What does that give us? That gives us a little two-item array named arrName, an array that looks like this:
• |
Name |
• |
Jack |
Well, what do you know: the second item in our mini-array (index number 1) just happens to be the user name. That means we can grab the name simply by assigning the value of array item 1 to the variable strName, which is what we do in our second line of code.
Get the idea? We then repeat the process with index numbers 2 (address) and 3 (phone number), like so:
arrAddress = Split(arrFields(2), “=”) strAddress = arrAddress(1) arrPhone = Split(arrFields(3), “=”) strPhone = arrPhone(1)
We now have three variables – strName, strAddress, and strPhone – that contain information parsed from the text file. That means that our next step is to begin constructing the CSV file. That’s what this monstrosity is for:
strNewContent = strNewContent & Chr(34) & strName & Chr(34) & “,” & Chr(34) & strAddress & Chr(34) & _ “,” & Chr(34) & strPhone & Chr(34) & vbCrLf
Don’t let this line of code deter you: its bark is far worse than its bite. All we’re doing here is assigning a value to a variable named strNewContent. What value are we assigning to strNewContent? Well, we’re assigning the existing value of the variable plus the following:
• |
Double quotes (that’s what the Chr(34) is for). |
• |
The value of the variable strName. |
• |
More double quotes. |
• |
A comma (“,”). |
• |
Double quotes. |
• |
The value of the variable strAddress. |
• |
Double quotes. |
• |
Another comma. |
• |
Double quotes. |
• |
The value of the variable strPhone. |
• |
Double quotes. |
• |
A carriage return-linefeed character (vbCrLf). |
Why such a complicated construction? What we’re trying to do is guard against any items that might include a comma. For example, suppose a user has the name Ken Myer, Jr. If we simply write that name as-is to a CSV file, the CSV file will assume that we have two values: Ken Myer and Jr. That’s because the comma is used as the delimiter. In turn, a simple thing like that can cause our CSV file (and any scripts/applications that need to read that file) to go completely haywire. To prevent that from happening we enclose each value in double quotes, like so:
“Ken Myer, Jr.”
In a CSV file, items enclosed in double quotes are treated as a single value, even if the value includes commas.
See? We’re doing this for your own good!
After that we then loop around and repeat the process with the next line in the tab-delimited file.
Once we’ve read and processed the entire file we’re ready to create the CVS file. To do that we first close the file Test.txt, then use this line of code to create a new text file named C:\Scripts\Test.csv:
Set objFile = objFSO.CreateTextFile(“C:\Scripts\Test.csv”)
Because DM needed a header line the first thing we do to our new file is use this line of code to write a header to the thing:
objFile.WriteLine “Name,Address,Phone”
And then all we have to do is use the Write method to write the value of strNewContent to the file. (And then use the Close method to close Test.csv.) At that point we – just like the hitters that had to go up against the Scripting Son – are finished.
Hope that helps. DM. It’s a little complicated, but not if you take a deep breath and work your way through it. And we know that a lot of people are faced with similar tasks. We hope this helps all of you as well.
Meanwhile, we – what’s that? Oh, right: you noticed that small patch of blue sky off in the distance. If the weather’s improving, does that mean that baseball season is already over?
No, far from it: if the weather is improving that simply means that the Scripting Son doesn’t have a game tonight. But don’t worry: the forecast for tomorrow’s game includes a temperature of 49 degrees and rain. Now that’s baseball weather.
0 comments