April 6th, 2005

How Can I Convert a Comma-Separated Values File to a Tab-Separated Values File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I convert a comma-separated values file to a tab-separated values file?

— RV

SpacerHey, Scripting Guy! AnswerScript Center

Hey, RV. Depending on the nature of your comma-separated values (CSV) file this is either very easy or a little bit tricky. Let’s follow the Scripting Guys Philosophy of Life and look at the easy one first.

Note. Of course, in our real Philosophy of Life we’d take the easy one first and then hope that there wouldn’t be time to do the hard one as well. Today, however, we’ll make an exception and tackle both the easy one and the hard one.

Let’s assume we have a very simple CSV file, one that looks like this:

a,b,c,d,e,f
g,h,i,j,k,l
m,n,o,p,q,r
s,t,u,v,w,x
y,z

How can we convert this to a tab-separated values file (TSV)? Like we said, this is pretty easy. We’ll begin by opening the file and reading the entire contents into a variable. Once the file contents are in memory we’ll replace all the commas with tabs and then re-save the file. Just like magic we’ll have turned a CSV file into a TSV file. Here’s the code that does the trick:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject(“Scripting.FileSystemObject”)

Set objFile = objFSO.OpenTextFile(“C:\scripts\test.txt”, ForReading) Do Until objFile.AtEndOfStream strLine = objFile.ReadLine strLine = Replace(strLine, “,”, vbTab) strNewText = strNewText & strLine & vbCrLF Loop

objFile.Close

Set objFile = objFSO.OpenTextFile(“C:\scripts\test.txt”, ForWriting) objFile.WriteLine strNewText

objFile.Close

We begin by defining two constants – ForReading and ForWriting – that we use when opening and then re-opening our file (C:\Scripts\Test.txt). And, yes, we said re-opening: when using the FileSystemObject to work with text files you can open a file for reading or you can open a file for writing; unfortunately, you can’t do both at the same time. Because of that we’ll need to open the file, read the contents into memory, close the file, and then re-open that same file for writing in order to save the modified data.

After defining our constants we create an instance of the FileSystemObject and use the OpenTextFile method to open the file for reading. We then create a Do Loop in which we’ll read in the existing CSV file and then construct our new TSV file.

How do we do that? We begin by using the ReadLine method to read the first line of the file and store it in a variable named strLine. Next we use the VBScript Replace method to replace all the commas in the variable with tab characters; that’s what happens here:

strLine = Replace(strLine, “,”, vbTab)

Note that we use the VBScript constant vbTab to indicate the tab character; had we used something like “tab” we would have replaced all the commas with the word tab. Not exactly what we had in mind. Likewise, don’t try typing a set of double quotes, setting the cursor inside those double quotes and pressing the TAB key (“ “). That just won’t work.

After we replace the commas with tabs we can start putting together our revised file. In this line of code we assign the variable strNewText the current value of strNewText (which, the first time through the loop, will be nothing) plus the new, tab-delimited strLine plus a carriage-return linefeed (vbCrLf):

strNewText = strNewText & strLine & vbCrLF

The first time through the loop strNewText will look like this, with tabs replacing the commas:

a     b     c     d     e     f

We then loop around and repeat the process with line 2. By the time we’ve finished reading in all the lines of the file the variable strNewText will hold this value:

a     b     c     d     e     f
g     h     i     j     k     l
m     n     o     p     q     r
s     t     u     v     w     x
y     z

As you can see, we’ve removed all the commas and replaced them with tabs. All we have to do now is close the file and then re-open it for writing. With the file re-opened we can use the WriteLine method to replace the existing contents of the file with the value of strNewText. And that’s it: our CSV file is now a TSV file. And we did it all PDQ ASAP.

See, that was pretty easy, wasn’t it? Now, what about that harder scenario? Well, suppose we have a CSV file that looks like this:

“a”,”b,c,d,e”,”f”
“g”,”h”,”i”
“j”,”k”,”l,m,n,o,p,q,r,s,t,u,v,w,x,y,z”

This is still a CSV file; the difference is that individual items are enclosed in double quote marks. That’s because some of the items include commas themselves. For example, the first line in the file is actually composed of these three items:

a     b,c,d,e     f

Why do we care about that? Well, suppose we use our previous script and simply replace all the commas with tabs. If we do that, our first line will be divvied up like this:

“a”     “b     c     d     e”     “f”

Oops; that’s not even close to being right. The problem (aside from the double quotes) lies with the commas found inside the double quote marks; those commas should not be converted to tabs. How are we going to deal with that?

Well, to be honest, the best way to deal with this issue would be to use regular expressions. However, explaining how regular expressions work lies way beyond the scope of this column; to learn more about that we encourage you to view the Scripting Guys webcast on the subject. In the meantime, we’ll show you a solution that, while not 100% foolproof, should work in most situations:

Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject(“Scripting.FileSystemObject”)

Set objFile = objFSO.OpenTextFile(“C:\scripts\test.txt”, ForReading) Do Until objFile.AtEndOfStream strLine = objFile.ReadLine strReplacer = Chr(34) & “,” & Chr(34) strLine = Replace(strLine, strReplacer, vbTab) strLine = Replace(strLine, chr(34), “”) strNewText = strNewText & strLine & vbCrLF Loop

objFile.Close

Set objFile = objFSO.OpenTextFile(“C:\scripts\test.txt”, ForWriting) objFile.WriteLine strNewText

objFile.Close

The difference between this script and the first one we showed you occurs when we go about replacing commas in a line. We now do that using these three lines of code:

strReplacer = Chr(34) & “,” & Chr(34)
strLine = Replace(strLine, strReplacer, vbTab)
strLine = Replace(strLine, chr(34), “”)

In the first line we’re assigning the value Chr(34) & “,” & Chr(34) to a variable named strReplacer; that means strReplacer will now be equal to this:

“,”

What was the point of that? Well, as we’ve already seen, we can’t search for and replace all commas. Therefore, what we’re going to do in line 2 is search for double quotes followed by a comma followed by another set of double quotes. Sounds crazy but if you take a look at line 1 in our text file you’ll see that this is actually the pattern used to separate items:

“a”,”b,c,d,e”,”f”

In other words, replacing all instances of “,” will yield this:

“a     b,c,d,e     f”

As you can see, that’s pretty close to what we want; we just need to do something about those extraneous double quotes. That’s what we do with our third line of code, which replaces any remaining double quote marks with nothing:

strLine = Replace(strLine, chr(34), “”)

The net result? Just what the doctor ordered:

a     b,c,d,e     f

Again, not a foolproof method, but it will work in most cases. If you find that it doesn’t work in your case, then view the regular expressions webcast.

Author

0 comments

Discussion are closed.

Feedback