{"id":53973,"date":"2009-04-14T23:13:00","date_gmt":"2009-04-14T23:13:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2009\/04\/14\/hey-scripting-guy-how-can-i-convert-a-tab-delimited-file-to-a-comma-separated-value-file\/"},"modified":"2009-04-14T23:13:00","modified_gmt":"2009-04-14T23:13:00","slug":"hey-scripting-guy-how-can-i-convert-a-tab-delimited-file-to-a-comma-separated-value-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-how-can-i-convert-a-tab-delimited-file-to-a-comma-separated-value-file\/","title":{"rendered":"Hey, Scripting Guy! How Can I Convert a Tab-Delimited File to a Comma-Separated Value File?"},"content":{"rendered":"<p><H2><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\"> <\/H2>\n<P>Hey, Scripting Guy! I have this tab-delimited file that was produced by a report from an old database application. All it knows how to do is make a tab-delimited file. I hate this format, and our other applications do too. I have been opening the thing in Notepad and trying to find and replace tabs with commas, but Notepad does not seem to do a good job cleaning up tab-delimited files either. I am wondering if you can do something with Windows PowerShell?<BR><BR>&#8211; OG<\/P><IMG border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\"> \n<P>Hi OG,<\/P>\n<P>It is cloudy, gray, and overcast outside. Where we grew up, when the the clock said it was noon, we had every right to expect sun. I miss dependable sun. Perhaps I could bring a few rays of figurative sunshine to your day, OG. You need to be able to convert a tab-delimited file to a comma-separated value file. To do that, we will need to use regular expressions.<\/P>\n<TABLE id=\"EXC\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD>\n<P class=\"lastInCell\">This week we are focusing on regular expressions. There are some <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/scripts\/misc\/regexp\/default.mspx?mfr=true\" target=\"_blank\">VBScript examples in the Script Center<\/A>. Here is <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/funzone\/games\/tips08\/gtip0201.mspx\" target=\"_blank\">a good introduction<\/A> from the 2008 Winter Scripting Games (by the way, in the 2009 Summer Scripting Games, I can pretty much guarantee you will need to be able to do something with regular expressions for one of the events). The Regex .NET Framework class from the System.Text.RegularExpressions namespace is <A href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/system.text.regularexpressions.regex.aspx\" target=\"_blank\">documented on MSDN<\/A>. This is one of the main classes we use in Windows PowerShell when working with regular expressions. You also will find some information about regular expressions in the Microsoft Press book, <A href=\"http:\/\/www.microsoft.com\/learning\/en\/us\/Books\/9541.aspx\" target=\"_blank\">Windows PowerShell Scripting Guide<\/A>. Here is a very good article about <A href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms974570.aspx\" target=\"_blank\">regular expression use in VBScript<\/A>. In this week&#8217;s articles, we are using Windows PowerShell for our samples. Please refer to the <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/hubs\/msh.mspx\" target=\"_blank\">Windows PowerShell Scripting Hub<\/A> for more information about this exciting new technology.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>You can also review <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/apr05\/hey0406.mspx\" target=\"_blank\">a VBScript version of this script<\/A>. Actually, OG, using Windows PowerShell I can solve your problem with a single line of&nbsp;code:<\/P><PRE class=\"codeSample\">Get-Content \u2013path C:\\fso\\tabDelimited.txt | ForEach-Object {$_ -replace &#8220;\\s&#8221;,&#8221;,&#8221; } |  \nOut-File -filepath C:\\fso\\tabDelimited.csv\n<\/PRE>\n<P>As seen in the following image, each column is separated by a tab value, which results in an even amount of white space between each column of data:<\/P><IMG border=\"0\" alt=\"Image of the columns separated by an even amount of white space\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/qanda\/hsg\/2009\/april\/hey0414\/hsg-04-14-09-01.jpg\" width=\"413\" height=\"232\"> \n<P>&nbsp;<\/P>\n<P>To read from a text file using Windows PowerShell, we use the <B>Get-Content<\/B> cmdlet and supply the path to the file. When we do this, the result is a stream of text that goes wherever we have directed it. If we simply read the file, the output will be the console as seen here:<\/P><PRE class=\"codeSample\">PS C:\\&gt; Get-Content -Path C:\\fso\\TabDelimited.txt\nhead1   head2   head3   head4\ndata1   data21  data31  data41\ndata2   data22  data32  data42\ndata3   data23  data33  data43\ndata4   data24  data34  data44\ndata5   data25  data35  data45\ndata6   data26  data36  data46\ndata7   data27  data37  data47\ndata8   data28  data38  data48\ndata9   data29  data39  data49\n<\/PRE>\n<P>Many times, displaying text output on the screen is all that interests us. At other times, however, we need to parse the information or use it in other ways. There are two ways we can further refine data we receive from <B>Get-Content<\/B>. The first way to do this is to read the content of the file, and then store the results in a variable. This is the way things were done in VBScript and is illustratedhere:<\/P><PRE class=\"codeSample\">PS C:\\&gt; $tabFileContents = Get-Content -Path C:\\fso\\TabDelimited.txt\nPS C:\\&gt; $tabFileContents\nhead1   head2   head3   head4\ndata1   data21  data31  data41\ndata2   data22  data32  data42\ndata3   data23  data33  data43\ndata4   data24  data34  data44\ndata5   data25  data35  data45\ndata6   data26  data36  data46\ndata7   data27  data37  data47\ndata8   data28  data38  data48\ndata9   data29  data39  data49\n<\/PRE>\n<P>The variable <B>$tabFileContents<\/B> contains an array of text items. We can now use standard array techniques to work with the text. We can, for example, index directly into the array and retrieve a specific line of text. This is helpful if we are interested in obtaining information that is always stored on a specific line within a text file. This is seen&nbsp;here:<\/P><PRE class=\"codeSample\">PS C:\\&gt; $tabFileContents[0]\nhead1   head2   head3   head4\n<\/PRE>\n<P>We can also use the <B>ForEach<\/B> statement and walk through the items in the <B>$tabFileContents<\/B> variable, but I generally do not do things like that unless I am working in a script. I think the code gets kind of jumbled and is hard to read. Here is what it would look&nbsp;like:<\/P><PRE class=\"codeSample\">PS C:\\&gt; $tabFileContents = Get-Content -Path C:\\fso\\TabDelimited.txt\nPS C:\\&gt; ForEach ($line in $tabFileContents) { $line }\nhead1   head2   head3   head4\ndata1   data21  data31  data41\ndata2   data22  data32  data42\ndata3   data23  data33  data43\ndata4   data24  data34  data44\ndata5   data25  data35  data45\ndata6   data26  data36  data46\ndata7   data27  data37  data47\ndata8   data28  data38  data48\ndata9   data29  data39  data49\n<\/PRE>\n<P>This could have been written on a single line by using the semicolon to separate the two logical lines of code, but it would hurt readability. One of the mistakes people make when moving from VBScript to Windows PowerShell is not taking advantage of the pipeline. With small files such as the one we are working on here, there is no performance impact. However, with larger files there can be a significant performance impact both in terms of memory utilization and speed of the operation. This is because reading a text file and storing the content into a variable is a linear process that must be completed before we can begin to work on the data. With a pipeline, the data from the file begins to stream over the pipeline nearly immediately. This causes the memory consumption to be less, but it also results in faster processing time, because the reading and processing are taking place at the same time. <\/P>\n<P>After we have obtained the content of the tab-delimited file, we pipeline the resulting text to the <B>ForEach-Object<\/B> cmdlet. The <B>ForEach-Object<\/B> cmdlet is how we are able to work with individual lines of text as they come through the pipeline. We use the <B>replace<\/B> operator to replace items that match the regular expression pattern with other text. This can be as simple as substituting one literal string for another one as seen&nbsp;here:<\/P><PRE class=\"codeSample\">PS C:\\&gt; &#8220;this is an string&#8221; -replace &#8220;an&#8221;,&#8221;a&#8221;\nthis is a string\n<\/PRE>\n<P>For an introduction to using regular expression patterns, refer to <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/apr09\/hey0413.mspx\" target=\"_blank\">yesterday\u2019s \u201cHey, Scripting Guy!\u201d article<\/A>. The pattern we are using today is simple: an <B>&#8220;\\s&#8221;<\/B> pattern to look for white space in the text file. All white space, regardless of the number of spaces, is identified by our pattern. When white space is found, it is changed to a comma. This line of code is seen here:<\/P><PRE class=\"codeSample\">ForEach-Object {$_ -replace &#8220;\\s&#8221;,&#8221;,&#8221; }<\/PRE>\n<P>After the tab values have been replaced with commas, the file is saved as a .csv file. To do this, we pipeline the results to the <B>Out-File<\/B> cmdlet and specify the file path for the new .csv file. This is seen here: <\/P><PRE class=\"codeSample\">Out-File -filepath C:\\fso\\tabDelimited.csv<\/PRE>\n<P>After the new file is created, you can open it in Notepad or Microsoft Excel, or use it as an import source for your database program. The .csv file is seen here:<\/P><IMG border=\"0\" alt=\"Image of the .csv file that is created\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/qanda\/hsg\/2009\/april\/hey0414\/hsg-04-14-09-02.jpg\" width=\"413\" height=\"232\"> \n<P>&nbsp;<\/P>\n<P>Well OG, we could have written a script. If we did, it would have looked like <B>ConvertTabFileToCSV.ps1<\/B>. As you will no doubt notice, we added a variable to hold the path, and used replace to change the file name from &#8220;.txt&#8221; to &#8220;.csv&#8221;. We then substituted variables in the body of the code. Other than that, the script and our one-liner command are the same. Here&#8217;s the script:<\/P>\n<P>body of the code. Other than that, the script and our one-liner command are the same. Here&#8217;s the script:<\/P>\n<P><B>ConvertTabFileToCSV.ps1<\/B><\/P><PRE class=\"codeSample\">$path = &#8220;C:\\fso\\tabDelimited.txt&#8221;\n$outPath = $path -replace &#8220;.txt&#8221;,&#8221;.csv&#8221;\nGet-Content -path $path | \nForEach-Object {$_ -replace &#8220;\\s&#8221;,&#8221;,&#8221; } |  \nOut-File -filepath $outPath\n<\/PRE>\n<P>OG,&nbsp;I hope you have enjoyed our discussion about replacing text. In the end, we did write a script, didn&#8217;t we? But we did not have to write the script to accomplish our goals. This is one of the cool things about Windows PowerShell. I believe a famous person once said something like this: &#8220;To script or not to script? That is the question.&#8221; In the end we could not make up our mind, and we did both. Hope you will join us tomorrow as Regular Expressions Week continues. Until then, peace.<\/P>\n<P>&nbsp;<\/P>\n<P><B>Ed Wilson and Craig Liebendorfer, Scripting Guys<\/B><\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I have this tab-delimited file that was produced by a report from an old database application. All it knows how to do is make a tab-delimited file. I hate this format, and our other applications do too. I have been opening the thing in Notepad and trying to find and replace tabs [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[169,174,3,4,45],"class_list":["post-53973","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-csv-and-other-delimited-files","tag-regular-expressions","tag-scripting-guy","tag-scripting-techniques","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I have this tab-delimited file that was produced by a report from an old database application. All it knows how to do is make a tab-delimited file. I hate this format, and our other applications do too. I have been opening the thing in Notepad and trying to find and replace tabs [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/53973","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=53973"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/53973\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=53973"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=53973"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=53973"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}