November 2nd, 2011

Remove Unwanted Quotation Marks from CSV Files by Using PowerShell

Doctor Scripto
Scripter

Summary: Learn how to remove unwanted quotation marks from a CSV file by using Windows PowerShell.

 

Microsoft Scripting Guy Ed Wilson here. The fall is rapidly falling down here in Charlotte, North Carolina, in the United States. It is cooling off here, and is around 60 degrees Fahrenheit (15.5 degrees Celsius, according to my conversion module). The neighbors have all been wracked with anguish over whether to tough it out for a few more days, or to go ahead and turn on the heater. It is, evidently, such a convoluted decision that they have made numerous Facebook postings about it. But I guess I would rather read about anguished decisions about to turn on the heater or not, than to read about the latest nonferrous material that their pet consumed. 

The Internet is cool. For example, I really enjoy comments that are posted on the Hey, Scripting Guy! Blog. In fact, I have subscribed to a RSS feed that alerts me any time that a new comment is made, regardless of how old the article may be. Just this morning, I replied to two comments from readers on postings that were more than five years old; that’s 35 in doggy years, and more like 50 in Internet years.

Anyway, the other day, I wrote a pretty cool article named, Use PowerShell to Append CSV files Easily.In the comments about that article, Jeffrey S. Patton from Kansas (yes Toto, he is from Kansas) posted a asked why I used Notepad to remove commas instead of programmatically replacing the commas in the output with Windows PowerShell. I answered that I did it because opening the file in Notepad, clicking Edit, and then clicking Replace was faster and easier than writing the extra code to replace the quotation marks. In a one-off scenario, that is probably true. But what if this is something that needs to be scheduled on a regular basis? If it is an automation scenario, opening the file in Notepad is off the table; as cool as Notepad is, it does not have an automation interface, so it is not a scriptable solution.

What exactly is the problem? When using the Export-CSV cmdlet, it automatically adds quotation marks around everything. In some cases, when reading this file, the quotation marks could be interpreted as part of the data, which of course would be a problem. It is rare, but it most certainly could happen. In fact, this very problem is one reason I quit using Microsoft Excel to manipulate CSV files: it has the same “feature.” This happens because if you have a space in a column, some programs are not smart enough to respect the space between commas. Therefore, to help prevent an error with a column that has a space in it, quotation marks are added everywhere. The output file produced from our command from yesterday (shown here) is shown in the figure following this code:

dir c:\fso -Filter *.csv | ? {$_.basename -like ‘users?’} | Import-Csv | sort lname,fname | Export-Csv -Path c:\fso\UsersConsolidated.csv –NoTypeInformation

Image of output file with quotation marks

Jeffrey was absolutely correct when he said I could use replace to remove the quotation marks from the output file. It is actually pretty simple.

Here are the steps I need to perform:

  1. Use the Get-Content cmdlet to read the text of the usersconsolidated.csv file.
  2. Use the Foreach-Object cmdlet (% is an alias) to read each line as it comes from the file.
  3. Inside the script block for the Foreach-Object command, use the $_ automatic variable to reference the current line and the replace operator to replace a quotation mark with nothing.
  4. Use the Out-File cmdlet to overwrite the existing usersconsolidated.csv file with the newly changed content that no longer contains the quotation marks.
  5. Use the force switched parameter to tell the Out-File cmdlet to overwrite existing content.
  6. Use the encoding parameter to specify ASCII encoding to maintain compatibility with legacy applications.

The command line I use is much simpler than the six steps above would make it seem. Here is the command:

(Get-Content C:\fso\UsersConsolidated.csv) | % {$_ -replace ‘”‘, “”} | out-file -FilePath C:\fso\UsersConsolidated.csv -Force -Encoding ascii

The preceding command is a single-line command that has wrapped. No line continuation marks are used in the command. The parentheses around the Get-Content cmdlet are required (otherwise, it creates a blank file). I can actually shorten this command by searching for additional aliases. I use the Get-Alias cmdlet to search for aliases. The cool thing is that it will accept an array for input, and therefore I can look for aliases for both Get-Content and Out-File at the same time. Here is the command:

Get-Alias -Definition get-content, out-file

The bad thing about the Get-Alias cmdlet is that it returns an error when no match appears. This behavior is shown in the following figure.

Image showing error when no match appears

The shortened version of the command uses the gc alias for the Get-Content cmdlet. It also uses positional parameters and partial parameter names.

(gc C:\fso\UsersConsolidated.csv) | % {$_ -replace ‘”‘, “”} | out-file C:\fso\UsersConsolidated.csv -Fo -En ascii

As shown in the following figure, there are no more quotation marks in the file.

Image of output file with no quotation marks

If you want to do this all in a single command, it is easier to switch to using the ConvertTo-CSV cmdlet instead of the Export-CSV cmdlet. The reason is that ConvertTo-CSV converts to CSV format, but does not export to a file. This allows time to replace the quotation marks prior to writing to file. The revised command is shown here:

dir c:\fso -Filter *.csv | ? {$_.basename -like ‘users?’} | Import-Csv |  sort lname,fname | convertto-csv -NoTypeInformation | % { $_ -replace ‘”‘, “”} | out-file c:\fso\usersconsolidated.csv -fo -en ascii

There is no alias for the ConvertTo-CSV cmdlet, but there is an alias for Import-CSV, which is ipcsv. I can also use ls as an alias for Get-ChildItem, instead of the lengthier dir alias. I can also shorten filter to fi (I cannot use the single letter f because there is also a parameter force defined for the Get-ChildItem cmdlet). Here is the shortened command:

ls c:\fso -Fi *.csv | ? {$_.basename -like ‘users?’} | ipcsv | sort lname,fname | convertto-csv -NoTypeInformation | % { $_ -replace ‘”‘, “”} | out-file c:\fso\usersconsolidated.csv -fo -en ascii

Well, that is about all there is to removing quotation marks from a CSV file. Join me tomorrow for more Windows PowerShell fun.

 

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

 

 

Author

The "Scripting Guys" is a historical title passed from scripter to scripter. The current revision has morphed into our good friend Doctor Scripto who has been with us since the very beginning.

1 comment

Discussion is closed. Login to edit/delete existing comments.

  • Byron Louise Hudkin Festin

    What if the data inside a field contains a single ‘double quotes’ ex. “Samsung”Note 9”
    or has a comma plus single ‘double quotes’ “Samsung”Note 9,Black” ?