{"id":55513,"date":"2008-05-22T01:46:00","date_gmt":"2008-05-22T01:46:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2008\/05\/22\/hey-scripting-guy-how-can-i-remove-delimiters-and-fix-apostrophes-in-a-csv-file\/"},"modified":"2008-05-22T01:46:00","modified_gmt":"2008-05-22T01:46:00","slug":"hey-scripting-guy-how-can-i-remove-delimiters-and-fix-apostrophes-in-a-csv-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-how-can-i-remove-delimiters-and-fix-apostrophes-in-a-csv-file\/","title":{"rendered":"Hey, Scripting Guy! How Can I Remove Delimiters, and Fix Apostrophes, in a CSV File?"},"content":{"rendered":"<p><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\" \/> <\/p>\n<p>Hey, Scripting Guy! I have a comma-separated values file (CSV) that uses double quote marks as the field delimiter; unfortunately, this file also uses a double quote mark any time it should use an apostrophe (like in the name O\u2019Malleys). Somehow I need to remove all the delimiters <i>and<\/i> convert the appropriate double quote marks to an apostrophe. How can I do that?<br \/>&#8212; MP<\/p>\n<p><img decoding=\"async\" border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\" \/><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\" \/><a href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><img decoding=\"async\" class=\"farGraphic\" title=\"Script Center\" border=\"0\" alt=\"Script Center\" align=\"right\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" height=\"288\" \/><\/a> <\/p>\n<p>Hey, MP. Well, the Scripting Guys can finally breathe a sigh of relief: things are back to normal today. Last Thursday afternoon something very strange and totally unexpected occurred: the clouds parted, and the sun actually came out. Even more bizarre, the temperature actually went <i>up<\/i> rather than down. (Which was good; after all, the Scripting Guy who writes this column had pretty much decided he\u2019d been ripped off and been sold a one-way thermometer instead of one in which the temperature could go up as well as down.)<\/p>\n<p>Thursday afternoon was nice: the temperature actually crept above 70 degrees Fahrenheit. Friday and Saturday were even better: temperatures were in the 80s and, depending on your exact location, even topped 90 degrees. Sunday was a little bit cooler, but still nice, at least for the better part of the day. By Sunday night, however, the rain had returned and, as of Monday afternoon (when this column was written) the sky was once again gloomy and overcast, and the temperature a rousing 58 degrees. And not a moment too soon. Temperatures in the 80s? No human being could possibly withstand prolonged exposure to warm, sunny weather.<\/p>\n<p>Actually the Scripting Guys were perfectly content with the warm sunshine. That wasn\u2019t necessarily true of our fellow Seattleites, however. When the Scripting Guy who writes this column drove home Friday afternoon he didn\u2019t pass a single car with the windows open; instead, everyone had the windows tightly rolled-up and the air conditioner blasting away. (And yes, people in Seattle <i>do<\/i> buy air conditioners. And we turn them on any time the temperature gets above 50 degrees.)<\/p>\n<p>In other words, two or three times a year.<\/p>\n<p>In addition to that, a few minutes ago this same Scripting Guy heard two of his colleagues standing in the hallway discussing the weather. \u201cIt was just way too hot on Saturday,\u201d said the one Microsoft employee. \u201cYou\u2019re telling me,\u201d said the other. \u201cI hope that\u2019s it for hot weather this year.\u201d<\/p>\n<p>So do we. Because it\u2019s always best to get summer over with by May 20<sup>th<\/sup>, if at all possible.<\/p>\n<p>At any rate, today the weather is back to Seattle gray, and everyone is breathing a sigh of relief. Well, everyone except the Scripting Guys, that is. The Scripting Guys both spent most of the past weekend outdoors; in fact, the Scripting Guy who writes this column even managed to get a little color into his skin. (OK, true: he already <i>had<\/i> a little color in his skin, assuming you count Pasty Pale White as a color. The Scripting Editor <i>never<\/i> gets ay color into her skin, because she uses sun block. Which <i>does<\/i> seem a bit redundant when you live in Seattle, doesn\u2019t it?)<\/p>\n<p>Needless to say, though, neither of the Scripting Guys will be spending much time outside today. OK, granted, the Scripting Guy who writes this column might, if Microsoft finally reaches the breaking point and tosses him out on his rear end. But, barring that, he\u2019ll be inside all day today.<\/p>\n<p>Which means that he might as well try to write a script that can remove delimiters from a text file and, just for good measure, substitute apostrophes for incorrectly-place double quote marks as well. Before we do that, however, let\u2019s take a peek at the CSV file that MP sent along:<\/p>\n<pre class=\"codeSample\">\"1\",\"HOUSE\",\"JOE J\",\"SMITH\",\"01\/01\/2005\"\n\"2\",\"BUSINESS\",\"BAKERY\",\"SMITH2\",\"02\/01\/2005\"\n\"3\",\"BUSINESS\",\"O\"MALLEYS TIRE AND AUTO\",\"SMITH3\",\"03\/01\/2005\"\n\"4\",\"HOUSE\",\"JASON\",\"SMITH4\",\"04\/01\/2005\"\n<\/pre>\n<p>As you can see, this is actually a fairly standard little CSV file, with individual fields enclosed in double quote marks and separated by commas. Line 1, for example, contains the following field values:<\/p>\n<table border=\"0\" cellSpacing=\"0\" cellPadding=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>&#8220;1&#8221;<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>&#8220;HOUSE&#8221;<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>&#8220;JOE J&#8221;<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>&#8220;SMITH&#8221;<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>&#8220;01\/01\/2005&#8221;<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>There is one bit of goofiness, however: this particular file (or, more correctly, the application that generates it) uses a double quote mark instead of an apostrophe. Take a peek at the third field in line 3:<\/p>\n<pre class=\"codeSample\">\"O\"MALLEYS TIRE AND AUTO\"\n<\/pre>\n<p>Needless to say, that\u2019s <i>supposed<\/i> to look like this:<\/p>\n<pre class=\"codeSample\">\"O'MALLEYS TIRE AND AUTO\"\n<\/pre>\n<p>In the end, that means there are two things we need to do. First, we need to get rid of the double quote marks; MP has no use for these delimiters. Second, we need to substitute an apostrophe (a single quote mark) as needed. That, by the way, is the sticking point. If all we had to do was get rid of the double quote marks that would be easy: we could simply replace each double quote mark with nothing. But we can\u2019t do that; if we did, then O\u2019Malleys Tire and Auto would look like this:<\/p>\n<pre class=\"codeSample\">OMALLEYS TIRE AND AUTO\n<\/pre>\n<p>That\u2019s not what we want, not what we want at all.<\/p>\n<p>So how are we going to tackle these two problems? To begin with, we might note that MP sent along a sample script that <i>almost<\/i> did the trick; that script gave him output that looked like this:<\/p>\n<pre class=\"codeSample\">\"1,HOUSE,JOE J,SMITH,01\/01\/2005\" \n\"2,BUSINESS,BAKERY,SMITH2,02\/01\/2005\" \n\"3,BUSINESS,O\"MALLEYS TIRE AND AUTO,SMITH3,03\/01\/2005\" \n\"4,HOUSE,JASON,SMITH4,04\/01\/2005\"\n<\/pre>\n<p>What MP did was search for each instance of a comma enclosed in double quote marks (<b>&#8220;,&#8221;<\/b>) and replace each instance with just a comma. Like we said, that <i>almost<\/i> worked; unfortunately, though, it left a double quote mark at the beginning and at the end of each line, and it did nothing to address the apostrophe issue. With any luck, however, the following script should take care of all of MP\u2019s problems:<\/p>\n<pre class=\"codeSample\">Const ForReading = 1\nConst ForWriting = 2\n\nSet objFSO = CreateObject(\"Scripting.FileSystemObject\")\nSet objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForReading)\n\nDo Until objFile.AtEndOfStream\n    strLine = objFile.ReadLine\n    intLength = Len(strLine)\n    strLine = Mid(strLine, 2, intLength - 2)\n    strText = Chr(34) &amp; \",\" &amp; Chr(34)\n    strLine = Replace(strLine, strText, \",\")\n    strLine = Replace(strLine, Chr(34), \"'\")\n    strNewText = strNewText &amp; strLine &amp; vbCrLf\nLoop\n\nobjFile.Close\n\nSet objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForWriting)\n\nobjFile.Write strNewText\nobjFile.Close\n<\/pre>\n<p>OK, <i>now<\/i> we\u2019re ready to launch into a discussion of the code and how it works. To begin with, we create a pair of constants, ForReading and ForWriting; we\u2019ll need these constants in order to open our CSV file in the correct mode (that is, either for reading or for writing). After defining the constants we create an instance of the <b>Scripting.FileSystemObject<\/b>, then use the <b>OpenTextFile<\/b> method to open the file C:\\Scripts\\Test.txt for reading:<\/p>\n<pre class=\"codeSample\">Set objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForReading)\n<\/pre>\n<p>After the file is open we set up a Do Until loop that will enable us to read the file, line-by-line, until there aren\u2019t any lines left to read. (Or, to put a more technical spin on it, until the file\u2019s <b>AtEndOfStream<\/b> property is True.) Inside that loop the first thing we do is use the <b>ReadLine<\/b> method to read the first line in the file and store it in a variable named strLine:<\/p>\n<pre class=\"codeSample\">strLine = objFile.ReadLine\n<\/pre>\n<p>And yes, now the fun begins. We kick off the festivities by using the <b>Len<\/b> function to determine the number of characters in the variable strLine. Why do we do that? Well, as you recall, one of the problems MP ran into was this: his search-and-replace operation left a double quote mark at the beginning and at the end of each line. One easy way to get around that problem is to simply grab all the characters in each line <i>except<\/i> for the beginning and ending character. In the case of line 1, that gives us this following value:<\/p>\n<pre class=\"codeSample\">1\",\"HOUSE\",\"JOE J\",\"SMITH\",\"01\/01\/2005\n<\/pre>\n<p>What good does that do us? Actually, it does us quite a bit of good; with the two endpoints gone we <i>can<\/i> replace each instance of <b>&#8220;,&#8221;<\/b> with a comma. In turn, that makes line 1 equal to this:<\/p>\n<pre class=\"codeSample\">1,HOUSE,JOE J,SMITH,01\/01\/2005\n<\/pre>\n<p>Which, of course, is just exactly what we <i>want<\/i> it to be equal to.<\/p>\n<p>But more on that in just a moment. Instead, let\u2019s try doing first things first, and explain how we get to that point. In order to extract all the characters except the first and last characters from strLine we need to know the total number of characters in that string. Once we know that value we can use the <b>Mid<\/b> function to grab all the characters in strLine except the first and last characters:<\/p>\n<pre class=\"codeSample\">strLine = Mid(strLine, 2, intLength - 2)\n<\/pre>\n<p>There\u2019s really nothing fancy going on here: we\u2019re simply asking Mid to start at character position 2 and then grab <i>x<\/i> number of characters (with x being equal to the length of the string minus 2). By starting at character position 2 we skip over the first character (the beginning double quote mark) altogether. And by grabbing the next <i>x<\/i> characters (with x, again, being equal to the length of the string minus 2) that lets us grab everything else except for the very last character.<\/p>\n<p>For example, suppose strLine happened to be a string six characters long:<\/p>\n<pre class=\"codeSample\">\"Test\"\n<\/pre>\n<p>If we start at character 2 we\u2019ll start on the <i>T<\/i> in <i>Test<\/i>. And suppose we grab the total number of characters minus 2? Well, the total number of characters is 6, and 6 minus 2 is 4. If we start on the <i>T<\/i> and take a total of 4 characters that results in us grabbing the following value:<\/p>\n<pre class=\"codeSample\">Test\n<\/pre>\n<p>Pretty clever, if we do say so ourselves.<\/p>\n<table id=\"EMH\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td>\n<p class=\"lastInCell\"><b>Note<\/b>. And we <i>do<\/i> say so ourselves, because we assume that no one else will.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"dataTableBottomMargin\"><\/div>\n<p>Our next chore is to replace each instance of a comma enclosed in double quote marks with just a comma. In order to do that, we first run this line of code:<\/p>\n<pre class=\"codeSample\">strText = Chr(34) &amp; \",\" &amp; Chr(34)\n<\/pre>\n<p>All we\u2019re doing here is defining the text that we want to search for: a double quote mark (<b>Chr(34)<\/b>) followed by a comma followed by another double quote mark. After we\u2019ve defined the search text we can replace each instance of that text with a plain old comma by using this line of code:<\/p>\n<pre class=\"codeSample\">strLine = Replace(strLine, strText, \",\")\n<\/pre>\n<p>That\u2019s great, but it still leaves us with a problem. As-is, our script will correctly remove all the delimiters from lines 1 and 2; it will even remove all the <i>delimiters<\/i> from lines 3 and 4 as well. When we reach line 3, however, we run smack-dab into the apostrophe issue:<\/p>\n<pre class=\"codeSample\">O\"MALLEYS TIRE AND AUTO\n<\/pre>\n<p>Great. What do we do about <i>that<\/i>?<\/p>\n<p>Well, the one thing that we <i>shouldn\u2019t<\/i> do is panic. At this point in time we\u2019ve eliminated the double quote marks that appear at the beginning and at the end of each line; we\u2019ve also removed all the double quote marks that were used as field delimiters. If there\u2019s a double quote mark left in the string (and there is) then this is undoubtedly a double quote mark that\u2019s supposed to be an apostrophe. And if that\u2019s the case (and it is) we can substitute an apostrophe for the double quote mark by using this line of code:<\/p>\n<pre class=\"codeSample\">strLine = Replace(strLine, Chr(34), \"'\")\n<\/pre>\n<p>Once we\u2019ve done that we tack the new value of strLine plus a carriage return-linefeed character (<b>vbCrLf<\/b>) onto the end of a string variable named strNewText:<\/p>\n<pre class=\"codeSample\">strNewText = strNewText &amp; strLine &amp; vbCrLf\n<\/pre>\n<p>And then it\u2019s back to the top of the loop, where we repeat the process with the next line in the text file. After we\u2019ve processed each and every line in the file we close the CSV file, then immediately reopen it, this time for writing:<\/p>\n<pre class=\"codeSample\">Set objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForWriting)\n<\/pre>\n<p>From here we use the <b>Write<\/b> method to replace the existing contents of Test.txt with the value of strNewText. And once we\u2019ve done <i>that<\/i> we simply call the <b>Close<\/b> method to close the file for good.<\/p>\n<p>That should do it, MP; let us know if it doesn\u2019t. In the meantime, here\u2019s the current weather forecast for the Seattle area:<\/p>\n<p><img decoding=\"async\" border=\"0\" alt=\"Spacer\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/qanda\/weather.jpg\" width=\"222\" height=\"97\" \/> <\/p>\n<p>Springtime in Seattle \u2026 it just doesn\u2019t get any better than this, does it?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I have a comma-separated values file (CSV) that uses double quote marks as the field delimiter; unfortunately, this file also uses a double quote mark any time it should use an apostrophe (like in the name O\u2019Malleys). Somehow I need to remove all the delimiters and convert the appropriate double quote marks [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[169,3,4,5],"class_list":["post-55513","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-csv-and-other-delimited-files","tag-scripting-guy","tag-scripting-techniques","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I have a comma-separated values file (CSV) that uses double quote marks as the field delimiter; unfortunately, this file also uses a double quote mark any time it should use an apostrophe (like in the name O\u2019Malleys). Somehow I need to remove all the delimiters and convert the appropriate double quote marks [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55513","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=55513"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55513\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=55513"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=55513"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=55513"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}