{"id":55673,"date":"2008-04-30T01:21:00","date_gmt":"2008-04-30T01:21:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2008\/04\/30\/hey-scripting-guy-how-can-i-read-a-text-file-and-extract-all-the-text-enclosed-in-double-quote-marks\/"},"modified":"2008-04-30T01:21:00","modified_gmt":"2008-04-30T01:21:00","slug":"hey-scripting-guy-how-can-i-read-a-text-file-and-extract-all-the-text-enclosed-in-double-quote-marks","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-how-can-i-read-a-text-file-and-extract-all-the-text-enclosed-in-double-quote-marks\/","title":{"rendered":"Hey, Scripting Guy! How Can I Read a Text File and Extract All the Text Enclosed in Double Quote Marks?"},"content":{"rendered":"<p><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" height=\"34\" alt=\"Hey, Scripting Guy! Question\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\" \/> <\/p>\n<p>Hey, Scripting Guy! I\u2019m trying to write a script that can extract all the values found between a set of double quote marks. I know how to read the text file, and I know how to output any information that I find. However, I can\u2019t figure out how to extract all the characters between a set of double quote marks. Can you help?<br \/>&#8212; TM<\/p>\n<p><img decoding=\"async\" height=\"5\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" border=\"0\" \/><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" height=\"34\" alt=\"Hey, Scripting Guy! Answer\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\" \/><a href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><img decoding=\"async\" class=\"farGraphic\" title=\"Script Center\" height=\"288\" alt=\"Script Center\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" align=\"right\" border=\"0\" \/><\/a> <\/p>\n<p>Hey, TM. Before we get started today we need to update everyone on the status of the Dr. Scripto bobblehead dolls that were given out as part of the <a href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/funzone\/games\/games08.mspx\"><b>2008 Winter Scripting Games<\/b><\/a>: unfortunately, there <i>is<\/i> no status to update. Without making any excuses (well, other than the excuses we\u2019re about to make) this past month has been an extremely \u2026 interesting \u2026 one for the Scripting Guys. That\u2019s due, in part, to: 1) carry-over from the Scripting Games; 2) a never-ending series of meetings regarding potential changes to TechNet, changes that could dramatically affect the Script Center and how we do our work; and, 3) the need to start putting together an instructor-led lab for <a href=\"http:\/\/www.microsoft.com\/events\/teched2008\/itpro\/default.mspx\" target=\"_blank\"><b>TechEd 2008<\/b><\/a>. <\/p>\n<table class=\"dataTable\" id=\"EKD\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td class=\"\">\n<p class=\"lastInCell\"><b>Note<\/b>. So how are we doing on that lab, which has to be completed in the next 10 days or so? Well, so far we have a single slide labeled <i>Agenda<\/i>. All we have to do now is actually come up with an agenda, and we\u2019ll be able to finish off that first slide in no time. So at least things are looking up when it comes TechEd 2008.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"dataTableBottomMargin\"><\/div>\n<p>Anyway, because of all that the Scripting Guy who writes this column agreed to something that he typically would <i>not<\/i> agree to: a few weeks ago he agreed to let someone else mail out the bobbleheads for us.<\/p>\n<p>Ah, good question: why <i>doesn\u2019t<\/i> the Scripting Guy who writes this column typically agree to let people help out? (After all, if anyone could use some help it\u2019s him.) Let\u2019s put it this way: in the past three weeks, how many bobbleheads do you suppose have been sent out? That\u2019s right: zero. And how many shipping boxes have been ordered and received, boxes that are needed before the bobbleheads can be packed up and shipped out? Right again: zero. And how many \u2013 well, you get the idea. We\u2019ve sent out hundreds of Certificates of Excellence, we\u2019ve sent out 50 copies of Windows Vista, we\u2019ve sent out T-shirts and books and assorted software. But how many bobbleheads have been sent out so far? Like we said: zero. <\/p>\n<p>Not to mention zilch, nada, and zip. As well as nil, naught, aught, and a big goose egg. And \u2013 well, you probably get the idea here, too.<\/p>\n<p>Just to make things even <i>more<\/i> interesting, the Scripting Guy who writes this column also recently discovered that a <i>ton<\/i> of the email he sent out over the past few weeks never got delivered, most likely because his email account was moved to three different servers during that period. When it rains, it pours.<\/p>\n<p>Speaking of which, it\u2019s also pouring down rain right now. And yes, as a matter if fact the Scripting Son <i>does<\/i> have a baseball game this afternoon. This is what life is like when you\u2019re a Scripting Guy.<\/p>\n<p>Anyway, we apologize for the delay, and we are going to try to take care of this as quickly as we can. (Which means we\u2019re going to send out the bobbleheads ourselves, just like we should have done in the first place.) It\u2019s still going to be a week or more before the first bobbleheads go out; after all, we don\u2019t even have any boxes to pack the things in yet. But we\u2019ll start getting bobbleheads shipped out as quickly as we can. Promise.<\/p>\n<p>Fortunately, there <i>is<\/i> one thing that the Scripting Guys would never outsource, something which we always take care to do ourselves: eat lunch. Oh, wait; there\u2019s a second thing, too: we <i>always<\/i> write our own scripts that can extract all the text found between double quote marks in a file. Let\u2019s explain that scenario in a little more detail, then show you how we attacked the problem.<\/p>\n<p>And yes, we really <i>will<\/i> show you how we attacked this problem. And with any luck we\u2019ll show you today, not three or four weeks from today.<\/p>\n<p>TM has a text file that looks something like this (we simplified his file a little to make sure it would fit across the page without needing any line breaks):<\/p>\n<pre class=\"codeSample\">192.168.112.88 \"CN=Ken Myer,CN=Users,DC=fabrikam,DC=com\" 141 \"15\/Apr\/2008\" v5 connect 13552\n192.168.112.89 \"CN=Pilar Ackerman,CN=Users,DC=fabrikam,DC=com\" 142 \"16\/Apr\/2008\" v5 connect 13631\n192.168.112.90 \"CN=Jonathan Haas,CN=Users,DC=fabrikam,DC=com\" 143 \"17\/Apr\/2008\" v5 connect 13987\n<\/pre>\n<p>What TM needs to do is search through this file and find all the text that\u2019s contained between double quote marks. In the first line of the file, that\u2019s going to be the following two pieces of information:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>CN=Ken Myer,CN=Users,DC=fabrikam,DC=com<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>15\/Apr\/2008<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>That\u2019s nice, but how do we actually <i>get<\/i> this information? To tell you the truth, our first thought was to use a regular expression; unfortunately, though, this problems calls for a fairly tricky regular expression. Why? Well, if you look closely at line 1 you\u2019ll see that \u2013 as far as the regular expression is concerned \u2013 we\u2019re likely to have <i>four<\/i> items that are enclosed in double quote marks:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>\u201cCN=Ken Myer,CN=Users,DC=fabrikam,DC=com\u201d<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>\u201c15\/Apr\/2008\u201d<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>&#8220;CN=Ken Myer,CN=Users,DC=fabrikam,DC=com&#8221; 141 &#8220;15\/Apr\/2008&#8221;<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>&#8221; 141 &#8220;<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>And yes, <i>we<\/i> know that\u2019s not the way it\u2019s supposed to work, but the computer doesn\u2019t know that, at least not without us giving the machine a hand by writing a pretty complicated little regular expression. (How complicated? Like we said, pretty complicated. See <a href=\"http:\/\/haacked.com\/archive\/2004\/10\/25\/usingregularexpressionstomatchhtml.aspx\" target=\"_blank\"><b>this Web page<\/b><\/a> for a sample regular expression that retrieves the text found between HTML tags, a challenge very similar to our task, which requires us to retrieve the text found between double quote marks.)<\/p>\n<p>In other words, our first thought didn\u2019t pan out. And that was a problem; after all, what are the odds that the Scripting Guys would have a <i>second<\/i> thought? But hey, stranger things have happened, right? (Not that we can think of any, mind you, but we\u2019re sure they\u2019ve happened.) Here\u2019s the Scripting Guys\u2019 Plan B, an approach that is much simpler than the regular expression we\u2019d have to write (and, as a bonus, actually does what it\u2019s supposed to do):<\/p>\n<pre class=\"codeSample\">Const ForReading = 1\n\nSet objFSO = CreateObject(\"Scripting.FileSystemObject\")\nSet objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForReading)\n\nDo Until objFile.AtEndOfStream\n    strText = \"\"\n    strCharacter = objFile.Read(1)\n    If strCharacter = Chr(34) Then\n        Do Until objFile.AtEndOfStream\n           strNewCharacter = objFile.Read(1)\n           If strNewCharacter = Chr(34) Then\n               Exit Do\n           End If\n           If strNewCharacter &lt;&gt; \"\" Then\n               strText = strText &amp; strNewCharacter\n           End If\n        Loop\n        Wscript.Echo strText\n    End If\nLoop\n\nobjFile.Close\n<\/pre>\n<p>As you can see, we kick things off by defining a constant named ForReading and setting the value to 1; we\u2019ll need to use this constant when we open our text file for reading. After defining the constant we create an instance of the <b>Scripting.FileSystemObject<\/b>, then use the following line of code to open the file C:\\Scripts\\Test.txt for \u2013 that\u2019s right, for reading:<\/p>\n<pre class=\"codeSample\">Set objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForReading)\n<\/pre>\n<p>Now the fun begins. What we\u2019re going to do is parse the text file character-by-character. (Why are we going to parse the text file character-by-character? That should become clear in just a moment.) In order to parse the file we set up a Do Until loop that runs until we\u2019ve read every last character in the file. (Or, if you\u2019re a stickler for technical accuracy, until the file\u2019s <b>AtEndOfStream<\/b> property is True.) Inside this loop, we set the value of a variable named strText to an empty string (\u201c\u201d), then use the <b>Read<\/b> method to read a single character from the text file, storing that value in a variable named strCharacter:<\/p>\n<pre class=\"codeSample\">strCharacter = objFile.Read(1)\n<\/pre>\n<p>Our next step is to determine whether or not this character is a double quote mark; that\u2019s something we can do by checking to see if the character has an ASCII value equal to 34, the value assigned to the double quote mark:<\/p>\n<pre class=\"codeSample\">If strCharacter = Chr(34) Then\n<\/pre>\n<p>Suppose the character <i>isn\u2019t<\/i> a double quote mark. That\u2019s no big deal; in that case we simply go back to the top of the loop and use the Read method to read the next character in the text file. But suppose that character <i>is<\/i> a double quote mark; what then? We\u2019re glad you asked that question.<\/p>\n<p>If it turns out that we do have a double quote mark that\u2019s the \u201csignal\u201d that we need to start grabbing text; after all, our job is to grab all the text found inside a pair of double quote marks, and we just found the first of the two quote marks that make up a pair. With that in mind, the next thing we do is set up a second Do Until loop, this one also designed to run until we reach the end of the file.<\/p>\n<p>Of course you might be thinking, \u201cBut if it runs all the way to the end of the file won\u2019t that mess up our script?\u201d And you\u2019re right: if that happened it <i>would<\/i> mess up our script. But don\u2019t panic; we\u2019ll make sure that won\u2019t happen.<\/p>\n<p>Promise. <\/p>\n<p>Inside this second loop we use the Read method to read the next character in the text file and store it in a variable named strNewCharacter:<\/p>\n<pre class=\"codeSample\">strNewCharacter = objFile.Read(1)\n<\/pre>\n<p>No sooner do we grab hold of that character than we check to see if that character happens to be a double quote mark:<\/p>\n<pre class=\"codeSample\">If strNewCharacter = Chr(34) Then\n<\/pre>\n<p>What if this character <i>is<\/i> a double quote mark? In that case, we\u2019ve found the second half of our pair and we need to exit the inner loop, something we do by calling the <b>Exit Do<\/b> statement:<\/p>\n<pre class=\"codeSample\">Exit Do\n<\/pre>\n<p>But what happens if the character <i>isn\u2019t<\/i> a double quote mark? Well, that means that this is a character we want to hang onto; after all, it\u2019s a piece of text that\u2019s nestled between two double quote marks. With that in mind, we tack the character onto the end of the variable strText:<\/p>\n<pre class=\"codeSample\">strText = strText &amp; strNewCharacter\n<\/pre>\n<p>And then we go back to the top of the inner loop and repeat the process with the next character in the text file.<\/p>\n<p>What does all that mean? Well, when we start reading Test.txt the first character we encounter is a 1; consequently, we ignore this character and try again with the next character in the text file: 9. Because this second character isn\u2019t a double quote mark we skip it as well, and then we try, try again. This process continues until, at long last, we hit a double quote mark.<\/p>\n<p>As soon as we hit that double quote mark we drop into our second Do Until loop. In that loop we read in the next character in the text file: C. Is C a double quote mark? Not as far as we know. Therefore, we add this character to the variable strText. We then loop around and read the next character in the text file: N. This character also gets tacked onto the end of strText. Eventually, strText will be equal to this:<\/p>\n<pre class=\"codeSample\">CN=Ken Myer,CN=Users,DC=fabrikam,DC=com\n<\/pre>\n<p>As it turns out, the next character following the <i>m<\/i> in com <i>is<\/i> a double quote mark. Because of that we don\u2019t append that character to strText; instead, we drop out of the inner loop and echo back the value of strText (assuming that strText actually <i>has<\/i> a value, that is):<\/p>\n<pre class=\"codeSample\">If strText &lt;&gt; \"\" Then\n    Wscript.Echo strText\nEnd If\n<\/pre>\n<p>And what happens after that? You got it: we go back to the top of the original loop, reset the value of strText to an empty string, then begin searching for the next pair of double quotes. By the time the script is finished we should see the following information echoed back to the screen:<\/p>\n<pre class=\"codeSample\">CN=Ken Myer,CN=Users,DC=fabrikam,DC=com\n15\/Apr\/2008\nCN=Pilar Ackerman,CN=Users,DC=fabrikam,DC=com\n16\/Apr\/2008\nCN=Jonathan Haas,CN=Users,DC=fabrikam,DC=com\n17\/Apr\/2008\n<\/pre>\n<p>And guess what? That just happens to be the all the information that was contained between the double quote marks. Success!<\/p>\n<p>That should do it, TM; let us know if you have any additional questions about this. Again, we apologize to everyone for the delay in getting the bobbleheads sent out; we kind of dropped the ball on this one, but we\u2019ll get things squared away as quickly as we can. On the bright side \u2013 well, never mind. After all, the Scripting Guys are technical writers at Microsoft; it\u2019s been so long since we\u2019ve seen the bright side we probably wouldn\u2019t recognize it anymore anyway.<\/p>\n<p>And did we mention the fact that it\u2019s pouring down rain again?<\/p>\n<p>See you all tomorrow.<\/p>\n<table class=\"dataTable\" id=\"EUAAC\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td class=\"\">\n<p><b>Ed<\/b><b>itor\u2019s Note.<\/b> Update: The boxes have arrived! Does that mean the bobbleheads are on the way? Well, not just yet. We still need to get everything over to the people who promised to ship them for us. That\u2019s assuming the Scripting Editor can keep the Scripting Guy who writes this column from stopping all his work (you know, little things like writing this column every day and creating a lab for TechEd), packing each box himself, and hand-delivering them to each of the 250 winners (since he doesn\u2019t actually trust anyone involved in the shipping process to help either).<\/p>\n<p>Oh, and the sun is out now.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I\u2019m trying to write a script that can extract all the values found between a set of double quote marks. I know how to read the text file, and I know how to output any information that I find. However, I can\u2019t figure out how to extract all the characters between a [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[3,4,14,5],"class_list":["post-55673","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-scripting-guy","tag-scripting-techniques","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I\u2019m trying to write a script that can extract all the values found between a set of double quote marks. I know how to read the text file, and I know how to output any information that I find. However, I can\u2019t figure out how to extract all the characters between a [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55673","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=55673"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55673\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=55673"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=55673"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=55673"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}