{"id":55663,"date":"2008-05-01T01:21:00","date_gmt":"2008-05-01T01:21:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2008\/05\/01\/hey-scripting-guy-how-can-i-sort-a-text-file-based-on-a-specific-column-in-that-file\/"},"modified":"2008-05-01T01:21:00","modified_gmt":"2008-05-01T01:21:00","slug":"hey-scripting-guy-how-can-i-sort-a-text-file-based-on-a-specific-column-in-that-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-how-can-i-sort-a-text-file-based-on-a-specific-column-in-that-file\/","title":{"rendered":"Hey, Scripting Guy! How Can I Sort a Text File Based on a Specific Column in That File?"},"content":{"rendered":"<p><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" height=\"34\" alt=\"Hey, Scripting Guy! Question\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"> \n<P>Hey, Scripting Guy! I have a text file composed of many different lines; each line has a number of fields, with the fields separated from one another by a tab character. I need to sort this text file based on the value of one of these fields. How can I do that?<BR>&#8212; IR<\/P><IMG height=\"5\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" border=\"0\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" height=\"34\" alt=\"Hey, Scripting Guy! Answer\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" height=\"288\" alt=\"Script Center\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" align=\"right\" border=\"0\"><\/A> \n<P>Hey, IR. You know, that\u2019s an interesting question, if only because \u2013 oops, hold on a second, we just received an email with the Subject line <I>Message From the Treasury Department<\/I>. We better check this out; after all, that <I>could<\/I> mean that Scripting Guy Peter Costantini has started to write counterfeit VBScript scripts again.<\/P>\n<TABLE class=\"dataTable\" id=\"EBD\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD class=\"\">\n<P class=\"lastInCell\"><B>Note<\/B>. Actually, counterfeiting falls under the province of the United States Secret Service (which obviously isn\u2019t all-that-secret), which is merely a division of the Treasury Department. The Treasury Department itself exists primarily to act in the role of \u201cthe steward of U.S. economic and financial systems, and as an influential participant in the global economy.\u201d In addition, the Treasury Department works to \u201cpredict and prevent economic and financial crises.\u201d You know that <A href=\"http:\/\/en.wikipedia.org\/wiki\/Subprime_mortgage_crisis\" target=\"_blank\"><B>subprime mortgage<\/B><\/A> thing that\u2019s currently wreaking havoc on the US economy? Well, don\u2019t feel bad; apparently the Treasury Department didn\u2019t know about it, either.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>Anyway, please excuse us for a second while we deal with this important email. According to the message we received we need to log onto our account with the Treasury Department and verify that our personal information is correct; as it turns out, that\u2019s the only way that the Treasury Department can protect us against fraud. We better do that right away.<\/P>\n<P>And yes, we know: many of you are thinking, \u201cDon\u2019t do it, Scripting Guy who writes that column; this sounds like a scam to us.\u201d And we have to admit that, at first, we had the same thought ourselves. However, take a look at just one portion of the email:<\/P>\n<TABLE class=\"dataTable\" id=\"EUD\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD class=\"\">\n<P class=\"lastInCell\">With respect to the email automatically submitted to you from our online banking system in order to assure the security of our client, we have to inform you that the references received were not in compliance with our database system. Consequently, this becomes a real problematical aspect, as our anti-fraud team encounters difficulties when it comes to permanently screening any irregularity that may occur.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>See? Two sentences, neither of which makes a lick of sense. This <I>has<\/I> to be a message from a government agency!<\/P>\n<P>Or technical documentation from Microsoft.<\/P>\n<P>At any rate, we\u2019re trying to log onto the Treasury Department Web site right now. (We tried to call the toll-free number included in the email, but it turns out that they didn\u2019t provide us with a valid phone number. Can\u2019t even get their own phone number right? <I>Now<\/I> try telling us that this email didn\u2019t come from a government agency!) <\/P>\n<P>While we wait (the servers seem to be a bit slow his morning, probably because the Treasury Department is busy advising the President on economic and financial issues, encouraging sustainable economic growth, and fostering improved governance in financial institutions) we thought we\u2019d take a moment to answer IR\u2019s question. Might as well, eh?<\/P>\n<P>To begin with, IR has a tab-delimited text file; that is, a text file in which the individual fields are separated using tabs. Here\u2019s a version of the text file, as provided by IR himself:<\/P><PRE class=\"codeSample\">Wer     red     ring    cic     Mswf\nwin     der     jhg     ppp     Iwol\nswe     ffo     ccc     aqs     Ihte\ndde     uyt     jhg     der     Cabc\n<\/PRE>\n<P>What we\u2019re interested in is column 5, the next-to-last column in the file. (IR is actually interested in column 33 in his real text file, but we\u2019re simplifying things a bit for the purposes of this column.) What IR needs to do is sort the text file on the value found in column 5; in other words, he needs output that looks like this:<\/P><PRE class=\"codeSample\">dde     uyt     jhg     der     Cabc\nswe     ffo     ccc     aqs     Ihte\nwin     der     jhg     ppp     Iwol\nwer     red     ring    cic     Mswf\n<\/PRE>\n<P>How\u2019s he going to do that, and, by extension, how are <I>we<\/I> going to do that? Why, by running the following script, of course:<\/P><PRE class=\"codeSample\">Const adVarChar = 200\nConst MaxCharacters = 255\nConst adFldIsNullable = 32\nConst ForReading = 1<\/p>\n<p>Set DataList = CreateObject(&#8220;ADOR.Recordset&#8221;)\nDataList.Fields.Append &#8220;LineText&#8221;, adVarChar, MaxCharacters, adFldIsNullable\nDataList.Fields.Append &#8220;SortCharacter&#8221;, adVarChar, MaxCharacters, adFldIsNullable\nDataList.Open<\/p>\n<p>Set objFSO = CreateObject(&#8220;Scripting.FileSystemObject&#8221;)\nSet objFile = objFSO.OpenTextFile(&#8220;C:\\Scripts\\Test.txt&#8221;, ForReading)<\/p>\n<p>Do Until objFile.AtEndOfStream\n    strLine = objFile.ReadLine\n    arrFields = Split(strLine, vbTab)\n    strCharacter = arrFields(4)<\/p>\n<p>    DataList.AddNew\n    DataList(&#8220;LineText&#8221;) = strLine\n    DataList(&#8220;SortCharacter&#8221;) = strCharacter\n    DataList.Update\nLoop<\/p>\n<p>objFile.Close<\/p>\n<p>DataList.Sort = &#8220;SortCharacter, LineText&#8221;<\/p>\n<p>Do Until DataList.EOF\n    Wscript.Echo DataList.Fields.Item(&#8220;LineText&#8221;)\n    DataList.MoveNext\nLoop\n<\/PRE>\n<P>As most of you know by now, sorting a text file isn\u2019t exactly VBScript\u2019s strong point; that\u2019s especially true when it comes to sorting a file by the value found in the fifth column in a set of tab-separated values. Therefore, we decided to cheat a little: we aren\u2019t <I>going<\/I> to sort the text file, at least not directly. Instead, we\u2019re going to grab the contents of the file, stash that information in a disconnected recordset, and then sort the disconnected recordset. <\/P>\n<P>And yes, you <I>would<\/I> think that this is the sort of thing that the Treasury Department would take care of, wouldn\u2019t you? Somehow or another, though, this task seems to have fallen through the governmental cracks. Which means that, as usual, it\u2019s up to the Scripting Guys.<\/P>\n<TABLE class=\"dataTable\" id=\"E3E\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD class=\"\">\n<P class=\"lastInCell\"><B>Note<\/B>. Incidentally, if you aren\u2019t familiar with the term, a \u201cdisconnected recordset\u201d is simply a database table that exists only in memory; that is, the data is tied to an actual table in an actual database. We aren\u2019t going to discuss the whys and wherefores of disconnected recordsets in any detail today; if you need a little background information on the subject you might check out <A href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/guide\/sas_ent_piij.mspx\"><B>this section<\/B><\/A> of the <I>Microsoft Windows 2000 Scripting Guide<\/I>.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>As for our script, we begin by defining a bunch of constants. The first three constants are needed to add fields to our disconnected recordset; in particular:<\/P>\n<TABLE class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<TBODY>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>adVarChar<\/B> tells the script that we want these new fields to have the variant data type.<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>MaxCharacters<\/B> tells the script that the field can contain a maximum of 255 characters.<\/P><\/TD><\/TR>\n<TR>\n<TD class=\"listBullet\" vAlign=\"top\">\u2022<\/TD>\n<TD class=\"listItem\">\n<P><B>adFldIsNullable<\/B> tells the script that it\u2019s OK to have Null values in the field. We won\u2019t actually <I>have<\/I> Null values in either of our fields, but this covers us in case we run into a line of text where column 5 is blank.<\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<P>And then we have one of our old favorites, the constant ForReading, which we\u2019ll use when we open the text file C:\\Scripts\\Test.txt. Good to see you, ForReading; glad you could make it.<\/P>\n<P>After we define all the constants we create an instance of the <B>ADOR.Recordset<\/B> object, the object that enables us to construct a disconnected recordset. We then use these two lines of code to add two fields (LineText and SortCharacter) to the recordset:<\/P><PRE class=\"codeSample\">DataList.Fields.Append &#8220;LineText&#8221;, adVarChar, MaxCharacters, adFldIsNullable\nDataList.Fields.Append &#8220;SortCharacter&#8221;, adVarChar, MaxCharacters, adFldIsNullable\n<\/PRE>\n<P>Once we\u2019ve done that we use the <B>Open<\/B> method to open the recordset and prepare for data entry:<\/P><PRE class=\"codeSample\">DataList.Open\n<\/PRE>\n<P>Next we create an instance of the <B>Scripting.FileSystemObject<\/B> object, then use the <B>OpenTextFile<\/B> method to open the file Test.txt for reading:<\/P><PRE class=\"codeSample\">Set objFile = objFSO.OpenTextFile(&#8220;C:\\Scripts\\Test.txt&#8221;, ForReading)\n<\/PRE>\n<P>And then we\u2019re done. See you all tomorrow!<\/P>\n<P>Oh, wait; apparently we <I>aren\u2019t<\/I> done, at least not yet. With that in mind, we next set up a Do Until loop designed to run until we\u2019ve read all the lines in the text file (that is, until the file\u2019s <B>AtEndOfStream<\/B> property is True):<\/P><PRE class=\"codeSample\">Do Until objFile.AtEndOfStream\n<\/PRE>\n<P>What do we do inside this loop? Well, for starters, we use the <B>ReadLine<\/B> method to read the first line in the text file and store that information in a variable named strLine:<\/P><PRE class=\"codeSample\">strLine = objFile.ReadLine\n<\/PRE>\n<P>At that point we execute these two lines of code:<\/P><PRE class=\"codeSample\">arrFields = Split(strLine, vbTab)\nstrCharacter = arrFields(4)\n<\/PRE>\n<P>What are we doing here? Well, as you might recall, in addition to the production of coin and currency, the disbursement of payments to the American public, revenue collection, and the borrowing of funds necessary to run the federal government, our script also needs to sort the lines in the text file based on the value of column 5. That\u2019s fine, except for this: how in the world are we supposed to know what the value of column 5 is? <\/P>\n<P>Well, one easy way to get at that information is to use VBScript\u2019s <B>Split<\/B> method to split the line of text into an array (splitting, of course, on the tab character, which is represented by the VBScript constant <B>vbTab<\/B>). When we do that we\u2019ll get an array that looks like this:<\/P><PRE class=\"codeSample\">wer     \nred     \nring     \ncic     \nMswf\n<\/PRE>\n<P>As you can see, the last item in the array, item 4 (remember, the first item in an array is always item 0), just happens to be column 5. <I>That\u2019s<\/I> how we can determine the value of column 5, a value we stash in a variable named strCharacter.<\/P>\n<P>We now have the complete text of the line stored in the variable strLine, and the value of column 5 stored in the variable strCharacter. And because we <I>do<\/I> have both those pieces of information we can use this block of code to add the text and the value of column 5 to our database:<\/P><PRE class=\"codeSample\">DataList.AddNew\nDataList(&#8220;LineText&#8221;) = strLine\nDataList(&#8220;SortCharacter&#8221;) = strCharacter\nDataList.Update\n<\/PRE>\n<P>As you can see, we store the complete text of the line in the LineText field and the value of column 5 in the SortCharacter field. By storing these two values in separate fields it\u2019s going to be very easy to sort the lines in the text file by the value in column 5.<\/P>\n<P>As you\u2019ll find out in just a second.<\/P>\n<P>After we add the information from line 1 to the disconnected recordset we zip back to the top of the loop and repeat the process with the next line in the text file. Once we\u2019ve processed each line in the file we close Test.txt, then use the <B>Sort<\/B> method to sort these lines of text, first by column 5 (SortCharacter) and then by the actual line of text itself:<\/P><PRE class=\"codeSample\">DataList.Sort = &#8220;SortCharacter, LineText&#8221;\n<\/PRE>\n<P>After that we set up a Do Until loop designed to loop through all the records in the recordset (that is, until the recordset\u2019s <B>EOF<\/B> \u2013 end-of-file \u2013 property is True):<\/P><PRE class=\"codeSample\">Do Until DataList.EOF\n<\/PRE>\n<P>Inside that loop we simply echo back the value of the LineText property, then use the <B>MoveNext<\/B> method to move on to the next record in the recordset:<\/P><PRE class=\"codeSample\">Wscript.Echo DataList.Fields.Item(&#8220;LineText&#8221;)\nDataList.MoveNext\n<\/PRE>\n<P>And what will that give us? That will give us output that looks like this:<\/P><PRE class=\"codeSample\">dde     uyt     jhg     der     Cabc\nswe     ffo     ccc     aqs     Ihte\nwin     der     jhg     ppp     Iwol\nwer     red     ring    cic     Mswf\n<\/PRE>\n<P>In case you\u2019re wondering, that\u2019s the text from Test.txt, sorted by the value in column 5. Mission accomplished. And now we <I>are<\/I> done.<\/P>\n<P>Well, we\u2019re done with IR\u2019s problem, that is. As for the problem with the Scripting Guy\u2019s account at the Treasury Department, well, we\u2019re still working on that (must be a lot of people trying to log onto the server and make sure that they are in compliance with the database). In the meantime, the Scripting Guy who writes this column <I>did<\/I> stumble upon an <A href=\"http:\/\/www.treasurydirect.gov\/NP\/NPGateway\" target=\"_blank\"><B>interesting Web site<\/B><\/A> that lists the current national debt for the USA. In case you\u2019re wondering, as of this writing the US owes a total of $ 9,334,965,822,890.37. <\/P>\n<P>But don\u2019t worry; turns out that you can write a check (made out to the Bureau of Public Debt) and help pay down that balance of $9 trillion. Just send your check here:<\/P>\n<P>Attn Dept G<BR><BR>Bureau Of the Public Debt<BR><BR>P. O. Box 2188<BR><BR>Parkersburg, WV 26106-2188<\/P>\n<P>And that\u2019s a good question: wouldn\u2019t the Scripting Guys be doing everyone a public service by just going ahead and paying off the national debt? Yes, we would, and that\u2019s something we\u2019ll seriously consider. Granted, what with the Scripting Son starting college this fall it\u2019s not going to be easy to come up with an additional $9 trillion. But we\u2019ll see what we can do.<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I have a text file composed of many different lines; each line has a number of fields, with the fields separated from one another by a tab character. I need to sort this text file based on the value of one of these fields. How can I do that?&#8212; IR Hey, IR. [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[3,4,14,5],"class_list":["post-55663","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-scripting-guy","tag-scripting-techniques","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I have a text file composed of many different lines; each line has a number of fields, with the fields separated from one another by a tab character. I need to sort this text file based on the value of one of these fields. How can I do that?&#8212; IR Hey, IR. [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55663","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=55663"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55663\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=55663"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=55663"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=55663"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}