Hey, Scripting Guy! How Can I Sort the Contents of a Text File in Numerical Order?
Hey, Scripting Guy! I have a text file in which each line begins with a number. How can I sort the contents of that file by those numbers?
Hey, KT. Tell you what: we’ll go ahead and answer your question. However, in return, you have to promise to forget everything we tell you. (Which is usually not a problem for most people.) What do you say: do we have a deal?
Admittedly, that’s kind of an odd bargain we’re trying to strike: we agree to answer someone’s question on the condition that they immediately forget the answer. Why would we try to make a deal like that? Well, to tell you the truth, our first thought upon reading this question was, “Wow. That would make a good Scripting Games event. In fact, we should use this in the 2009 Games.” The only problem, of course, is that we can’t provide the answer today, then try and use this question as a Scripting Games event tomorrow. Talk about a quandary. On the one hand, we need everyone to read today’s column; after all, if nobody reads the column then there’s not much use for the Scripting Guy who writes this column, is there?
Note. Yes, there are plenty of people – including his fellow Scripting Guys – who wonder if there’s ever any need for the Scripting Guy who writes this column.
On the other hand, we need you to forget both the question and the answer; that way we can surprise everyone by using this query as an event in the 2009 Scripting Games. (The events have already been finalized for the 2008 Games, February 15th through March 3rd right here in the Script Center.)
To make a long story short, that’s why we need everyone to – you know what? Never mind; go ahead and remember both the question and the answer if you want to. After all, the odds are pretty good that even if you upheld your end of the bargain the Scripting Guys would forget all about it and wouldn’t use it as an event in the 2009 Scripting Games anyway.
Sadly, forgetfulness is a part of growing old. For some reason, that’s something we can never remember to tell the other Scripting Guys.
Fortunately, though, we did remember to wear pants this morning (we’d rather not talk about that) and to write a script that can sort the lines in a text file based on a number that appears at the beginning of each line. As a matter of fact, we remembered to write this script:
Const ForReading = 1 Const ForWriting =2 Const adVarChar = 200 Const MaxCharacters = 255 Const adFldIsNullable = 32 Const adInteger = 3 Set DataList = CreateObject("ADOR.Recordset") DataList.Fields.Append "TextLine", adVarChar, MaxCharacters, adFldIsNullable DataList.Fields.Append "NumberLine", adInteger, , adFldIsNullable DataList.Open Set objFSO = CreateObject("Scripting.FileSystemObject") Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading) Do Until objFile.AtEndOfStream strLine = objFile.ReadLine intLength = Len(strLine) strValue = "" For i = 1 to intLength If IsNumeric(Mid(strLine, i, 1)) Then strValue = strValue & Mid(strLine, i, 1) Else Exit For End If Next DataList.AddNew DataList("TextLine") = strLine DataList("NumberLine") = strValue DataList.Update Loop objFile.Close Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting) DataList.Sort = "NumberLine ASC" DataList.MoveFirst Do Until DataList.EOF objFile.WriteLine DataList.Fields.Item("TextLine") DataList.MoveNext Loop objFile.Close
Before we go much further we should note that this is actually a slightly-more complicated script than KT originally asked for. Why did we give him more than he asked for? As most of
you know, sorting data in VBScript is never much fun; that’s because VBScript has no built-in mechanism for sorting data. On top of that, even when you can sort data you run into the
problem of whether VBScript sees your data as being composed of numbers or letters. For example, suppose we have the following values:2 BBB 1 AAA 10 DDD 3 CCC
If VBScript sees these values as being numbers, then it will sort the values like this:1 AAA 2 BBB 3 CCC 10 DDD
However, if VBScript thinks it’s dealing with string values, then we’re going to get back a list like this one:1 AAA 10 DDD 2 BBB 3 CCC
In KT’s case this isn’t a problem; that’s because the numbers in his text file only go up to 9. As he noted, he needs to sort the lines in the file “based on
the value of the first character in the line, which is always a number.” If you’re only sorting on the first character, well, there aren’t many single-digit numbers
greater than 9.
Well, OK, math fans: there aren’t many base-10 single-digit numbers greater than 9.
We have a feeling, however, that many of you have text files that include line numbers greater than 9; in that case, sorting on just the first digit of a number
like 147 doesn’t do you a lot of good. Therefore, we came up with a script that’s a tiny bit more complicated, but way more flexible: it should work in any
situation and with any numbers.
But don’t worry; if do you want a script that only works with values up to 9, well, we’ve tacked that on to the end of today’s column.
As for the fancier and slightly-more complicated script, we start out by defining a bunch of constants. The first two – ForReading and ForWriting – are needed
when we read from and write to the text file. The other four fill the following roles:
adVarChar. This constant enables us to create a recordset field that uses the Variant data type. This field will hold the actual text of each line in the text
MaxCharacters. This constant enables us to put as many as 255 characters in that Variant field.
adFldIsNullable. This constant allows us to have Null values in a field. We don’t really need it for today’s script, but it’s a handy thing to know about. (And
putting it in doesn’t hurt anything.)
adInteger. This constant lets us create a recordset field that uses the Integer data type. We’ll use this field to keep track of the numbers that appear at
the start of each line.
Our next task is to create a “disconnected recordset,” which is – for all intents and purposes – a database table that exists only in memory (that is, it’s not tied
to a physical database). We aren’t going to discuss disconnected recordsets in any detail today; if you’d like more information, you might take a look at
this section of the Microsoft Windows 2000 Scripting Guide. For now, all we’ll do is point out that the following chunk of code creates a new disconnected
recordset, adds a variant field named TextLine and an integer field named NumberLine, and then opens the recordset for use:
Set DataList = CreateObject("ADOR.Recordset") DataList.Fields.Append "TextLine", adVarChar, MaxCharacters, adFldIsNullable DataList.Fields.Append "NumberLine", adInteger, , adFldIsNullable DataList.Open
As soon as we have our recordset we can go ahead open the text file C:\Scripts\Test.txt for reading; that’s what these two lines of code are for:
Set objFSO = CreateObject("Scripting.FileSystemObject") Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)
Once the file is open, we then set up a Do loop designed to run until we’ve read every line in the file (in other words, until the file’s AtEndOfStream property
Do Until objFile.AtEndOfStream
As you might expect, all the excitement (well, most of it, anyway) takes place inside this Do loop. To begin with, we use the ReadLine method to read the first
line in the text file, storing that value in a variable named strLine. We then use VBScript’s Len function to determine the total number of characters in that
intLength = Len(strLine)
Why do we need to know that? Well, we know that the first x characters in the line are numbers; however, we have no idea what x is equal to. Therefore,
we’re going to examine each line character-by-character, not stopping until we encounter a non-numeric value. For example, suppose the first line in our text
file looks like this:
124This is a line in the text file.
We’ll grab the 1, the 2, and the 4, and won’t stop grabbing characters until we reach the T, the first non-numeric value.
So how do we do all that? Well, for starters, we assign an empty string to a variable named strValue:
strValue = ""
We then execute the following For Next loop:
For i = 1 to intLength If IsNumeric(Mid(strLine, i, 1)) Then strValue = strValue & Mid(strLine, i, 1) Else Exit For End If Next
What we’re doing here is looping from 1 to the total number of characters in the line (intLength). Inside this loop, we use the IsNumeric function to determine
whether or not the first character in the line is a number. If it is, we append that character to the variable strValue:
strValue = strValue & Mid(strLine, i, 1)
In other words, the first time through the loop, strValue will be equal to 1. We then repeat the process with the next character in the line. If this, too is a
number (and it is) we append that value to the variable strValue. That makes strValue equal to this: 12. This continues until we find a non-numeric character.
As soon as we encounter a non-numeric character we call the Exit For statement and exit the For Next loop.
So what did that do for us? That enabled us to extract the number that appears at the beginning of each line. Now that we know that number we can use this
block of code to add a new record to our recordset, a record consisting of the entire text of the line (TextLine) and the number that appears at the beginning
of the line:
DataList.AddNew DataList("TextLine") = strLine DataList("NumberLine") = strValue DataList.Update
In other words, our recordset now contains the following record:
124This is a line in the text file.
And then it’s on to the next line in the text file.
After we’ve read all the lines from the text file (and added each line to our recordset) we close the file and then immediately reopen it, this time for writing. All
it takes now is one little line of code to sort the recordset:
DataList.Sort = "NumberLine ASC"
As you can see, we’re sorting by the field NumberLine; that’s the field that contains the number that appears at the beginning of each line. We’ve also chosen
to sort the data in ascending (1 to 100) order; that’s what the ASC is for. Would you rather sort the recordset in descending order (100 to 1)? Then use this
line of code instead:
DataList.Sort = "NumberLine DESC"
All we have to do now is replace the existing contents of our text file with these new, sorted contents. To do that, we use the MoveFirst method to move to
the first record in our recordset. We then use this block of code to loop through all the items in that recordset, using the WriteLine method to add the values
of the TextLine field to the file:
Do Until DataList.EOF objFile.WriteLine DataList.Fields.Item("TextLine") DataList.MoveNext Loop
Will that really work? Well, suppose we have the following text file:14tttt 2aaaa 8ffff 6vvvv 4cccc 12xxxx 11zzzz 9qqqq
Here’s what the same file looks like after we run our script:2aaaa 4cccc 6vvvv 8ffff 9qqqq 11zzzz 12xxxx 14tttt
Well, what do you know: it did work!
That should take care of your problem, KT. In a way, though, that’s a shame: like we said, that would have been a pretty good event in the 2009 Scripting
Games. After all, it’s not impossible to solve, but it’s also not immediately obvious what the solution might be. That, of course, is what the Scripting Games are all about.
Speaking of which, have we mentioned that the 2008 Winter Scripting Games begin February 15th? You don’t want to miss it; after all, we expect this
year’s Games to be the biggest and the best competition ever.
OK, admittedly, we’ve only had two prior editions of the Scripting Games, and the first one was tossed together in about a week. Nevertheless, this year’s
Games – which includes a Perl division as well as the VBScript and Windows PowerShell divisions – should be a lot of fun: the events are a bit more challenging
(but still doable); we have some “bonus” activities planned for those of you who run through all 10 events in a single day (unlike the Scripting Guys, who
typically take 10 days to run through a single event); and we have a lot of cool prizes to give away. And remember, we have a Beginners division as well as
an Advanced division. That means there’s a little something for everyone. Even you.
OK, OK, maybe not him, and maybe not her. But there’s definitely something for you.
In other words, we’d like as many people as possible to participate in the Scripting Games. Mark your calendar now. Or, even better, wait till tomorrow; in
tomorrow’s column, we’ll give you a script that can mark your calendar for you.
Hmmm …. You know, that might be a good Scripting Games event, too ….
At any rate, that should do it for today – oh, right. As promised, here’s a simpler version of the script, one that works only with values from 0 to 9:Const ForReading = 1 Const ForWriting =2 Const adVarChar = 200 Const MaxCharacters = 255 Const adFldIsNullable = 32 Set DataList = CreateObject("ADOR.Recordset") DataList.Fields.Append "TextLine", adVarChar, MaxCharacters, adFldIsNullable DataList.Open Set objFSO = CreateObject("Scripting.FileSystemObject") Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading) Do Until objFile.AtEndOfStream strLine = objFile.ReadLine DataList.AddNew DataList("TextLine") = strLine DataList.Update Loop objFile.Close Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForWriting) DataList.Sort = "TextLine ASC" DataList.MoveFirst Do Until DataList.EOF objFile.WriteLine DataList.Fields.Item("TextLine") DataList.MoveNext Loop objFile.Close
And that should do it for today. See you all tomorrow.