June 7th, 2005

How Can I Tally Items in a Text File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I tally items in a text file?

— JA

SpacerHey, Scripting Guy! AnswerScript Center

Hey, JA. If we understand you correctly you have a text file that consists of a number of lines, each line representing a particular item. Just to keep things simple, let’s say your text file consists of city names:

City
Seattle
Seattle
Los Angeles
Chicago
Seattle
Los Angeles
Pittsburgh

What you’re looking for is a script that can read through the file and report the number of occurrences for each item; in other words, a script that can return a report similar to this:

Pittsburgh 1
Chicago 1
Los Angeles 2
Seattle 3

There are several different ways you can do this, but we decided to show you a very powerful and flexible method for dealing with text files as databases (which is essentially what you want to do). It’s not well-known, but you can actually use ADO (ActiveX Data Objects) to run database queries against a text file. And that’s exactly what we do with this script:

On Error Resume Next

Const adOpenStatic = 3 Const adLockOptimistic = 3 Const adCmdText = &H0001

Set objConnection = CreateObject(“ADODB.Connection”) Set objRecordset = CreateObject(“ADODB.Recordset”)

strPathtoTextFile = “C:\Scripts”

objConnection.Open “Provider=Microsoft.Jet.OLEDB.4.0;” & _ “Data Source=” & strPathtoTextFile & “;” & _ “Extended Properties=””text;HDR=Yes;FMT=Delimited”””

strFile = “City_Names.txt”

objRecordset.Open “Select City, Count(City) AS CountOfCity FROM ” & strFile & _ ” GROUP BY City”, objConnection, adOpenStatic, adLockOptimistic, adCmdText

Do Until objRecordset.EOF Wscript.Echo objRecordset.Fields.Item(“City”), objRecordset.Fields.Item(“CountOfCity”) objRecordset.MoveNext Loop

As you might expect, most of this script is devoted to creating ADO Connection and Recordset objects and then connecting to the text file as though it was a database. We won’t discuss those steps in any detail today; if you’re interested you might take a look at this Scripting Clinic column on the topic. What we’ll focus on today is the query we use to retrieve data from the text file, as well as the lines of code that display the returned information.

Let’s start with the SQL query we use to retrieve data:

objRecordset.Open “Select City, Count(City) AS CountOfCity FROM ” & strFile & _
    ” GROUP BY City”, objConnection, adOpenStatic, adLockOptimistic, adCmdText

At first glance you might be confused by this query. As you probably recall, our text file consists of only a single field: each line in the file contains a city name and nothing else. And yet, even though we have only one field in our “database,” our SQL query specifies two separate items: City and Count(City) AS CountOfCity. What gives?

Actually we aren’t selecting two different fields; instead we’re selecting one field (City) and one “calculated field” we can use to tally up the number of times each city appears in the database. (A calculated field is a field that we calculate based on data found in other fields; the field itself does not actually exist in the database.) The item Count(City) AS CountOfCity (in conjunction with the clause GROUP BY City) can be read like this: Count up the number of times each city appears in the file and then store that number in a calculated field named CountOfCity.

In other words, the query will generate a recordset that looks like this, with Pittsburgh and Chicago as instances of the City field and 1 and 1 instances of the CountOfCity calculated field:

Pittsburgh 1
Chicago 1

The nice thing is that we get back this recordset with the city totals calculated for us. Alternatively we could have opened the text file, read each line, and used an array or a Dictionary object to keep track of the number of times each city name appeared in the file. But ADO makes this process far easier.

You might have noticed this as well: instead of selecting from a database table (because we have neither a database nor any tables), we specify the name of our text file in our SQL query. Thus, we select data from the variable strFile, which just happens to represent the name of our text file (City_Names.txt).

Once we get a recordset back we simply use this code to loop through all the records, echoing back the name of the city and the number of times each city appears in the file:

Do Until objRecordset.EOF
    Wscript.Echo objRecordset.Fields.Item(“City”), objRecordset.Fields.Item(“CountOfCity”)
    objRecordset.MoveNext
Loop

As you can see, we do that by echoing the values of two items: City and CountOfCity. Even though CountOfCity is a calculated field, it’s treated as if it was a real field actually stored in the database. All in all, a very easy way to retrieve information (as opposed to raw data) from a text file

Author

0 comments

Discussion are closed.

Feedback