February 12th, 2008

Hey, Scripting Guy! How Can I Use Windows PowerShell to Sort a CSV File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! have a comma-separated values file that includes several different fields. I’d like to be able to read this file from Windows PowerShell and then sort the information on one (or maybe more) of these fields. However, I can’t seem to get this to work. Any suggestions?

— DW

SpacerHey, Scripting Guy! AnswerScript Center

Hey, DW. Before we answer your question we have an observation that we just had to pass along. This past Saturday the Scripting Guy who writes this column was driving along making a doughnut run. (And yes, it is amazing how much of his life revolves around doughnuts, isn’t it?) As he drove past a Domino’s Pizza he noticed a big banner hanging outside the place:

Now Serving Domino’s Pizza

You read that right: as hard as it might be to believe, there’s at least one Domino’s Pizza place in the Seattle area that now serves Domino’s pizza. What will they think of next?!?

Although he was tempted to stop and ask about the banner the Scripting Guy who writes this column kept driving. (As the old saying goes, “Eat doughnuts first, ask questions later.”) Still, he couldn’t help but wonder what caused the owner of the place to hang up that banner in the first place:

“You know, Bob, I was thinking maybe we should start serving Domino’s pizza here.”

“What, are you nuts? You can’t serve Domino’s pizza at a Domino’s Pizza!”

“I don’t know, Bob. I just really think it would help business if we started serving Domino’s pizza.”

“Well, maybe. But how are people even going to know we started serving Domino’s pizza? I mean, who’s going to drive by and think, ‘Oh, there’s a Domino’s Pizza. I bet I could get some Domino’s pizza there.’”

“Hmmm, I hadn’t thought of that …. Wait, I know: we’ll hang up a big banner that says Now Serving Domino’s Pizza.”

“You know, that’s just crazy enough to work!”

Before you ask, no, we don’t know what they used to serve before they started serving Domino’s pizza at the Domino’s Pizza place. We’ll have to look into that for you.

One thing we do know, however, is that they didn’t serve up Windows PowerShell commands that can sort the contents of a CSV. (Well, OK, maybe we don’t know that for sure. But we’re willing to bet on it.) If you’re looking for information on how to use Windows PowerShell to sort a comma-separated values file, well, there’s only one place to get it.

No, not Pizza Hut; right here, in the Hey, Scripting Guy! column.

In fact, today we’re going to show you how to sort a CSV file. But not right now; after lunch. All this talk of pizza and doughnuts is making us hungry.

OK, that’s better. Let’s start out by taking a look at DW’s text file:

FirstName,LastName,Department,ScoreAlice,Ciccu,Human Resources,222Ken,Myer,Finance,151Pilar,Ackerman,Finance,514Jonathan,Haas,Administration,17Syed,Abbas,Human Resources,67Terri,Chudzik,Finance,188

As you can see, this is a relatively simple little CSV (comma-separated values) file, a file containing four fields: FirstName; LastName; Department; and Score. DW would like a script (or command) that can read the file and then sort the data by one of those four fields. Unfortunately, his first crack at the problem didn’t go so well; after issuing the command Get-Content C:\Scripts\Test.txt | Sort-Object he ended up with output that looked like this:

Alice,Ciccu,Human Resources,222FirstName,LastName,Department,ScoreJonathan,Haas,Administration,17Ken,Myer,Finance,151Pilar,Ackerman,Finance,514Syed,Abbas,Human Resources,67Terri,Chudzik,Finance,188

Interesting. But not exactly what DW had in mind.

So what went wrong here? Well, for one thing, the Get-Content cmdlet is typically not the cmdlet to use when working with a CSV file. Why not? Because Get-Content doesn’t understand that a CSV file is made up of individual fields; instead, it simply sees each line in the text file as one big, monolithic entity. If you take a look at the output DW got you’ll notice that Get-Content did read in the text file and the Sort-Object cmdlet did sort the contents of that file. The problem is that his command simply sorted the lines (including the header line) by the first character in the line. Definitely not what he had in mind.

And no, asking Sort-Object to sort on a particular field won’t help, either. When we tried the command Get-Content C:\Scripts\Test.txt | Sort-Object Department we got back this:

Jonathan,Haas,Administration,17Syed,Abbas,Human Resources,67Terri,Chudzik,Finance,188Pilar,Ackerman,Finance,514FirstName,LastName,Department,ScoreAlice,Ciccu,Human Resources,222Ken,Myer,Finance,151

What is our data being sorted on here? To tell you the truth, we have no idea. But it’s definitely not sorting our data on the Department field. That’s because Get-Content isn’t passing along any field information to Sort-Object. (Try passing the output to the Get-Member cmdlet and you’ll see what we mean.)

OK, so if Get-Content isn’t the answer then what is the answer? As it turns out, any time you’re working with a CSV file you should use the Import-CSV cmdlet. For example, take a look at the following line of code, which imports the file C:\Scripts\Test.txt:

Import-CSV C:\Scripts\Test.txt

Now look at the output we get when we run that command:

FirstName                     LastName                      Department                    Score---------                     --------                      ----------                    -----Alice                         Ciccu                         Human Resources               222Ken                           Myer                          Finance                       151Pilar                         Ackerman                      Finance                       514Jonathan                      Haas                          Administration                17Syed                          Abbas                         Human Resources               67Terri                         Chudzik                       Finance                       188

Notice that the header row is separated from the data rows, and notice that the individual fields have been placed in columns. What does that mean? That means that the Import-CSV cmdlet understands how a comma-separated values file works, and it understands that individual fields should be treated as separate and distinct properties of the text file. To better demonstrate that fact, the following command imports the contents of Test.txt and then pipes that data to the Get-Member cmdlet:

Import-CSV C:\Scripts\Test.txt | Get-Member

And here’s what Get-Member reports back:

Name        MemberType   Definition----        ----------   ----------Equals      Method       System.Boolean Equals(Object obj)GetHashCode Method       System.Int32 GetHashCode()GetType     Method       System.Type GetType()ToString    Method       System.String ToString()Department  NoteProperty System.String Department=Human ResourcesFirstName   NoteProperty System.String FirstName=AliceLastName    NoteProperty System.String LastName=CiccuScore       NoteProperty System.String Score=222

Take a peek at the last four lines in the output. When you do you’ll see that our four fields – Department, FirstName, LastName, and Score – are treated as properties of the text file. (Or, more correctly, as NoteProperties. In PowerShell, a NoteProperty is a name-value pair, like FirstName = Alice.) Import-CSV truly understands what a CSV file is all about.

So does it even matter that Import-CSV treats CVS fields as properties? You bet it does. After all, now we can specify a field name (that is, a property name) when we pipe the contents of the file to Sort-Object:

Import-CSV C:\Scripts\Test.txt | Sort-Object Department

Take a look at the output that’s going to give us:

FirstName                     LastName                      Department                    Score---------                     --------                      ----------                    -----Jonathan                      Haas                          Administration                17Terri                         Chudzik                       Finance                       188Pilar                         Ackerman                      Finance                       514Ken                           Myer                          Finance                       151Alice                         Ciccu                         Human Resources               222Syed                          Abbas                         Human Resources               67

That’s what DW had in mind all along.

Does that mean that all our problems are solved? Well, almost. However, there is still one other little matter to take care of. Suppose we issue this command, one designed to sort the CSV file on the Score field:

Import-CSV C:\Scripts\Test.txt | Sort-Object Score

What’s that going to give us? This:

FirstName                     LastName                      Department                    Score---------                     --------                      ----------                    -----Ken                           Myer                          Finance                       151Jonathan                      Haas                          Administration                17Terri                         Chudzik                       Finance                       188Alice                         Ciccu                         Human Resources               222Pilar                         Ackerman                      Finance                       514Syed                          Abbas                         Human Resources               67

Uh-oh; since when does 151 come before 17? And why does 514 come before 67?

If you’re thinking, “Hmmm, I bet PowerShell is treating those values as strings rather than numbers,” well, you’re absolutely right. In this case PowerShell is treating those values as strings. And, in the devil-may-care world of string sorting, anything that starts with 15 (like 151) comes before anything starting with 17. That’s just the way things work.

So how do we fix this problem? Here’s one way:

Import-CSV C:\Scripts\Test.txt | Sort-Object {[int] $_.Score}

In this command we aren’t really sorting by a particular property name; instead, we’re sorting by a value derived from a scriptblock. (Commands enclosed in curly braces are, for all intents and purposes, little scripts embedded in other scripts/commands.) In PowerShell, the $_ variable represents the object currently in the pipeline; that means that we can use syntax like $_.Score to represent a given property of that object. Of course, we don’t want to sort on the Score property; as we just saw, that’s not going to give us the output we expected. Instead, we’re going to use [int] to convert the value of the Score property to an integer. In turn, that’s going to cause PowerShell to sort the scores as numbers. And that will give us output like this:

FirstName                     LastName                      Department                    Score---------                     --------                      ----------                    -----Jonathan                      Haas                          Administration                17Syed                          Abbas                         Human Resources               67Ken                           Myer                          Finance                       151Terri                         Chudzik                       Finance                       188Alice                         Ciccu                         Human Resources               222Pilar                         Ackerman                      Finance                       514

Much better.

Incidentally, you can use all the other parameters of Sort-Object when sorting a CSV file. Want to sort the data by score, but in descending order? Then just add the –descending parameter, like so:

Import-CSV C:\Scripts\Test.txt | Sort-Object {[int] $_.Score} -descending

Alternatively, maybe you’d like to sort by Department and then, within each department, sort by LastName. That’s fine:

Import-CSV C:\Scripts\Test.txt | Sort-Object Department, LastName

Etc.

That should do it, DW; you should now be able to sort your CSV file any way you want. That’s about all the time we have for today, although we should take a moment to remind everyone that the 2008 Winter Scripting Games start this Friday, February 15th. As we’ve said before, we expect this year’s Scripting Games to be bigger and better than ever. Why? Well, for one thing, taking a cue from the good folks at Domino’s Pizza, we’ve decided that this year the Winter Scripting Games will feature Winter Scripting Games events! That’s right: this year’s Winter Scripting Games will be chock-full of Winter Scripting Games events.

Sure, it’s a bit of a gamble, but we like to believe that putting Winter Scripting Games events in the Winter Scripting Games is just crazy enough to work. (And we don’t think it’s any crazier than serving pizza at a pizza place.) Did the Scripting Guys finally go too far? Looks like we’ll all find out on Friday, won’t we? See you then.

Author

0 comments

Discussion are closed.

Feedback