April 29th, 2008

Hey, Scripting Guy! How Can I Use Windows PowerShell to Determine the Number of Lines in a Text File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! I’m using Windows PowerShell to determine the record count (i.e., the number of lines) in a text file I just created. If the file has 2 lines, no problem; my script reports back a value of 2. If the file only has 1 line, however, I get back what looks to be the number of characters in the file. In other words, when I run the script I might get back 41, but I have no idea what that 41 is: 41 records or 41 characters. I suspect free radicals, sunspots, or a feature I don’t know about. What am I missing here?
— RS

SpacerHey, Scripting Guy! AnswerScript Center

Hey, RS. Needless to say, this is an interesting problem, and you have some interesting theories as to why this problem even exists in the first place. We have to admit that we were intrigued by the notion that free radicals (atomic molecules with unpaired electrons) might be responsible for PowerShell sometimes reporting back the number of lines in a file and sometimes reporting back the number of characters in a file; initially, that seemed like the answer.

However, it then occurred to us that the formation of radicals may involve the breaking of covalent bonds homolytically, a process that – as we all know – requires significant amounts of energy. On top of that, no one can argue with the fact that bond cleavage most often happens between two atoms of similar electronegativity. To the best of our knowledge PowerShell has been carefully constructed to ensure that all the atoms have dissimilar electronegativity, making free radicals a less-likely culprit.

Note. Not that free radicals are entirely off the hook, mind you. According to one theory, free radicals in the human body are responsible for many diseases, maybe even for the aging process itself. According to this theory, the unpaired electron in a free radical essentially kidnaps an electron from another molecule, thus turning that molecule into a free radical. This new free radical then goes off in search of a new electron, starting a chain reaction that eventually leads to “biological breakdown.”

To tell you the truth, the Scripting Guy who writes this column has no idea if that’s how the aging process really works. (If you need detailed, first-hand information about aging and growing old, you need to talk to the Scripting Editor.) However, he does plan to use this theory, and to cite free radicals, in his year-end performance review, especially when he gets to the question “Why did you fail to meet your commitments this past year?”

Incidentally, if any of you are short an electron or two, well, just let us know. The Scripts Guys always have some spare electrons lying around the office; heck, Scripting Guy Peter Costantini has several boxes of electrons that have never even been opened!

OK, so much for free radicals. Sunspots are something we can’t so easily dismiss, however. That’s mainly because we live in the Seattle area and, because of that, haven’t seen the sun in years. However, while sunspots have been known to cause interference with radio signals, there’s not a lot of evidence to suggest that sunspots are responsible for problems in determining how many lines are in a text file. We can’t rule it out, but it seems a little far-fetched to us.

Of course, the Scripting Guy who writes this column is still planning to blame sunspot activity for his inability to meet his commitments over the past year.

Sadly, that is the best excuse he has.

That leaves us with RS’ final thought: that this unusual behavior is feature of Windows PowerShell. Could this strange activity really be something that’s built into PowerShell? Let’s see if we can figure that out.

To begin with, suppose we have a text file consisting of the following 6 lines:

This is line 1.
This is line 2.
This is line 3.
This is line 4.
This is line 5.
This is line 6.

Let’s use a script similar to RS’ to retrieve the contents of this file, store that data in a variable named $a, then report back the Length of the variable:

$a = Get-Content C:\Scripts\Test.txt
$a.Length

What do we get back when we run this script? We get back the following:

6

So far so good, eh?

Now let’s remove the last 5 lines from the text file, leaving us with this:

This is line 1.

What do we get back when we run the script against a one-line text file? We get back this:

15

Eep. It looks like RS was right: when dealing with a one-line text file, PowerShell counts the number of characters in the file rather than the number of lines in the file. And that’s not good, not good at all.

So is this because of free radicals? Well, maybe. But before we start making wild accusations, let’s run this script instead, and let’s run it against our six-line text file:

$a = Get-Content C:\Scripts\Test.txt
$a.GetType()

What we’re doing here is using the Get-Content cmdlet to retrieve the contents of the file C:\Scripts\Test.txt, just like we did before. However, instead of echoing back the Length of the variable $a, we’re using the GetType method to gain some insight into the kind of object we’re dealing with:

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array

As you can see, we’re working with an array here, with each line in the text file representing a single item in that array. That’s why the Length property returns 6; with an array, the Length indicates the number of items in that array.

Now let’s run this script against a one-line text file. Here’s what we get back when we do that:

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     String                                   System.Object

Well what do you know? In this case we’re dealing with a string rather than an array. That’s why we get back a 15 when we ask for the Length; with a string value the Length is the number of characters in that string. When using the Get-Content cmdlet, it appears that a multi-line text file is returned as an array, while a single-line text file is returned as a string. That explains why the Length property sometimes represents the number of lines in the file, and sometimes represents the number of characters in the file: sometimes we’re dealing with an array, and sometimes we’re dealing with a string.

Admittedly, that’s pretty darn exciting, except for one thing: how does it help RS with his problem? To be honest, it doesn’t. But this should:

$a = (Get-Content C:\Scripts\Test.txt | Measure-Object)
$a.Count

So what are we doing here? Well, once again we’re using Get-Content to retrieve the contents of the file Test.txt. However, this time around we pipe that information to the Measure-Object cmdlet and ask Measure-Object to determine the number of lines in the file for us. After we’ve done that, we then echo back the value of the Count property. When we run this revised script against our six-line text file we get back the following:

6

And when we run this script against our one-line text file? We get back this:

1

Hallelujah! All we have to do is turn this problem over to Measure-Object and we’re home free.

Incidentally, this works because – by default – Measure-Object uses the line count as the default measure when working with a text file. However, we can be more specific – and get back additional information about a text file – simply by adding the –line, -word, and/or the –character parameter. For example, suppose we run this command against our six-line text file:

$a = (Get-Content C:\Scripts\Test.txt | Measure-Object -line -word -character)

Here’s what we’ll get back:

                        Lines                         Words                    Characters Property
                        -----                         -----                    ---------- --------
                            6                            24                            90

Pretty cool, huh?

That should do it, RS; if you have any questions please let us know. But please let us know before December 21, 2012; after all, according to some interpretations of the Mayan Calendar, that’s the day that the world is going to end. And why is the world supposed to end on that day? You got it: this time it is because of sunspots. According to some people, a burst of intense sunspot activity will “…flip the sun’s magnetic field, causing earthquakes and flooding on earth … and alter[ing] the endocrine production of the pineal gland.”

And yes, as a matter of fact those are the very same things the Scripting Guy who writes this column was blamed for on his last performance review. In his defense, however, it’s not like he meant to flip the sun’s magnetic field; sometimes these things just happen.

Author

0 comments

Discussion are closed.

Feedback