November 6th, 2006

How Can I Extract the Text Between the Header and Footer in a Text File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I extract the text between the header and footer in a text file?

— LL

SpacerHey, Scripting Guy! AnswerScript Center

Hey, LL. Before we answer your question we should point out that, depending on when you’re reading this, you might be missing Day 1 of Windows PowerShell Week. Even as we speak the lovely and talented Jean Ross could be presenting an introduction to Windows PowerShell, Microsoft’s new command shell/scripting technology.

Well, OK: right now the talented Jean Ross could be presenting an introduction to Windows PowerShell, Microsoft’s new command shell/scripting technology.

Fine: right now Jean Ross could be presenting an introduction to Windows PowerShell, Microsoft’s new command shell/scripting technology.

Is that better?

But don’t fret if you somehow missed today’s presentation; all the webcasts will be available in archive form within the next 2 or 3 days. Besides, we still have four more webcasts to go; for example, tomorrow the lovely and talented Dean Tsaltas will explain the whys and wherefores of Windows PowerShell Cmdlets.

….

Sorry. We expected somebody to protest the fact that we referred to Dean as being both lovely and talented, but everyone seems fine with that description. Good news, Dean: the scripting world loves you!

Of course, some of you – namely you, LL – might not care about any of that; you just want to know how to extract the text between the header and footer of a text file. Well, then why didn’t you say so:

Const ForReading = 1
x = 0
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)
Do Until objFile.AtEndOfStream
    strLine = objFile.ReadLine
    If Left(strLine, 4) = "****" Then
        x = x + 1
    End If
    If x = 2 Then
        Exit Do
    End If
    If Left(strLine, 4) <> "****" And x = 1 Then
        strText = strText & strLine & vbCrLf
    End If
Loop
Wscript.Echo strText

A note of explanation before we begin. The text file that LL referred to uses a line of asterisks to mark the beginning and end of the file header. In other words, the text file looks something like this:

******************************************************************
This is line 1 of the header.
This is line 2 of the header.
******************************************************************
This is text that we don't care about.
This, too, is text we don’t care about.
As is this.

Our job is to pull out just the two lines wedged between the asterisks. You know, these two lines:

This is line 1 of the header.
This is line 2 of the header.

Will our script be able to pull off such a feat? Let’s find out.

As you can see, we start out by defining a constant named ForReading and setting the value to 1; we’ll need this constant when we open our text file. We then set the value of a mysterious variable named x to 0. What do we need this variable for? You’ll find out soon enough.

What do you mean that isn’t soon enough? We understand how exciting this is, but try to be patient. Remember: good things come to those who wait.

Our next step is to create an instance of the Scripting.FileSystemObject, then use the OpenTextFile method to open the file C:\Scripts\Test.txt for reading. That’s what we do with these two lines of code:

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\Scripts\Test.txt", ForReading)

After the file has been opened we set up a Do Until loop that runs until we reach the end of the file (or, to make it sound like we know what we’re talking about, until the AtEndOfStream property is True). The first thing we do inside that loop is use the ReadLine method to read the first line of the text file and then store that value in a variable named strLine:

strLine = objFile.ReadLine

Most likely the first line in the text file is a row of asterisks that marks the beginning of the header. However, we don’t know that for sure. Therefore, we use this line of code to verify that the first four characters in the line are, indeed, asterisks:

If Left(strLine, 4) = "****" Then

Note. We’re guessing that you don’t have any other lines in the text file that begin with four asterisks. If you do, then you’ll have to adjust the preceding line of code to take that into account (for example, checking to see if the first 10 or the first 20 or the first whatever characters are all asterisks).

Let’s assume that the first four characters of the line are all asterisks. (If they aren’t, well, no big deal; the script is designed to handle that.) If we have encountered a line of asterisks that means that we’ve hit the line that signifies the beginning of the header. To help us mark this momentous occasion we increment the value of x by 1. In other words, when x equals 1 we know that we’ve found the header.

That brings us to this block of code:

If x = 2 Then
    Exit Do
End If

What’s this for? Well, when we reach the line in the text file that marks the beginning of the header we increment the value of x by 1. And guess what? We do the very same thing when we reach the line in the text file that marks the end of the header. (Why? Because that line also begins with four asterisks.) That means when we reach the end of the header x will be equal to 2; logically then, if x is equal to 2 we must have reached the end of the header. And because we have reached the end of the header then there’s no point in going on; therefore, we use the Exit Do statement to exit the Do Until loop.

Make sense? The variable x is equal to 0 until we hit the line that marks the beginning of the header. At that point x gets set to 1 and remains at 1 until we reach the line marking the end of the header. The value of x will then get upped to 2, which is a signal to the script that it’s time to exit the loop.

Of course, the first time through the loop x won’t be equal to 2; most likely it will be equal to 1. That brings us to this block of code:

If Left(strLine, 4) <> "****" And x = 1 Then
    strText = strText & strLine & vbCrLf
End If

Here we’re checking two conditions: that x is equal to 1 (which means we’re dealing with the header) and that the first four characters in the line are not equal to ****. If both of these conditions are true then we’re actually dealing with the header text, the very text we’re trying to extract. With that in mind we grab the current line (strLine) and append that value plus a carriage return-linefeed character to a variable named strText. We then loop around and repeat the process with the next line in the text file. When we’re all done, we echo back the value of strText.

Admittedly, it’s a tad bit complicated, but it works. In fact, here’s what we get back when we run the script:

This is line 1 of the header.
This is line 2 of the header.

Like we hinted at, this script works even if the header doesn’t appear right at the very beginning of the document. To see what we mean, try running the script against the following text file and see what happens:

This line we don't care about.
******************************************************************
This is line 1 of the header.
This is line 2 of the header.
******************************************************************
This is text that we don't care about.
This, too, is text we don’t care about.
As is this.

And now, if you’ll excuse us, we need to get back to the Windows PowerShell Week webcasts. Could this be the day that the lovely and talented Peter Costantini breaks into song? You’ll find out soon enough.

Author

0 comments

Discussion are closed.