May 30th, 2008

Hey, Scripting Guy! How Can I Search For a Specific Style in a Folder Full of Microsoft Word Documents?

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a folder with scores of Word documents in it. What I need to do is open each document, copy anything that uses the Heading 1 style, and then write that information (the file name and anything that uses the Heading 1 style) to a single text file. Is there a way to do this using a script?
— DN

SpacerHey, Scripting Guy! AnswerScript Center

Hey, DN. You know, there has been a lot of speculation lately about Windows 7, the next version of Windows. Microsoft has been uncharacteristically tight-lipped when it comes to Windows 7, which means that, all over the world, people have been asking the same question, “Just what features are going to be included in Windows 7?” Well, at the risk of losing our jobs the Scripting Guys have decided to lift the veil on Windows 7. Here, for the first time ever, are the key features you can expect to see in Windows 7:

Grilled cheese sandwiches. Yes, Windows 7 will be able to make you a grilled cheese sandwich on demand. To tell you the truth, we aren’t sure why Microsoft hasn’t made a bigger deal about this; after, just try to make a grilled cheese sandwich on a Macintosh. (Go on, we dare you!) We suspect that this is due to the fact that, in its initial release, you’ll only be able to make a sandwich that uses Monterey Jack cheese; cheddar, American, and provolone probably won’t be available until Service Pack 1. As for Colby or Muenster, well, all we can is this: the research goes on.

Legal disclaimer. Please don’t try to make a grilled cheese sandwich on your Macintosh. And no, we didn’t dare you to try and make a grilled cheese sandwich on your Macintosh; where did you get that idea?

Oh, right.

500 built-in television channels. All the hardware will be built into the box, and all the channels will be free. And at any given time 492 of them will be showing the same rerun of Everybody Loves Raymond.

And no, that wasn’t an oversight; there’s a reason why we didn’t say that the channels would be commercial-free.

Which reminds us: do you need a little something to wash down that grilled cheese sandwich? The Scripting Guys suggest Fabrikam Cola. Fabrikam: the soft drink that’s almost a real cola drink.

Touch screen capabilities. Admittedly, this is still somewhat up in the air. The touch screen technology is present in the early builds of the product, but every time you touch the screen the computer goes, “Whoa. What was that? Are you touching me? Are you touching me?” Needless to say, there are still a few bugs to work out.

And that’s not all, far from it. There is also an unsubstantiated rumor going around the Microsoft campus that Windows 7 will include a script that lets you open each Microsoft Word document in a folder, copy any text that uses the Heading 1 style, and then writes that information to a text file. And you’re right, that would be a pretty compelling reason to upgrade to Windows 7, wouldn’t it? But guess what? You don’t have to wait until Windows 7 to get a script like that; after all the Script Center is offering that very same capability today. Literally:

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder("C:\Scripts")

Set objWord = CreateObject("Word.Application")
objWord.Visible = True

For Each objFile in objFolder.Files
    strFilePath = objFile.Path 
    strExtension = objFSO.GetExtensionName(strFilePath)

    If strExtension = "doc" Then
        strText = strText & strFilePath & vbCrLf
        Set objDoc = objWord.Documents.Open(strFilePath)
        Set objSelection = objWord.Selection

        objSelection.Find.Forward = True
        objSelection.Find.Format = True
        objSelection.Find.Style = "Heading 1"

        Do While True
            objSelection.Find.Execute
            If objSelection.Find.Found Then
                strText = strText & objSelection.Text & vbCrLf
            Else
                Exit Do
            End If
        Loop

        objDoc.Close
        strText = strText & vbCrLf
    End If
Next

objWord.Quit

Set objTextFile = objFSO.CreateTextFile("C:\Scripts\Test.txt")
objTextFile.Write strText
objTextFile.Close

As you can see, we start off by creating an instance of the Scripting.FileSystemObject, the COM object we’ll use to retrieve a collection of all the files in the folder C:\Scripts. Once we create this object we promptly put it to use, using the FileSystemObject and the GetFolder method to bind us to the Scripts folder:

Set objFolder = objFSO.GetFolder("C:\Scripts")

Granted, that’s nowhere near as exciting as an operating system that can make grilled cheese sandwiches. But be patient; we aren’t done yet.

At this point we take a slight detour in our code, using these two lines of code to create an instance of the Word.Application object and then make that instance of Word visible onscreen:

Set objWord = CreateObject("Word.Application")
objWord.Visible = True

In case you’re wondering, you don’t have to make Word visible onscreen; the script works just fine if you have Word run in an invisible window. However, we like to make Word visible during our initial testing, and for two reasons: 1) it helps us verify what – if anything – is actually going on when we run the script; and, 2) if anything does go wrong we can quickly and easily close each instance of Word. That way we don’t run the risk of having a dozen or so “orphaned” copies of Word running in hidden windows and using up system resources.

Next we set up a For Each loop to loop through all the files in the Scripts folder:

For Each objFile in objFolder.Files

And that’s a good point: we don’t have much use for all the files in C:\Scripts, do we? In fact, all we care about are the Microsoft Word documents, the .DOC files. So how do we separate the wheat (all the .DOC files) from the chaff (everything that isn’t a .DOC file)? Well, these two lines of code help:

strFilePath = objFile.Path 
strExtension = objFSO.GetExtensionName(strFilePath)

In the first line we’re grabbing the value of the Path property and storing it in a variable named strFilePath; as you might expect, the Path property contains the complete path to the file in question. In line 2, we use the GetExtensionName property to extract just the file extension (not including the dot) from that file path. Once we’ve done that we can use this line of code to determine whether or not we are working with a .DOC file:

If strExtension = "doc" Then

And what if we aren’t working with a .DOC file? That’s fine; in that case we simply go back to the top of the loop and repeat the process with the next file in the collection. On the other hand, if we are working with a .DOC file our next step is to grab that file path and append it (and a carriage return-linefeed character, represented by the VBScript constant vbCrLf) to a variable named strText:

strText = strText & strFilePath & vbCrLf

From there we use these two lines of code to open the .DOC and create an instance of Word’s Selection object:

Set objDoc = objWord.Documents.Open(strFilePath)
Set objSelection = objWord.Selection

Why do we need an instance of Word’s Selection object? Well, as it turns out, the Find object (the object use to, well, find things) happens to be a child object of the Selection object. In a minute or so we’re going to use the Find object to search for all the text that uses the Heading 1 style. Before we do that, however, we need to assign values to a few properties of the Find object:

objSelection.Find.Forward = True
objSelection.Find.Format = True
objSelection.Find.Style = "Heading 1"

Let’s see what we have here. Setting the Forward property to True simply tells the script that we want to start searching at the beginning of the document and continue searching until there’s nothing left to search. Setting the Format property to True tells the script that we’re searching for a particular type of formatting. And what type of format are we searching for? Why, the Heading 1 style, of course, which explains why we set the Style property to Heading 1.

That brings us to the following chunk of code:

Do While True
    objSelection.Find.Execute
    If objSelection.Find.Found Then
        strText = strText & objSelection.Text & vbCrLf
    Else
        Exit Do
    End If
Loop

As you probably figured out for yourself, this is the block of code where we search for any text that uses the Heading 1 style. (See, that’s why we need to bring you the inside scoop on Windows 7; you guys are more than capable of figuring out the scripting stuff all on your own.) To begin with, we set up a Do loop designed to run as long as True is equal to True. In case you’re wondering, that’s a loop designed to run forever and ever, or at least until we explicitly tell the script to exit the loop.

And you’re right: in this day and age it is hard to tell if True is still equal to True, isn’t it?

Inside the loop we call the Execute method and search for the first instance of the Heading 1 style. What if we can’t find any instance of the Heading 1 style? That’s fine; if the Found property comes back False we simply call the Exit Do statement and exit our not-so-endless loop:

Exit Do

If the Found property comes back True we run this line of code instead:

strText = strText & objSelection.Text & vbCrLf

As you can see, there isn’t anything too-terribly fancy going on here. When you find a chunk of text in Microsoft Word that text automatically becomes selected. That makes it easy to grab the found text (in this case, any text that uses the Heading 1 style): all we have to do is reference the Selection object’s Text property. And what are we going to do with that selected text? Nothing much; we’re simply going to append it (and another carriage return-linefeed) to the variable strText.

From there it’s back to the top of the Do loop, where we again call the Execute method and look for the next instance of the Heading 1 style. After we’ve found all the Heading 1 styles in the first document we exit the loop (because the Found property comes back False), then call the Close method to dismiss the first Word document. We add a blank line to the end of strText, then hop back to the top of the For Each loop and try again with the next file in the collection.

Wow. That’s like riding a roller coaster, eh?

Eventually we will have searched all the .DOC files in C:\Scripts. When we’ve reached that point we’re almost done; in fact, all we have to do now is call the Quit method to terminate our instance of Microsoft Word, then execute these three lines of code:

Set objTextFile = objFSO.CreateTextFile("C:\Scripts\Test.txt")
objTextFile.Write strText
objTextFile.Close

This is code you’re probably very familiar with. In the first line, we use the FileSystemObject and the CreateTextFile method to create a new text file named C:\Scripts\Test.txt. In the second line we use the Write method to write the contents of strText to this new file, a file we close in line 3. And what will Test.txt look like after we finish with it? With any luck it should look like this:

C:\scripts\abcd.doc
This is a Heading 1 headingSo is thisAnd so is this

C:\scripts\test.doc
Heading 1, No. 1Heading 1, No. 2Heading 1, No. 3
Heading 1, No. 4

C:\scripts\test2.doc
This is the only instance of Heading 1 in this file

In other words, we’ve copied all the text styled as Heading 1 from all the Word documents in the folder C:\Scripts and saved that information to our text file. Which is pretty cool, although we have to admit that it would be even cooler if the script made you a grilled cheese sandwich as well. If you want a grilled cheese sandwich, however, well, you’re just going to have to wait for Windows 7.

Legal disclaimer. The Scripting Editor feels it would be a good idea if we mentioned that: 1) Windows 7 is highly unlikely to ever make you a grilled cheese sandwich; 2) there probably won’t be 500 TV channels embedded within Windows 7; and, 3) while there may be touch screen capability built into Windows 7 the odds are pretty good that the computer won’t get upset if you actually touch it. “We need to make that clear to everyone,” she noted. “After all, we don’t want to get fired.”

And that’s true, especially with the Scripting Son about to graduate from high school and then head off to college. On the other hand, if people really did get fired from Microsoft, well, then why would the Scripting Guy who writes this column still be here?

In other words, don’t give up on those grilled cheese sandwiches, at least not yet.

Editor’s Note: The Scripting Editor has been rethinking that whole “getting fired” thing. It might be nice to have the summer off…. Grilled cheese, anyone?

Author

0 comments

Discussion are closed.