May 31st, 2006

How Can I Search a Word Document for All the Words in Double Brackets?

Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I search a Word document for all the words in double brackets? For example, I need to find words like this: [[failed]]. I then need to save the words I find to a text file.

— PP

SpacerHey, Scripting Guy! AnswerScript Center

Hey, PP. You know, one thing that seems to be true of the Scripting Guys is a willingness to do something even when they have no idea what it is they’re doing. Put new molding in the family room? Come on, you just saw some wood and nail it to the wall, right? New tile in the kitchen? Piece of cake. Search a Word document for words in square brackets? Hey, how hard could something like that be?

In our defense, in most cases we come up with a finished product that looks reasonably good. New molding? Looks nice; just don’t look in the garbage can and count the number of pieces that were cut in the wrong place. Kitchen floor? That looks great … as long as you don’t take a look under the refrigerator, mind you. A script that searches a document for words in square brackets? As long as you’re willing to overlook all the false starts we ran into, well, then it wasn’t so bad after all:

Set objWord = CreateObject(“Word.Application”)
objWord.Visible = True

Set objDoc = objWord.Documents.Open(“C:\Scripts\Test.doc”) Set objSelection = objWord.Selection

objSelection.Find.Forward = True objSelection.Find.MatchWildcards = True objSelection.Find.Text = “\[\[*\]\]”

Do While True objSelection.Find.Execute If objSelection.Find.Found Then strWord = objSelection.Text strWord = Replace(strWord, “[[“, “”) strWord = Replace(strWord, “]]”, “”) Wscript.Echo strWord Else Exit Do End If Loop

Admittedly, parts of this script are a little odd-looking, most notably the line where we specify our search text. With that in mind, let’s walk you through the code and see if we can explain how it works.

Note. That’s a good point: we do walk through the code in each and every Hey, Scripting Guy! column, don’t we? We just thought that saying we were going to do that would somehow make today’s column sound special.

The script starts out in fairly-straightforward fashion: we simply create an instance of the Word.Application object and then set the Visible property to True. That gives us a running instance of Word that we can see on screen. We use the Open method to open the file C:\Scripts\Test.doc, then create an instance of the Word Selection object.

Note. You’re lost already? Don’t worry about it: if you’ve never written a script that interacts with Microsoft Word we wouldn’t expect that first paragraph to make much sense. For more information on scripting Microsoft Word take a peek at our Office Space Archive. And, while you’re there, you might want to check out this article, which explains how to find and replace text in a Word document.

As it turns out, the Selection object (which, when created without any additional parameters, simply positions the cursor at the very beginning of the document) has a child object named Find. As you might expect, the Find object is used for finding text in a Word document. Before we can use the Find object, however, we need to configure three important properties, something we do here:

objSelection.Find.Forward = True
objSelection.Find.MatchWildcards = True
objSelection.Find.Text = “\[\[*\]\]”

The first two properties – Forward and MatchWildcards – are relatively easy to figure out. The Forward property, when True, tells the script to search the document from the current cursor position to the end of the document. We want to do that because the cursor is currently positioned at the start of the document. If the cursor was at the end of the document we’d probably set the Forward property to False, which would cause us to search the document backwards, from the current cursor position to the beginning of the document. MatchWildcards? That’s even easier: MacthWildcards simply tells the script that we’re going to use a wildcard character in our search.

Why are we going to use a wildcard character in our search? Well, we’re looking for anything that begins with a pair of square brackets ([[) and then ends with a pair of square brackets (]]). To do that we can use the asterisk to represent any character or set of characters. Looking for something that starts with two brackets, has some character or set of characters in the middle, and then ends with two brackets? Then you want to search for something like this:

[[*]]

Of course, what you want to do and what Word will let you do aren’t always the same thing. For example, when it comes to doing a wildcard search, the bracket symbols are reserved characters. If you try using [[*]] in a wildcard search you’ll get back an “invalid range” error.

Note. This is true even if you aren’t using a script. In Word click Edit and then click Find. When the Find and Replace dialog box appears, type [[*]] in the Find what box, then click More and select Use wildcards. Now click Find Next and see what happens.

Because the brackets are reserved characters, we need to “escape” them by preceding each character with a \. That’s why the value of the Text property (the text we want to search for) looks so odd:

objSelection.Find.Text = “\[\[*\]\]”

Believe it or not, that really is the text we’re searching for; we simply had to precede each of the four bracket symbols with a \. Strange but true.

Now we’re ready to start searching, a task we perform inside a Do While loop:

Do While True
    objSelection.Find.Execute
    If objSelection.Find.Found Then
        strWord = objSelection.Text
        strWord = Replace(strWord, “[[“, “”)
        strWord = Replace(strWord, “]]”, “”)
        Wscript.Echo strWord
    Else
        Exit Do
    End If
Loop

You might have noticed the syntax for the Do While loop: Do While True. In essence that means, “Continue looping as long as True equals True.” That sounds like a problem waiting to happen: with the possible exception of politics, doesn’t True always equal True? And, if so, doesn’t that mean this script will run forever without stopping? Yes, it does.

Well, unless we add a line of code that we’ll discuss in a moment.

Inside the loop we start out by calling the Execute method; that causes the script to search our document for the first instance of the target text. The script begins searching and stops the first time it finds an instance of the text. On top of that, if the search is successful then the value of the Found property will be True. That’s what we’re checking here:

If objSelection.Find.Found Then

But what if we’ve reached the end of the document and no instance of the target text can be found? In that case, the Found property will be False, and we use the Exit Do command to exit the Do While loop. That’s how we manage to get out of our loop: True is still equal to True, but the Exit Do command will automatically take you out of the loop. Period.

On the other hand, suppose we do find the target text? In that case, Word will automatically move the selection to the text that was just found. Because of that, we can get the value of the new selection just by checking the value of the Selection object’s Text property:

strWord = objSelection.Text

What does that mean? Let’s assume that Word has found this item:

[[failed]]

In that case, the word [[failed]] will (for our purposes) be highlighted, and the value of the Text property will be whatever happens to be highlighted. In other words, the Text property will equal this:

[[failed]]

Of course, ideally we’d like to get rid of the surrounding brackets and pick out just the word failed. Fortunately that’s easy to do. After storing the value of the Text property in the variable strWord we can use the VBScript Replace function to replace the four bracket symbols ([[ and ]]) with, well, nothing. That’s what we do with these two lines of code:

strWord = Replace(strWord, “[[“, “”)
strWord = Replace(strWord, “]]”, “”)

To keep the script simple, we then do nothing more than echo back the value of strWord. Admittedly, PP, you wanted to write any words that were found to a text file. For more information on doing that, check out the Microsoft Windows 2000 Scripting Guide. (And, of course, the Hey, Scripting Guy! archive has plenty of sample scripts that show you how to write data to a text file.)

As we noted, the script will find the first instance of the target text and then stop. But what if there are more words enclosed in square brackets? We’ve already thought of that: that’s why we’re using the Do While loop. After finding the first instance the script loops around, calls the Execute method and starts the whole process all over again. That continues until the script reaches the end of the document. At that point the Found property will be False, and we exit the loop.

As for the Scripting Guys, with another challenge successfully tackled it’s time to move on to the next project: putting a basketball court in the backyard. Have we ever actually poured concrete before? Do you even have to ask?

Author

0 comments

Discussion are closed.