June 30th, 2008

Hey, Scripting Guy! How Can I Copy Specified Sections of an .RTF File to a New Word Document?

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have large files output by a reservoir simulator, files I use to predict oil, water and gas movements in hydrocarbon reservoirs. These files are finite-difference numerical simulators, and work by timestep (i.e. a time interval, usually about 90 days); when they have finished calculating, they chuck out a bunch of data about the wells and sand regions we have defined. There’s a lot of information in each of these data-dumps, which means that finding the one piece of info from each timestep can be a pain; after all, there are many timesteps in a 30-year simulation. I’d like to write a script to go in and retrieve a certain piece of text from each one of these timesteps, and output the extracted data in a new file.

— MM

SpacerHey, Scripting Guy! AnswerScript Center

Hey, MM. Before we get started today we should probably note that there are likely to be some changes in store for the Script Center in the near future. (What kind of changes? To be honest, we don’t know for sure). Because of that we thought it would be a good idea to audition some new writers, just in case one of them might some day be called upon to take over the Hey, Scripting Guy! column. (Perish the thought! After all, how could anyone but the Scripting Guy who writes this column be the Scripting Guy who writes this column? But it’s better to be safe than sorry, right?)

Needless to say, filling the shoes of the Scripting Guy who writes this column would be a monumental task for anyone. Because of that, we decided to bring in only the best and most famous writers in history; after all, no mere mortal could possibly take the place of the Scripting Guy who writes this column. Our first applicant? The immortal James Joyce, author of Ulysses and Finnegan’s Wake:

Ineluctable modality of the visible: at least that if no more, thought through my eyes. Signatures of all things I am here to script, FileSystemObject and InternetExplorer.Application, WMPlayer.OCX, that CScript. WMI, COM, PowerShell: coloured signs. Limits of the diaphane. But he adds: in Notepad. Then he was aware of them scripts before of them coloured. How? By knocking his code against them, sure. Go easy. Bald he was and a millionaire, maestro di colour che sanno for each objItem in colItems. Why in? If you can put your five fingers through it, it’s a gate, if not a door, if not x < 7 then wscript.echo x. Shut your eyes and see.

Thank you, James. We’ll get back to you ….

There you have it, folks: James Joyce’s first try at writing Hey, Scripting Guy! To be honest, the text was totally bizarre and incomprehensible, and was completely lacking in both logic and grammar. On rare occasions Joyce did toss in a scripting term or two, but those seemed to be thrown in mainly to give his article the look of a technical article on system administration scripting. Joyce was obviously trying to be funny but, just as obviously, the jokes all fell flat. All in all, this was a huge waste of both his time and your time.

Which, the more we think about it, would make him the ideal candidate to write this column on a full-time basis, wouldn’t it?

Of course, one potential drawback to hiring James Joyce to write Hey, Scripting Guy! is the fact that James Joyce died in January of 1941; that will definitely make it a little more complicated for us to get him a work visa. But don’t worry about that; the Scripting Guys will look into getting James Joyce a work visa. In the meantime, the Scripting Guy who writes the column will go ahead and write this column. Not only that, but he’ll also write a script that can extract specific portions of a Microsoft Word document and copy that data to another Word file:

Const wdExtend = 1
Const wdMove = 0
Const wdParagraph = 4
Const wdStory = 6

Set objWord = CreateObject("Word.Application")
objWord.Visible = True

Set objNewDoc = objWord.Documents.Add()
Set objNewSelection = objWord.Selection

Set objDoc = objWord.Documents.Open("C:\Scripts\Test.rtf")
objDoc.Activate
Set objSelection = objWord.Selection

objSelection.Find.Forward = True
objSelection.Find.Text = "Connection Rates and Cumulatives"

Do While True
    objSelection.Find.Execute
    If objSelection.Find.Found Then
        objSelection.MoveDown wdParagraph, 22, wdExtend
        objSelection.Copy

        objNewDoc.Activate
        objNewSelection.EndKey wdStory, wdMove
        objNewSelection.TypeParagraph()
        objNewSelection.TypeParagraph()
        objNewSelection.Paste

        objDoc.Activate
        objSelection.MoveDown 
    Else
        Exit Do
    End If
Loop

Before we go any further we should explain the scenario in a little more detail. MM works for a major oil company, and uses a simulation tool that generates huge .RTF files. (By the way, MM, you know those jokes we made not long ago about oil companies and gas prices? Those jokes actually came from James Joyce, and he has been severely reprimanded for them.) Each of these .RTF files includes numerous sections titled Connection Rates and Cumulatives. What MM needs to do is locate each of these sections, copy all the data found in each section, and then paste that data into a new Word document.

Can the Scripting Guys help MM do all that? Well, James Joyce probably put it best: “Well and what’s cheese? Corpse of milk.”

By which he means – well, we don’t have the slightest idea what he means. But yes, we can help MM with his problem.

So how are we going to help MM with his problem? Well, as you can see, we kick things off by defining four constants. Is that a record for the most constants ever defined at the beginning of a Hey, Scripting Guy! column? Well, we hadn’t really thought about that, it just might be. Good news everyone: today you might very well be a part of history!

Note. Although, to be honest, we think it only ties the record. And that’s no good: as the old saying goes, a tie is like kissing your sister.

Although we suppose if that means kissing your sister, well, maybe that’s not so bad. Do you have any pictures of her?

So what are we going to do with these four constants? Let’s see:

We’ll use wdExtend to extend our selection. When we do a search in Word, the target text is automatically selected. We need to extend that selection to cover not just the target text but the next 21 paragraphs as well.

We’ll use wdMove to move the selection as opposed to extending it.

We’ll use wdParagraph to indicate that we want to extend the selection to cover the next 21 paragraphs (as opposed to, say, the next 21 characters, lines, or sentences).

We’ll use wdStory to help us move the selection to the very end of the document. Why do we need to do that? We’ll explain why in just a minute.

After defining our constants we create an instance of the Word.Application object and then set the Visible property to True; that gives us a running instance of Microsoft Word that we can see onscreen. We then use these two lines of code to add a new, blank document to our instance of Word, and to create a Selection object at the very beginning of that document:

Set objNewDoc = objWord.Documents.Add()
Set objNewSelection = objWord.Selection

As you might expect, this new, blank document is the one that will eventually hold all the data we copy out of our .RTF file.

Speaking of .RTF files, the next three lines in the script enable us to open the file (C:\Scripts\Test.rtf); make this file the active document; and then create a second Selection object, this one for use within Test.rtf:

Set objDoc = objWord.Documents.Open("C:\Scripts\Test.rtf")
objDoc.Activate
Set objSelection = objWord.Selection

That brings us to these two lines of code:

objSelection.Find.Forward = True
objSelection.Find.Text = "Connection Rates and Cumulatives"

As we noted, we’re going to search the .RTF file for each instance of the heading Connection Rates and Cumulatives. In order to do that we need to do two things. First, we need to assign the value True to the Find object’s Forward property; that tells the script that we want to search the document from front to back. (Because we’ll be starting our search at the very top of the document, that means we’ll end up searching the entire document.) Second, we need to define the text we’re searching for; that’s something we do by assigning the target phrase to the Text property.

Once we’ve assigned our property values we’re ready to start searching. To that end, we set up a Do While loop that will run as long as True equals True. (And what if True always equals True? Does that mean we’ll be trapped in this loop forever? Yes. Well, unless we show you a way to break out of that loop. But what do you think the odds are of us doing that?)

Inside the loop, we use the Execute method to search for the first instance of the target text. And what if we actually find that target text; what then? Well, for starters, we execute these two lines of code:

objSelection.MoveDown wdParagraph, 22, wdExtend
objSelection.Copy

In the first line, we’re using the MoveDown method to extend the selection (note the constant wdExtend). How far are we planning to extend the selection? We need to extend the selection so it covers the “Connection Rates and Cumulatives” heading plus the next 21 paragraphs. (Note the constant wdParagraph and the value 22.) Why 22 paragraphs? Because, in MM’s .RTF file, that’s enough to encompass all the data in the Connection Rates and Cumulatives section.

As for line 2, well, that should be self-explanatory: in line 2 we’re simply using the Copy method to copy the selected text to the Clipboard.

That brings us to this block of code:

objNewDoc.Activate
objNewSelection.EndKey wdStory, wdMove
objNewSelection.TypeParagraph()
objNewSelection.TypeParagraph()
objNewSelection.Paste

What we’re doing here is using the Activate method to make our new document (the one we’re going to paste data into) the active document. We then use the EndKey method to move (wdMove) the cursor to the end of the document (that is, the end of the “story”). After calling the TypeParagraph method a couple of times (just to insert a blank line) we call the Paste method to paste the contents of the Clipboard to the end of the document. We move the cursor to the end of the document each time to ensure that any new data we paste in gets appended to the document, and doesn’t paste over any text already in the document.

After that we re-activate our .RTF file and use the MoveDown method to move the selection one character. Why? Because, if we don’t, the script will search the selected text only, and will continue to search that same block of text over and over and over.

And over.

From there it’s back to the top of the loop, where we call the Execute method and search for the next instance of our target text. If the search comes up empty (that is, if we can’t find the value Connection Rates and Cumulatives) then we call the Exit Do statement to exit from our “endless” loop. (Because all good things must come to an end. Even endless loops.)

That should do it, MM; let us know if you have any questions. In the meantime, we’ll continue with our auditions. Up next: Geoffrey Chaucer. Here’s a sneak peek:

Lordinges, right thus, as ye have understonde,
Bar I stifly myne olde scryptbondes on honde,
That thus they seyden in hir programmingnesse;
And A.Hidden was fals, and objItem.Writeable was tru,
On ADSI and on my nece also.
O lord, the peyne I dide recurse and the wo,
Ful giltelees, by goddes swete code!

Well, not bad. But nothing we haven’t all seen in this column before.

Author

0 comments

Discussion are closed.