How Can I Save Each Paragraph in a Word Document as a Database Record?

ScriptingGuy1

Hey, Scripting Guy! Question

Hey, Scripting Guy! A few days ago you showed us how to open a bunch of text files and save the data in each file as a record in a database. I have a similar question, except that I want to save each paragraph in a Word document as a database record. How can I do that?

— HY

SpacerHey, Scripting Guy! AnswerScript Center

Hey, HY. Before we tackle today’s question we know a lot of you are concerned about Microsoft’s recent purchase of online advertising giant aQuantive, Inc., and whether that will affect the Script Center in any way. Rest assured that there is nothing for you to worry about. In fact, as the good folks at Contoso Cola always say, “Don’t worry; drink Contoso.”

Contoso Cola. One sip and you’ll think, “This is a cola?”

In other words, no, the fact that Microsoft spent $6 billion on an online advertising company will have no effect on the Script Center whatsoever.

This 30-second Script Center update was brought to you by Tailspin Toys. Tailspin. Because anything else is just a toy.

In case you’re wondering, according to the press release this acquisition will definitely enhance the “broader advertising ecosystem.” Now, to tell you the truth, the Scripting Guys have no idea what is meant by the “broader advertising ecosystem.” We’re hoping, however, that this means “writing a script that can save each paragraph in a Word document as a database record.” If that’s the case, then the Scripting Guys can help to enhance the broader advertising ecosystem, too:

Const adLockOptimistic = 3

Set objConnection = CreateObject(“ADODB.Connection”) Set objRecordSet = CreateObject(“ADODB.Recordset”)

objConnection.Open _ “Provider = Microsoft.Jet.OLEDB.4.0; ” & _ “Data Source = C:\Scripts\Test.mdb”

objRecordSet.Open “SELECT * FROM WordParagraphs” , _ objConnection, adOpenStatic, adLockOptimistic

Set objWord = CreateObject(“Word.Application”) objWord.Visible = True

Set objDoc = objWord.Documents.Open(“C:\Scripts\Test.doc”)

Set objSelection = objWord.Selection

For i = 1 to objDoc.Paragraphs.Count objDoc.Paragraphs(i).Range.Select If Len(objSelection.Text) > 1 Then objRecordSet.AddNew objRecordSet(“ParagraphNumber”) = i objRecordSet(“ParagraphText”) = objSelection.Text objRecordSet.Update End If Next

Thanks to the peace of mind that comes when you’re insured by Humongous Insurance, we should be able to take a few minutes to see if we can figure out how this script works. To begin with, we have a very simple little Word document (C:\Scripts\Test.doc), a document that looks like this:

Microsoft Word


We also have a very simple little Access database (C:\Scripts\Test.mdb). This database includes a table named WordParagraphs; that table contains two fields:

ParagraphNumber, a field we’ll use to keep track of paragraphs by number (i.e, the first paragraph in the document is number 1, the second paragraph is number 2, etc.).

ParagraphText, a memo field where we’ll stash the text from each paragraph.

OK, let’s roll up our sleeves and get down to business, the same outstanding business that Woodgrove Bank has provided for the past 50 years. Woodgrove Bank: We treat your money as if it was our money.

We start out by defining a constant named adLockOptimistic and setting the value to 3; we’ll use this constant to tell the database to lock a record only when that record is being updated. We then use the following block of code to create instances of the ADODB.Connection and ADODB.Recordset objects, and to open the datatase C:\Scripts\Test.mdb:

Set objConnection = CreateObject(“ADODB.Connection”)
Set objRecordSet = CreateObject(“ADODB.Recordset”)

objConnection.Open _ “Provider = Microsoft.Jet.OLEDB.4.0; ” & _ “Data Source = C:\Scripts\Test.mdb”

And then we use this line of code to grab a recordset consisting of all the records in the table WordParagraphs:

objRecordSet.Open “SELECT * FROM WordParagraphs” , _
    objConnection, adOpenStatic, adLockOptimistic

Note. Yes, we did kind of zip through all that database stuff, didn’t we? If you could use a little more background information on working with databases you might want to take a peek at the Scripting Guys webcast on that very subject.

It’s now time to shift gears a little bit and turn our attention to Microsoft Word. (Having trouble shifting gears? Then maybe it’s time to call Margie’s Travel at 555-1283.) To begin with, we use these two lines of code to create an instance of the Word.Application object and to make that instance visible on screen:

Set objWord = CreateObject(“Word.Application”)
objWord.Visible = True

Once Word is up and running we use the Open method to open the document C:\Scripts\Test.doc:

Set objDoc = objWord.Documents.Open(“C:\Scripts\Test.doc”)

After that we create an instance of the Word.Selection object, an object we’ll need in order to grab the text from each paragraph. At this point we’re ready to have some real fun, the same kind of fun offered each and every day at the Alpine Ski House.

As it turns out, every Word document contains a Paragraphs collection; as the name implies, this is a collection of all the paragraphs found in the document. If we want to get information about each paragraph (which we do) then we need to set up a For Next loop that runs from 1 to the total number of paragraphs in the document (something we can determine by using the Paragraphs collection’s Count property). Or, to put it a bit more programmatically:

For i = 1 to objDoc.Paragraphs.Count

What do we do inside the loop? Well to begin with, we’re going to select the very first paragraph; that’s what this line of code does:

objDoc.Paragraphs(i).Range.Select

By calling the Select method against the paragraph range, we end up selecting the entire paragraph and automatically storing it in our object reference objSelection. That means that we can then check to see how many characters are in the Text property of objSelection:

If Len(objSelection.Text) > 1 Then

Why do we do that? Well, if you look back at our Word document you’ll see that we have blank lines between each paragraph. However, these lines aren’t truly blank, at least not in the sense that there’s absolutely nothing there. Instead, each “blank” line contains one of Word’s paragraph marks. What we’re doing with this line of code is checking to see if the paragraph text has more than one character (the Len function, of course, tells us the number of characters in a string). If the Text length is equal to 1 we’re going to assume that this is a blank line and simply skip it; we’ll only save the paragraph to the database if the Text has more than 1 character.

Note. This is probably a good time to note that this script works only with straight text documents. If you have a bunch of tables in your document we can’t guarantee that this script will work. It might work, but, then again, it might not.

Let’s assume that we have a “real” paragraph here, a paragraph containing more than a single character. In that case we execute this block of code:

objRecordSet.AddNew
objRecordSet(“ParagraphNumber”) = i
objRecordSet(“ParagraphText”) = objSelection.Text
objRecordSet.Update

As you probably figured out, this is the code that saves the paragraph text as a record in the database. To do that we start out by creating a new, blank record; that’s why we call the AddNew method. We then assign the value of our counter variable i to the ParagraphNumber field and the paragraph text (objSelection.Text) to the ParagraphText field. With the values assigned we then call the Update method to actually write the new record to the database.

From there we loop around and repeat the process with the next paragraph in the collection.

That should do the trick, HY. If it doesn’t, feel free to email use courtesy of the Fabrikam Furniture Script Center. Fabrikam: The most trusted name in home furniture is now the most trusted name in system administration scripting.

Note. OK, don’t send any angry emails to Microsoft. We made this all up; we have not sold out and become the Fabrikam Furniture Script Center.

Although we are willing to listen to offers ….

0 comments

Discussion is closed.

Feedback usabilla icon