April 8th, 2010

Hey, Scripting Guy! How Can I Read Words in a Text File and Make Those Same Word Italic in a Word Document?

Bookmark and Share

 

Hey, Scripting Guy! Question

Hey, Scripting Guy! I came across this blog and it has since become a staple of mine. Now, I need to know how to accomplish something more than was originally presented in your article, Hey, Scripting Guy! How Can I Italicize Specific Words in a Microsoft Word Document? Suppose I wanted to create an array containing 305 words that would fill up several lines in my script and could prove to be a maintenance nightmare. It might be easier to simply list the words in a text file and then have the script read that text file line by line and then search for the words one at a time. I will leave it up to you. My problem is I need to be able to accomplish this, but I have no idea where to begin.

— TK

 

Hey, Scripting Guy! AnswerHello TK,

Microsoft Scripting Guy Ed Wilson here. I can understand your hesitancy to modify a script that works when you are uncomfortable with the code. It is a good practice to tread lightly when making changes; in fact I prefer to make one change at a time and then test the change. It is kind of like when I was scuba diving in Boca Raton, Florida, and I ran across the green moray eel whose picture I took (see just below). Green moray eels are usually very friendly and smile for you when you take their picture, but you do not want to harass them. It is also a good idea to not get too close to them when you snap their picture, or they might snap at you. They enjoy their personal space. Do not mess with their personal space.

Photo Ed took of moray eel 

TK, the original ItalicWordsInWord.ps1 script is seen here.

ItalicWordsInWord.ps1

$application = New-Object -comobject word.application
$application.visible = $true
$document = $application.documents.open(“C:\fso\Test.docx”)
$selection = $application.Selection
$words = “exchange”,”sql”
$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$format = $true
$replace = 2
Foreach ($word in $words)
{
 $findText = $word
 $replaceWith = $word
 $selection.find.replacement.font.italic = $true
 $exeRTN = $selection.find.execute($findText,$matchCase,
 $matchWholeWord,$matchWIldCards,$matchSoundsLike,
 $matchAllWordForms,$forward,$wrap,$format,$replaceWith,
 $replace)
 }

In the original script, an array of words that are searched for in the Microsoft Word document is hardcoded into the script. The word document is opened, and a Foreach loop is used to walk through the array of words and change the font to italic. Because the script already accepts an array, and because it already performs the search and replace, there is only one change that is needed to modify the script and to have it read a text file for its input.

With Windows PowerShell, the Get-Content cmdlet reads a text file and creates an array from the contents of the text file. As shown in the words.txt (see following image), no special formatting is required. Each search term (element in the array) is placed on its own individual line. To make matters worse, I included spaces in the words. For example, Microsoft Word, or to make matters even more difficult, I used Windows Server 2008 which is three words. As you can see, it would be very easy to create a text file that included the words that are specific to your industry or your company.

Image of words.txt file 

The line that was changed in the script is this one:

$words = Get-Content C:\fso1\words.txt

That is the only change that was made to the script. I created a new test document, Test.docx. The Test.docx file is seen in the following image. As you can see, there are several things that could go wrong with our scenario. For example, in test.docx I have the “word document,” so the question is will the search and replace pick up “word” and modify it because the phrase “Microsoft Word” appears in the words.txt file? Before you reject the possibility out of hand, consider that if each word becomes a search term (instead of each line in the text file), “Microsoft” and “Word” would both be included instead of “Microsoft Word.” In addition, Test.docx includes a blank line between paragraphs. A blank line could cause the script to error out when it attempted to parse the line.

Image of Test.docx file

The completed ReadTextFileItalicWordsInWord.ps1 script is seen here.

ReadTextFileItalicWordsInWord.ps1

$application = New-Object -comobject word.application
$application.visible = $true
$document = $application.documents.open(“C:\fso1\Test.docx”)
$selection = $application.Selection
$words = Get-Content C:\fso1\words.txt
$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$format = $true
$replace = 2
Foreach ($word in $words)
{
 $findText = $word
 $replaceWith = $word
 $selection.find.replacement.font.italic = $true
 $exeRTN = $selection.find.execute($findText,$matchCase,
 $matchWholeWord,$matchWIldCards,$matchSoundsLike,
 $matchAllWordForms,$forward,$wrap,$format,$replaceWith,
 $replace)
 }

After the script has run, the Test.docx appears and all the words in the words.txt file are italicized.

Image of Test.docx with words in italic text

To make the script more useful, I modified the ReadTextFileItalicWordsInWord.ps1 script and added a couple of features. The first thing I did was add the ability for the script to create an array of all the documents in a folder. This uses the Get-ChildItem cmdlet to return all the .doc and .docx files in a folder. The returned array of objects is stored in the $docs variable. This line of code is shown here:

$docs = Get-ChildItem -Path c:\fso1 -Include *.doc,*.docx –Recurse

Because we have an array of System.IO.FileInfo objects, I use the Foreach command to iterate through the collection. This is shown here:

Foreach ($doc in $docs)
{

I add the save method to save the changes, and the close method to close the document. These two commands go inside the newly created Foreach command that I wrap around the old script:

 $document.save()
 $document.close()
} #end foreach doc

After all the documents in the folder have been processed, it is time to exit the Microsoft Word application and to release the memory used by the objects. This section of the code is shown here:

$application.quit()
$application = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()

The complete ReadTextFileItalicWordsInAllWordDocsInFolder.ps1 script is shown here.

ReadTextFileItalicWordsInAllWordDocsInFolder.ps1

$application = New-Object -comobject word.application
$application.visible = $false
$words = Get-Content C:\fso1\words.txt
$docs = Get-ChildItem -Path c:\fso1 -Include *.doc,*.docx -Recurse
Foreach ($doc in $docs)
{
 $document = $application.documents.open($doc.FullName)
 $selection = $application.Selection
 $matchCase = $false
 $matchWholeWord = $true
 $matchWildCards = $false
 $matchSoundsLike = $false
 $matchAllWordForms = $false
 $forward = $true
 $wrap = 1
 $format = $true
 $replace = 2
 Foreach ($word in $words)
 {
  $findText = $word
  $replaceWith = $word
  $selection.find.replacement.font.italic = $true
  $exeRTN = $selection.find.execute($findText,$matchCase,
  $matchWholeWord,$matchWIldCards,$matchSoundsLike,
  $matchAllWordForms,$forward,$wrap,$format,$replaceWith,
  $replace)
  } #end foreach word
 $document.save()
 $document.close()
} #end foreach doc
$application.quit()
$application = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()

 

TK, that is all there is to modifying an existing script, to read a text file, and to collect files from a directory. Join us tomorrow for Quick-Hits Friday, when we will talk about…wait a minute.

If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

 

Ed Wilson and Craig Liebendorfer, Scripting Guys

 

Author

0 comments

Discussion are closed.