October 22nd, 2009

Hey, Scripting Guy! Part 2: How Can I Update Many Office Word Documents at Once?

Bookmark and Share

(Editor’s note: This is part 2 of a two-part article originally intended for TechNet Magazine. Part 1 was published yesterday.) 


Now we arrive at the entry point to the script. The first thing to do in the entry point of the script is to create a collection of all the .doc and docx files in the folder indicated by the $path variable. The Get-ChildItem cmdlet is used to gather the collection of files. The Recurse parameter is used to tell the Get-ChildItem cmdlet to retrieve all files in the path that match the Include filter. This is seen here:

$files = Get-ChildItem -Path $path -Include *.doc,*.docx –Recurse

The ReplaceWordsLogResults.ps1 script can be used to make nomenclature changes in a collection of documents. It can also be used to automatically correct misspelled words and to improve the grammar of documents. Such a candidate for improvement is seen here:


There is no way to completely automate the grammar checker and the spelling checker from within Microsoft Word. There is a find-and-replace method that can be used to perform the same service. To allow the script to work with multiple words, a hash table is created. Each element of the hash table is made up of a key and a value. Each element in the hash table is separated from other elements by a semicolon. The hashtable is stored in the variable $wordHash and is seen here:

$wordHash = @{“misspelled” = “spelled incorrectly” ; “done”=”finished”}

The log file name is stored in the $logfile variable. The log will be a comma-separated value file so that it can easily be opened in Microsoft Excel. A counter variable, $i, will be used to keep track of the progress as the script walks through the collection of files. This is seen here:

$logfile = “ReplaceResults.csv”

$i=0

The Find object is used to perform the find-and-replace operation in the Word document. The find operation represented by the find object begins when it is called by the Find.Execute method of the find object. The execute method takes a large number of parameters. The easiest way to deal with complicated methods such as the execute method is to create a collection of variables that contain the desired parameters. This makes the method call easier to read and also makes the script easier to modify. The variables that hold the preferences for the execute method are seen here:

$ReplaceAll = 2

$FindContinue = 1

$MatchCase = $False

$MatchWholeWord = $True

$MatchWildcards = $False

$MatchSoundsLike = $False

$MatchAllWordForms = $False

$Forward = $True

$Wrap = $FindContinue

$Format = $False

The collection of Word documents that were obtained by the Get-ChildItem cmdlet is piped to the ForEach-Object cmdlet. This is seen here:

$files | ForEach-Object {

The $file variable is used to store the complete path to each Word document that comes across the pipeline. This is seen here:

  $file = $_.fullname

The $i counter variable is incremented by one as seen here:

  $i ++

Next the Write-Progress cmdlet is used to display a progress bar. The progress bar shown in the following image is the one displayed by the Windows PowerShell ISE on Windows PowerShell 2.0. The progress bar gives a visual indicator that the script is running and is continuing to process Word documents. This is the only visual indicator that the script is running or has completed:

Image of progress bar

The code that calls the Write-Progress cmdlet is seen here:

  write-progress -activity “Searching For Word documents”

 -status “Progress:” -percentcomplete ($i/$files.count*100)

The Get-WordSelection function is used to create the Word application object and to create the selection object. The Get-WordSelection function accepts two parameters, the path to the Word document, and whether or not to make the Word application visible while processing the files. This is seen here:

 Get-WordSelection -file $_.fullname -visible $false

Now it is time to walk through the hash table that contains the words and their substitute values. The keys property is used to obtain a collection of all the keys in the hash table. This is seen here:

 foreach($FindText in $wordHash.keys)

 {

Armed with a specific set of keys and values, it is time to search the Word document for specific words. The words that are sought are stored in the keys of the hash table. The substitute word is the value associated with the key value. To retrieve the value associated with a particular key, the value is retrieved by querying the specific key. Each key in the collection of keys from the hash table is queried. This is seen here:

  $rtn = $Script:Selection.Find.Execute($FindText,$MatchCase,

   $MatchWholeWord,$MatchWildcards,$MatchSoundsLike,

   $MatchAllWordForms,$Forward,$Wrap,$Format,

   $wordHash.$FindText,$ReplaceAll)

 } #end foreach findtext

The modified Word document is seen here:

Image of modified Word document

 

After the Execute method from the Find object has been called and all the replacements that are stored in the hash table have been made, it is time to call the New-LogObject function. This function accepts two parameters: the path to the Word document and the Boolean return value that was captured from the Execute method. This is seen here:

 New-LogObject -document $file -replaced $rtn

One reason to pipe the collection of Word documents to the ForEach-Object cmdlet is that it makes it easy to create a CSV file from the logging object that is returned from the New-LogObject function. A new logging object is created for each Word document that is processed. The Export-Csv cmdlet includes the properties on the first line of the CSV file and the values on the second line. There is no append parameter for the Export-Csv cmdlet. The Export-Csv cmdlet is designed to handle piped data, and each logging object is therefore piped to the Export-Csv cmdlet without incurring extra property descriptions. This command is seen here:

} | Export-Csv -Path (join-path -Path $path -ChildPath $logfile) -NoTypeInformation

$Script:word.quit()

The completed log is seen here:

Image of completed log


Though I do not think this script will stand the test of time, it should make it easier for you to cope with many of the changes that take place on a daily basis in the workplace. Whether you use the ReplaceWordsLogResults.ps1 script to fix common typographical errors or standard letter transpositions, or to update documents with the latest and greatest naming conventions, you are sure to find a use for it.

For more tools, tips, and tricks to help you surf the tides of change, head over to the TechNet Script Center and check out our daily Hey, Scripting Guy! Blog posts.

Ed Wilson and Craig Liebendorfer, Scripting Guys

 

Author

0 comments

Discussion are closed.

Feedback