Hey, Scripting Guy! I need to use Windows PowerShell 2.0 to convert Microsoft Word documents to the new document format. If a user opens an old document (.doc) file extension in Microsoft Word 2010 and makes changes, they are prompted about saving the file in compatibility mode. This process is confusing for our users. Therefore, we have decided that we would like to convert all of the .doc files on our file server to .docx before we roll out Office 2010. I looked around but could not find a utility or a script I could use. Is this even possible?
— SJ
Hello SJ,
Microsoft Scripting Guy Ed Wilson here. I was talking to one of my Microsoft friends in Australia the other day, and we were discussing him doing something similar to what you describe. As I read your email message, I was listening to The Avalanches on my Zune HD. The Avalanches are a group I fell in love with when I was in Melbourne, Australia, a few years ago. Because the Scripting Wife has gone to get her nails done, I got into the Tim Tams my buddy Brent brought me from Sydney, and I am drinking a cup of lapsang souchong tea (a tea that I have to put milk in when I drink it). I decided to look at some pictures I took while I was diving with the seals on my last trip down under. The following photo is one of my favorite seal pictures. One nice thing about Australia this time of year is that whereas it was 96 degrees Fahrenheit today in Charlotte with 65 percent humidity for a heat index of 107 degrees Fahrenheit, it is winter in Australia. The high today in Sydney is 65 degrees Fahrenheit—it is a great way to beat the heat and humidity of Charlotte.
SJ, the SaveWordDocAsDocx.ps1 script will accept one of more folders as search paths, and it will convert all .doc files into .docx files. The complete script is shown here.
SaveWordDocAsDocx.ps1
[ref]$SaveFormat = “microsoft.office.interop.word.WdSaveFormat” -as [type]
$word = New-Object -ComObject word.application
$word.visible = $false
$folderpath = “c:\fso\*”, “c:\fso1\*”
$fileType = “*doc”
Get-ChildItem -path $folderpath -include $fileType |
foreach-object `
{
$path = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
“Converting $path to $fileType …”
$doc = $word.documents.open($_.fullname)
$doc.saveas([ref] $path, [ref]$SaveFormat::wdFormatDocumentDefault)
$doc.close()
}
$word.Quit()
$word = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()
The first thing to do in the SaveWordDocAsDocx.ps1 script is to create the WdSaveFormat enumeration. Well, you do not actually have to create the enumeration, but it makes the script more readable than simply assigning a random number. In addition, by creating the actual enumeration, you have documentation about what the enumeration is called. This is very useful when trying to look up information on MSDN. For information about working with enumerations in Windows PowerShell, you may want to refer to the Weekend Scripter articles I recently wrote. The WdSaveFormat enumeration is saved in the $saveFormat variable. It is cast as a reference type because the SaveAs method from the document object requires the WdSaveFormat enumeration to be passed by reference. This line of code is shown here:
[ref]$SaveFormat = “microsoft.office.interop.word.WdSaveFormat” -as [type]
Next, the Microsoft Word application object is created. This was discussed in yesterday’s Hey, Scripting Guy! Blog post and will not be repeated today.
$word = New-Object -ComObject word.application
$word.visible = $false
You can specify more than one folder when calling the SaveWordDocAsDocx.ps1 script. The folder paths are stored in an array as shown here:
$folderpath = “c:\fso\*”, “c:\fso1\*”
Now the .doc files are retrieved by using the Get-ChildItem cmdlet. This is similar to the code used yesterday and will not be discussed here.
$fileType = “*doc”
Get-ChildItem -path $folderpath -include $fileType |
foreach-object `
{
It is possible that there could be multiple dots in the file name; therefore, the lastindexof method is used to retrieve the position of the last dot in the file name. The substring method is then used to retrieve the path and the file name without the file extension. A better way to create this path to the file without the file extension would be to use the Join-Path cmdlet, and some of the properties from the system.io.fileinfo .NET Framework class. I talked about these properties in yesterday’s Hey, Scripting Guy! post. To create the path to the file without the file extension, use the directory property from the fileinfo object, as well as the basename property. This is illustrated here:
PS C:\> Join-Path -Path $file.Directory -ChildPath $file.BaseName
C:\fso\Test
PS C:\>
In the script, a couple of string methods are used to obtain the path to the file and the file name without the file extension. To understand exactly what is being done, suppose there is a string called this.is.a.file.withdots.doc that is stored in a variable named $test. When the LastIndexOf string method is used to find the position of the last dot in the file name, it returns an integer -23. A second string method, substring, is used to retrieve a portion of the string. The first parameter for the method is the starting position and the second parameter is the ending position in the string. If I choose letters 0 through 5 inclusively, the phrase “this.” is returned. By starting at position 0 and ending at the last dot in the file name, the path and filename without extension is returned. This example is shown here:
PS C:\> $test = “this.is.a.file.withdots.doc”
PS C:\> $test.LastIndexOf(“.”)
23
PS C:\> $test.Substring(0,5)
this.
PS C:\> $test.Substring(0,$test.lastindexof(“.”))
this.is.a.file.withdots
PS C:\> $test.Substring(0,23)
this.is.a.file.withdots
PS C:\>
The completed command uses the $_ character, which represents the current item on the pipeline instead of a specific file name and path. However, other than that, the command is the same one we just finished:
$path = ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
A status message is displayed that lists the current file being processed to the Windows PowerShell console. This command is shown here:
“Converting $path to $fileType …”
Now the full path to the current document is passed to the open method of the documents object. The returned document object is stored in the $doc variable as shown here:
$doc = $word.documents.open($_.fullname)
No additional processing needs to be accomplished; therefore, the saveas method from the document object is called. Both the path and SaveFormat must be passed by reference. This command is shown here:
$doc.saveas([ref]$path, [ref]$SaveFormat::wdFormatDocumentDefault)
After the document has been saved in the Microsoft Word 2007 or Microsoft Word 2010 format, it is time to close the document. To do this, use the close method from the document object:
$doc.close()
}
To clean up after the script, we need to call the quit method from the application object. I then set the $word variable to $null. Garbage collection is then called by using the collect method. This will trigger a garbage collection event for all generations. The waitForPendingFinalizers method is used tell the garbage collector that if an object has a finalizer that has not yet run to completion, wait until it completes. The finalize method of an object attempts to free resources to perform other cleanup operations before the object is reclaimed by garbage collection. This section of the script is shown here:
$word.Quit()
$word = $null
[gc]::collect()
[gc]::WaitForPendingFinalizers()
SJ, that is all there is to using Windows PowerShell to automate Microsoft Word and create documents in the new file format. Microsoft Office Week will continue tomorrow.
If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson and Craig Liebendorfer, Scripting Guys
0 comments