December 29th, 2009

Hey, Scripting Guy! How Can I List All the Properties of a Microsoft Word Document?

Bookmark and Share 

 

Hey, Scripting Guy! Question

Hey, Scripting Guy! I am trying to find a Windows PowerShell script that will list all the properties of a Microsoft Word document. I am not talking about the size and the date of the document; those are file properties. I am talking about stuff like the author, the subject, and those properties most people never know even exist for a document. Is this even possible?

— JM

 

Hey, Scripting Guy! AnswerHello JM,

Microsoft Scripting Guy Ed Wilson here. I wrote a similar article that illustrates accessing the meta data from a Microsoft Excel spreadsheet on October 9, 2008 (I was in Australia when I wrote How Can I Read Microsoft Excel Metadata?). I was looking through some of the pictures I took while I was in Sydney, such as the following one I took at the ANZAC War Memorial. The ANZAC War Memorial is a beautiful art deco structure, and looking at it causes me to reminisce just a bit. I absolutely love taking pictures in Sydney because of the striking architecture found there.

Image of the ANZAC War Memorial


JM, Get-WordProperties.ps1 opens a Microsoft Word document, and inspects the built-in document properties. If a property contains a value, the value is displayed; otherwise, a note is displayed in the Windows PowerShell console that states the property does not contain a value. The complete Get-WordProperties.ps1 script is seen here.

Get-WordProperties.ps1

$application = New-Object -ComObject word.application
$application.Visible = $false
$document = $application.documents.open(“C:dataScriptingGuys2009HSG_12_28_09Test.docx”)
$binding = “System.Reflection.BindingFlags” -as [type]
$properties = $document.BuiltInDocumentProperties
foreach($property in $properties)
{
 $pn = [System.__ComObject].invokemember(“name”,$binding::GetProperty,$null,$property,$null)
  trap [system.exception]
   {
     write-host -foreground blue “Value not found for $pn”
    continue
   }
  “$pn`: ” +
   [System.__ComObject].invokemember(“value”,$binding::GetProperty,$null,$property,$null)

}
$application.quit()

JM, if you read the Microsoft Excel metadata Hey, Scripting Guy! post I referred to earlier, you will recall that it took me two days to write the script to retrieve the metadata. The problem is that the process to retrieve the information is not documented. Whereas, the Microsoft Word automation model is documented, the use of the interfaces from Windows PowerShell was never intended and therefore we have to use some pretty tricky procedures to obtain the information. Luckily, the Microsoft Word automation model and the Microsoft Excel automation model are similar enough that I was able to leverage the two days of research, trial, and experimentation that led to the previous Hey, Scripting Guy! post. Because the two approaches are similar, I am not going to repeat all the background information from the previous post.

The basic Microsoft Word document properties are shown in the following image and are accessed by clicking the Office button in Office 2007, clicking Prepare, and then clicking Properties:

Image of Microsoft Word document properties


You can see the advanced document properties by opening the Document Properties drop-down list. The Document Properties dialog box is shown in the following image. As you can see, the Summary tab is completely filled out:

Imageof Summary tab completely filled out

 

The Get-WordProperties.ps1 script begins by creating the application object. The application object is the main object that is used when automating Microsoft Word. To create the application object, use the New-Object cmdlet and the –comObject parameter with the word.application program ID. Store the returned application object in the $application variable, as shown here:

$application = New-Object -ComObject word.application

There is no need for the Microsoft Word document to appear and capture the window focus. The Get-WordProperties.ps1 script runs faster if the Microsoft Word document is not visible. To control this behavior, set the visible property from the application object to $false. One thing that is vital when making the application invisible is that you must call the quit method at the end of the script to keep from having multiple instances of word.exe running. The line of code that makes the application invisible is shown here:

$application.Visible = $false

The document object is obtained by using the open method from the documents collection object that is returned by the documents property of the application object. Store the returned document object in the $document variable, as seen here:

$document = $application.documents.open(“C:dataScriptingGuys2009HSG_12_28_09Test.docx”)

The invokemember method that is required to retrieve the document properties requires a bindingflags enumeration value. To provide access to the bindingflags enumeration values, use –as [type] and store the bindingflags enumeration in the $binding variable, as seen here:

$binding = “System.Reflection.BindingFlags” -as [type]

You obtain the BuiltInDocumentProperties collection object by querying the BuiltInDocumentProperties property from the document object. Store the BuiltInDocumentProperties collection object in the $properties variable, as seen here:

$properties = $document.BuiltInDocumentProperties

Because the BuiltInDocumentProperties collection object is a collection, use the foreach statement to iterate through the collection, as seen here:

foreach($property in $properties)

{

Inside the foreach loop, use the invokemember method to retrieve the name of each document property from the collection of BuiltInDocumentProperties. This is shown here:

 $pn = [System.__ComObject].invokemember(“name”,$binding::GetProperty,$null,$property,$null)

It is entirely possible that a particular document property will be empty. If this is the case, an error will be generated. To keep the script from crashing, the trap statement is used to handle any system exception that might be generated. If an instance of the system.exception class is generated, the exception is trapped and the name of the empty document property is displayed in blue on the Windows PowerShell console. This is seen here:

  trap [system.exception]

   {

     write-host -foreground blue “Value not found for $pn”

After the exception has been trapped, the continue statement is used to cause the script to return to the top of the foreach loop. The continue statement is seen here:

    continue

   }

If there were no errors accessing the document property, the invokemember method is used to retrieve the value of the document property. There is no need to use the trap statement to handle errors because the nonexistent document properties were previously handled. This is seen here:

  “$pn`: ” +

   [System.__ComObject].invokemember(“value”,$binding::GetProperty,$null,$property,$null)

}

After all the document properties and values have been displayed, close the Microsoft Word document by using the quit method, as shown here:

$application.quit()

When the Get-WordProperties.ps1 script runs, the output shown in the following image is seen in the Windows PowerShell ISE:

Image of script output in Windows PowerShell ISE

 

JM, that is all there is to using Windows PowerShell to get the document properties from a Microsoft Word document. Microsoft Word Week will continue tomorrow.

If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

 

Ed Wilson and Craig Liebendorfer, Scripting Guys

 

Author

0 comments

Discussion are closed.