August 2nd, 2012

Use PowerShell to Find Specific Word Built-in Properties

Doctor Scripto
Scripter

Summary: Microsoft Scripting Guy, Ed Wilson, talks about using Windows PowerShell to find specific built-in properties from Word documents.

Microsoft Scripting Guy, Ed Wilson, is here. Well the script for today took a bit of work … actually it took quite a bit of work. The script does the following:

  • Searches a specific folder for Word documents
  • Creates an array of specific Word document properties from the Word built-in document properties enumeration. The built-in Word properties are listed on MSDN.
  • Retrieves the specific built-in Word properties and their associated value
  • Creates a custom Windows PowerShell object with each of the specified properties, in addition to the full path to the Word document

Today’s script is similar to the Find All Word Documents that Contain a Specific Phrase script from yesterday, so reviewing that posting would be a good thing to do. This script also accomplishes a few of the things I wanted to do in yesterday’s script that I did not get a chance to do—namely, I return a custom object that contains the built-in properties I choose. This is a great benefit because it permits further analysis and processing of the data­­—and it would even permit export to a CSV file if I wish.

Working with Word Document properties

It is very difficult to work with Word document properties, and I have written several blogs about this. You should refer to those blogs for additional information. The first thing I do is create a couple of command-line parameters. This permits changing the path to search, as well as modifying the include filter that is used by the Get-ChildItem cmdlet. Next, I create the Word.Application object and set it to be invisible. Next, I need to create BindingFlags and WdSaveOptions. The reason for creating WdSaveOptions is to keep Word from modifying the last save option on the Word files. Finally, I obtain a collection of fileinfo objects and store the returned objects in the $docs variable. This portion of the script is shown here.

Param(

  $path = “C:fso”,

  [array]$include = @(“HSG*.docx”,”WES*.docx”))

$AryProperties = “Title”,”Author”,”Keywords”, “Number of words”, “Number of pages”

$application = New-Object -ComObject word.application

$application.Visible = $false

$binding = “System.Reflection.BindingFlags” -as [type]

[ref]$SaveOption = “microsoft.office.interop.word.WdSaveOptions” -as [type]

$docs = Get-childitem -path $Path -Recurse -Include $include 

Now I need to walk through the collection of documents. I use the foreach statement. Inside the foreach loop, I open each document,and return the BuiltInDocumentProperties collection. I also create a hash table that I will use to create the custom object later in the script. This portion of the code is shown here.

Foreach($doc in $docs)

 {

  $document = $application.documents.open($doc.fullname)

  $BuiltinProperties = $document.BuiltInDocumentProperties

  $objHash = @{“Path”=$doc.FullName}

It is time to work through the array of built in properties that I selected earlier. To do this, once again I use a foreach statement. I use Try  when attempting to access each built-in property because an error generates if the property contains no value. I already know the name of the property that I desire to obtain; therefore, I use it directly when obtaining the value of the property. Both the name and the value of the built-in document properties are assigned to the hash table as a keyvalue pair. If an error occurs, I print a message via Write-Host that the value was not found. I use Write-Host for this so I can specify the color (blue). The code is shown here.

foreach($p in $AryProperties)

    {Try

     {

      $pn = [System.__ComObject].invokemember(“item”,$binding::GetProperty,$null,$BuiltinProperties,$p)

      $value = [System.__ComObject].invokemember(“value”,$binding::GetProperty,$null,$pn,$null)

      $objHash.Add($p,$value) }

     Catch [system.exception]

      { write-host -foreground blue “Value not found for $p” } 

I then create a new custom PSObject and use the hash table for the properties of that object. I display that object, and close the Word document without saving any changes. Finally, I release the document object and the BuiltInProperties object, and I continue to loop through the collection of documents. This code is shown here.

   $docProperties = New-Object psobject -Property $objHash

   $docProperties

   $document.close([ref]$saveOption::wdDoNotSaveChanges)

   [System.Runtime.InteropServices.Marshal]::ReleaseComObject($BuiltinProperties) | Out-Null

   [System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null

   Remove-Variable -Name document, BuiltinProperties

   }

 When I have completed processing the collection of documents, I release the Word.Application COM object and call garbage collection. This code is shown here.

$application.quit()

[System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null

Remove-Variable -Name application

[gc]::collect()

[gc]::WaitForPendingFinalizers() 

Using the returned objects

One reason for returning an object is that it allows for grouping, sorting, and for further processing. I could have written everything in a function, but it works just as well as a script. For example, when I run the script, it returns the following objects.

PS C:> C:dataScriptingGuys2012HSG_7_30_12Get-SpecificDocumentProperties.ps1

Path            : C:fsoHSG-7-23-12.docx

Number of words : 1398

Number of pages : 4

Author          : edwils

Keywords        :

Title           :

 

Path            : C:fsoHSG-7-24-12.docx

Number of words : 1035

Number of pages : 4

Author          : edwils

Keywords        : guest blogger, powershell

Title           :  

Because the objects return from the script, I can search the output and find only documents that contain the word “guest blogger” as shown here.

PS C:> C:dataScriptingGuys2012HSG_7_30_12Get-SpecificDocumentProperties.ps1 | where keywords -match “guest blogger”

 

Path            : C:fsoHSG-7-24-12.docx

Number of words : 1035

Number of pages : 4

Author          : edwils

Keywords        : guest blogger, powershell

Title           : 

It is even possible to modify the way the output appears and to split only the file name from the remainder of the path. This is shown here.

PS C:> C:dataScriptingGuys2012HSG_7_30_12Get-SpecificDocumentProperties.ps1 | sort “number of words” -Descending | select @{LABEL=”file”;EXPRESSION={split-path $_.path -Leaf}}, “number of words”, author, keywords | ft -AutoSize

 

file             Number of words Author Keywords                

—-             ————— —— ——–                

HSG-7-23-12.docx            1398 edwils                          

HSG-7-27-12.docx            1208 edwils                         

HSG-8-2-11.docx             1206 edwils                         

hsg-9-28-11.docx            1131 edwils                         

HSG-7-24-12.docx            1035 edwils guest blogger, powershell

HSG-8-1-11.docx              963 edwils                         

HSG-7-25-12.docx             882 edwils                         

HSG-7-26-12.docx             848 edwils                         

 

PS C:>  

The complete Get-SpecificDocumentProperties.ps1 script is on the Scripting Guys Script Repository. 

Join me tomorrow when I will talk about programmatically assigning values to the Word documents.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

Author

The "Scripting Guys" is a historical title passed from scripter to scripter. The current revision has morphed into our good friend Doctor Scripto who has been with us since the very beginning.

0 comments

Discussion are closed.

Feedback