October 13th, 2009

Hey, Scripting Guy! How Do I Count the Number of Pages in a Group of Office Word Documents?

Bookmark and Share

Hey, Scripting Guy! Question

Hey, Scripting Guy! I am so mad right now, I could just about scream. In fact, if you don’t mind, I believe I will scream. We have this stupid printer at work that I have to use to print out a very important report. This printer is rather slow when printing out these reports; it prints out about 30 pages a minute. It has a hopper that holds 10 reams of paper, which is 5,000 sheets.

Why am I telling you all this? Well, when I print out the monthly report, it takes several hours to print. I generally go and fill up the hopper with all the paper it will hold, go back to my desk, and start the print job and go home. When I come back the next day, the reports are ready if I am lucky. Tying up the printer during the day is not a real option, and I really do not want to have to stay late and babysit the machine. Why am I mad, you may ask? The last two months in a row the stupid printer has run out of paper. It seems that some of the partners have been adding extra stuff to their monthly reports (probably trying to explain their poor performance, but that is a different story). Is there a script I can use that will tell me the page count of all the Microsoft Word documents in a folder? I could then run that script and tell if I needed to reduce the number of copies of the report I was making.

— KR

Hey, Scripting Guy! Answer

Hello KR,

Microsoft Scripting Guy Ed Wilson here. Well the sun is shining here in Charlotte, NC in the United States. For the last couple of days, when I woke up to go running, I thought I was back in Edinburgh, Scotland. It may have been the cool thick morning air, or maybe it is the time of year. The last time I was in Edinburgh, it was during Halloween (which is a cool time to visit that city by the way). Anyway, KR, I am listening to Derek Trucks on my Zune, thinking about the last time I was in Edinburgh, and looking through your e-mail to scripter@microsoft.com. I was writing the Microsoft Press book, Windows PowerShell Step by Step, during my trip to Edinburgh and was reminded of the book-writing process by your e-mail. One of the things a writer is always concerned about is the number of pages that will be in the final document.

KR, I decided to write a Windows PowerShell script for you that will list the number of pages for each Word document in a folder. The complete GetPageCountOfWordDocs.ps1 script is seen here.

GetPageCountOfWordDocs.ps1

Function Set-Variables
{
 $folderpath = “c:fso*”
 $fileTypes = “*.docx”,”*doc”
 $confirmConversion = $false
 $readOnly = $true
 $addToRecent = $false
 $passwordDocument = “password”
 $pageCountFile = “C:fsoPageCount.csv”
 $numberOfPages = 0
 Set-OutputFile
} #end Set-Variables

Function Set-OutputFile
{
 if(Test-Path -path $pageCountFile)
   { Remove-Item -path $pageCountFile }
 “name,pageCount” >> $pageCountFile
 Get-WordDocuments
} #end Set-OutputFile

Function Get-WordDocuments
{
  “Counting pages in Word Docs in $folderPath”
 $word = New-Object -ComObject word.application
 $word.visible = $false
 Get-ChildItem -path $folderpath -include $fileTypes |
 foreach-object `
  {
   $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
   $doc = $word.documents.open($_.fullname, $confirmConversion, $readOnly,
   $addToRecent,   $passwordDocument)
   $window = $doc.ActiveWindow
   $panes = $window.Panes
   $pane = $Panes.item(1)
     $($_.name), $($pane.pages.count)”  >> $pageCountFile
   $doc.close()
  } #end Foreach-Object
 $word.Quit()
 Get-pageCount
} #end Get-WordDocuments

Function Get-pageCount
{
 $wdcsv = import-csv -path $pageCountFile
 for ($i = 0 ; $i -le $wdcsv.length -1 ; $i++)
 {
  $numberOfPages += [int32]$wdcsv[$i].pageCount
 }
 $numberOfPages
} #end Get-pageCount

# *** Entry Point to Script ***

Set-Variables

<

p style=”MARGIN: 0in 0in 8pt” class=”MsoNormal”>In Microsoft Office Word 2007 and in Microsoft Word 2010, the number of pages in a document is displayed in the lower left corner of the document. The information can also be obtained from the review tab by clicking the Word Count button.

The Word Count dialog box is shown here:

Image of Word Count dialog box

But, KR, as you pointed out, if you have several Microsoft Word documents you will need to script the page count total. The GetPageCountOfWordDocs.ps1 script is based upon the CountWordsInWord.ps1 from yesterday’s Hey, Scripting Guy! post.

Because the GetPageCountOfWordDocs.ps1 script performs the same basic steps, the same functions will be used. The first function is the Set-Variables function. The only changes to the function were to rename a couple of variables. The first new variable name is $pageCountFile, which contains the path to the file that will be created to hold the page counts of each Microsoft Word document. The second new variable name is the $numberOfPages variable that is used to hold the number of pages in the Word document. The Set-Variables function is shown here.

Function Set-Variables
{
 $folderpath = “c:fso*”
 $fileTypes = “*.docx”,”*doc”
 $confirmConversion = $false
 $readOnly = $true
 $addToRecent = $false
 $passwordDocument = “password”
 $pageCountFile = “C:fsoPageCount.csv”
 $numberOfPages = 0
 Set-OutputFile
} #end Set-Variables

The Set-OutputFile function is the same as the Set-OutputFile function used in the GetPageCountOfWordDocs.ps1 script. The only changes were changing variable names and the header names for the CSV file. The variable $pageCountFile holds the path to the CSV file that gets created to hold the page count data. The column header values for the output CSV file are name and pageCount. The Set-OutputFile function is seen here:

Function Set-OutputFile
{
 if(Test-Path -path $pageCountFile)
   { Remove-Item -path $pageCountFile }
 “name,pageCount” >> $pageCountFile
 Get-WordDocuments
} #end Set-OutputFile

The Get-WordDocuments function begins by creating the word.application COM object. This object was discussed in yesterday’s Hey Scripting Guy! post as well. The Get-ChildItem cmdlet returns a collection of the Microsoft Word documents from the folder, and pipes the resulting objects to the ForEach-Object cmdlet. Each document is opened by using the open method from the document object. All this was also discussed in yesterday’s Hey Scripting Guy! post.

The part of the Get-WordDocuments function that is changed is the code that retrieves the page count. The first thing you need to do is to retrieve a window object. You can obtain a b object by accessing the ActiveWindow property of the document object. This is seen here:

$window = $doc.ActiveWindow

After you have a window object, you retrieve a panes collection by accessing the panes property of the window object. This is seen here:

$panes = $window.Panes

You use the item method from the panes collection to retrieve a pane object. This code is shown here:

$pane = $Panes.item(1)

After you have a pane object, you use the pages property to retrieve a pages collection. The pages collection has the count property that will tell you how many pages are in the document. These commands are gathered together and the page count is piped to the CSV file. This is seen here:

  $($_.name), $($pane.pages.count)”  >> $pageCountFile

The complete Get-WordDocuments function is seen here.

Function Get-WordDocuments
{
  “Counting pages in Word Docs in $folderPath”
 $word = New-Object -ComObject word.application
 $word.visible = $false
 Get-ChildItem -path $folderpath -include $fileTypes |
 foreach-object `
  {
   $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf(“.”))
   $doc = $word.documents.open($_.fullname, $confirmConversion, $readOnly,
   $addToRecent,   $passwordDocument)
   $window = $doc.ActiveWindow
   $panes = $window.Panes
   $pane = $Panes.item(1)
     $($_.name), $($pane.pages.count)”  >> $pageCountFile
   $doc.close()
  } #end Foreach-Object
 $word.Quit()
 Get-pageCount
} #end Get-WordDocuments

The Get-pageCount function is basically the same as the Get-wordCount function from yesterday’s Hey, Scripting Guy! post. The only change is the variable names. The $pageCountFile variable refers to the CSV file that was created in the Get-WordDocuments function. The $numberOfPages variable is used to collect the total number of pages in the collection of Word documents. The Get-pageCount function is seen here:

Function Get-pageCount
{
 $wdcsv = import-csv -path $pageCountFile
 for ($i = 0 ; $i -le $wdcsv.length -1 ; $i++)
 {
  $numberOfPages += [int32]$wdcsv[$i].pageCount
 }
 $numberOfPages
} #end Get-pageCount

Well, KR, thanks for a cool question. As you can see, the technique for obtaining the page count from a collection of Word documents is similar to the technique for gathering the number of words. Unfortunately, it is not as straight forward as querying the value of a different property. Join us tomorrow as Word Week continues.

If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson and Craig Liebendorfer, Scripting Guys

 

Author

0 comments

Discussion are closed.