March 24th, 2013

Weekend Scripter: Convert Word Documents to PDF Files with PowerShell

Doctor Scripto
Scripter

Summary: Windows PowerShell MVP, Sean Kearney, talks about using Windows PowerShell to convert Word documents to PDF files en-masse.

Microsoft Scripting Guy, Ed Wilson, is here. Today’s blog is brought to you by Windows PowerShell MVP and honorary Scripting Guy, Sean Kearney.

Previous blog posts by Sean Kearney

Take it away Sean…

My boss looks up at me today, and sighs, “I love the built-in SaveAs PDF in Word 2013. But I want to do multiple documents at the same time. Oh, if ONLY there was some way to do this in bulk.”

The wires and lights starting blinking in my head. I’m about to leap out of my chair because I hear “Bulk.” Many to do, repeatable, and…

POWERSHELL!

So accessing a file in Microsoft Word programmatically is quite easy. We’ve been doing it for years.

$Word=NEW-OBJECT –COMOBJECT WORD.APPLICATION

$Doc=$Word.Documents.Open(“C:\Foofile.docx”)

And along the same lines, we could save this same file in the following manner.

$Doc.saveas([ref] “C:\Foofile.docx”)

If we would like to save it in an alternate format, like in .pdf format, things get a wee bit fancier. We have to speak a bit of .NET.

$Doc.saveas([ref] “C:\Foofile.pdf”, [ref] 17)

If your brain didn’t pop out just now, you’re OK. Let’s get a little fancier. What if I want Microsoft Word to save that PDF file with the same name as the parent without knowing the name?

Now we’re stepping into the land of fun. We can access the file name of that single document in the following way. Three of the available properties in the Word object are the Name of the document, the Path to the document, and the FullName path. I poked out using the following cmdlet to…well, to be honest…to guess.

$Doc | get-member –membertype property *Name*

Image of command output

Note the two little nuggets. In poking about and using a similar search for Path,  I found the property holding its path.

$Doc | get-member –membertype property *Path*

So I could do something like this: Open some file, get the file name information, swap out .docx with .pdf, and then save it.

Image of command output

With these three bits of information, I don’t have to actually know  the file name. Word will tell me based on the document. But let’s simplify this down a bit. Let’s just take a known file name and resave it as a PDF file with the same file name. All we need to do is run a Replace() method on the provided file name, and swap .docx with .pdf.

Yes….we’re presuming it’s a .pdf, J.

$File=”C:\Foofolder\foofile.docx”

$Word=NEW-OBJECT –COMOBJECT WORD.APPLICATION

$Doc=$Word.Documents.Open($File)

$Doc.saveas([ref] (($File).replace(“docx”,”pdf”)), [ref] 17)

$Doc.close()

So that’s all fine and dandy. But what if we have a folder of DOCX files that we want to convert at once?

Let’s pretend we ran the following lines in Windows PowerShell:

$File=”C:\Foofolder\foofile.docx”

$Files=GET-CHLDITEM ‘C:\FooFolder\*.DOCX’

Now we could cheat and access the full file name and path by accessing the FullName from the drive system. But let’s have some fun with Word and ask it these questions.

Foreach ($File in $Files) {

    # open a Word document, filename from the directory

    $Doc=$Word.Documents.Open($File.fullname)

Now let’s ask Word what is the name of the file. And while we’re at it, let’s swap the .docx file extension with .pdf.

  $Name=($Doc.Fullname).replace(“docx”,”pdf”)

Then we’ll access that file within Microsoft Word and use the built-in SaveAs PDF option in Word 2010 or Word 2013 to produce a PDF file in the same folder as the original Word document. When done, we close the file.

    $Doc.saveas([ref] $Name, [ref] 17)

    $Doc.close()

}

So when done our script will look like this:

# Acquire a list of DOCX files in a folder

$Files=GET-CHLDITEM ‘C:\FooFolder\*.DOCX’

 

Foreach ($File in $Files) {

    # open a Word document, filename from the directory

    $Doc=$Word.Documents.Open($File.fullname)

 

    # Swap out DOCX with PDF in the Filename

$Name=($Doc.Fullname).replace(“docx”,”pdf”)

 

    # Save this File as a PDF in Word 2010/2013

    $Doc.saveas([ref] $Name, [ref] 17)

    $Doc.close()

}

So why the big secrecy about using the document names in Word itself? In the future, I’ll show you how to programmatically trigger a mail merge, and you’ll see where it’s needed there.

Cheers and remember to keep on scriptin’.

~Sean,
The Energized Tech

Thanks, Sean, for taking the time to share your scripting expertise with us today. Join me tomorrow when I have a guest blog post from Ingo Karstein about using Windows PowerShell with SharePoint. It is cool and you do not want to miss it.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy 

Author

The "Scripting Guys" is a historical title passed from scripter to scripter. The current revision has morphed into our good friend Doctor Scripto who has been with us since the very beginning.

2 comments

Discussion is closed. Login to edit/delete existing comments.

  • Bá»™ Nguyá»…n

    I have a question,

    If I call the script multiple times with multiple files at the same time, (about 10s to generate 1pdf from 1 docx)-> Does Word App run in parallel or synchonized?

  • ä¿Šå»· 劉

    May I ask how to automatic print multiple files (txt/html web page/jpg/docx/xlsx) to “Microsoft Print to PDF” Printer ?
    I can’t automatic assign file name to each file
    Get-ChildItem -Path “D:\Desktop\Data\” -Recurse | ForEach-Object{# Default Printer is “Microsoft Print to PDF”Start-Process -FilePath $_.FullName -Verb Print}