{"id":54913,"date":"2008-11-12T11:19:00","date_gmt":"2008-11-12T11:19:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2008\/11\/12\/hey-scripting-guy-how-can-i-convert-word-files-to-pdf-files\/"},"modified":"2008-11-12T11:19:00","modified_gmt":"2008-11-12T11:19:00","slug":"hey-scripting-guy-how-can-i-convert-word-files-to-pdf-files","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-how-can-i-convert-word-files-to-pdf-files\/","title":{"rendered":"Hey, Scripting Guy! How Can I Convert Word Files to PDF Files?"},"content":{"rendered":"<p><H2><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" height=\"34\" alt=\"Hey, Scripting Guy! Question\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"> <\/H2>\n<P>Hey, Scripting Guy! I have a folder full of Word documents. The <A href=\"http:\/\/dictionary.reference.com\/browse\/pointy%20headed\">pointy-headed boss<\/A> (PHB) is being particularly <A href=\"http:\/\/dictionary.reference.com\/browse\/obtuse\">obtuse<\/A> this week, and has decided that he would like to have all the Word documents turned into PDF files so everyone can view them. This is despite the fact that a couple of months ago he had us deploy <A href=\"http:\/\/www.microsoft.com\/downloads\/details.aspx?FamilyID=3657ce88-7cfa-457a-9aec-f4f827f20cac&amp;displaylang=en\">Word viewer<\/A>, so everyone can view the documents. Geez, one of these days, one of these days. Luckily, we have deployed <A href=\"http:\/\/office.microsoft.com\/en-us\/downloads\/default.aspx?ofcresset=1\">Word 2007<\/A>, and have also deployed the <A href=\"http:\/\/www.microsoft.com\/downloads\/details.aspx?FamilyId=4D951911-3E7E-4AE6-B059-A2E79ED87041&amp;displaylang=en\">PDF add-in<\/A>. But still, there are thousands of files in there, and it will take at least two weeks to open and save and close all those files. Any ideas? <BR><BR>&#8211; MS<\/P><IMG height=\"5\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" border=\"0\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" height=\"34\" alt=\"Hey, Scripting Guy! Answer\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\"> \n<P>Hi MS,<\/P>\n<P>I have a great idea. Tell the PHB that it will take you at least two weeks to do this project. Bring him into your office, demonstrate creating and saving one file, and then tell him that due to the importance of the project you should not be disturbed. Go spend two weeks on the golf course. Return to work tanned and relaxed, and then run this&nbsp;script:<\/P><PRE class=\"codeSample\">$wdFormatPDF = 17\n$word = New-Object -ComObject word.application\n$word.visible = $false\n$folderpath = &#8220;c:\\fso\\*&#8221;\n$fileTypes = &#8220;*.docx&#8221;,&#8221;*doc&#8221;\nGet-ChildItem -path $folderpath -include $fileTypes |\nforeach-object `\n{\n $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf(&#8220;.&#8221;))\n &#8220;Converting $path to pdf &#8230;&#8221;\n $doc = $word.documents.open($_.fullname)\n $doc.saveas([ref] $path, [ref]$wdFormatPDF)\n $doc.close()\n}\n$word.Quit()<\/PRE>\n<P>To begin with, we have a folder such as the one seen in this image. The folder contains .doc files, .docx files, .txt files, and other files:<\/P><IMG height=\"399\" alt=\"Image of a folder with a variety of files\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/qanda\/hsg\/hey1112\/hsg_pdf1.jpg\" width=\"450\" border=\"0\"> \n<P>&nbsp;<\/P>\n<P>The first thing we do in the script is create a variable that we will use to tell the <B>Word Document<\/B> object how to save our file. The value is <B>17<\/B> for a .pdf file. This is from the Word <A href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/bb238158.aspx\">enumeration value<\/A> called <B>wdSaveFormat<\/B>. The enumeration name for a .pdf file is <B>wdFormatPDF<\/B>, so we decided to use that as our variable name: <\/P><PRE class=\"codeSample\">$wdFormatPDF = 17<\/PRE>\n<P>When we have created our variable to hold the enumeration value, we need to create an instance of the <B>Word application<\/B> object. This is the main object and is one that we always need to create if we are working with Word. To create the <B>Word application<\/B> object, we use the <B>New-Object<\/B> cmdlet to specify that it is a COM object and save the resulting object into the <B>$word<\/B> variable. This is illustrated here: <\/P><PRE class=\"codeSample\">$word = New-Object -ComObject word.application<\/PRE>\n<P>We do not want to make the Word application visible because we will be iterating through a collection of many documents. To allow the Word application to be invisible, we set the <B>visible<\/B> property to <B>false<\/B>. This is seen here: <\/P><PRE class=\"codeSample\">$word.visible = $false<\/PRE>\n<P>The folder path that contains the .doc and .docx files we want to convert to .pdf files is assigned here as a string. The thing to keep in mind is that we need to include the trailing backslash and star: <B>\\*<\/B>. This is required by the <B>Get-ChildItem<\/B> cmdlet and is seen here: <\/P><PRE class=\"codeSample\">$folderpath = &#8220;c:\\fso\\*&#8221;<\/PRE>\n<P>We now create an array of file extensions that we will pass to the <B>Get-ChildItem<\/B> cmdlet. We are interested in .docx and .doc files and we include each of them in a set of quotation marks. We need to include the <B>*<\/B> wild card character to tell <B>Get-ChildItem<\/B> we are interested in any .doc or .docx files. This is shown here:<\/P><PRE class=\"codeSample\">$fileTypes = &#8220;*.docx&#8221;,&#8221;*doc&#8221;<\/PRE>\n<P>Next we call the <B>Get-ChildItem<\/B> cmdlet and tell it to file all the file types we stored in the <B>$fileTypes<\/B> variable, and to search the path we stored in the <B>$folderPath<\/B> variable. The <B>Get-ChildItem<\/B> cmdlet returns a collection of items. We take this collection of items (Word documents in this example) and pass them into the pipeline. The bar (<B>|<\/B>) character is used for the pipeline character. This code can be seen here: <\/P><PRE class=\"codeSample\">Get-ChildItem -path $folderpath -include $fileTypes |<\/PRE>\n<P>The <B>ForEach-Object<\/B> cmdlet is used to examine items as they come streaming through the pipeline. The way that it returns a single item can be thought of as working in a similar fashion as the <B>Foreach<\/B> statement. The backtick character is used for line continuation. This is shown here: <\/P><PRE class=\"codeSample\">foreach-object `<\/PRE>\n<P>Now we want to retrieve the path to the file, the file name but not the file extension. The reason we do not want the file extension is that we are going to save the file as a .pdf file. This code is a little <A href=\"http:\/\/dictionary.reference.com\/browse\/hairy\">hairy<\/A>. Let&#8217;s take it kind of slow. The <B>$_<\/B> is an automatic variable that refers to the current item on the pipeline. This is like your enumerator when you use the <B>Foreach<\/B> statement in either VBScript or in Windows PowerShell. We are interested in the <B>fullname<\/B> property because it includes both the path and the entire <B>filename<\/B>. Substring is a string method, and it takes two parameters. The first one is the position to start looking. We use 0, which means to start at the beginning. We then use parentheses to group the next operation. We again use <B>$_<\/B>, which is the current item on the pipeline, and the <B>fullname<\/B> property, which includes both the path and the file name. Okay, that part is just like the previous part. We now use the <B>lastIndexOf<\/B> string method, which will return a number that represents where it found the last occurrence of a character. We are looking for the last occurrence of the period, which is used to offset the file extension. When we have the location where that period occurs, we use that number to tell the <B>substring<\/B> method how many letters to return from our string. This code is seen here: <\/P><PRE class=\"codeSample\">{ $path =  ($_.fullname).substring(0,($_.FullName).lastindexOf(&#8220;.&#8221;))<\/PRE>\n<P>We print out a friendly message to let us know that the script is actually doing something. The results of the friendly message are seen in the image just below. And the code for creating the friendly message is seen here: <\/P><PRE class=\"codeSample\">&#8220;Converting $path to pdf &#8230;&#8221;<\/PRE><IMG height=\"223\" alt=\"Image of the friendly message\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/qanda\/hsg\/hey1112\/hsg_pdf2.jpg\" width=\"450\" border=\"0\"> \n<P>&nbsp;<\/P>\n<P>Next we need to open the Word document. To do this, we use the <B>documents<\/B> property to return a <B>documents<\/B> object, and use the <B>open<\/B> method from the <B>documents<\/B> object. It takes the path to the file; therefore, we give it the <B>$_<\/B> variable, which is the current item on the pipeline, and the <B>fullname<\/B> property, which includes both the file name and the path. We store the resulting <B>document<\/B> object in the <B>$doc<\/B> variable. This is shown here: <\/P><PRE class=\"codeSample\">$doc = $word.documents.open($_.fullname)<\/PRE>\n<P>Next we call the <B>saveas<\/B> method from the <B>document<\/B> object. The <B>saveas<\/B> method takes a large number of parameters. Luckily, we only need to use the first two. The path and the document format are all we need to supply. The <B>saveas<\/B> method requires the parameters to be passed as a <B>reference<\/B> type. We use the <B>[ref]<\/B> type to force the <B>saveas<\/B> method to accept our string values. This is seen here: <\/P><PRE class=\"codeSample\">$doc.saveas([ref] $path, [ref]$wdFormatPDF)<\/PRE>\n<P>Because we have saved the document as a .pdf file, we now want to close the document. To do this, we use the <B>close<\/B> method from the <B>document<\/B> object that is stored in the <B>$doc<\/B> variable. After the document is closed and we are done working through the collection of files that were found by the <B>Get-ChildItem<\/B> cmdlet, it is time to close the Word application as well. To do this, we use the <B>quit<\/B> method from the <B>Word application<\/B> object that we stored in the <B>$word<\/B>&nbsp;variable: <\/P><PRE class=\"codeSample\">$doc.close()\n}\n$word.Quit()<\/PRE>\n<P>You can now see the folder in the image just below, which contains our newly created .pdf files. One .pdf file is created for each of the .doc or .docx files that resides in the folder:<\/P><IMG height=\"399\" alt=\"Image of the folder with newly created .pdf files\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/qanda\/hsg\/hey1112\/hsg_pdf3.jpg\" width=\"450\" border=\"0\"> \n<P>&nbsp;<\/P>\n<P>MS, that is all there is to creating .pdf files from both .doc and .docx files. I recommend you put your two weeks of free time to good use, such as with scuba diving or wood working. Or perhaps you could learn how to make the perfect <A href=\"http:\/\/en.wikipedia.org\/wiki\/Croque_monsieur\">croque monsieur<\/A>, travel to Paris, and meet the person of your dreams. Or not. See you tomorrow. <A href=\"http:\/\/en.wikipedia.org\/wiki\/Batman_%28TV_series%29\">Same bat time, same bat channel<\/A>.<\/P>\n<P><FONT class=\"Apple-style-span\" face=\"Verdana\" size=\"3\"><SPAN class=\"Apple-style-span\"><B><B>Ed Wilson and Craig Liebendorfer, Scripting Guys<\/B><\/B><\/SPAN><\/FONT><\/P><FONT class=\"Apple-style-span\" face=\"Verdana\" size=\"3\"><B><\/B><\/FONT><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I have a folder full of Word documents. The pointy-headed boss (PHB) is being particularly obtuse this week, and has decided that he would like to have all the Word documents turned into PDF files so everyone can view them. This is despite the fact that a couple of months ago he [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[84,49,3,45],"class_list":["post-54913","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-microsoft-word","tag-office","tag-scripting-guy","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I have a folder full of Word documents. The pointy-headed boss (PHB) is being particularly obtuse this week, and has decided that he would like to have all the Word documents turned into PDF files so everyone can view them. This is despite the fact that a couple of months ago he [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/54913","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=54913"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/54913\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=54913"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=54913"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=54913"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}