June 8th, 2007

How Can I Get a List of All the PDF Files in a Folder and Its Subfolders?

Hey, Scripting Guy! Question

Hey, Scripting Guy! How can I get a list of all the .PDF files (only the file names; I don’t want the file path or file extension) in a folder and its subfolders, and then put that list into a text file?

— PS

SpacerHey, Scripting Guy! AnswerScript Center

Hey, PS. You know, PS, this is actually a hard question; far harder and far more complicated than it might sound. On top of that, the Scripting Guy who writes this column is on vacation; he doesn’t really have time to be bothered with stuff like this.

Well, OK, technically he’s not really on vacation; technically he’s attending – and working at – TechEd 2007. But, come on PS. The Scripting Guy who writes this column is in Orlando; you know, Disney World, Universal Studios, Sea World. The truth is, there’s no way he’s going to spend his time in Orlando writing scripts that can retrieve a list of all the PDF files in a folder and its subfolders. Sorry, but it’s just not going to happen.

Or at least we didn’t think it would happen. By an astonishing coincidence, however, Epcot Center just opened a brand-new attraction: Dr. Scripto’s Wild Ride. And guess what happens when you ride this new ride? That’s right: you get a queasy feeling in your stomach. However, you also get a script that can retrieve a list of all the PDF files in a folder and its subfolders (it’s kind of a weird ride):

strComputer = “.”

Set objWMIService = GetObject(“winmgmts:\\” & strComputer & “\root\cimv2”)

strFolderName = “C:\Test”

Set objFSO = CreateObject(“Scripting.FileSystemObject”) Set objTextFile = objFSO.CreateTextFile(“C:\Scripts\Test.txt”)

Set colSubfolders = objWMIService.ExecQuery _ (“Associators of {Win32_Directory.Name='” & strFolderName & “‘} ” _ & “Where AssocClass = Win32_Subdirectory ” _ & “ResultRole = PartComponent”)

Set colFiles = objWMIService.ExecQuery _ (“ASSOCIATORS OF {Win32_Directory.Name='” & strFolderName & “‘} Where ” _ & “ResultClass = CIM_DataFile”)

For Each objFile in colFiles If objFile.Extension = “pdf” Then objTextFile.WriteLine objFile.FileName End If Next

For Each objFolder in colSubfolders GetSubFolders strFolderName Next

Sub GetSubFolders(strFolderName)

Set colSubfolders2 = objWMIService.ExecQuery _ (“Associators of {Win32_Directory.Name='” & strFolderName & “‘} ” _ & “Where AssocClass = Win32_Subdirectory ” _ & “ResultRole = PartComponent”)

For Each objFolder2 in colSubfolders2 strFolderName = objFolder2.Name

Set colFiles = objWMIService.ExecQuery _ (“ASSOCIATORS OF {Win32_Directory.Name='” & strFolderName & “‘} Where ” _ & “ResultClass = CIM_DataFile”)

For Each objFile in colFiles If objFile.Extension = “pdf” Then objTextFile.WriteLine objFile.FileName End If Next

GetSubFolders strFolderName Next End Sub

We tried to warn you: this is a complicated procedure, far more complicated than it probably should be. That’s because neither WMI nor the FileSystemObject provide a simple, straightforward way to work with folders and subfolders. (This, by the way, is one area where Windows PowerShell represents a big improvement over previous technologies.) Because of that, we need to use a recursive function in order to get at the files in a folder, the files in that folder’s subfolders, and the files in the sub-subfolders of those subfolders. We don’t have time to explain recursion in any detail today. (Did we mention that we’re on vacation?) However, you can get an overview of recursion and recursive functions by taking a peek at the Microsoft Windows 2000 Scripting Guide.

All right, let’s get down to business and try to get through this before the amusement parks close for the day. We start out by connecting to the WMI service on the local computer. (Although by assigning a computer name to the variable strComputer we could just as easily run this script against a remote machine.) We then assign the parent folder name to the variable strFolderName. Because we want to look at the folder C:\Test (as well as all its subfolders) we assign strFolderName the value C:\Test:

strFolderName = “C:\Test”

Makes sense, right? And because this is as good a time as any, we next create an instance of the Scripting.FileSystemObject and use the CreateTextFile method to create a text file named C:\Scripts\Test.txt:

Set objFSO = CreateObject(“Scripting.FileSystemObject”)
Set objTextFile = objFSO.CreateTextFile(“C:\Scripts\Test.txt”)

That brings us to this line of code:

Set colSubfolders = objWMIService.ExecQuery _
    (“Associators of {Win32_Directory.Name='” & strFolderName & “‘} ” _
        & “Where AssocClass = Win32_Subdirectory ” _
            & “ResultRole = PartComponent”)

What we’re doing here is using an Associators Of query to get a list of all the subfolders of the folder C:\Test. It might not look like it, but what our query is really saying this: Get me a list of all the items associated with the directory C:\Test, provided that those items are subdirectories (Where AssocClass = Win32_Subdirectory).

When all is said and done this query brings back a list of all the top-level subfolders in C:\Test; for example, C:\Test\Folder1 and C:\Test\Folder2. What it doesn’t bring back are any second-level folders; for example, we won’t get back a folder like C:\Test\Folder1\SubfolderA. To get at these sub-subfolders (that is, subfolders of a subfolder) we need to use a recursive query. That’s what the subroutine GetSubFolders is for. Programmatically, we pass this subroutine the name of each subfolder we find (for example, C:\Test\Folder1 and C:Test\Folder2); in turn, the subroutine queries each of these subfolders for PDF files and for any sub-subfolders. If there are some sub-subfolders, the subroutine will automatically call itself and look for any sub-sub-subfolders.

Confused? To be honest, so are we. But don’t worry about it; just leave the code as-is and give it a try. To search a different folder (that is, a folder other than C:\Test) simply change the value of the variable strFolderName to the name of the folder you want to work with. For example, if you want to search C:\Windows then use this line of code:

strFolderName = “C:\Windows”

Whoa, wait a second; we aren’t done yet. Before we start calling subroutines we need to list any PDF files that happen to live in our parent folder, C:\Test. To do that, we use this line of code to return a collection of all the files (instances of the CIM_DataFile class) found in C:\Test (which is, as you recall, the value we assigned to the variable strFolderName):

Set colFiles = objWMIService.ExecQuery _
    (“ASSOCIATORS OF {Win32_Directory.Name='” & strFolderName & “‘} Where ” _
        & “ResultClass = CIM_DataFile”)

Next we use this block of code to check each file and see if it has the file extension PDF (note that, in WMI, the dot is not part of the file extension):

For Each objFile in colFiles
    If objFile.Extension = “pdf” Then
        objTextFile.WriteLine objFile.FileName 
    End If
Next

If we do have a PDF file then we use the WriteLine method to write the FileName to our text file:

objTextFile.WriteLine objFile.FileName

Needless to say, the FileName property is just the name of the file, without the path or the file extension. For example, if our first file is C:\Test\MyFile.pdf we’ll end up writing the following to the text file:

MyFile

That takes care of all the files in the parent folder, C:\Test. Now, what about files in any subfolders of C:\Test? That’s what this block of code is for:

For Each objFolder in colSubfolders
    GetSubFolders strFolderName
Next

What we’re doing here is looping through the collection of subfolders; for each subfolder in C:\Test we’re calling the recursive function GetSubFolders, passing the folder path as the sole parameter:

GetSubFolders strFolderName

This is difficult to visualize, but GetSubfolders is going to do the exact same thing we just finished doing. Let’s say that our first subfolder is C:\Test\Folder1. The subroutine is going to start out by looking for all the PDF files in C:\Test\Folder1; if it finds any it’s going to write the names of those files to the text file.

After that, GetSubFolders is going to run a query and see if C:\Test\Folder1 has any subfolders of its own. Suppose it does; what happens then? Well, in that case the subroutine is going to call itself, this time passing the name of each sub-subfolder as the subroutine parameter. Like we said, this can give you a headache just thinking about it. But suppose we have a sub-subfolder named C:\Test\Folder1\SubfolderA. The subroutine will call itself and then look for PDF files in this sub-subfolder; after that, it will check to see in SubfolderA has any sub-sub-subfolders, and then, if necessary, the subroutine will call itself yet again. This will continue until its reached the end of the line and run out of folders.

Crazy, huh? Fortunately, VBScript does all the hard work here. All you need to do is call the recursive function (which, now that we think about it, is really a recursive subroutine, isn’t it?). VBScript will take care of the rest, keeping track of which folders you’ve looked at and which folders you haven’t looked at. Amazingly enough, by the time it finishes, this script will have looked at every single folder and subfolder in C:\Test (even if these subfolders go several levels deep) and will have written the names of any and all PDF files to the text file C:\Scripts\Test.txt. If you don’t believe us (and we don’t blame you for being skeptical) give it a try and see for yourself.

Before you start making vacation plans we should add that we were just kidding: Epcot Center does not have a new attraction called Dr. Scripto’s Wild Ride. Based on today’s column, that’s probably just as well: we doubt that even Disney could afford the liability insurance for a ride this wild.

Note. By the time most people read this, TechEd 2007 will be over and the Scripting Guys will be headed for the airport. But just because Teched is over that doesn’t mean the fun is over; be sure and check out our TechEd 2007 page for new articles, TechEd updates, and your chance to win a Dr. Scripto bobblehead doll. With TechEd 2007 the fun never ends!

Author

0 comments

Discussion are closed.