July 29th, 2008

Hey, Scripting Guy! How Can I Compare Files With the Same Name But Two Different Extensions?

Hey, Scripting Guy! Question

Hey, Scripting Guy! After doing several restores we have folders with files named name.doc AND name.do (without “C”). How can we identify such pairs of files and do something to them based on a comparison?

— TN

SpacerHey, Scripting Guy! Answer

Hi TN,

Scripts like the one we’ll develop in this article are the best tools I know of for dealing with this sort of problem. Dealing with something like this manually just isn’t feasible. If the initial time investment isn’t costly enough, the resulting therapy bills will be. That said, be careful when you run these scripts. Make sure you test them on some representative samples of your ‘problem’ before you unleash them on entire servers. If you aren’t careful, you could easily worsen your situation.

With that disclaimer out of the way, here’s a script that provides a framework for taking an action on pairs of files with the same name, but different  extensions.

strComputer = "."Set objWMIService = GetObject("winmgmts:" _    & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")Set colFileList = objWMIService.ExecQuery _    ("ASSOCIATORS OF {Win32_Directory.Name='C:\Scripts\HSG\July28\A'} Where " _        & "ResultClass = CIM_DataFile")NumberOfFiles = colFileList.Count For i=0 to NumberOfFiles-1   If i+1 <= NumberOfFiles-1 Then      If colfilelist.ItemIndex(i).FileName = colfilelist.ItemIndex(i+1).FileName Then          DealWithDups colfilelist.ItemIndex(i), colfilelist.ItemIndex(i+1)      End IF   End IfNextSub DealWithDups (objFile1, objFile2)      strFile1 = objFile1.FileName & "." & objFile1.Extension      strFile2 = objFile2.FileName & "." & objFile2.Extension      WScript.Echo strFile1 & ":" & strFile2End Sub

This  first section of the script is just the boilerplate code required to get started working with WMI.

strComputer = "."Set objWMIService = GetObject("winmgmts:" _    & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")

The next section of code uses the Win32_Directory WMI class. It is an association query that retrieves a collection of instances of CIM_Datafile corresponding to all the files in the specified directory: C:\Scripts\HSG\July28\A. That last sentence is a terrible mouthful. Basically, this section of code enables us to work on all the files in a directory. What we need to work with the files is stored in the variable named colFileList.

Set colFileList = objWMIService.ExecQuery _    ("ASSOCIATORS OF {Win32_Directory.Name='C:\Scripts\HSG\July28\A'} Where " _        & "ResultClass = CIM_DataFile")

It’s in the next section of code that we introduce the logic that identifies our pairs.

NumberOfFiles = colFileList.Count For i=0 to NumberOfFiles-1   If i+1 <= NumberOfFiles-1 Then      If colFileList.ItemIndex(i).FileName = colFileList.ItemIndex(i+1).FileName Then          DealWithDups colFileList.ItemIndex(i), colFileList.ItemIndex(i+1)      End IF   End IfNext

Instead of using our tried and true For Each loop, we use a For loop. We do this so that we can use indexes i and i+1 to access two files in the colFileList collection each time through the loop. So, we end up stepping through each file in the folder, always accessing the file “to its right” as well. Since the files are returned in alphabetical order, any files that have the same name but differing extensions will be beside each other in the collection. For example, suppose you have these files in the collection:

(a.doc, a.do, b.doc, c.doc, d.doc, d.do, e.doc, f.doc)

Then (a.doc, a.do) will be compared, as will (a.do,b.doc),(b.doc,c.doc),(c.doc,d.doc),(d.doc,d.do),(d.do,e.doc) and(e.doc,f.doc). We probably shouldn’t expect Professor Knuth to send us a check for $2.56 in celebration of the elegance of this algorithm. But it does seem to get the job done.

This If statement ensures that we don’t try to test the last file in the collection against the one to the right of it (sense there isn’t one to the right of it).

If i+1 <= NumberOfFiles-1 Then

Then, this line of code determines whether the names of the pair of files being examined are the same.

If colfilelist.ItemIndex(i).FileName = colfilelist.ItemIndex(i+1).FileName Then

Many scripters are familiar with WMI classes and how to look up properties of a class. In this case, we’re using CIM_DataFile (look back at the query involving Win32_Directory) and, once you know that, it’s straightforward to look up the properties of that class and find that FileName is the one you need. Less straightforward is how to figure out that you need to use ItemIndex to access individual files in the collection.

You have to first recognize that colFileList was populated as a result of a call to ExecQuery. Looking up the reference information for this method in the WMI Scripting API Objects reference, we find that it returns something called an SWbemObjectSet. And, looking at the documentation for SWbemObjectSet, you find out that ItemIndex is the method you need to call if you want to index into the collection using an integer index. Yes, it’s a little difficult to follow. You have WMI classes on one hand and the WMI Scripting Object library on the other; so there are two different sets of WMI reference documentation that you need to use. Of course, you might also need to look up VBScript information as well. Geesh, this scripting stuff is almost like hard work.

OK, so once you’ve done the comparison to see if you’re dealing with a pair of files that have the same name (they must have different extensions–you can’t have two files named exactly the same thing in a single folder), you can then take some action on them. You might want to delete the oldest or decide which to archive based on some other criteria. The Hey, Scripting Guy archives of issues dealing with files provide a good starting point to give you some idea of what’s possible.

In this article, we’re going to just set things up by creating a subroutine called DealWithDups where you can add whatever sort of processing you might want. Here’s the line of code that calls our subroutine, passing file objects that represent each member of the pair:

DealWithDups colfilelist.ItemIndex(i), colfilelist.ItemIndex(i+1)

And here’s the subroutine. In this case we are just rebuilding the complete name of each file and displaying them both on the screen, separated by a colon.

Sub DealWithDups (objFile1, objFile2)      strFile1 = objFile1.FileName & "." & objFile1.Extension      strFile2 = objFile2.FileName & "." & objFile2.Extension      WScript.Echo strFile1 & ":" & strFile2End Sub

This lets you test that the pairs are being correctly identified. It also forces you to follow the advice we started out with–to test things before running scripts of this sort. Running the above script against a test folder will show you which pairs of files objFile1 and objFile2 actually point to.

Once you’re confident that the identified pairs are the ones you want to act on, you can add logic similar to the following to actually take action. This deletes the larger of the two files.

Sub DealWithDups (objFile1, objFile2)   If objFile1.FileSize > objFile2.FileSize Then      objFile1.Delete   End IfEnd Sub

Now, this only works on pairs of duplicates. What if you have triplets? What if you don’t know ahead of time what you have? Hey, TN only asked for pairs. If you want to get fancy, you’ve come to the wrong place.

Maybe try Professor Knuth.

Author

0 comments

Discussion are closed.