April 13th, 2005

How Can I Eliminate Duplicate Names in a Text File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a text file that contains a bunch of names. How can I read through that text file and eliminate all the duplicate names?

— MW

SpacerHey, Scripting Guy! AnswerScript Center

Hey, MW. We’re assuming you have a text file that looks something like this:

Ken Myer
Dean Tsaltas
Jonathan Haas
Ken Myer
Dean Tsaltas
Syed Abbas
Gail Erickson
Carol Phillips
Dean Tsaltas
Dylan Miller
Kim Abercrombie
Dylan Miller

As you can see, there are a number of duplicate names in this list. For example, you have three Dean Tsaltas’. We actually know Dean Tsaltas and – take it from us – one Dean Tsaltas is enough for any organization! But how can you get rid of all those duplicate names?

Well, to tell you the truth there are several different ways. We decided to use the Script Runtime’s Dictionary object because we thought that, all things considered, it was the simplest and easiest approach. What we’re going to do is read in the names from the text file, one-by-one. We’ll read in the first name – Ken Myer – and then store that name in the Dictionary. We’ll then read in the second name but – before we add it to the Dictionary – we’ll check to see if the name is already in the Dictionary. If it’s not, we’ll add it. If it is in the Dictionary then we won’t add it and, instead, we’ll just go ahead and read in the third name. When we’re done, our Dictionary will contain a list of all the unique names, no duplicates.

Here’s a script that will echo back the unique names (we’ll modify this script in a minute so that it writes the unique names back to the text file):

Const ForReading = 1

Set objDictionary = CreateObject(“Scripting.Dictionary”) Set objFSO = CreateObject(“Scripting.FileSystemObject”)

Set objFile = objFSO.OpenTextFile _ (“c:\scripts\namelist.txt”, ForReading)

Do Until objFile.AtEndOfStream strName = objFile.ReadLine If Not objDictionary.Exists(strName) Then objDictionary.Add strName, strName End If Loop

objFile.Close

For Each strKey in objDictionary.Keys Wscript.Echo strKey Next

We begin by defining a constant – For Reading – that we need when we open the text file. Next we create instances of two different objects: the Dictionary object and the FileSystemObject. Finally, we use the OpenTextFile method to open the file C:\Scripts\Namelist.txt for reading.

Now we’re ready to have some fun. What we do next is set up a Do Loop that enables us to read the text file line-by-line (and, by extension, name-by-name). Each time we read in a line, we store that value in a variable named strName. We then use this line of code to see if that particular name is already in the Dictionary:

If Not objDictionary.Exists(strName) Then

Yes, kind of clumsy syntax, but this can be read as “If the name stored in the variable strName does not exist in the Dictionary, then do the following line of code.” In that following line of code, we simply add that name to the Dictionary, using the value of strName for both the item value and item key. (For more information about the Dictionary object, see this section of the Microsoft Windows 2000 Scripting Guide.) If the name does exist in the Dictionary then we simply loop around and read in the next line. This continues until we’ve processed each line in the text file.

What does that give us? Well, it gives us a Dictionary that contains a set of keys, each key representing a unique name. If we want to see those names we can simply loop through the Keys collection and echo the value of each Dictionary key:

For Each strKey in objDictionary.Keys
    Wscript.Echo strKey
Next

Child’s play.

Of course, you didn’t necessarily want to see the unique names, you wanted to delete duplicate names from your text file. That’s easy enough. Our Dictionary object now contains the unique file names, with all the duplicate names deleted. To remove the duplicate names from the file itself all we have to do is open that file (for writing this time) and replace the existing contents with our Dictionary keys. Because the Dictionary keys represent the unique names writing those keys to the text file will give us a text file with unique names, no duplicates.

Here’s a modified script that writes the Dictionary keys back to the text file:

Const ForReading = 1
Const ForWriting = 2

Set objDictionary = CreateObject(“Scripting.Dictionary”)

Set objFSO = CreateObject(“Scripting.FileSystemObject”) Set objFile = objFSO.OpenTextFile _ (“c:\scripts\namelist.txt”, ForReading)

Do Until objFile.AtEndOfStream strName = objFile.ReadLine If Not objDictionary.Exists(strName) Then objDictionary.Add strName, strName End If Loop

objFile.Close

Set objFile = objFSO.OpenTextFile _ (“c:\scripts\namelist.txt”, ForWriting)

For Each strKey in objDictionary.Keys objFile.WriteLine strKey Next

objFile.Close

If you run the script and then open the text file you should see something like this:

Ken Myer
Dean Tsaltas
Jonathan Haas
Syed Abbas
Gail Erickson
Carol Phillips
Dylan Miller
Kim Abercrombie

See? We’re down to one Dean Tsaltas. Now if we can just figure out a way to get rid of that Dean Tsaltas, then we’d have something. (Just kidding, Dean.)

Author

0 comments

Discussion are closed.