{"id":55803,"date":"2008-04-11T01:02:00","date_gmt":"2008-04-11T01:02:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2008\/04\/11\/hey-scripting-guy-how-can-i-delete-duplicate-entries-in-a-tab-delimited-file\/"},"modified":"2008-04-11T01:02:00","modified_gmt":"2008-04-11T01:02:00","slug":"hey-scripting-guy-how-can-i-delete-duplicate-entries-in-a-tab-delimited-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-how-can-i-delete-duplicate-entries-in-a-tab-delimited-file\/","title":{"rendered":"Hey, Scripting Guy! How Can I Delete Duplicate Entries in a Tab-Delimited File?"},"content":{"rendered":"<p><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" height=\"34\" alt=\"Hey, Scripting Guy! Question\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\" \/> <\/p>\n<p>Hey, Scripting Guy! How can I delete the duplicate entries in a tab-delimited file?<br \/>&#8212; ST<\/p>\n<p><img decoding=\"async\" height=\"5\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" border=\"0\" \/><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" height=\"34\" alt=\"Hey, Scripting Guy! Answer\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\" \/><a href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><img decoding=\"async\" class=\"farGraphic\" title=\"Script Center\" height=\"288\" alt=\"Script Center\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" align=\"right\" border=\"0\" \/><\/a> <\/p>\n<p>Hey, ST. Before we start, is anyone out there familiar with the book <a href=\"http:\/\/www.amazon.com\/gp\/product\/product-description\/0061374059\/ref=dp_proddesc_0?ie=UTF8&amp;n=283155&amp;s=books\" target=\"_blank\"><b>Not Quite What I Was Planning: Six-Word Memoirs by Writers Famous and Obscure<\/b><\/a>? Not Quite What I Was Planning is an interesting little book in which the editors of <a href=\"http:\/\/www.smithmag.net\/sixwords\/\" target=\"_blank\"><b>SMITH magazine<\/b><\/a> asked people to summarize their lives in six words. Examples from both the book and the magazine include the following:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>Found true love, married someone else.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>I still make coffee for two.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>Born good. Went bad. Good again.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>Road less traveled, now know why<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Oh, and this one from Amy Sedaris:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>Mushrooms. Clowns. Wands. Five. Wig. Thatched.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Well, OK; never mind about that last one. <\/p>\n<p>At any rate, the Scripting Guy who writes this column thought it might be cool to apply this idea to <i>Hey, Scripting Guy!<\/i> After all, if he could answer the day\u2019s question in just six words, well, that would save him a <i>heck<\/i> of a lot of time and effort. Unfortunately, though, he had to quickly give up on that idea; as it turns out, the Scripting Guy who writes this column can\u2019t even refer to <i>himself<\/i> using just six words:<\/p>\n<table class=\"numberedList\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr vAlign=\"top\">\n<td class=\"listNumber\" noWrap align=\"right\">\n<p>1.<\/p>\n<\/td>\n<td class=\"\">\n<p>The<\/p>\n<\/td>\n<\/tr>\n<tr vAlign=\"top\">\n<td class=\"listNumber\" noWrap align=\"right\">\n<p>2.<\/p>\n<\/td>\n<td class=\"\">\n<p>Scripting<\/p>\n<\/td>\n<\/tr>\n<tr vAlign=\"top\">\n<td class=\"listNumber\" noWrap align=\"right\">\n<p>3.<\/p>\n<\/td>\n<td class=\"\">\n<p>Guy<\/p>\n<\/td>\n<\/tr>\n<tr vAlign=\"top\">\n<td class=\"listNumber\" noWrap align=\"right\">\n<p>4.<\/p>\n<\/td>\n<td class=\"\">\n<p>who<\/p>\n<\/td>\n<\/tr>\n<tr vAlign=\"top\">\n<td class=\"listNumber\" noWrap align=\"right\">\n<p>5.<\/p>\n<\/td>\n<td class=\"\">\n<p>writes<\/p>\n<\/td>\n<\/tr>\n<tr vAlign=\"top\">\n<td class=\"listNumber\" noWrap align=\"right\">\n<p>6.<\/p>\n<\/td>\n<td class=\"\">\n<p>this <\/p>\n<\/td>\n<\/tr>\n<tr vAlign=\"top\">\n<td class=\"listNumber\" noWrap align=\"right\">\n<p>7.<\/p>\n<\/td>\n<td class=\"\">\n<p>column<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Needless to say, he tends to be a little long-winded from time-to-time. <\/p>\n<table class=\"dataTable\" id=\"ETE\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td class=\"\">\n<p class=\"lastInCell\">Actually he did come up with <i>one<\/i> possibility: \u201cHere is the code. Good luck.\u201d However, he wasn\u2019t sure if anyone would count that as having sufficiently explained the script and how it works.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"dataTableBottomMargin\"><\/div>\n<p>Therefore, he decided to compromise a little: we\u2019ll do <i>part<\/i> of today\u2019s column in six-word blocks, enclosing those six-word explanations in square brackets. The rest of the column will be done in typical <i>Hey, Scripting Guy!<\/i> fashion. [Typical fashion. Why are you booing?]<\/p>\n<p>So let\u2019s get started. How can you delete duplicate entries from a tab-delimited file? [Beats us. Try reading the article.]<\/p>\n<table class=\"dataTable\" id=\"EIF\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td class=\"\">\n<p class=\"lastInCell\"><b>Note<\/b>. Say what you want, but it\u2019s still better than the one Amy Sedaris came up with. And she\u2019s a <i>real<\/i> writer and comedian.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"dataTableBottomMargin\"><\/div>\n<p>Here\u2019s a script that can delete duplicate entries from a tab-delimited file (although we\u2019re ashamed to admit that the script requires more than six words):<\/p>\n<pre class=\"codeSample\">Const ForReading = 1\nConst ForWriting = 2\n\nSet objDictionary = CreateObject(\"Scripting.Dictionary\")\n\nSet objFSO = CreateObject(\"Scripting.FileSystemObject\")\nSet objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForReading)\n\nstrContents = objFile.ReadAll\nobjFile.Close\n\nstrContents = Replace(strContents, vbCrLf, vbTab)\narrContents = Split(strContents, vbTab)\n\nFor Each strItem in arrContents\n    If Not objDictionary.Exists(strItem) Then\n        objDictionary.Add strItem, strItem   \n    End If\nNext\n\ni = 1\n\nFor Each strKey in objDictionary.Keys\n    If i &lt; 3 Then\n        strNewContents = strNewContents &amp; strKey &amp; vbTab\n        i = i + 1\n    Else\n        strNewContents = strNewContents &amp; strKey &amp; vbCrLf\n        i = 1\n    End If\nNext\n\nSet objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForWriting)\nobjFile.Write strNewContents\nobjFile.Close\n<\/pre>\n<p>As you can see, our script starts out by creating a pair of constants, ForReading and ForWriting, a pair of constants we\u2019ll need when we open our text file. [Two constants. Must open file twice.] After we define the constants we create instances of the objects <b>Scripting.Dictionary<\/b> and <b>Scripting.FileSystemObject<\/b>, then use the following line of code to open the file C:\\Scripts\\Test.txt for reading:<\/p>\n<pre class=\"codeSample\">Set objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForReading)\n<\/pre>\n<p>As soon as the file is open we use the <b>ReadAll<\/b> method to read the entire contents of the file into a variable named strContents, then call the <b>Close<\/b> method to (temporarily) close Test.txt. [We\u2019ve got the file. Now what?]<\/p>\n<p>Before we do anything else let\u2019s take a look at the file we just opened (and, not coincidentally, the value of the variable strContents):<\/p>\n<pre class=\"codeSample\">Apple    Apple    Apple\nBanana   Cherry   Cherry\nCherry   Date     Fig\nFig      Lemon    Orange\nOrange   Peach    Peach\nPear     Pear     Pear\n<\/pre>\n<p>As you can see, this is a simple three-column list of various fruits. As you can also see, many of these fruits appear more than once; for example, we have three listings for Apple and two for Peach. [One date. Story of our lives.] What we want to do now is eliminate all the duplicate listings. [Originals only. Duplicates need not apply.] In other words, we want a three-column list that looks like this:<\/p>\n<pre class=\"codeSample\">Apple    Banana    Cherry\nDate     Fig       Lemon\nOrange   Peach     Pear\n<\/pre>\n<p>Good question: how <i>are<\/i> we going to do that? Well, for starters, we need a list of words that\u2019s a bit easier to work with. In fact, what we really need is a <i>one<\/i>-column list of words rather than this fancy three-column format. [One column\u2019s great; three\u2019s a crowd.] In order to combine our collection of words into a single list the first thing we do is replace any instances of the carriage return-linefeed character (<b>vbCrLf<\/b>) with a tab character (<b>vbTab<\/b>):<\/p>\n<pre class=\"codeSample\">strContents = Replace(strContents, vbCrLf, vbTab)\n<\/pre>\n<p>What this does is give us a single line of words, with each word in that line separated by a tab. In other words, something similar to this (except that this single line would continue to scroll on and on and on):<\/p>\n<pre class=\"codeSample\">Apple    Apple    Apple    Banana  Cherry   Cherry\n<\/pre>\n<p>[One lone banana. Life isn\u2019t fair.]<\/p>\n<p>Granted, that might not seem like we\u2019ve made much progress. Believe it or not, however, we have. [Holy smokes: Scripting Guys make progress!] Now that <i>all<\/i> our words are separated by tabs we can use the <b>Split<\/b> function to split strContents on the tab character:<\/p>\n<pre class=\"codeSample\">arrContents = Split(strContents, vbTab)\n<\/pre>\n<p>Why would we want to do that? Because that gives us an array named arrContents, an array consisting of the following elements:<\/p>\n<pre class=\"codeSample\">Apple \nApple \nApple\nBanana\nCherry\nCherry\nCherry\nDate  \nFig\nFig   \nLemon \nOrange\nOrange\nPeach \nPeach\nPear  \nPear  \nPear\n<\/pre>\n<p>Now that we have a nice little array like this one it\u2019s actually quite easy to eliminate the duplicate items. To that end, we first set up a For Each loop designed to loop through all the items in the array arrContents. [For Each: everyone gets a turn!]:<\/p>\n<pre class=\"codeSample\">For Each strItem in arrContents\n<\/pre>\n<p>Inside that loop we take the first item in the array, and use the <b>Exists<\/b> method to see if that item exists in the Dictionary [See? We <i>never<\/i> forget the Dictionary.]:<\/p>\n<pre class=\"codeSample\">If Not objDictionary.Exists(strItem) Then\n<\/pre>\n<p>If the item <i>does<\/i> exist in the Dictionary that means this is a duplicate item; in that case, we simply go back to the top of the loop and repeat the process with the next item in the array. If the item <i>doesn\u2019t<\/i> exist in the Dictionary then we use the following line of code to add item to the Dictionary, using the word itself as both the Dictionary key and item:<\/p>\n<pre class=\"codeSample\">objDictionary.Add strItem, strItem\n<\/pre>\n<p>By the time we\u2019re done our Dictionary should contain the following keys (and items):<\/p>\n<pre class=\"codeSample\">Apple \nBanana\nCherry\nDate  \nFig\nLemon \nOrange\nPeach \nPear\n<\/pre>\n<p>That\u2019s pretty good, except now we need to turn this one-column list back into a three-column list. [Sometimes three <i>is<\/i> better than one.] To do that we first assign the value 1 to a counter variable named i:<\/p>\n<pre class=\"codeSample\">i = 1\n<\/pre>\n<p>We then set up another For Each loop, this one designed to walk through all the keys in the Dictionary object [Can never have too many loops.]:<\/p>\n<pre class=\"codeSample\">For Each strKey in objDictionary.Keys\n<\/pre>\n<p>Because we want a three-column list, the first thing we do inside the loop is check to see if the value of our counter variable i is less than 3. If it is, we execute this block of code:<\/p>\n<pre class=\"codeSample\">strNewContents = strNewContents &amp; strKey &amp; vbTab\ni = i + 1\n<\/pre>\n<p>As you can see, there\u2019s nothing too complicated going on here. [Scripting Guys: Simple minds, simple scripts.] In line 1 we\u2019re assigning a value to a variable named strNewContents; to be a little more specific, we\u2019re assigning this variable the existing value of strContents <i>plus<\/i> the value of the Dictionary key <i>plus<\/i> a tab character. In line 2, we then increment the value of our counter variable by 1. What does all that mean? That means that, after three trips through the loop, strContents will be equal to this:<\/p>\n<pre class=\"codeSample\">Apple    Banana  Cherry\n<\/pre>\n<p>Now, what\u2019s going to happen the <i>next<\/i> time through the loop? Well, the next time through the loop the counter variable i will be equal to 3. That means we\u2019re going to execute these two lines of code instead:<\/p>\n<pre class=\"codeSample\">strNewContents = strNewContents &amp; strKey &amp; vbCrLf\ni = 1\n<\/pre>\n<p>Notice the differences? In this block of code, we\u2019re tacking a carriage return-linefeed character onto the end of the string instead of adding a tab character. On top of that, we also reset the value of i to 1. Those two things ensure that the next item in our list (the fourth item) will show up in the first column of row 2 rather than the fourth column of row 1.<\/p>\n<p>As soon as we\u2019ve reformatted strNewContents we reopen the file Test.txt, this time for writing:<\/p>\n<pre class=\"codeSample\">Set objFile = objFSO.OpenTextFile(\"C:\\Scripts\\Test.txt\", ForWriting)\n<\/pre>\n<p>We use the <b>Write<\/b> method to replace the existing contents of Test.txt with the value of strNewContents, then use the <b>Close<\/b> method to close the file one more time, this time for good. [File closed? Column must be over.]<\/p>\n<p>That should do it, SS. (Shoot, that\u2019s only five words. Double shoot: <i>that\u2019s<\/i> only five words, too. Dang; and that last sentence was <i>seven<\/i> words!) Let us know if you have any questions. Also, let us know if you have any six-word descriptions for the Script Center and the Scripting Guys; we\u2019ll publish those in a future column. For example, here\u2019s one we came up with to describe this column:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>OK \u2026. But why no scripting information?<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>And here\u2019s one that summarizes our beloved Scripting Editor:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>Loves people. Had one for breakfast.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>See you all tomorrow.<\/p>\n<p>Wait; check that. How about this: <\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>New column tomorrow; see you then.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>That\u2019s better.<\/p>\n<p>Wait; check that. How about this: <\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>That\u2019s better; must use six words.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! How can I delete the duplicate entries in a tab-delimited file?&#8212; ST Hey, ST. Before we start, is anyone out there familiar with the book Not Quite What I Was Planning: Six-Word Memoirs by Writers Famous and Obscure? Not Quite What I Was Planning is an interesting little book in which the [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[169,3,4,5],"class_list":["post-55803","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-csv-and-other-delimited-files","tag-scripting-guy","tag-scripting-techniques","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! How can I delete the duplicate entries in a tab-delimited file?&#8212; ST Hey, ST. Before we start, is anyone out there familiar with the book Not Quite What I Was Planning: Six-Word Memoirs by Writers Famous and Obscure? Not Quite What I Was Planning is an interesting little book in which the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=55803"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/55803\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=55803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=55803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=55803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}