{"id":69133,"date":"2005-08-19T20:13:00","date_gmt":"2005-08-19T20:13:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2005\/08\/19\/how-can-i-remove-all-duplicate-lines-from-a-text-file\/"},"modified":"2005-08-19T20:13:00","modified_gmt":"2005-08-19T20:13:00","slug":"how-can-i-remove-all-duplicate-lines-from-a-text-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-remove-all-duplicate-lines-from-a-text-file\/","title":{"rendered":"How Can I Remove All Duplicate Lines From a Text File?"},"content":{"rendered":"<p><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\"> \n<P>Hey, Scripting Guy! How can I remove all duplicate lines from a text file?<BR><BR>&#8212; SW<\/P><IMG border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" border=\"0\" alt=\"Script Center\" align=\"right\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" height=\"288\"><\/A> \n<P>Hey, SW. You know, to be a Scripting Guy means to embark on a never-ending quest to find the ultimate solution to a given problem. (Or at least that\u2019s what we tell our manager when he asks why we never actually seem to finish anything: \u201cBut boss, never-ending quests take <I>time<\/I>!\u201d) That\u2019s why we were glad to see your question. <A href=\"http:\/\/null\/technet\/scriptcenter\/resources\/qanda\/apr05\/hey0413.mspx\"><B>Awhile back<\/B><\/A> we answered a similar question, one about removing duplicate names from a text file. The solution we came up with was simple enough, and it worked just fine; we just weren\u2019t convinced that it was the <I>best<\/I> solution. And now, thanks to your question, we can take another shot at this. As to whether this is a better\/faster\/easier solution than the one we offered before, well, we\u2019ll leave that up to you.<\/P>\n<P>To begin with, we assume you have a text file where each line represents a separate record. It\u2019s unlikely, but maybe your file looks like this:<\/P><PRE class=\"codeSample\">This is one of the lines in the text file.\nThis is one of the lines in the text file.\nThis is another line in the text file.\nThis is one of the lines in the text file.\nThis is yet another line in the text file.\nThis is another line in the text file.\nThis is another line in the text file.\nThis is one of the lines in the text file.\n<\/PRE>\n<P>You want a script that can weed out all the duplicate lines and provide you with output similar to this:<\/P><PRE class=\"codeSample\">This is one of the lines in the text file.\nThis is another line in the text file.\nThis is yet another line in the text file.\n<\/PRE>\n<P>SW, you came to the right place:<\/P><PRE class=\"codeSample\">Const adOpenStatic = 3\nConst adLockOptimistic = 3\nConst adCmdText = &amp;H0001<\/p>\n<p>Set objConnection = CreateObject(&#8220;ADODB.Connection&#8221;)\nSet objRecordSet = CreateObject(&#8220;ADODB.Recordset&#8221;)<\/p>\n<p>strPathToTextFile = &#8220;C:\\Scripts\\&#8221;\nstrFile = &#8220;Test.txt&#8221;<\/p>\n<p>objConnection.Open &#8220;Provider=Microsoft.Jet.OLEDB.4.0;&#8221; &amp; _\n      &#8220;Data Source=&#8221; &amp; strPathtoTextFile &amp; &#8220;;&#8221; &amp; _\n          &#8220;Extended Properties=&#8221;&#8221;text;HDR=NO;FMT=Delimited&#8221;&#8221;&#8221;<\/p>\n<p>objRecordSet.Open &#8220;Select DISTINCT * FROM &#8221; &amp; strFile, _\n    objConnection, adOpenStatic, adLockOptimistic, adCmdText<\/p>\n<p>Do Until objRecordSet.EOF\n    Wscript.Echo objRecordSet.Fields.Item(0).Value   \n    objRecordSet.MoveNext\nLoop\n<\/PRE>\n<P>We find this script kind of interesting because we\u2019re using ActiveX Data Objects (ADO) and treating this text file as if it were a database. We won\u2019t spend a lot of time detailing <I>how<\/I> you treat a text file as if it were a database; if you\u2019d like to learn more about that we have a <A href=\"http:\/\/msdn.microsoft.com\/library\/en-us\/dnclinic\/html\/scripting03092004.asp\" target=\"_blank\"><B>Scripting Clinic<\/B><\/A> column covering that topic in detail. For now, suffice to say that we\u2019re working with the text file C:\\Scripts\\Test.txt, something we indicate by assigning the appropriate values to the variables strPathToTextFile and strFile:<\/P><PRE class=\"codeSample\">strPathToTextFile = &#8220;C:\\Scripts\\&#8221;\nstrFile = &#8220;Test.txt&#8221;\n<\/PRE>\n<P>So how does that enable us to eliminate duplicate lines? Well, there\u2019s a kind of database query known as <B>Select DISTINCT<\/B>; what Select DISTINCT allows you to do is select all the distinct (or unique) records in a table. Suppose you had a simple database with these entries:<BR><\/P><PRE class=\"codeSample\">Red\nRed\nBlue\nRed\n<\/PRE>\n<P>If you use a Select DISTINCT query, you\u2019ll get back a recordset consisting only of the unique records:<\/P><PRE class=\"codeSample\">Red\nBlue\n<\/PRE>\n<P>No doubt you\u2019re thinking, \u201cWow: getting back the unique records is pretty much the same thing as eliminating duplicate records.\u201d And we\u2019ll admit that &#8211; whoa, wait a second: you\u2019re absolutely right. Our text file is constructed like a database table, with each line in the text file representing a single field in a single record. If we run a Select DISTINCT query against this text file, we\u2019ll get back only the unique lines. In fact, we\u2019ll get back a recordset that looks like this:<\/P><PRE class=\"codeSample\">This is one of the lines in the text file.\nThis is another line in the text file.\nThis is yet another line in the text file.\n<\/PRE>\n<P>That\u2019s just exactly the information we <I>wanted<\/I> to get back. Good thing you pointed that out to us!<\/P>\n<P>After retrieving our recordset we then use this code to echo the unique lines back to the screen:<\/P><PRE class=\"codeSample\">Do Until objRecordset.EOF\n    Wscript.Echo objRecordset.Fields.Item(0).Value   \n    objRecordset.MoveNext\nLoop\n<\/PRE>\n<P>If we wanted to, we could also use the FileSystemObject to open the text file and replace the existing contents with only the unique lines; that would have the effect of removing all duplicate lines from the text file. (It would be cool if we could use some sort of Update query to do that, but when it comes to text files ADO is read-only.) <\/P>\n<P>So is this the last word on removing duplicate items &#8211; be they names or entire lines &#8211; from a text file? Hey, who knows: after all, never-ending quests take <I>time<\/I>! (Actually, we find that they take about 2-to-3 days. After that we get bored and move on to something else.)<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! How can I remove all duplicate lines from a text file?&#8212; SW Hey, SW. You know, to be a Scripting Guy means to embark on a never-ending quest to find the ultimate solution to a given problem. (Or at least that\u2019s what we tell our manager when he asks why we never [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[3,4,14,5],"class_list":["post-69133","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-scripting-guy","tag-scripting-techniques","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! How can I remove all duplicate lines from a text file?&#8212; SW Hey, SW. You know, to be a Scripting Guy means to embark on a never-ending quest to find the ultimate solution to a given problem. (Or at least that\u2019s what we tell our manager when he asks why we never [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/69133","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=69133"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/69133\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=69133"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=69133"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=69133"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}