{"id":12221,"date":"2011-11-01T00:01:00","date_gmt":"2011-11-01T00:01:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2011\/11\/01\/use-powershell-to-remove-duplicate-lines-from-a-csv-file\/"},"modified":"2011-11-01T00:01:00","modified_gmt":"2011-11-01T00:01:00","slug":"use-powershell-to-remove-duplicate-lines-from-a-csv-file","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/use-powershell-to-remove-duplicate-lines-from-a-csv-file\/","title":{"rendered":"Use PowerShell to Remove Duplicate Lines from a CSV File"},"content":{"rendered":"<p><span style=\"font-size: small\"><span style=\"font-family: Segoe\"><strong>Summary:<\/strong> Learn how to use Windows PowerShell to easily remove duplicates from a CSV file.<\/span><\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">&nbsp;<\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Segoe\"><span style=\"font-family: Segoe;font-size: small\"><span style=\"font-family: Segoe\"><img decoding=\"async\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\" \/><\/span><\/span>Hey, Scripting Guy! I have a comma-separated value (CSV) file that comes from a Microsoft Excel spreadsheet. We use this CSV file to create new users automatically in Active Directory. The problem is that every once in a while, duplicates end up in the CSV file. As you can well imagine, this creates havoc and the import fails. I began to think about how I could ensure that the data in the CSV file is unique. I was thinking of the <\/span><a href=\"http:\/\/blogs.technet.com\/search\/searchresults.aspx?q=%22dictionary%20object%22%20unique&amp;sections=7618\"><span style=\"font-family: Segoe\">old-fashioned dictionary object trick<\/span><\/a><span style=\"font-family: Segoe\"> I used back in the VBScript days, and I tried some things using the <\/span><a href=\"http:\/\/blogs.technet.com\/search\/searchresults.aspx?q=hashtable&amp;sections=7618\"><span style=\"font-family: Segoe\">Windows PowerShell hashtable<\/span><\/a><span style=\"font-family: Segoe\">. The problem is it quickly got rather complicated. Next, I decided to try a similar thing using an array, and once again, things went <\/span><a href=\"http:\/\/en.wikipedia.org\/wiki\/Pear-shaped\"><span style=\"font-family: Segoe\">pear-shaped<\/span><\/a><span style=\"font-family: Segoe\">. I looked on the Scripting Guys Script Repository and found a pretty cool <\/span><a href=\"http:\/\/gallery.technet.microsoft.com\/scriptcenter\/f07b23b8-d760-44f0-9ffc-839c2c14b911\"><span style=\"font-family: Segoe\">VBScript that uses a recordset and ADO<\/span><\/a><span style=\"font-family: Segoe\">. I am guessing I could modify it, but I am hoping for something easier. <\/span><\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">&mdash;JB<\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">&nbsp;<\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\"><span style=\"font-family: Segoe;font-size: small\"><span style=\"font-family: Segoe\"><img decoding=\"async\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\" \/><\/span><\/span>Hello JB, <\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">Microsoft Scripting Guy Ed Wilson here. Some things are simple, such as today&rsquo;s date: 11-1-11. (That&rsquo;s how we present it in the United States anyway.) In fact, this year has been one of simple dates. It began with 1-1-11, and in 10 days, it will be 11-11-11. Other things appear complicated, and then a sudden revelation shows simplicity and elegance of design. <\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">One of the hardest things for people who formerly used VBScript as they move to Windows PowerShell is to quit thinking in terms of what they would have done in VBScript. The overriding principle in Windows PowerShell is that everything is an object. &ldquo;Oh yeah,&rdquo; you might say, &ldquo;I know that.&rdquo; But until you <i>really<\/i> know that, and until you use that as fundamental in your approach, you will not actually get it. After you really know that everything is an object, you begin to look for solutions in a new and different way&mdash;one that was not possible in VBScript or some other scripting language. <\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">A perfect case in point, JB, is your problem with needing to remove duplicates from a CSV file. First, if I am going to work with a CSV file, I need to import it. I then need to see which properties are available. To do this, I use the <b>Import-CSV<\/b> cmdlet and the <b>Get-Member<\/b> cmdlet. In the output that follows, I see four <b>noteproperties<\/b><i> <\/i>that correspond to the column headers from the CSV file. <\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">PS C:\\&gt; Import-Csv C:\\fso\\UsersConsolidated.csv | Get-Member<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">&nbsp;<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">&nbsp;<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">&nbsp;&nbsp; TypeName: System.Management.Automation.PSCustomObject<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">&nbsp;<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\"><span style=\"text-decoration: underline\">Name<\/span>&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style=\"text-decoration: underline\">MemberType<\/span>&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style=\"text-decoration: underline\">Definition<\/span><\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">Equals&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Method &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bool Equals(System.Object obj)<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">GetHashCode&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Method &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int GetHashCode()<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">GetType&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Method &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type GetType()<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">ToString&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Method &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string ToString()<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">Fname&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NoteProperty&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.String Fname=Ed<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">Group&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NoteProperty&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.String Group=Engineering<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">Lname&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NoteProperty&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.String Lname=Banti<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">OU&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NoteProperty&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; System.String OU=Atlanta<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">&nbsp;<\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">I now need to decide which duplicates I want to remove. Obviously, I do not want to remove duplicate group names or OU names. I only want to remove duplicate first name and last name combinations. It is okay to have a duplicate first name, and it is okay to have a duplicate last name&mdash;just not a duplicate first name\/last name combination. I therefore need to sort the output by <b>lname<\/b> (last name) and <b>fname<\/b>(first name) columns. To do this, I use the <b>Sort-Object<\/b> cmdlet. This command is shown here (<b>sort<\/b> is actually an alias for <b>Sort-Object<\/b>):<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">Import-Csv C:\\fso\\UsersConsolidated.csv | sort lname,fname<\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">The command to sort an imported CSV file and the associated output is shown in the following figure.<\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/2538.hsg-11-1-11-01.png\"><img decoding=\"async\" style=\"border: 0px\" title=\"Image of command and associated output\" alt=\"Image of command and associated output\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/2538.hsg-11-1-11-01.png\" \/><\/a><\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">After I have the contents of the CSV file sorted, I use the <b>unique<\/b><i> <\/i>switch to return only unique rows from the file:<\/span><\/p>\n<p style=\"padding-left: 30px\"><span style=\"font-family: Segoe;font-size: small\">Import-Csv C:\\fso\\UsersConsolidated.csv | sort lname,fname &ndash;Unique<\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">The command and associated output are shown in the following figure.<\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\"><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/4113.hsg-11-1-11-02.png\"><img decoding=\"async\" style=\"border: 0px\" title=\"Image of command and associated output\" alt=\"Image of command and associated output\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/4113.hsg-11-1-11-02.png\" \/><\/a><\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">JB, that is all there is to using the <b>Sort-Object<\/b> cmdlet and the <b>Import-CSV<\/b> cmdlet to remove duplicates from a CSV file. Join me tomorrow when we continue to explore the cool things that Windows PowerShell offers.<\/span><\/p>\n<p><span style=\"font-size: small\"><span style=\"font-family: Segoe\">I invite you to follow me on <\/span><a href=\"http:\/\/bit.ly\/scriptingguystwitter\" target=\"_blank\"><span style=\"font-family: Segoe\">Twitter<\/span><\/a><span style=\"font-family: Segoe\"> and <\/span><a href=\"http:\/\/bit.ly\/scriptingguysfacebook\"><span style=\"font-family: Segoe\">Facebook<\/span><\/a><span style=\"font-family: Segoe\">. If you have any questions, send email to me at <\/span><a href=\"mailto:scripter@microsoft.com\" target=\"_blank\"><span style=\"font-family: Segoe\">scripter@microsoft.com<\/span><\/a><span style=\"font-family: Segoe\">, or post your questions on the <\/span><a href=\"http:\/\/bit.ly\/scriptingforum\" target=\"_blank\"><span style=\"font-family: Segoe\">Official Scripting Guys Forum<\/span><\/a><span style=\"font-family: Segoe\">. See you tomorrow. Until then, peace<\/span><\/span><\/p>\n<\/p>\n<p><span style=\"font-size: small\"><b>Ed Wilson, Microsoft Scripting Guy<\/b><\/span><\/p>\n<p><span style=\"font-family: Segoe;font-size: small\"><\/span>&nbsp;<\/p>\n<p><span style=\"font-family: Segoe;font-size: small\">&nbsp;<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Summary: Learn how to use Windows PowerShell to easily remove duplicates from a CSV file. &nbsp; Hey, Scripting Guy! I have a comma-separated value (CSV) file that comes from a Microsoft Excel spreadsheet. We use this CSV file to create new users automatically in Active Directory. The problem is that every once in a while, [&hellip;]<\/p>\n","protected":false},"author":596,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[169,3,4,45],"class_list":["post-12221","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-csv-and-other-delimited-files","tag-scripting-guy","tag-scripting-techniques","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>Summary: Learn how to use Windows PowerShell to easily remove duplicates from a CSV file. &nbsp; Hey, Scripting Guy! I have a comma-separated value (CSV) file that comes from a Microsoft Excel spreadsheet. We use this CSV file to create new users automatically in Active Directory. The problem is that every once in a while, [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/12221","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/596"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=12221"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/12221\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=12221"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=12221"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=12221"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}