{"id":17731,"date":"2010-07-17T00:01:00","date_gmt":"2010-07-17T00:01:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2010\/07\/17\/hey-scripting-guy-weekend-scripter-uniquifying-and-validating-link-lists-with-windows-powershell\/"},"modified":"2010-07-17T00:01:00","modified_gmt":"2010-07-17T00:01:00","slug":"hey-scripting-guy-weekend-scripter-uniquifying-and-validating-link-lists-with-windows-powershell","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-weekend-scripter-uniquifying-and-validating-link-lists-with-windows-powershell\/","title":{"rendered":"Hey, Scripting Guy! Weekend Scripter: Uniquifying and Validating Link Lists with Windows PowerShell"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>Microsoft Scripting Guy Ed Wilson here. It is a beautiful Saturday morning here in Charlotte, <a href=\"http:\/\/en.wikipedia.org\/wiki\/North_Carolina\">North Carolina<\/a>, in the United States. The sky is dark blue and there are few if any clouds visible. It almost makes one dizzy while standing in the front yard leaning back and looking upward because there is nothing to give any frame of reference for depth perception. I am reminded of a wall dive I did in <a href=\"http:\/\/en.wikipedia.org\/wiki\/Maui\">Maui<\/a>, Hawaii. I snapped the following photo on that dive. <\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/5353.wes071710011_493BC3AB.jpg\"><img decoding=\"async\" style=\"border-bottom: 0px;border-left: 0px;border-top: 0px;border-right: 0px\" title=\"Photo Ed took while diving off Maui, Hawaii\" border=\"0\" alt=\"Photo Ed took while diving off Maui, Hawaii\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/6330.wes071710011_thumb_639BB6C2.jpg\" width=\"604\" height=\"579\" \/><\/a> <\/p>\n<p>I have a <a href=\"http:\/\/en.wikipedia.org\/wiki\/Comma-separated_values\">CSV<\/a> file that contains a number of hyperlinks in it. Unfortunately, there are many duplicates in the file. To make matters worse, some of the links are not really valid hyperlinks. I thought about opening the CSV file in Microsoft <a href=\"http:\/\/office.microsoft.com\/en-us\/excel\/\">Excel<\/a>, using the sort option, and then manually cleaning up the file, but Microsoft Excel has developed a bad habit of adding extra quotation marks to my CSV files. In addition, it seems that CSV files opened with Microsoft Excel pick up extra commas along the way. Both of these annoyances have caused me to quit using Microsoft Excel to work with CSV files. The CSV file is shown in Notepad in the following image.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/1856.wes071710021_02DE5D96.jpg\"><img decoding=\"async\" style=\"border-bottom: 0px;border-left: 0px;border-top: 0px;border-right: 0px\" title=\"Image of CSV file in Notepad\" border=\"0\" alt=\"Image of CSV file in Notepad\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/8802.wes071710021_thumb_7AE6BB33.jpg\" width=\"604\" height=\"439\" \/><\/a> <\/p>\n<p>Therefore, as I was mulling my dilemma over a bowl of shredded cardboard (I believe the box would have tasted better than the alleged cereal), I initially hit upon the idea of creating a hash table. After the hash table was created, I would then add the hyperlinks as keys in the hash table, handle the duplicate key errors, errors, and pass the newly created hash table to the <strong>Export-CSV<\/strong> <a href=\"http:\/\/technet.microsoft.com\/en-us\/scriptcenter\/powershell.aspx\">Windows PowerShell<\/a> cmdlet. Though that approach would accomplish what I need to do, it just seems like too much work for a Saturday morning. <\/p>\n<p>When I went upstairs to my office, I opened Windows PowerShell and said to myself, &ldquo;I wish I could use the <strong>Sort-Object<\/strong> cmdlet with the <strong>&ndash;unique<\/strong> parameter to provide me with a unique list of hyperlinks. While I am at it, it would be really cool if I could pass the sorted contents to <strong>Export-CSV<\/strong>.&rdquo; As a matter of fact I can, and here is the command I used (I typed this on a single line in my Windows PowerShell console; the line is broken up here for readability):<\/p>\n<p>&nbsp;<\/p>\n<div class=\"code\"><span style=\"color: #0000ff\">Import-Csv<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">.\\HSGlinks_RAW.csv<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">|<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #0000ff\">Sort-Object<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">-Property<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">text<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">-Unique<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">|<\/span><span style=\"color: #808080\">&nbsp; <\/p>\n<p><\/span><span style=\"color: #0000ff\">Export-Csv<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">.\\sortedlinks.csv<\/span> <\/div>\n<p>&nbsp;<\/p>\n<p>After I had a sorted unique list of hyperlinks, I next decided to filter out invalid hyperlinks. I had noticed my file contained a number of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Universal_Naming_Convention\">Universal Naming Convention (UNC)<\/a> paths, and I am looking for only Internet addresses in this list. This time, I imported the CSV file, piped the results to the <strong>Where-Object<\/strong> cmdlet and used a very simple <a href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/qanda\/apr09\/hey0413.mspx\">regular expression<\/a> to look for the presence of &ldquo;http&rdquo; at the beginning of the string. If that match is found, the results are piped to the <strong>Export-CSV<\/strong> cmdlet. Here is the code that I used:<\/p>\n<p>&nbsp;<\/p>\n<div class=\"code\"><span style=\"color: #0000ff\">Import-csv<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">.\\sortedlinks.csv<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">|<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #0000ff\">where-object<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">{<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #2b91af\">$_<\/span><span style=\"color: #000000\">.url<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #0000ff\">-match<\/span><span style=\"color: #808080\"> &#8220;^http&#8221;<\/span><span style=\"color: #000000\">}<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">|<\/span><span style=\"color: #808080\">&nbsp; <\/p>\n<p><\/span><span style=\"color: #0000ff\">Export-Csv<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">.\\sortedlinksCleaned.csv<\/span> <\/div>\n<p>&nbsp;<\/p>\n<p>Well, that is it&mdash;short and sweet. Now, I am going to head out to my woodworking shop. The <a href=\"http:\/\/blogs.technet.com\/b\/heyscriptingguy\/archive\/tags\/scripting+wife\/\">Scripting Wife<\/a> has been making fun of me because I have five projects in progress at the same time. It is time to finish one or two of them. I wonder if this would work:<\/p>\n<p>&nbsp;<\/p>\n<div class=\"code\"><span style=\"color: #000000\">Get-WoodworkingProject<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">|<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #0000ff\">where-object<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">{<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #2b91af\">$_<\/span><span style=\"color: #000000\">.status<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">-match<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">&ldquo;incomplete&rdquo;}<\/span><span style=\"color: #808080\">&nbsp;<\/span><span style=\"color: #000000\">|<\/span><span style=\"color: #808080\"> <\/p>\n<p><\/span><span style=\"color: #000000\">Complete-WoodworkingProject<\/span> <\/div>\n<p>&nbsp;<\/p>\n<p>You never know. The CSV project was easy to complete. I just need to find a woodworking provider. <\/p>\n<p>If you want to know exactly what we will be looking at tomorrow, follow us on <a href=\"https:\/\/twitter.com\/scriptingguys\/\">Twitter<\/a> or <a href=\"http:\/\/bit.ly\/scriptingguysfacebook\">Facebook<\/a>. If you have any questions, send e-mail to us at <a href=\"mailto:scripter@microsoft.com\">scripter@microsoft.com<\/a>, or post your questions on the <a href=\"http:\/\/bit.ly\/scriptingforum\">Official Scripting Guys Forum.<\/a>. See you tomorrow. Until then, peace.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Ed Wilson and Craig Liebendorfer, Scripting Guys<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Microsoft Scripting Guy Ed Wilson here. It is a beautiful Saturday morning here in Charlotte, North Carolina, in the United States. The sky is dark blue and there are few if any clouds visible. It almost makes one dizzy while standing in the front yard leaning back and looking upward because there is nothing [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[169,3,4,61,45],"class_list":["post-17731","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-csv-and-other-delimited-files","tag-scripting-guy","tag-scripting-techniques","tag-weekend-scripter","tag-windows-powershell"],"acf":[],"blog_post_summary":"<p>&nbsp; Microsoft Scripting Guy Ed Wilson here. It is a beautiful Saturday morning here in Charlotte, North Carolina, in the United States. The sky is dark blue and there are few if any clouds visible. It almost makes one dizzy while standing in the front yard leaning back and looking upward because there is nothing [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/17731","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=17731"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/17731\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=17731"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=17731"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=17731"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}