{"id":2147,"date":"2022-08-08T09:06:18","date_gmt":"2022-08-08T16:06:18","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sdk\/?p=2147"},"modified":"2022-08-08T09:06:18","modified_gmt":"2022-08-08T16:06:18","slug":"broken-link-detection-in-the-azure-sdk","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sdk\/broken-link-detection-in-the-azure-sdk\/","title":{"rendered":"Broken link detection in the Azure SDK"},"content":{"rendered":"<p>It&#8217;s frustrating when you select a reference link in an open-source project and receive an HTTP 404 response. This experience gives customers the impression that the project&#8217;s repository isn&#8217;t maintained.<\/p>\n<p>The Azure SDK team maintains 10 active GitHub repositories that have hundreds of READMEs in total. Before introducing the link checker, we relied on a product manager, content reviewer, or external customer to find broken links. Unfortunately, this approach eroded the trust of our customers. Hence, the Azure SDK Engineering Systems (EngSys) team designed the dead link checker and integrated it into the Continuous Integration (CI) pipelines. Now team members become aware of the errors before broken links are introduced into the repository&#8217;s <em>main<\/em> branch.<\/p>\n<h2>How link checker works<\/h2>\n<p>The link checker is implemented as a <a href=\"https:\/\/github.com\/Azure\/azure-sdk-tools\/blob\/main\/eng\/common\/scripts\/Verify-Links.ps1\">PowerShell script<\/a> that&#8217;s commonly used across our engineering systems. You can invoke the script as follows:<\/p>\n<pre><code class=\"language-powershell\">.\\Verify-Links -urls C:\\README.md -checkLinkGuidance $true<\/code><\/pre>\n<p>For the given README or other Markdown file, the link checker locates all links. A web request is created for each link. If an error response is received, the link check fails. The broken links are printed for the code owner&#8217;s information.<\/p>\n<p>We also maintain a README <a href=\"https:\/\/github.com\/Azure\/azure-sdk\/blob\/main\/docs\/policies\/README-TEMPLATE.md#link-guidelines\">guidance<\/a> and our link checker will check whether the links violate our guidance. By default, we enable the validation for this guidance. Privileged users with unique circumstances have the flexibility to disable the rule.<\/p>\n<p>Once an owner acknowledges the failed links, the links can be added to an allowlist. This allowlist file enables people to continue checking in their code.<\/p>\n<p>There are also cases in which a link points to a file being introduced in the same pull request (PR). As you probably guessed, the link will be valid only after merging the PR into the <em>main<\/em> branch. Therefore, two PRs would be required to avoid introducing a broken link to the new file. To address that problem, the link checker can mutate the GitHub repository&#8217;s <em>main<\/em> branch to the current commit and predict the validity after check-in.<\/p>\n<h2>Where we run link checker<\/h2>\n<p>Collectively, hundreds of PRs are merged into the Azure SDK repositories&#8217; <em>main<\/em> branches each day. To prevent the introduction of invalid links, we enabled check-in PR validation. It saves a significant amount of the EngSys team&#8217;s time to verify the links themselves. For example, <a href=\"https:\/\/dev.azure.com\/azure-sdk\/public\/_build\/results?buildId=1748084&amp;view=logs&amp;j=b70e5e73-bbb6-5567-0939-8415943fadb9&amp;t=2102385d-609d-5572-64d2-932661c7902f\">PR validation<\/a>.<\/p>\n<p>There are cases in which links begin to fail as times passes. The EngSys team has two CI pipelines checking this case:<\/p>\n<ol>\n<li>An aggregate report pipeline, which scans the entire repository on a nightly basis.<\/li>\n<li>The CI deployment runs link checker on service directories each night and for each release. An example of a service directory is <a href=\"https:\/\/github.com\/Azure\/azure-sdk-for-net\/tree\/main\/sdk\/storage\"><em>storage<\/em><\/a>. The validation in these pipelines further prevents the broken links from entering the SDK release pages and documentation sites.<\/li>\n<\/ol>\n<h2>Cache mechanism<\/h2>\n<p>As link checker usage increased, the EngSys team learned something. The high frequency in which some links were accessed resulted in throttling by some commonly referenced websites. For example, GitHub and npm. We decided to introduce caching to reduce the frequency of this issue.<\/p>\n<p>Frequently used links are stored in a cache file. Each time link checker runs, the cache file&#8217;s links are treated as valid. Therefore, there&#8217;s no need to invoke a web request for each of those links. We also provide the flexibility to refresh the input cache file by scanning the repository and updating all good links into input cache file.<\/p>\n<p>Here&#8217;s the workflow:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-content\/uploads\/sites\/58\/2022\/08\/dead-link-cache.png\" alt=\"Workflow for link checker cache\" \/><\/p>\n<h2>Summary<\/h2>\n<p>To reduce frustration and improve efficiency, the Azure SDK team designed the dead link checker and integrated it into the Continuous Integration (CI) pipelines. Detection of broken links occurs prior to being introduced into the repository&#8217;s main branch.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how the Azure SDK Engineering Systems team detects broken links in GitHub repositories.<\/p>\n","protected":false},"author":90059,"featured_media":2153,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[865,705],"class_list":["post-2147","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sdk","tag-eng-tools","tag-sdk"],"acf":[],"blog_post_summary":"<p>Learn how the Azure SDK Engineering Systems team detects broken links in GitHub repositories.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/2147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/users\/90059"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/comments?post=2147"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/posts\/2147\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media\/2153"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/media?parent=2147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/categories?post=2147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sdk\/wp-json\/wp\/v2\/tags?post=2147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}