{"id":543,"date":"2012-01-21T12:25:58","date_gmt":"2012-01-21T12:25:58","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/buckh\/2012\/01\/21\/a-tool-to-find-duplicate-copies-in-a-build\/"},"modified":"2012-01-21T12:25:58","modified_gmt":"2012-01-21T12:25:58","slug":"a-tool-to-find-duplicate-copies-in-a-build","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/buckh\/a-tool-to-find-duplicate-copies-in-a-build\/","title":{"rendered":"A tool to find duplicate copies in a build"},"content":{"rendered":"<p>As part of our builds, quite a few projects copy files to the binaries directory or other locations.&#160; These can be anything from image files to test scripts.&#160; To have our builds complete more quickly, we use the multi-process option (<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms164311.aspx\">\/maxcpucount<\/a>) of msbuild to <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/bb651790.aspx\">build projects in parallel<\/a>.<\/p>\n<p>This all sounds normal, so what\u2019s the problem?&#160; In a large team, people will sometimes inadvertently add statements to different project files that copy files to the same destination.&#160; When those project files have no references to each other, directly or indirectly, msbuild may build them in parallel.&#160; If it does happen to run those projects in parallel on different nodes and the copies happen at the same time, the build breaks because one copy succeeds and one fails.&#160; Since the timing is not going to be the same on every build, the result is random build breaks.&#160; Build breaks suck.&#160; They drain the productivity of the team and are frustrating.<\/p>\n<p>Whether the build is continuous integration or gated checkin, these breaks may happen randomly.&#160; They are most likely to happen on incremental builds where the percentage of time spent during the build on doing copies is much higher than a clean build.&#160; Tracking them down as they happen is painful.<\/p>\n<p>So, I wrote a simple tool to find cases in the log where the destination is the same for more than one copy operation.&#160; The comment in the header explains what the code is looking for.&#160; Running this on the normal verbosity msbuild logs from a clean build ensures that all of the copies are in the log for analysis.&#160; We also build what we call partitions separately, resulting in the number of log files being a multiple of the number of partitions being built (a partition is a subset of the source and is typically a top-level directory in the branch).<\/p>\n<p>In our internal builds, we record multiple log files for our builds, including minimal, normal, and detailed.&#160; When there\u2019s a problem, you can start with the smaller build logs and increase to the more verbose logging as needed.<\/p>\n<p>I\u2019m posting this for any of you who might run into the same thing.&#160; Keep in mind that there are other things, such as antivirus software, that can interfere with the build process and result in errors for files being copied.<\/p>\n<blockquote>\n<pre class=\"csharpcode\"><span class=\"kwrd\">using<\/span> System;\n<span class=\"kwrd\">using<\/span> System.Collections.Generic;\n<span class=\"kwrd\">using<\/span> System.Linq;\n<span class=\"kwrd\">using<\/span> System.IO;\n<span class=\"kwrd\">using<\/span> System.Text;\n<span class=\"kwrd\">using<\/span> System.Text.RegularExpressions;\n<span class=\"kwrd\">using<\/span> System.Threading.Tasks;\n\n<span class=\"rem\">\/*<\/span>\n<span class=\"rem\"> * This tool finds cases where more than one file is copied to the same target.  This will cause<\/span>\n<span class=\"rem\"> * build breaks when msbuild executes the copies in parallel because the copies are independent<\/span>\n<span class=\"rem\"> * (there are no dependencies).  This typically occurs in incremental builds because incremental<\/span>\n<span class=\"rem\"> * builds do a lot less work (not nearly as much to build), resulting in the copies being a much<\/span>\n<span class=\"rem\"> * higher percentage of the build activities and more likely to collide.  Gated checkin,<\/span>\n<span class=\"rem\"> * continuous integration, and developer\/tester builds are almost always incremental, not clean.<\/span>\n<span class=\"rem\"> * These issues are still possible in regular clean builds, such as done nightly by the build lab.<\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\"> * These race conditions are difficult to debug manually.  Since msbuild records all of the copies<\/span>\n<span class=\"rem\"> * made via the copy task, we can use the log file to identify cases where the same destination<\/span>\n<span class=\"rem\"> * path is used in more than one copy.<\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\"> * Use the *.normal.* logs from a clean build with this tool.<\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\"> * The best thing to do is to ensure that each file copy to a particular destination is done by<\/span>\n<span class=\"rem\"> * one and only one project.  When that is the case, you are guaranteed not to have problems<\/span>\n<span class=\"rem\"> * with two copies colliding and breaking your build.<\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\"> * Here's example output from buildr.suitesrc.normal.log that shows a copy failure.  Here two<\/span>\n<span class=\"rem\"> * copies were executed in parallel and the second one failed, causing the build to fail.<\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\">    48&gt;Project &quot;D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\dirs.proj&quot; (48) is building &quot;D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\Azure\\Microsoft.TeamFoundation.Data.Azure.csproj&quot; (55) on node 8 (BuildLinked target(s)).<\/span>\n<span class=\"rem\"><\/span>\n<span class=\"rem\">    55&gt;_CopyOutOfDateSourceItemsToOutputDirectory:<\/span>\n<span class=\"rem\">         Copying file from &quot;D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\ReleaseManifest.xml&quot; to &quot;D:\\a1\\binaries.x86ret\\bin\\i386\\ReleaseManifest.xml&quot;.<\/span>\n<span class=\"rem\"><\/span>\n<span class=\"rem\"><\/span>\n<span class=\"rem\">    48&gt;Project &quot;D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\dirs.proj&quot; (48) is building &quot;D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\SqlServer\\Microsoft.TeamFoundation.Data.csproj&quot; (53) on node 4 (BuildLinked target(s)).<\/span>\n<span class=\"rem\"><\/span>\n<span class=\"rem\">    53&gt;_CopyOutOfDateSourceItemsToOutputDirectory:<\/span>\n<span class=\"rem\">         Copying file from &quot;D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\ReleaseManifest.xml&quot; to &quot;D:\\a1\\binaries.x86ret\\bin\\i386\\ReleaseManifest.xml&quot;.<\/span>\n<span class=\"rem\"><\/span>\n<span class=\"rem\">    53&gt;D:\\a1\\dd\\tools\\x86\\managed\\v4.5\\Microsoft.Common.targets(3516,5): error MSB3021: Unable to copy file &quot;D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\ReleaseManifest.xml&quot; to &quot;D:\\a1\\binaries.x86ret\\bin\\i386\\ReleaseManifest.xml&quot;. Access to the path 'D:\\a1\\binaries.x86ret\\bin\\i386\\ReleaseManifest.xml' is denied. [D:\\a1\\dd\\alm\\tfs_core\\Admin\\Servicing\\Data\\SqlServer\\Microsoft.TeamFoundation.Data.csproj]<\/span>\n<span class=\"rem\">  <\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\"> * Note that there may be multiple copies in a sequence.<\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\">    291&gt;_CopyOutOfDateSourceItemsToOutputDirectoryAlways:<\/span>\n<span class=\"rem\">         Copying file from &quot;D:\\a1\\dd\\suitesrc\\TFS\\common\\deploytools\\httpcfg.exe&quot; to &quot;D:\\a1\\binaries.x86ret\\SuiteBin\\i386\\TFS\\Tests\\httpcfg.exe&quot;.<\/span>\n<span class=\"rem\">         Copying file from &quot;D:\\a1\\dd\\suitesrc\\TFS\\common\\deploytools\\makecert.exe&quot; to &quot;D:\\a1\\binaries.x86ret\\SuiteBin\\i386\\TFS\\Tests\\makecert.exe&quot;.<\/span>\n<span class=\"rem\">         Copying file from &quot;D:\\a1\\dd\\suitesrc\\TFS\\common\\deploytools\\winhttpcertcfg.exe&quot; to &quot;D:\\a1\\binaries.x86ret\\SuiteBin\\i386\\TFS\\Tests\\winhttpcertcfg.exe&quot;.<\/span>\n<span class=\"rem\">       CopyFilesToOutputDirectory:<\/span>\n<span class=\"rem\">         Copying file from &quot;D:\\int\\641\\194\\suitesrc\\tfshttpsconfig.csproj_80399372\\objr\\x86\\TfsHttpsConfig.exe&quot; to &quot;D:\\a1\\binaries.x86ret\\SuiteBin\\i386\\TFS\\Tests\\TfsHttpsConfig.exe&quot;.<\/span>\n<span class=\"rem\"><\/span>\n<span class=\"rem\"> * Nodes are reused by msbuild.  The result is that a given may process many projects, so it's not<\/span>\n<span class=\"rem\"> * possible to scan and pair up all of the nodes and project files at once.  In the code below, <\/span>\n<span class=\"rem\"> * you will see that it always tracks the most recent node for that reason.<\/span>\n<span class=\"rem\"> * <\/span>\n<span class=\"rem\"> *\/<\/span>\n\n<span class=\"kwrd\">namespace<\/span> FindBadCopies\n{\n    <span class=\"kwrd\">class<\/span> Program\n    {\n        <span class=\"kwrd\">static<\/span> <span class=\"kwrd\">void<\/span> Main(<span class=\"kwrd\">string<\/span>[] args)\n        {\n            <span class=\"kwrd\">if<\/span> (args.Length &lt; 1)\n            {\n                Console.WriteLine(<span class=\"str\">&quot;Usage: findbadcopies &lt;logfile&gt;\\r\\n&quot;<\/span>);\n                Console.WriteLine(\n<span class=\"str\">@&quot;This tool scans a build log, such as buildr.suitesrc.normal.log, and produces a\nlist of file paths that are the targets of more than one copy and shows which\nproject files initiated each copy.  These redundant file copies are prone to\nfail periodically in incremental builds, such as gated check ins and CI builds,\nbecause copies are a higher percentage of the operations in the build, making\nit more likely that two collide.&quot;<\/span>);\n\n                <span class=\"kwrd\">return<\/span>;\n            }\n\n            ProcessFile(args[0]);\n        }\n\n        <span class=\"kwrd\">private<\/span> <span class=\"kwrd\">static<\/span> <span class=\"kwrd\">void<\/span> ProcessFile(String fileName)\n        {\n            Dictionary&lt;<span class=\"kwrd\">int<\/span>, String&gt; nodeTable = <span class=\"kwrd\">new<\/span> Dictionary&lt;<span class=\"kwrd\">int<\/span>, String&gt;(1000);\n            Dictionary&lt;String, <span class=\"kwrd\">int<\/span>&gt; pathTable = <span class=\"kwrd\">new<\/span> Dictionary&lt;String, <span class=\"kwrd\">int<\/span>&gt;(1000, StringComparer.InvariantCultureIgnoreCase);\n            String previousLine;\n\n            <span class=\"kwrd\">string<\/span>[] text = File.ReadAllLines(fileName);\n\n            <span class=\"rem\">\/\/ Process all of the lines in the file, skipping the first line (we need the previous line,<\/span>\n            <span class=\"rem\">\/\/ and the first line in the file isn't important to this tool).<\/span>\n            <span class=\"kwrd\">int<\/span> lastNode = 0;\n            <span class=\"kwrd\">for<\/span> (<span class=\"kwrd\">int<\/span> i = 1; i &lt; text.Length; i++)\n            {\n                previousLine = text[i - 1];\n\n                <span class=\"rem\">\/\/ Record most recent node.  The text that appears with it can be different<\/span>\n                <span class=\"rem\">\/\/ (see sample log data).<\/span>\n                <span class=\"kwrd\">string<\/span> prevLinePattern = <span class=\"str\">@&quot;([0-9]+)[&gt;]&quot;<\/span>;\n                Match match = Regex.Match(previousLine, prevLinePattern, RegexOptions.IgnoreCase);\n                <span class=\"kwrd\">if<\/span> (match.Success)\n                {\n                    lastNode = Int32.Parse(match.Groups[1].Value);\n                }\n\n                <span class=\"rem\">\/\/ If the line is recording the start of a project, add it to the table.<\/span>\n                <span class=\"kwrd\">string<\/span> pattern = <span class=\"str\">@&quot;[0-9]+[&gt;]Project &quot;<\/span><span class=\"str\">&quot;[^&quot;<\/span><span class=\"str\">&quot;]+&quot;<\/span><span class=\"str\">&quot; \\([0-9]+\\) is building &quot;<\/span><span class=\"str\">&quot;([^&quot;<\/span><span class=\"str\">&quot;]+)&quot;<\/span><span class=\"str\">&quot; \\(([0-9]+)\\)&quot;<\/span>;\n                match = Regex.Match(text[i], pattern, RegexOptions.IgnoreCase);\n                <span class=\"kwrd\">if<\/span> (match.Success)\n                {\n                    <span class=\"kwrd\">int<\/span> node = Int32.Parse(match.Groups[2].Value);\n                    String projectPath = Path.GetFullPath(match.Groups[1].Value);\n\n                    <span class=\"rem\">\/\/ Because nodes are reused, we are only keeping the project path for the most recent use<\/span>\n                    <span class=\"rem\">\/\/ of a given node.<\/span>\n                    nodeTable[node] = projectPath;\n\n                    <span class=\"rem\">\/\/ If we matched a project line, it can't be a copy line.<\/span>\n                    <span class=\"kwrd\">continue<\/span>;\n                }\n\n                <span class=\"rem\">\/\/ If the line is one that records a copy, see if there was an earlier copy made to<\/span>\n                <span class=\"rem\">\/\/ the same target path.  First, try the output of a copying task.<\/span>\n                <span class=\"kwrd\">string<\/span> copyingPattern = <span class=\"str\">@&quot;Copying file from &quot;<\/span><span class=\"str\">&quot;[^&quot;<\/span><span class=\"str\">&quot;]+&quot;<\/span><span class=\"str\">&quot; to &quot;<\/span><span class=\"str\">&quot;([^&quot;<\/span><span class=\"str\">&quot;]+)&quot;<\/span><span class=\"str\">&quot;&quot;<\/span>;\n                match = Regex.Match(text[i], copyingPattern, RegexOptions.IgnoreCase);\n                <span class=\"kwrd\">if<\/span> (match.Success)\n                {\n                    String targetPath = <span class=\"kwrd\">null<\/span>;\n                    <span class=\"kwrd\">try<\/span>\n                    {\n                        targetPath = Path.GetFullPath(match.Groups[1].Value);\n                    }\n                    <span class=\"kwrd\">catch<\/span> (Exception e)\n                    {\n                        <span class=\"rem\">\/\/ There is a file in the test tree that uses non-English chars that causes<\/span>\n                        <span class=\"rem\">\/\/ GetFullPath() to throw (TODO: understand why), so we keep the raw text.<\/span>\n                        <span class=\"rem\">\/\/ Console.WriteLine(match.Groups[1].Value);<\/span>\n                        targetPath = match.Groups[1].Value;\n                    }\n\n                    <span class=\"rem\">\/\/ If we have already seen the target path, then we have a duplicate copy path<\/span>\n                    <span class=\"rem\">\/\/ target to report.<\/span>\n                    <span class=\"kwrd\">int<\/span> otherNode;\n                    <span class=\"kwrd\">if<\/span> (pathTable.TryGetValue(targetPath, <span class=\"kwrd\">out<\/span> otherNode))\n                    {\n                        Console.ForegroundColor = ConsoleColor.Cyan;\n                        Console.WriteLine(<span class=\"str\">&quot;{0}&quot;<\/span>, targetPath);\n                        Console.ResetColor();\n\n                        Console.WriteLine(<span class=\"str\">&quot;      {0}&quot;<\/span>, nodeTable[otherNode]);\n                        Console.WriteLine(<span class=\"str\">&quot;      {0}&quot;<\/span>, nodeTable[lastNode]);\n                        Console.WriteLine();\n                    }\n\n                    pathTable[targetPath] = lastNode;\n                }\n            }\n        }\n    }\n}<\/pre>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>As part of our builds, quite a few projects copy files to the binaries directory or other locations.&#160; These can be anything from image files to test scripts.&#160; To have our builds complete more quickly, we use the multi-process option (\/maxcpucount) of msbuild to build projects in parallel. This all sounds normal, so what\u2019s the [&hellip;]<\/p>\n","protected":false},"author":94,"featured_media":10268,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[4,7],"class_list":["post-543","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","tag-msbuild","tag-team-build"],"acf":[],"blog_post_summary":"<p>As part of our builds, quite a few projects copy files to the binaries directory or other locations.&#160; These can be anything from image files to test scripts.&#160; To have our builds complete more quickly, we use the multi-process option (\/maxcpucount) of msbuild to build projects in parallel. This all sounds normal, so what\u2019s the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/posts\/543","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/users\/94"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/comments?post=543"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/posts\/543\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/media\/10268"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/media?parent=543"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/categories?post=543"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/buckh\/wp-json\/wp\/v2\/tags?post=543"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}