As part of our builds, quite a few projects copy files to the binaries directory or other locations. These can be anything from image files to test scripts. To have our builds complete more quickly, we use the multi-process option (/maxcpucount) of msbuild to build projects in parallel.
This all sounds normal, so what’s the problem? In a large team, people will sometimes inadvertently add statements to different project files that copy files to the same destination. When those project files have no references to each other, directly or indirectly, msbuild may build them in parallel. If it does happen to run those projects in parallel on different nodes and the copies happen at the same time, the build breaks because one copy succeeds and one fails. Since the timing is not going to be the same on every build, the result is random build breaks. Build breaks suck. They drain the productivity of the team and are frustrating.
Whether the build is continuous integration or gated checkin, these breaks may happen randomly. They are most likely to happen on incremental builds where the percentage of time spent during the build on doing copies is much higher than a clean build. Tracking them down as they happen is painful.
So, I wrote a simple tool to find cases in the log where the destination is the same for more than one copy operation. The comment in the header explains what the code is looking for. Running this on the normal verbosity msbuild logs from a clean build ensures that all of the copies are in the log for analysis. We also build what we call partitions separately, resulting in the number of log files being a multiple of the number of partitions being built (a partition is a subset of the source and is typically a top-level directory in the branch).
In our internal builds, we record multiple log files for our builds, including minimal, normal, and detailed. When there’s a problem, you can start with the smaller build logs and increase to the more verbose logging as needed.
I’m posting this for any of you who might run into the same thing. Keep in mind that there are other things, such as antivirus software, that can interfere with the build process and result in errors for files being copied.
using System; using System.Collections.Generic; using System.Linq; using System.IO; using System.Text; using System.Text.RegularExpressions; using System.Threading.Tasks; /* * This tool finds cases where more than one file is copied to the same target. This will cause * build breaks when msbuild executes the copies in parallel because the copies are independent * (there are no dependencies). This typically occurs in incremental builds because incremental * builds do a lot less work (not nearly as much to build), resulting in the copies being a much * higher percentage of the build activities and more likely to collide. Gated checkin, * continuous integration, and developer/tester builds are almost always incremental, not clean. * These issues are still possible in regular clean builds, such as done nightly by the build lab. * * These race conditions are difficult to debug manually. Since msbuild records all of the copies * made via the copy task, we can use the log file to identify cases where the same destination * path is used in more than one copy. * * Use the *.normal.* logs from a clean build with this tool. * * The best thing to do is to ensure that each file copy to a particular destination is done by * one and only one project. When that is the case, you are guaranteed not to have problems * with two copies colliding and breaking your build. * * Here's example output from buildr.suitesrc.normal.log that shows a copy failure. Here two * copies were executed in parallel and the second one failed, causing the build to fail. * 48>Project "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\dirs.proj" (48) is building "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\Azure\Microsoft.TeamFoundation.Data.Azure.csproj" (55) on node 8 (BuildLinked target(s)). 55>_CopyOutOfDateSourceItemsToOutputDirectory: Copying file from "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\ReleaseManifest.xml" to "D:\a1\binaries.x86ret\bin\i386\ReleaseManifest.xml". 48>Project "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\dirs.proj" (48) is building "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\SqlServer\Microsoft.TeamFoundation.Data.csproj" (53) on node 4 (BuildLinked target(s)). 53>_CopyOutOfDateSourceItemsToOutputDirectory: Copying file from "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\ReleaseManifest.xml" to "D:\a1\binaries.x86ret\bin\i386\ReleaseManifest.xml". 53>D:\a1\dd\tools\x86\managed\v4.5\Microsoft.Common.targets(3516,5): error MSB3021: Unable to copy file "D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\ReleaseManifest.xml" to "D:\a1\binaries.x86ret\bin\i386\ReleaseManifest.xml". Access to the path 'D:\a1\binaries.x86ret\bin\i386\ReleaseManifest.xml' is denied. [D:\a1\dd\alm\tfs_core\Admin\Servicing\Data\SqlServer\Microsoft.TeamFoundation.Data.csproj] * * Note that there may be multiple copies in a sequence. * 291>_CopyOutOfDateSourceItemsToOutputDirectoryAlways: Copying file from "D:\a1\dd\suitesrc\TFS\common\deploytools\httpcfg.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\httpcfg.exe". Copying file from "D:\a1\dd\suitesrc\TFS\common\deploytools\makecert.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\makecert.exe". Copying file from "D:\a1\dd\suitesrc\TFS\common\deploytools\winhttpcertcfg.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\winhttpcertcfg.exe". CopyFilesToOutputDirectory: Copying file from "D:\int\641\194\suitesrc\tfshttpsconfig.csproj_80399372\objr\x86\TfsHttpsConfig.exe" to "D:\a1\binaries.x86ret\SuiteBin\i386\TFS\Tests\TfsHttpsConfig.exe". * Nodes are reused by msbuild. The result is that a given may process many projects, so it's not * possible to scan and pair up all of the nodes and project files at once. In the code below, * you will see that it always tracks the most recent node for that reason. * */ namespace FindBadCopies { class Program { static void Main(string[] args) { if (args.Length < 1) { Console.WriteLine("Usage: findbadcopies <logfile>\r\n"); Console.WriteLine( @"This tool scans a build log, such as buildr.suitesrc.normal.log, and produces a list of file paths that are the targets of more than one copy and shows which project files initiated each copy. These redundant file copies are prone to fail periodically in incremental builds, such as gated check ins and CI builds, because copies are a higher percentage of the operations in the build, making it more likely that two collide."); return; } ProcessFile(args[0]); } private static void ProcessFile(String fileName) { Dictionary<int, String> nodeTable = new Dictionary<int, String>(1000); Dictionary<String, int> pathTable = new Dictionary<String, int>(1000, StringComparer.InvariantCultureIgnoreCase); String previousLine; string[] text = File.ReadAllLines(fileName); // Process all of the lines in the file, skipping the first line (we need the previous line, // and the first line in the file isn't important to this tool). int lastNode = 0; for (int i = 1; i < text.Length; i++) { previousLine = text[i - 1]; // Record most recent node. The text that appears with it can be different // (see sample log data). string prevLinePattern = @"([0-9]+)[>]"; Match match = Regex.Match(previousLine, prevLinePattern, RegexOptions.IgnoreCase); if (match.Success) { lastNode = Int32.Parse(match.Groups[1].Value); } // If the line is recording the start of a project, add it to the table. string pattern = @"[0-9]+[>]Project ""[^""]+"" \([0-9]+\) is building ""([^""]+)"" \(([0-9]+)\)"; match = Regex.Match(text[i], pattern, RegexOptions.IgnoreCase); if (match.Success) { int node = Int32.Parse(match.Groups[2].Value); String projectPath = Path.GetFullPath(match.Groups[1].Value); // Because nodes are reused, we are only keeping the project path for the most recent use // of a given node. nodeTable[node] = projectPath; // If we matched a project line, it can't be a copy line. continue; } // If the line is one that records a copy, see if there was an earlier copy made to // the same target path. First, try the output of a copying task. string copyingPattern = @"Copying file from ""[^""]+"" to ""([^""]+)"""; match = Regex.Match(text[i], copyingPattern, RegexOptions.IgnoreCase); if (match.Success) { String targetPath = null; try { targetPath = Path.GetFullPath(match.Groups[1].Value); } catch (Exception e) { // There is a file in the test tree that uses non-English chars that causes // GetFullPath() to throw (TODO: understand why), so we keep the raw text. // Console.WriteLine(match.Groups[1].Value); targetPath = match.Groups[1].Value; } // If we have already seen the target path, then we have a duplicate copy path // target to report. int otherNode; if (pathTable.TryGetValue(targetPath, out otherNode)) { Console.ForegroundColor = ConsoleColor.Cyan; Console.WriteLine("{0}", targetPath); Console.ResetColor(); Console.WriteLine(" {0}", nodeTable[otherNode]); Console.WriteLine(" {0}", nodeTable[lastNode]); Console.WriteLine(); } pathTable[targetPath] = lastNode; } } } } }
0 comments
Be the first to start the discussion.