Copy-on-Write performance and debugging

Erik Mavrinac

This is a follow-up to our previous coverage of Dev Drive and copy-on-write (CoW) linking. See our previous articles from May 24, 2023, October 13, 2023, and November 2, 2023.

Dev Drive was released in Windows 11 in October, 2023, and will be part of Windows Server 2025 this fall, along with an enhancement to automatically use copy-on-write linking (CoW-in-Win32). Here, we’ll cover the results of several months of repo build performance testing for several large internal codebases, provide some information on determining whether a file is a CoW link, and share a few tips we found from adding Dev Drive to thousands of Dev Box VMs for daily developer use.

Repo build performance

Let’s start with the chart:

Chart of Dev Drive + CoW Wins across internal repos ranging from largest reduction with Large C# to smallest with Large C++

The highest win of 43% did not replicate for all the repos under test. However, many repos did get a reduction of 10% or more. Several patterns stand out when comparing to the underlying repo code:

  • Repos containing C# with deep project-to-project dependencies cause MSBuild to copy assemblies many times. These can get a significant benefit from CoW linking.
  • Repos that perform lots of additional copying to create microservice layouts as part of the build output also get a strong benefit.
  • Repos heavy in C++ showed only a small win except where they were copying files for microservice layouts. C++ builds in MSBuild do not by default copy output files over and over again, and MSVC tends to generate fewer, larger files, where Dev Drive’s reduced file I/O overhead is less effective.
  • Two repos with low benefit had a project dependency graph with a lot of initial parallelism that was noticeably faster but with a near-linear chain of large projects at the end that reduced the effect of speeding up I/O in each project. 1

Test methodology notes: Tests for reach repo were run on NTFS and Dev Drive partitions on the same Dev Box VM. NuGet and other package caches were placed in the same partition as the source code. All repos used the Microsoft.Build.CopyOnWrite SDK and, where applicable, an upgraded Microsoft.Build.Artifacts SDK. CoW-in-Win32 was not available at the time of testing, and may produce different results when released this fall. Five or more iterations were run per test case, with the first one dropped to avoid measuring a cold disk cache. Measurements were of the build phase only, with package restore and inline tests separated out. All builds were run with clean repo and output directories to ensure a full build. Typical build times per iteration were selected to be about 20 minutes, which for large repos usually meant building a specific subdirectory.

CoW links are also known as block clones, where blocks of data on disk are referred-to from multiple file entries. fsutil contains subcommands that let us view files from a block clone point of view. Let’s take a look at a block clone of an assembly copied from a package to my MSBuild output directory:

> dir Azure.Core.dll
Volume in drive D is DevDrive
02/26/2024  08:24 AM           400,936 Azure.Core.dll

> fsutil file queryExtentsAndRefCounts Azure.Core.dll
VCN: 0x0        Clusters: 0x61       LCN: 0x1297cf7  Ref: 0x4
VCN: 0x61       Clusters: 0x1        LCN: 0x15337ac  Ref: 0x1

This shows that there are 97 clusters corresponding to the main 400K body of the assembly. Note the Ref: 0x4 meaning the underlying block at Logical Cluster Number 0x1297cf7 has 4 block clones on the disk volume, of which this is one. The last cluster with one reference holds the block clone reference metadata, which means for every cloned file there is one cluster actually used for tracking purposes.

Using ProcMon with Dev Drive

ProcMon uses an included filter driver whose name, e.g. ProcMon24, changes over time. Attach the filter driver like:

fsutil devdrv query
# Take note of the current allow-list of filters
fsutil devdrv setfiltersallowed ProcMon24,<other filters comma delimited>
fsutil volume dismount <dev drive letter>

You can generally leave ProcMon24 in the allow list, as it is only attached to the volume when ProcMon is in use. Our internal Dev Box images are generated with the filter always added.

Using Microsoft Performance Recorder (Xperf) with Dev Drive

Attach the FileInfo filter driver:

fsutil devdrv query
# Take note of the current allow-list of filters
fsutil devdrv setfiltersallowed FileInfo,<other filters comma delimited>
fsutil volume dismount <dev drive letter>:

Then measure. After measurement, it’s important to disable FileInfo as it is always attached to the Dev Drive when allowed, slowing performance.

fsutil devdrv setfiltersallowed <original filters comma delimited>
fsutil volume dismount <dev drive letter>:

Finding and fixing leaked CoW references

Dev Drive, which is based on ReFS, allows only 8176 clones of a data block. If you have a file that fails to copy because of an error related to too many clones, e.g. MaxCloneFileLinksExceededException from the CoW library used in the Microsoft.Build.CopyOnWrite and Microsoft.Build.Artifacts SDKs, winerror ERROR_BLOCK_TOO_MANY_REFERENCES = 347, or NTSTATUS STATUS_BLOCK_TOO_MANY_REFERENCES = 0xC000048C, you might have too many actual references, or you might need to clean up orphaned references. We ran into this problem on one machine that had run continuous CoW builds for weeks under a prerelease CoW-in-Win32 implementation, so we don’t expect this to appear in the wild very often.

In an elevated console or PowerShell run the following, where x: is the drive letter of your Dev Drive.

refsutil leak x: /s %TEMP%\ReFSRepair.tmp

This will scan and fix dangling references. You can add the /d parameter to detect but not fix these references.

Example output from a volume with a significant number of orphaned references:

C:\temp>refsutil leak d: /s .\refs.tmp
Creating volume snapshot on drive \\?\Volume{7e38c41b-1cbc-4abd-8c4a-2c5ca0eed7c7}...
Creating the scratch file...
Beginning volume scan... This may take a while...
Begin leak verification pass 1 (Cluster leaks)...
End leak verification pass 1. Found 1270060 leaked clusters on the volume.

Begin leak verification pass 2 (Reference count leaks)...
End leak verification pass 2. Found 10822697373 leaked references on the volume.

Begin leak verification pass 3 (Compacted cluster leaks)...
End leak verification pass 3.

Begin leak verification pass 4 (Remaining cluster leaks)...
End leak verification pass 4. Fixed 10823967433 leaks during this pass.

Begin leak verification pass 5 (Hardlink leaks)...
End leak verification pass 5. Fixed 0 hardlinks, and 0 posix deleted files/dirs during this pass.

Finished.
Found leaked clusters: 1270060
Found reference leaks: 10822697373
Total cluster fixed  : 10823967433

Conclusion

Copy-on-write will be on by default for Dev Drive in the 24H2 Windows operating system release wave. Dev Drive and CoW will be available in the Server SKU for the first time starting in Server 2025 later this year. These releases will make many builds on Windows notably faster, particularly C# builds. CoW-in-Win32 will avoid the need to integrate the CoW SDKs or modify other build engines or tools.

In the intervening months, consider integrating the CopyOnWrite SDK into your MSBuild repo and creating a Dev Drive partition on your development machine.

We hope you find your build performance notably faster!


  1. We recommended that the repo owners try ReferenceTrimmer to see if any parallelism could be recovered by removing unneeded project dependencies. 

0 comments

Leave a comment

Feedback usabilla icon