Hacking the CAB for Smaller Patches



Working on 32- and 64-bit patch support for Developer Division projects like the .NET Framework 2.0 and Visual Studio 2005, I’ve been involved with many threads and in many meetings about the various scenarios and solutions for producing patches that target our supported platform architectures. .NET will ship in three different flavors for x86, x64, and ia64 architectures. Both 64-bit redistributables also ship the x86 binaries by merging the x86 MSM produced from the build lab. This ensures that both 32- and 64-bit managed applications and CCWs work on your shiny new 64-bit systems.

When creating patches, you create a PCP file that contains – among other things – paths to target MSIs and upgrade MSIs in the TargetImages and UpgradeImages tables, respectively. You can specify as many as you want. With my idea for a “mondo patch” (not to be confused with mondo.msm that ships with Visual Studio to facilitate dependencies for managed code in the Windows Installer projects), we list all 3 pairs of target and upgrade MSIs for all architectures. When run through PatchWiz, an MSP (patch) is produced that includes pairs of MSTs (transforms) and a CAB file that contains the files referenced in any of the MSTs.

In the .NET Framework redistributable, a majority of the assemblies are shipped as MSIL and are architecture-neutral. Currently in beta 2 only 13 pairs assemblies out of 56 pairs of assemblies (and Microsoft.Vsa.Vb.CodeDOMProcessor.dll that ships only for x86 and to run under WOW64) are architecture-specific. In each of the MSIs for x86, x64, and ia64, the File table keys are the same for x86 so PatchWiz will only include a single copy, but the x64 and ia64 File table keys have to be different. Makecab.exe – which PatchWiz invokes on a DDF file it creates – will then include 3 copies of the same MSIL assemblies, thus bloating the size of the patch unnecessarily in most cases. It’s for this reason that I’m still exploring our options and CPX is still discussing our plan for servicing for the .NET Framework 2.0.

The DDF format and makecab.exe‘s interpretation of it does not allow us to have different file descriptors pointing to the same blob in the CAB, so I was looking at the APIs and raw CAB file format structures in the Microsoft Cabinet (CAB) SDK. The APIs won’t appear to help in this case and the structures for a CAB file have optional fields – not just fields with NULL data in the file, but fields that don’t exist depending on flags specified earlier.

The development test lead me to – for now – use the File table keys for the file names, and have 2 of the file names on 0 byte files. For System.dll, the directory listing would look like this:

We list them in the DDF using LZX compression like PatchWiz uses and run makecab.exe /f on the file below.

A file, test.cab, is produced. To modify this the CAB file format must be known. Fortunately, this is documented in the CAB SDK. It’s easy to spot the CFFILE entries because of the NULL-terminate file names, so we track backward to find the offset into the single CFFOLDER entry is the same. This is because the two 64-bit, 0-byte files are listed first and since they are 0-byte files they do not have CFDATA entries (the actual compressed file data) of their own. We simply need to change the file sizes as you can see below.

Now when you open the CAB you see 3 files but, in fact, only a single compressed blob exists for all three. The savings: a 66% reduction in size, from 2,317 KB to 788 KB. This needs to be automated, of course.

A smaller patch that carries transforms for each architecture means a single download and administrators have to only push out a single MSP to client systems. There are other problems, however, since there still the question of patch size when 3 separate assemblies are necessary for architecture-specific assemblies and, of course, all the native binaries that make up the CLR. Hopefully this blog gives you an idea of how to reduce patch sizes in your own managed applications if you run into similar scenarios, since the new managed compilers from Microsoft for the .NET Framework (including Visual Studio 2005) allow you to specify the CPU architecture but most assemblies would still probably be MSIL.

Heath Stewart

Senior Software Engineer

Follow Heath   


    Leave a comment