Hacking the CAB for Smaller Patches
Working on 32- and 64-bit patch support for Developer Division projects like the .NET Framework 2.0 and Visual Studio 2005, I’ve been involved with many threads and in many meetings about the various scenarios and solutions for producing patches that target our supported platform architectures. .NET will ship in three different flavors for x86, x64, and ia64 architectures. Both 64-bit redistributables also ship the x86 binaries by merging the x86 MSM produced from the build lab. This ensures that both 32- and 64-bit managed applications and CCWs work on your shiny new 64-bit systems.
When creating patches, you create a PCP file that contains – among other things – paths to target MSIs and upgrade MSIs in the TargetImages and UpgradeImages tables, respectively. You can specify as many as you want. With my idea for a “mondo patch” (not to be confused with mondo.msm that ships with Visual Studio to facilitate dependencies for managed code in the Windows Installer projects), we list all 3 pairs of target and upgrade MSIs for all architectures. When run through PatchWiz, an MSP (patch) is produced that includes pairs of MSTs (transforms) and a CAB file that contains the files referenced in any of the MSTs.
In the .NET Framework redistributable, a majority of the assemblies are shipped as MSIL and are architecture-neutral. Currently in beta 2 only 13 pairs assemblies out of 56 pairs of assemblies (and Microsoft.Vsa.Vb.CodeDOMProcessor.dll that ships only for x86 and to run under WOW64) are architecture-specific. In each of the MSIs for x86, x64, and ia64, the File table keys are the same for x86 so PatchWiz will only include a single copy, but the x64 and ia64 File table keys have to be different. Makecab.exe – which PatchWiz invokes on a DDF file it creates – will then include 3 copies of the same MSIL assemblies, thus bloating the size of the patch unnecessarily in most cases. It’s for this reason that I’m still exploring our options and CPX is still discussing our plan for servicing for the .NET Framework 2.0.
The DDF format and makecab.exe‘s interpretation of it does not allow us to have different file descriptors pointing to the same blob in the CAB, so I was looking at the APIs and raw CAB file format structures in the Microsoft Cabinet (CAB) SDK. The APIs won’t appear to help in this case and the structures for a CAB file have optional fields – not just fields with NULL data in the file, but fields that don’t exist depending on flags specified earlier.
The development test lead me to – for now – use the File table keys for the file names, and have 2 of the file names on 0 byte files. For System.dll, the directory listing would look like this:
3 File(s) 2,527,232 bytes
We list them in the DDF using LZX compression like PatchWiz uses and run makecab.exe /f on the file below.
A file, test.cab, is produced. To modify this the CAB file format must be known. Fortunately, this is documented in the CAB SDK. It’s easy to spot the CFFILE entries because of the NULL-terminate file names, so we track backward to find the offset into the single CFFOLDER entry is the same. This is because the two 64-bit, 0-byte files are listed first and since they are 0-byte files they do not have CFDATA entries (the actual compressed file data) of their own. We simply need to change the file sizes as you can see below.
0000000: 4d53 4346 0000 0000 254e 0c00 0000 0000 MSCF....%N......
0000010: 2c00 0000 0000 0000 0301 0100 0300 0000 ,...............
0000020: b930 0000 0d01 0000 4e00 0312 0090 2600 .0......N.....&.
0000030: 0000 0000 0000 8e32 6376 2000 464c 5f53 .......2cv .FL_S
0000040: 7973 7465 6d5f 646c 6c5f 5f5f 5f5f 4136 ystem_dll_____A6
0000050: 342e 3336 3433 3233 3646 5f46 4337 305f 4.3643236F_FC70_
0000060: 3131 4433 5f41 3533 365f 3030 3930 3237 11D3_A536_009027
0000070: 3841 3142 4238 0000 9026 0000 0000 0000 8A1BB8...&......
0000080: 008e 3263 7620 0046 4c5f 5379 7374 656d ..2cv .FL_System
0000090: 5f64 6c6c 5f5f 5f5f 5f49 3634 2e33 3634 _dll_____I64.364
00000a0: 3332 3336 465f 4643 3730 5f31 3144 335f 3236F_FC70_11D3_
00000b0: 4135 3336 5f30 3039 3032 3738 4131 4242 A536_0090278A1BB
00000c0: 3800 0090 2600 0000 0000 0000 8e32 6376 8...&........2cv
00000d0: 2000 464c 5f53 7973 7465 6d5f 646c 6c5f .FL_System_dll_
00000e0: 5f5f 5f5f 5838 362e 3336 3433 3233 3646 ____X86.3643236F
00000f0: 5f46 4337 305f 3131 4433 5f41 3533 365f _FC70_11D3_A536_
0000100: 3030 3930 3237 3841 3142 4238 00ce 2617 0090278A1BB8..&.
Now when you open the CAB you see 3 files but, in fact, only a single compressed blob exists for all three. The savings: a 66% reduction in size, from 2,317 KB to 788 KB. This needs to be automated, of course.
A smaller patch that carries transforms for each architecture means a single download and administrators have to only push out a single MSP to client systems. There are other problems, however, since there still the question of patch size when 3 separate assemblies are necessary for architecture-specific assemblies and, of course, all the native binaries that make up the CLR. Hopefully this blog gives you an idea of how to reduce patch sizes in your own managed applications if you run into similar scenarios, since the new managed compilers from Microsoft for the .NET Framework (including Visual Studio 2005) allow you to specify the CPU architecture but most assemblies would still probably be MSIL.