March 7th, 2019

Making C++ Exception Handling Smaller On x64

Visual Studio 2019 Preview 3 introduces a new feature to reduce the binary size of C++ exception handling (try/catch and automatic destructors) on x64. Dubbed FH4 (for __CxxFrameHandler4, see below), I developed new formatting and processing for data used for C++ exception handling that is ~60% smaller than the existing implementation resulting in overall binary reduction of up to 20% for programs with heavy usage of C++ exception handling.

*Update 5/25/2019*

Due to logistical issues we couldn’t get FH4 turned default on in Update 1. Everything looks in place to have FH4 default on in an early preview of Update 2 *fingers crossed*.

In addition, as Paul reported in the comments below in the Visual Studio 2019 RTM release the new runtime wasn’t being properly installed into system32 with just the Visual Studio installation. That was a general bug in RTM that we’ve since fixed in Update 1, for RTM please run “C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Redist\MSVC\14.20.27508\vc_redist.x64.exe” to install the runtime on any machine you plan on running binaries using FH4.

*Update 7/25/2019*

We discovered that hooks used for debugging were missing in the runtime for FH4. This causes debugger only issues where “step-into” from a throw no longer goes into the corresponding catch and a “step-into” out of the catch no longer goes to the next line in the normal execution path (instead they both go to the next breakpoint/end of program). This has been fixed for 16.2 but due to UWP library turn-around could not make it in time for UWP runtimes in 16.2. Given that, it was decided to not push FH4 default-on in 16.2 with this known issue but wait for 16.3 where everything will line up. Preview 1 of 16.3 which GA’ed yesterday does have FH4 on by default with the plan to keep it on moving forward.

How Do I Turn This On?

FH4 is currently off by default because the runtime changes required for Store applications could not make it into the current release. To turn FH4 on for non-Store applications, pass the undocumented flag “/d2FH4” to the MSVC compiler in Visual Studio 2019 Preview 3 and beyond.

We plan on enabling FH4 by default once the Store runtime has been updated. We’re hoping to do this in Visual Studio 2019 Update 1 and will update this post once we know more.

Tools Changes

Any installation of Visual Studio 2019 Preview 3 and beyond will have the changes in the compiler and C++ runtime to support FH4. The compiler changes exist internally under the aforementioned “/d2FH4” flag. The C++ runtime sports a new DLL called vcruntime140_1.dll that is automatically installed by VCRedist. This is required to expose the new exception handler __CxxFrameHandler4 that replaces the older __CxxFrameHandler3 routine. Static linking and app-local deployment of the new C++ runtime are both supported as well.

Now onto the fun stuff! The rest of this post will cover the internal results from trialing FH4 on Windows, Office, and SQL, followed by more in-depth technical details behind this new technology.

Motivation and Results

About a year ago, our partners on the C++/WinRT project came to the Microsoft C++ team with a challenge: how much could we reduce the binary size of C++ exception handling for programs that heavily used it?

In context of a program using C++/WinRT, they pointed us to a Windows component Microsoft.UI.Xaml.dll which was known to have a large binary footprint due to C++ exception handling. I confirmed that this was indeed the case and generated the breakdown of binary size with the existing __CxxFrameHandler3, shown below. The percentages in the right side of the chart are percent of total binary size occupied by specific metadata tables and outlined code.

Size Breakdown of Microsoft.UI.Xaml.dll using __CxxFrameHandler3

I won’t discuss in this post what the specific structures on the right side of the chart do (see James McNellis’s talk on how stack unwinding works on Windows for more details). Looking at the total metadata and code however, a whopping 26.4% of the binary size was used by C++ exception handling. This is an enormous amount of space and was hampering adoption of C++/WinRT.

We’ve made changes in the past to reduce the size of C++ exception handling in the compiler without changing the runtime. This includes dropping metadata for regions of code that cannot throw and folding logically identical states. However, we were reaching the end of what we could do in just the compiler and wouldn’t be able to make a significant dent in something this large. Analysis showed that there were significant wins to be had but required fundamental changes in the data, code, and runtime. So we went ahead and did them.

With the new __CxxFrameHandler4 and its accompanying metadata, the size breakdown for Microsoft.UI.XAML.dll is now the following:

Size Breakdown of Microsoft.UI.Xaml.dll using __CxxFrameHandler4

The binary size used by C++ exception handling drops by 64% leading to an overall binary size decrease of 18.6% on this binary. Every type of structure shrank in size by staggering degrees:

EH Data __CxxFrameHandler3 Size (Bytes) __CxxFrameHandler4 Size (Bytes) % Size Reduction
Pdata Entries 147,864 118,260 20.0%
Unwind Codes 224,284 92,810 58.6%
Function Infos 255,440 27,755 89.1%
IP2State Maps 186,944 45,098 75.9%
Unwind Maps 80,952 69,757 13.8%
Catch Handler Maps 52,060 6,147 88.2%
Try Maps 51,960 5,196 90.0%
Dtor Funclets 54,570 45,739 16.2%
Catch Funclets 102,400 4,301 95.8%
Total 1,156,474 415,063 64.1%

 

Combined, switching to __CxxFrameHandler4 dropped the overall size of Microsoft.UI.Xaml.dll from 4.4 MB down to 3.6 MB.

Trialing FH4 on a representative set of Office binaries shows a ~10% size reduction in DLLs that use exceptions heavily. Even in Word and Excel, which are designed to minimize exception usage, there’s still a meaningful reduction in binary size.

Binary Old Size (MB) New Size (MB) % Size Reduction Description
chart.dll 17.27 15.10 12.6% Support for interacting with charts and graphs
Csi.dll 9.78 8.66 11.4% Support for working with files that are stored in the cloud
Mso20Win32Client.dll 6.07 5.41 11.0% Common code that’s shared between all Office apps
Mso30Win32Client.dll 8.11 7.30 9.9% Common code that’s shared between all Office apps
oart.dll 18.21 16.20 11.0% Graphics features that are shared between Office apps
wwlib.dll 42.15 41.12 2.5% Microsoft Word’s main binary
excel.exe 52.86 50.29 4.9% Microsoft Excel’s main binary

 

Trialing FH4 on core SQL binaries shows a 4-21% reduction in size, primarily from metadata compression described in the next section:

Binary Old Size (MB) New Size (MB) % Size Reduction Description
sqllang.dll 47.12 44.33 5.9% Top-level services: Language parser, binder, optimizer, and execution engine
sqlmin.dll 48.17 45.83 4.8% Low-level services: transactions and storage engine
qds.dll 1.42 1.33 6.3% Query store functionality
SqlDK.dll 3.19 3.05 4.4% SQL OS abstractions: memory, threads, scheduling, etc.
autoadmin.dll 1.77 1.64 7.3% Database tuning advisor logic
xedetours.dll 0.45 0.36 21.6% Flight data recorder for queries

 

The Tech

When analyzing what caused the C++ exception handling data to be so large in Microsoft.UI.Xaml.dll I found two primary culprits:

  1. The data structures themselves are large: metadata tables were fixed size with fields of image-relative offsets and integers each four bytes long. A function with a single try/catch and one or two automatic destructors had over 100 bytes of metadata.
  2. The data structures and code generated were not amenable to merging. The metadata tables contained image-relative offsets that prevented COMDAT folding (the process where the linker can fold together identical pieces of data to save space) unless the functions they represented were identical. In addition, catch funclets (outlined code from the program’s catch blocks) could not be folded even if they were code-identical because their metadata is contained in their parents.

To address these issues, FH4 restructures the metadata and code such that:

  1. Previous fixed sized values have been compressed using a variable-length integer encoding that drops >90% of the metadata fields from four bytes down to one. Metadata tables are now also variable length with a header to indicate if certain fields are present to save space on emitting empty fields.
  2. All image-relative offsets that can be function-relative have been made function-relative. This allows COMDAT folding between metadata of different functions with similar characteristics (think template instantiations) and allows these values to be compressed. Catch funclets have been redesigned to no longer have their metadata stored in their parents’ so that any code-identical catch funclets can now be folded to a single copy in the binary.

To illustrate this, let’s look at the original definition for the Function Info metadata table used for __CxxFrameHandler3. This is the starting table for the runtime when processing EH and points to the other metadata tables. This code is available publicly in any VS installation, look for <VS install path>\VC\Tools\MSVC\<version>\include\ehdata.h:

typedef const struct _s_FuncInfo
{
    unsigned int        magicNumber:29;     // Identifies version of compiler
    unsigned int        bbtFlags:3;         // flags that may be set by BBT processing
    __ehstate_t         maxState;           // Highest state number plus one (thus
                                            // number of entries in unwind map)
    int                 dispUnwindMap;      // Image relative offset of the unwind map
    unsigned int        nTryBlocks;         // Number of 'try' blocks in this function
    int                 dispTryBlockMap;    // Image relative offset of the handler map
    unsigned int        nIPMapEntries;      // # entries in the IP-to-state map. NYI (reserved)
    int                 dispIPtoStateMap;   // Image relative offset of the IP to state map
    int                 dispUwindHelp;      // Displacement of unwind helpers from base
    int                 dispESTypeList;     // Image relative list of types for exception specifications
    int                 EHFlags;            // Flags for some features.
} FuncInfo;

This structure is fixed size containing 10 fields each 4 bytes long. This means every function that needs C++ exception handling by default incurs 40 bytes of metadata.

Now to the new data structure (<VS install path>\VC\Tools\MSVC\<version>\include\ehdata4_export.h):

struct FuncInfoHeader
{
    union
    {
        struct
        {
            uint8_t isCatch     : 1;  // 1 if this represents a catch funclet, 0 otherwise
            uint8_t isSeparated : 1;  // 1 if this function has separated code segments, 0 otherwise
            uint8_t BBT         : 1;  // Flags set by Basic Block Transformations
            uint8_t UnwindMap   : 1;  // Existence of Unwind Map RVA
            uint8_t TryBlockMap : 1;  // Existence of Try Block Map RVA
            uint8_t EHs         : 1;  // EHs flag set
            uint8_t NoExcept    : 1;  // NoExcept flag set
            uint8_t reserved    : 1;
        };
        uint8_t value;
    };
};


struct FuncInfo4
{
    FuncInfoHeader header;
    uint32_t bbtFlags;         // flags that may be set by BBT processing


    int32_t  dispUnwindMap;    // Image relative offset of the unwind map
    int32_t  dispTryBlockMap;  // Image relative offset of the handler map
    int32_t  dispIPtoStateMap; // Image relative offset of the IP to state map
    uint32_t dispFrame;        // displacement of address of function frame wrt establisher frame, only used for catch funclets
};

Notice that:

  1. The magic number has been removed, emitting 0x19930522 every time becomes a problem when a program has thousands of these entries.
  2. EHFlags has been moved into the header while dispESTypeList has been phased out due to dropped support of dynamic exception specifications in C++17. The compiler will default to the older __CxxFrameHandler3 if dynamic exception specifications are used.
  3. The lengths of the other tables are no longer stored in “Function Info 4”. This allows COMDAT folding to fold more of the pointed-to tables even if the “Function Info 4” table itself cannot be folded.
  4. (Not explicitly shown) The dispFrame and bbtFlags fields are now variable-length integers. The high-level representation leaves it as an uint32_t for easy processing.
  5. bbtFlags, dispUnwindMap, dispTryBlockMap, and dispFrame can be omitted depending on the fields set in the header.

Taking all this into account, the average size of the new “Function Info 4” structure is now 13 bytes (1 byte header + three 4 byte image relative offsets to other tables) which can scale down even further if some tables are not needed. The lengths of the tables were moved out, but these values are now compressed and 90% of them in Microsoft.UI.Xaml.dll were found to fit within a single byte. Putting that all together, this means the average size to represent the same functional data in the new handler is 16 bytes compared to the previous 40 bytes—quite a dramatic improvement!

For folding, let’s look at the number of unique tables and funclets with the old and new handler:

EH Data Count in __CxxFrameHandler3 Count in __CxxFrameHandler4 % Reduction
Pdata Entries 12,322 9,855 20.0%
Function Infos 6,386 2,747 57.0%
IP2State Map Entries 6,363 2,148 66.2%
Unwind Map Entries 1,487 1,464 1.5%
Catch Handler Maps 2,603 601 76.9%
Try Maps 2,598 648 75.1%
Dtor Funclets 2,301 1,527 33.6%
Catch Funclets 2,603 84 96.8%
Total 36,663 19,074 48.0%

 

The number of unique EH data entries drops by 48% from creating additional folding opportunities by removing RVAs and redesigning catch funclets. I specifically want to call out the number of catch funclets italicized in green: it drops from 2,603 down to only 84. This is a consequence of C++/WinRT translating HRESULTs to C++ exceptions which generates plenty of code-identical catch funclets that can now be folded. Certainly a drop of this magnitude is on the high-end of outcomes but nevertheless demonstrates the potential size savings folding can achieve when the data structures are designed with it in mind.

Performance

With the design introducing compression and modifying runtime execution there was a concern of exception handling performance being impacted. The impact, however, is a positive one: exception handling performance improves with __CxxFrameHandler4 as opposed to __CxxFrameHandler3. I tested throughput using a benchmark program that unwinds through 100 stack frames each with a try/catch and 3 automatic objects to destruct. This was run 50,000 times to profile execution time, leading to overall execution times of:

__CxxFrameHandler3 __CxxFrameHandler4
Execution Time 4.84s 4.25s

 

Profiling showed decompression does introduce additional processing time but its cost is outweighed by fewer stores to thread-local storage in the new runtime design.

Future Plans

As mentioned in the title, FH4 is currently only enabled for x64 binaries. However, the techniques described are extensible to ARM32/ARM64 and to a lesser extent x86. We’re currently looking for good examples (like Microsoft.UI.Xaml.dll) to motivate extending this technology to other platforms—if you think you have a good use case let us know!

The process of integrating the runtime changes for Store applications to support FH4 is in flight. Once that’s done, the new handler will be enabled by default so that everyone can get these binary size savings with no additional effort.

Closing Remarks

For anybody who thinks their x64 binaries could do with some trimming down: try out FH4 (via ‘/d2FH4’) today! We’re excited to see what savings this can provide now that this feature is out in the wild. Of course, if you encounter any issues please let us know in the comments below, by e-mail (visualcpp@microsoft.com), or through Developer Community. You can also find us on Twitter (@VisualC).

Thanks to Kenny Kerr for directing us to Microsoft.UI.Xaml.dll, Ravi Pinjala for gathering the numbers on Office, and Robert Roessler for trialing this out on SQL.

 

Author

Dev on Microsoft C++ backend.

36 comments

Discussion is closed. Login to edit/delete existing comments.

  • Michael Grabelkovsky

    Thanks for good feature and even more for your detail description here.
    It helps me to fix problem in my Project, when I receive a lot of errors on linking:

    >
    cor.lib(XXX1.obj) : error LNK2001: unresolved external symbol __CxxFrameHandler4
    9>cor.lib(XXX2.obj) : error LNK2001: unresolved external symbol __CxxFrameHandler4
    9>cor.lib(XXX3.obj) : error LNK2001: unresolved external symbol __CxxFrameHandler4
    9>AAA.lib(XXX4.obj) : error LNK2001: unresolved external symbol __GSHandlerCheck_EH4
    9>cor.lib(XXX5.obj) : error LNK2001: unresolved external symbol __GSHandlerCheck_EH4
    9>cor.lib(XXX6.obj) : error LNK2001: unresolved...

    Read more
    • Modi MoMicrosoft employee Author

      Good to hear that you were able to resolve your issue! As an additional tools you can look into what object files are pulling in the dependency on FH4 by adding "/verbose" to your link command. If you have a dependency on FH4 it'll give an output similar to the following:
      Searching C:\Program Files (x86)\Microsoft Visual Studio\2019\Preview\VC\Tools\MSVC\14.26.28720\lib\x64\libvcruntime.lib:
      Found _CxxThrowException
      ...

      Read more
  • FearsomeKitten

    I'm trying to do exception handling in native mode (link /subsystem:native). __CxxFrameHandler4 uses functions from kernel32, which normally is quite reasonable. However kernel32 isn't available in native mode.

    My hope was that SEH would work, but /EHa also uses __CxxFrameHandler4. I see that there are a number of exported functions in NTDLL.dll with "Exception" in their name, but if they support stack unwinding, I don't know how to tie into them.

    There's a little...

    Read more
    • Modi MoMicrosoft employee Author

      There's a few layers to this question, so I'm going to try and address them individually.

      1. Destructors are a C++ entity and as such only __CxxFrameHandler3/4 can understand them and process them. Without those available they won't get called regardless of configuration.
      2. /EHa allows __CxxFrameHandler3/4 to interact with SEH exceptions but these functions still need to be present to perform that job. __CxxFrameHandler3/4 is what takes care of calling the destructors when unwinding an...

      Read more
  • Dippy Aggarwal (Intel Americas Inc)Microsoft employee

    Hi. Are we supposed to use /d2FH4 in cl.exe OR -d2FH4-? Thank you.

    • Modi MoMicrosoft employee Author

      With 16.3 and beyond FH4 is default on for AMD64 so you don’t need any additional flags to enable it. Otherwise:

      /d2FH4: enables FH4
      /d2FH4-: disables FH4

      Prefixing can use “/” or “-” interchangeably with no difference.

    • Ianier Munoz

      To Modi: does 16.3 RTM have a switch to turn off this feature? That would be an acceptable workaround.

      • Modi MoMicrosoft employee Author

        Yes, the same switches as detailed above still works in 16.3 RTM. Add ‘-d2FH4-‘ to your cl.exe command line and ‘-d2:-FH4-‘ to the link line to revert back to the previous handler.

      • Damien Lebrun

        Could you confirm that the “cl.exe” command line is simply the line in VS2019->Properties->C/C++->Command Line ?
        cl was for me short for “Compile and Link” so I didn’t understand how to add your switch.
        BR

      • Modi MoMicrosoft employee Author

        Yeah the cl.exe command line is as you described. For the linker it’s the same thing but under linker. To summarize:

        VS2019->Properties->C/C++->Command Line add ‘-d2FH4-‘
        VS2019->Properties->Linker->Command Line add ‘-d2:-FH4-‘

      • Damien Lebrun

        Hi,
        same problem as this user : the link of 64 bits bits fails with same errors. We now use the release 16.3 version of VS2019. Would it be possible to precise what have to be done to switch this off please. The indicated command -d2:-FH4- doesn’t work. Screenshots would be nice.
        Thank you for your help,
        BR.

    • Dirk Busse

      Since we upgraded yesterday to 16.3, all our 64-bit builds are failing.
      All our solutions and projects are from Visual Studio 2017, but our build servers have Visual Studio 2019 installed and have yesterday been upgraded to 16.3.
      Since the upgrade yesterday, all 64-bit builds are failing with the following error:

      > build 24-Sep-2019 18:53:59 PP_ppuProdPlusUtils.lib(static_mutex.obj) : error LNK2001: unresolved external symbol __CxxFrameHandler4 [D:\Bamboo\xml-data\build-dir\EDC-NIGHT-B64D\pplus\Source\Core\PP_cfgConfiguration\PP_cfgConfiguration.vcxproj]
      > build 24-Sep-2019 18:53:59 PP_ppuProdPlusUtils.lib(w32_regex_traits.obj) : error LNK2001: unresolved external symbol __CxxFrameHandler4 [D:\Bamboo\xml-data\build-dir\EDC-NIGHT-B64D\pplus\Source\Core\PP_cfgConfiguration\PP_cfgConfiguration.vcxproj]
      > build 24-Sep-2019 18:53:59 PP_ppuProdPlusUtils.lib(winstances.obj)...

      Read more
      • Dirk Busse

        In our case, the problem was that we had a batch file to build Boost and this batch file was using the latest available compiler version. This means that Boost was compiled with VS2019. Then our projects have been compiled with VS2017 and libraries compiled with VS2019 are now not compatible any more with applications build with VS2017.

      • Nikolay Baklicharov

        I think the runtime is guaranteed to be only backward compatible, not forward compatible. I will give an example:
        If you build the boost libraries with VS 2017 and your main executable with VS 2019, everything should be OK but not vise versa.

        That means that libraries and executables that are build with VS 2017 are expecting their dependencies to be build with VS 2017 or VS 2015. The same goes for VS 2019 - executables...

        Read more
      • Modi MoMicrosoft employee Author

        Being in the same DLL doesn't solve the issue. The DLL in 2017 would lack this feature while the one present in 2019 will now have this feature. In fact that's a major reason why it's not and could not be in the same DLL: an app-local deployment can load in the 2017 DLL that could end up being used by a dependent program that needs the 2019 DLL. Being the same name it would...

        Read more
      • Cyriuz .

        Even if that is the case, it feels pretty unnecessary to break the current future compatibility all the way from VS 2015 for this feature? Why couldn’t this be part of the same dll at least?

      • Modi MoMicrosoft employee Author

        Correct, this is backwards compatible not forwards compatible. @Dirk is your issue with store apps or non-store apps? Assuming this is non-store, upgrading everything to 2019 is the cleanest way around this.

  • Jan Ringoš

    Hello Modi. Was this feature removed? Recently I’ve noticed that my executables no longer link to vcruntime140_1.dll despite the option /d2FH4 (I’m on 16.2.5). Or perhaps the option switch renamed?

    • Modi MoMicrosoft employee Author

      This is now default on in 16.3. In 16.2.5 the switch should still work to turn this on.

      • Jan Ringoš

        EDIT: Never mind. I upgraded to 16.3 and now it works. Again, you did awesome great job on this, Thank you!

  • roger andrews

    SO what happens for peole with older VCruntime140 – will a windwos update add this so people can deploy usingVS2019 without worring about what the target system has instakked – as logn as at least vcruntime140 is there ?

    • Modi MoMicrosoft employee Author

      vcruntime140.dll remains fully binary compatible with all previous DLLs of that name. All the new functionality exists exclusively in the new vcruntime140_1.dll.
      With every VS update/redistributable run we'll place the latest version of vcruntime140.dll into system32 but all of them with the same name are by design ABI compatible with each other. So systems with older vcruntime140 will get updated to a functionally identical ABI compatible version and get the new vcruntime140_1.dll deployed alongside it...

      Read more
  • Mark Harmer

    It was mentioned that this was expected to be turned on by default for Update 1. I just installed VS2019 Update 1 (Preview 1) and it doesn’t seem to be enabled – is this expected?

    • Modi MoMicrosoft employee Author

      The Store was not updated in time to turn this on by default in Preview 1 of Update 1. That being said, the Store runtimes have now been updated and you can try out FH4 on Store applications for x64.

      • Mark Harmer

        I think I misunderstood the default settings after Update 1 – is this only turned on for store applications? I was originally asking about desktop applications, is there an expected time frame on when it will be turned on by default? Are there any issues with explicitly turning on the undocumented flag for production desktop application builds?

      • Modi MoMicrosoft employee Author

        In 16.0 and 16.0 Update 1, everything is in place to turn FH4 on for desktop applications. Without the Store runtime available though, we couldn't enable it on by default because there's no way to tell if we're building something that is a Store application or part of one. My previous statement is saying that with the now GA Update 1 Store support is online alongside Desktop support. Sorry for the confusion.
        As far as...

        Read more
  • Paul Cameron

    I just installed VS2019 RTM to give this a try. Our main exe size was reduced 11%, thanks for that. One issue is that it won’t run because vcruntime140_1.dll is missing. Is there a separate step from running Visual Studio installer to get this dll installed?

    • Modi MoMicrosoft employee Author

      There shouldn’t be but I’m seeing that only the debug DLL is being installed with just running VS2019. The retail DLL can be manually installed by running: “C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Redist\MSVC\14.20.27508\vc_redist.x64.exe” which should place vcruntime140_1.dll into system32. Try that out and let me know if that resolves the issue.

  • Reuven Abliyev

    Just wonder where is magic 0x19930522  coming from?
    Is it date when exception where first implemented in MSVC compiler?

    • Modi MoMicrosoft employee Author

      Very likely something to do with implementation/shipping for C++ EH, though I don’t have definitive proof. 0x19930522 is actually the third magic number and used to indicate the information supports EH flags (/EHs and /EHa) so 0x19930520 would be the original value/date.

  • Runzhen HuangMicrosoft employee

    Amazing work! I wonder if you have trie compiling PPT and pptlink_desktop with /D2FH4, and what the file size reduction for ppcore.dll? Thanks.

    • Modi MoMicrosoft employee Author

      I have not tried to compile PPT and its associated DLLs. However, if you want to test it out you can build through the latest VS or use nuget packages with the updated toolset. If you do, definitely let me know the results!