Improvements to Interop Marshaling in V4: IL Stubs Everywhere
When the CLR needs to transition between managed and native code – usually because of P\Invoke or COM interop – we need to generate marshaling stubs (little chunks of code) to handle that specific call and transform the data from managed to native format and back again. . These stubs are little pieces of code that are usually generated at runtime to do their work behind the scenes and, hopefully, without making developers or users aware of their presence. In .Net 4 we’ve improved our marshaling infrastructure substantially with a feature we call “IL Stubs Everywhere.” For those who have a deep interest in interop marshaling we get into the details of this feature below. For everyone else, here are a few key benefits of this feature:
· Faster interop marshaling: the more complex the signature the greater the speed-up
· x86 and x64 behavior matches: we’ve updated the x64 marshaling to behave exactly as x86 always has and mostly without impact to x64
· Better debugging: when something goes wrong in marshaling we now give you the ability, and specialized tools, to find the problem
The 1.0 and 1.1 versions of the CLR had several different techniques for creating and executing these stubs that were each designed for marshaling different types of signatures. These techniques ranged from directly generated x86 assembly instructions for simple signatures to generating specialized ML (an internal marshaling language)* and running them through an internal interpreter for the most complicated signatures. This system worked well enough – although not without difficulties – in 1.0 and 1.1 but presented us with a serious maintenance problem when 2.0, and its support for multiple processor architectures, came around.
We realized early in the process of adding 64 bit support to 2.0 that this approach was not sustainable across multiple architectures. Had we continued with the same strategy we would have had to create parallel marshaling infrastructures for each new architecture we supported (remember in 2.0 we introduced support for both x64 and IA64) which would, in addition to the initial cost, at least triple the cost of every new marshaling feature or bug fix. We needed one marshaling stub technology that would work on multiple processor architectures and could be efficiently executed on each one: enter IL stubs.
With x64 and IA64 bit support in v2 we introduced IL stubs to perform the marshaling on these platforms. With IL stubs we generate actual IL rather than assembly code or the specialized language in our ML stubs. On these platforms we only used the IL stubs and were thus able to have a single implementation, for all stubs, across both processor architectures. These IL stubs were much faster than the ML stubs but were also much slower than x86 stubs. Because of the performance regress and our resource constraints, we left the x86 platform with its original marshaling stubs implementation.
New in V4: IL Stubs Everywhere
One of the largest features the interop team worked on in the v4 product was the “IL Stubs Everywhere” feature. With this work we moved to uniformly using the same IL stubs infrastructure for all marshaling on all platforms. Sometimes when we spend a lot of time developing, or reworking, a feature there is a concern that it will be expensive for developers to start taking advantage of it: the good news with this one is that this will change will occur automatically and the majority of the benefits will happen just by running your app on v4.0.
A few of the benefits you will see with this feature are:
The original x86 stubs for the simplest signatures were pretty fast – often a simple copy operation — but as soon as you needed marshaling for more complex signatures you were forced onto the slow path with the ML interpreter. We heavily optimized our IL stub infrastructure and now IL stubs are faster across the board (even compared to x86 stubs) and are orders of magnitude faster than the interpreted stubs you used to get for the more interesting signatures. Additionally, for those that use NGEN you’ll find that we include most of your IL stubs in the NGEN images to improve this scenario even further.
One of the common problems people ran into in interop in the v2 days is that with different marshaling technologies used for 32 and 64 bit the behaviors didn’t line up exactly the same. There were a few places where the same code behaved differently based on platform. With every platform running off of IL stubs we now have the same behavior across all platforms.
Note: when we did this we had a choice between giving the 4.0 IL stubs the behavior of the v2.0 64 bit IL stubs or the v2.0 32 bit stubs. We chose to use the 2.0 32 bit behavior for the 4.0 because of the much larger percentage of interop apps that have been built on 32bit to date.
This means that you should not see any behavioral differences moving a 32 bit app to the v4.0 NetFX but you may see some for 64 bit interop apps. Because of the nature of these changes we expect the impact of them to 64 bit apps to be very minimal: most of the changes involve relaxing behavior that was stricter in 64 bit than 32 bit. The biggest change in reverting to 32 bit behavior was to stop clearing ‘out’ parameters during certain error conditions and should be completely non-breaking to 64 bit applications.
One of the other problems with the old 32 bit marshaling infrastructure was that it worked essentially as a black box that was very difficult to debug if something went wrong. It was very difficult to figure out how/what the runtime was trying to marshal when something went wrong. By moving to IL stubs it is actually possible to see the IL instructions that we’re executing and determine where your problem is.
Since Visual Studio does not support debugging IL, the move to IL stubs alone wouldn’t help debugging very much if that was all we did. Instead we updated our IL stubs infrastructure to fire an ETW event detailing each stub as they get generated and built a tool, which we recently released on codeplex, which lets you easily see all the stubs being generated and quickly search for the ones for the methods you are most interested in. In an upcoming post we will have more details and a walkthrough of the tool, but in the meantime you can find it here: http://clrinterop.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=29745
*ML Stubs: it is a little known fact that up until v4 the CLR, at runtime, converted complex signatures into an internal CLR language called ML (marshaling language) which it then ran through the CLR’s internal ML interpreter each time the method was invoked. This little interpreter is almost another separate runtime just for interop marshaling. You can imagine how replacing this with IL and thus getting all the advantages of the perf work that goes into NGEN and the JIT allowed us to massively speed up these scenarios. There was much rejoicing on our team when we removed this interpreter in v4.