The .NET Native Tool-Chain

This post was authored by Xy Ziemba, Program Manager on the .NET Native team.

At BUILD, we announced .NET Native Preview. .NET Native is a compilation technology and a small runtime that allow .NET applications to start up to 60% faster and have a smaller memory footprint. We’ve previously discussed at a high-level how .NET Native provides these performance benefits. Here, we’ll talk about how the .NET Native tool-chain works.

.NET Native ships as a single SDK that lets you easily convert a .NET application from Microsoft Intermediate Language (MSIL) to native code. While we’ve made this experience just a few clicks, there are a number of steps involved in converting MSIL to native code. We’ll go through some of those steps and give an overview of how .NET Native converts your app into native code.

There are seven major steps in building a .NET Native application:

Building the MSIL application from source
Generating interop marshaling and serialization code
Merging the application
Reducing the application
Other MSIL transformations
Compiling from MSIL to Machine Dependent Intermediate Language (MDIL)
Binding from MDIL to native code

Step 1 is how applications are built today. .NET Native adds steps 2-7 and the process is automated by the IL Compiler (ILC). These subsequent steps work off the MSIL part of the an application.

Before we start, please note that .NET Native is under active development. This means that a lot of the minutiae is changing and some of these major steps might change too! Let’s look at each of these steps in detail.

1. Building the MSIL application from source

All .NET applications start as source code, including .NET Native apps. The source code is compiled to MSIL binaries (EXEs and/or DLLs) using a language compiler, such as the C# compiler. There are a few other tools that are used in a typical build process, such as packaging an app into an APPX package. In the case of existing Window Store apps, an APPX package containing MSIL binaries is the final app artifact that can be uploaded to the Windows Store.

The .NET Native tool-chain inserts multiple additional steps in between source code compilation and packaging. Those additional steps are described below.

2. Generating interop marshaling and serialization code

The .NET Native tool-chain starts with the IL Compiler , or ILC. ILC begins by pre-generating additional code needed to make the application run. On other .NET platforms, this code is generated as needed at runtime. ILC starts with two tools that generate code for marshaling and serialization.

The first of these tools is the Marshaling Code Generator, or MCG. Like its name implies, MCG generates marshaling code for native code interoperability scenarios such as Windows Runtime calls, P/Invoke calls, and COM interop.

MCG scans the entire application looking for any call to native code and any possible entry point for native code to call managed code. For each callee, three things happen:

A function is created to handle marshaling the arguments.
Proxy objects called CCWs and RCWs are created as needed.
Calls are redirected to the new functions and proxy objects.

All the code for this is generated as C# in appname.interop.g.cs. This is compiled into appname.WinMDInterop.dll and is added to the application.

Second is the Serialization Generator, or SG. This tool generates code to assist .NET serializers such as DataContractSerializer or DataContractJsonSeralizer. It scans the application to identify the types of objects an app will serialize. SG analyzes these types and produces the serialization and deserialization functions used at runtime.

Why does .NET Native do this? In short, these tools provide performance wins. Tools like SG and MCG move the analysis and generation of code from application users’ computers to the developer’s computer. This leads to simpler apps and a simpler runtime with fewer moving parts that perform better. We’ve used this design principle throughout the product. We refer to it as static compilation.

3. Merging the application

ILC next ‘merges’ the application. It gathers almost all code the application needs to run – the application EXE itself, generated code from SG and MCG, referenced managed WinMDs and DLLs, and referenced parts of the .NET Framework. These MSIL binaries are all combined it into a single EXE that contains all types and data for the application. For example, the new EXE contains its own copy of System.Object and System.String. This is analogous to static linking when you build a C++ application.

A couple extra things happen to make everything work. First, references to external assemblies are rewritten to reference the current assembly. This makes the applications one big self-contained unit. Second, prefixes are added to type names to identify the original assembly that contained the type. .NET Native internally understands these prefixes and maintains a lookup table to identify the originating assembly for any piece of code. The debugger and reflection also use the same lookup table so you don’t ever see these prefixes.

By unifying the application, the merge step ensures that all the subsequent tooling only has to deal with a single artifact.

4. Reducing the application

Merging all the framework DLLs with an application adds a lot of unnecessary code and size to an application. So, ILC performs ‘dependency reduction’ using an engine called the Dependency Reducer, or DR. (Compiler folks sometimes call this ‘tree-shaking’.)

The Dependency Reducer works by identifying all code that could execute and throwing out the rest. To do this, the Dependency Reducer performs a variety of tasks including:

Analyzing code referenced by your application’s entry points
Analyzing data bindings in XAML documents
Inspecting arguments passed to certain reflection methods
Following instructions provided by Runtime Directives

The last item is especially important. Visual Studio automatically adds a default Runtime Directive policy when you migrate an application to .NET Native. You can find this policy in default.rd.xml. By modifying the Runtime Directives, you can tune the behavior of the Dependency Reducer and often eliminate even more unused code from an application. This will make the application smaller and allow it to build faster.

5. Other MSIL transformations

Steps 2-4 were all ‘MSIL transformations’. We also perform dozens of smaller transformations in addition to those detailed above.

Many of these transformations move runtime work into ILC. For example, other .NET runtimes include the ability to generate GetHashCode and Equals implementations for value types. In .NET Native, these implementations are generated in advance of runtime. In another example, .NET Native generates implementations for calls to Delegate.Invoke at compile time whereas other .NET runtimes generate these implementations at runtime. By not doing this work at runtime, .NET Native gets multiple small performance wins that add up to big wins that your users will notice.

Other transformations exist to persist information that would otherwise be lost when an application is turned into native code. For example, type and member names are not present in native code, but reflection requires that information. Transforms collect data that your application needs for reflection. It encodes the data in a format that can persist through the compilation.

appname.ilexe is emitted after these transformations are all completed. This is the last step that directly modifies MSIL.

6. Compiling from MSIL to MDIL

At this point, there is a merged and reduced EXE containing MSIL that is ready to be compiled from high-level MSIL into machine code. ILC invokes a modified version of the Microsoft VC++ compiler called NUTC.

NUTC is modified to import MSIL and to understand the .NET type system. The optimizations and analysis are all powered by the C++ backend. This brings the best of C++ to .NET. This means that .NET Native applications take advantage of Microsoft Visual C++ inlining, dead code removal, and vectorization. This is the step that really provides the “native” in .NET Native.

NUTC doesn’t actually output a binary that’s ready to run. It outputs MDIL, or Machine Dependent Intermediate Language. MDIL isn’t a high-level language like MSIL. Instead, MDIL includes a given platform’s assembly language with some additional tokens to avoid hard-coding certain addresses and pointers. These additional tokens create a looser coupling between NUTC and the .NET Native runtime. They are resolved in the next step through a process called ‘binding’.

If you’re interested in MDIL, check out this Channel 9 talk.

7. Binding from MDIL to native code

The binder converts MDIL into machine code that a given architecture (e.g. ARM, x64) can run. This tool resolves the MDIL instructions in the file and hard-codes them to the .NET Native runtime. For example, the binder connects object and array allocations in the applications to the garbage collector in the runtime. You can also think of this like linking a traditional C application.

The binder’s final output is an optimized DLL that contains the app code. Of course, an app can’t run from just a DLL. So, the binder also emits a small stub application to load the DLL and start the application’s execution.

Summary

As you can see, the current .NET Native tool-chain is composed of a lot of parts. However, this has all been done to move logic and complexity from the runtime and into the tool-chain. This means that applications start faster, are simpler, and can be better optimized.

In the coming weeks, we’ll have additional blog posts to explore various parts of the tool-chain.