Conversation about crossgen2
Crossgen2 is an exciting new platform addition and part of the .NET 6 release. It is a new tool that enables both generating and optimizing code in a new way.
The crossgen2 project is a significant effort, and is the focus of multiple engineers. I thought it might be interesting to try a more conversational approach to exploring new features. I sent a set of questions to the team. Simon Nattress offered to tell us more about crossgen2. Let’s see what he said. I’ll provide my own thoughts, too.
What is crossgen for and when should it be used?
Simon: Crossgen is a tool that provides ahead-of-time (AOT) compilation for your code so that the need for JITing at runtime is reduced. When publishing your application, Crossgen runs the JIT over all assemblies and stores the JITted code in an extra section that can be quickly fetched at runtime. Crossgen should be used in scenarios where fast startup is important.
Rich: You might see crossgen and readytorun terms used interchangeably. Crossgen is a tool that generates native code in (at least today) the readytorun format. The readytorun format is primarily oriented on being compatible across assemblies, and having the same compatibility guarantee as IL, while offering the performance benefits of ahead-of-time compiled code. Starting with crossgen2, it has some other modes with other characteristics.
Why are we making a new version of crossgen? What are our goals?
Simon: Crossgen’s pedigree comes from the early .NET Framework days. Its implementation is tightly coupled with the runtime (it essentially is just the runtime and JIT attached to a PE file emitter). We are building a new version of Crossgen – Crossgen 2 – which starts with a new code base architected to be a compiler that can perform analysis and optimizations not possible with the previous version.
Rich: As the .NET Core project became more mature and we saw usage grow across multiple application scenarios, we realized that crossgen’s limitation of only really being able to produce native code of one flavor with one set of characteristics was going to be a big problem. For example, we might want to generate code with different characteristics for Windows desktop on one hand and Linux containers on the other. The need for that level of code generation diversity is what motivated the project.
Is crossgen -> crossgen2 similar to the native code csc -> managed Roslyn transition? How long has it been worked on?
Simon: The Roslyn transition to managed was not just a rewrite in a different language. It defined an analysis platform for using CSC as an API. It can be used as a compiler and as a source code analyzer in an editor. Similarly, Crossgen2 is not simply a rewrite in managed. The architecture uses a graph to drive analysis and compilation. This allows scanners, optimizers, analyzers to all work off a common representation of the assembly being compiled. The project has been worked on for 2 years – the origins of the Crossgen2 compiler began as a research project around 2016.
Rich: We have a lot of people on the team that primarily write C/C++ (even assembly), but most people like writing C# better and are more productive. Every release, more of the product gets moved to C# for this and other reasons.
What are the key benefits and also the drawbacks from writing crossgen in C#?
Simon: Writing in C# gives us access to a rich set of .NET APIs as well as memory safety guarantees provided by using a managed language. A drawback of using C# is increased processing time when using Crossgen2 on many small assemblies at once because of the overhead of starting the runtime many times. Fortunately, we can mitigate much of that by running Crossgen2 on itself!
Rich: It is also super helpful being on the same team as the folks adding new capabilities to C# and .NET libraries. There is a lot of shared thinking and collaboration on low-level scenarios to enable C# to be a high-performance language. The more challenges we run into to make low-level code fast, the more we add features to fix that. It’s a virtuous cycle.
Can you describe some of the projects that are planned that are made possible with crossgen2?
Simon: Crossgen2 (unlike native Crossgen) allows us to analyze and compile multiple assemblies at once as a single servicing unit with extra optimizations allowed within the compile set.
Rich: Version bubbles is the feature that Simon is referring to, and is one of my favorite new features. By default, readytorun code is versionable, and that’s a great characteristic. I work a lot on containers and they have a key characteristic of immutability, which makes versionability unimportant. Version bubbles trade versionability for performance. That’s perfect for scenarios like containers where you’d much prefer greater performance and don’t have to give anything up for it. I’m looking forward to offering more nuanced and opinionated code in scenarios where it makes sense.
Rich: Versionability is a big topic, but I feel the need to expand on it a little. Let’s start with the book of the runtime. “When changes to managed code are made, we have to make sure that all the artifacts in a native code image only depend on information in other modules that cannot change without breaking the compatibility rules. What is interesting about this problem is that the constraints only come into play when you cross module boundaries.” Inlining is the perfect example. Methods can be inlined within the same assembly (equivalent to “module”) because the method being inlined and the method it is being inlined into reside within the same compatibility boundary. You cannot update one without updating the other. If you inline across assemblies boundaries, then the original code (that was inlined) could change and then a performance optimization is now exhibiting functionally incorrect behavior. That’s very bad. Version bubbles enable redefining the version boundary, but it is up to you to maintain that contract, and it isn’t a .NET code generation bug if you don’t.
Rich: Cross-compilation is another really important feature. You’ll be able to produce native code for Arm64 on an x64 machine and vice versa. For example, when you want to generate Arm64 code on an x64 machine, the SDK will acquire the Arm64 RyuJIT compiled for x64 so that it will run on an x64 machine. Cross-compilation is a key tenet of the architecture.
Could crossgen2 ever be used to target a runtime other than CoreCLR? For example, to enable the native AOT form factor?
Simon: Yes – much of the current Crossgen2 code is shared with the NativeAOT project which targets a different runtime. The managed type system implementation has been designed with extension points to allow for this flexibility.
What’s with the name? What’s the name you would prefer and why?
Simon: Crossgen originally started life as a cross-architecture AOT code generator for Windows Phone.
Rich: At one point, I tried to rename the tool “genr2r”, like “generator” but “r2r” at the end for “ready-to-run” but no one else was keen on that idea. At this point, I’m hoping that we’ll revert to just calling the tool “crossgen” after we’ve dropped our use of the existing crossgen tool.
First, thanks Simon for taking some time to tell us all about crossgen2. We also appreciate all your efforts on crossgen2. Simon has since moved to the Cosmos DB team. They use .NET, too!
While many of you will not use crossgen2 directly, you will certainly take advantage of the .NET platform being more optimized with this new tool. Going forward, crossgen2 will enables us even more options to make higher performance choices for the platform and for your code.
This post was the first one that I’ve posted in a conversational style. Did you like it? Should we do this again? If so, which topics should we have a conversation about next?
This is great news.
Does this benefit those of us who use .NET Core for building AWS Lambda functions? In the past ReadyToRun was suggested as a way of optimizing performance on such setups. Can we except an improvement if we were to use crossgen2 when AWS Lambda starts supporting .NET6?
I’m on the AWS .NET team and I’m excited about this work.
This is the type of work in .NET Core that can really help Lambda cold starts. Lambda is a perfect use case for what Rich called a Version Bubble as Lambda is an immutable environment. So anything crossgen2 can do even going across module boundary I’m all for it when it comes to Lambda.
Besides the performance aspects the cross-compilation will really help the developer experience for Lambda users to take advantage of this feature. With that we will be able to add a switch to the CLI or checkbox in VS to easily enable it making it really simple to use. Compared that to now where you have to move your deployment experience to Linux to take advantage of R2R.
Our team is moving from running dockerized ASP.NET Core APIs on ECS to a serverless approach. It’s been a recurring discussion point whether .NET is the right platform to build for Lambdas; especially given the cold start delays.
We are excited to see you guys are actively looking in to these aspects.
Cold start has been a weaker aspect of .NET. The features landing .NET 6 and .NET 7 should help a lot.
That’s great to hear.
I guess, we’ll still not be able to get the benefits of .NET 7 on AWS Lambda as it’s not going to be LTS; just as we are stuck with 3.1 right now 🙁
Looking forward to 6.
While interesting, the juicy part was left out, what is on the roadmap for .NET Native, it is going to be left unmaintained in its current state and we should just move into desktop for anything beyond C# 7?
This project and .NET Native don’t have a lot of overlap. Assuming we’re talking about the same thing, .NET Native is used for UWP apps. We’re waiting on the Project Reunion plan to come to fruition before making any changes for UWP and related apps. In terms of .NET Native itself, it is maintained but has a low level of investment. We will not be enabling .NET Native with Project Reunion. Any new native AOT model would be based on the NativeAOT experiment, but there is nothing to share/announce on that.
Just curious, what’s wrong with .Net Native? It is already working stable technology giving great performance boost, sometimes upto x5-x6 on modern hardware. Why managers decided to not extend it to whole .Net?
Check out the form factors doc. A lot of the context is there.
Thanks for replying.
The lack of public information, just confirms that after being burned with XNA, Silverlight, WinRT => UAP => UWP rewrites, C++/CX being dropped without proper replacement (C++/WinRT with no VS tooling ain’t it), and now .NET Native uncertainty, that focusing on Win32 is the only safe bet to avoid the continuous rewrite stream coming out from some Microsoft teams.
My long term experience on Microsoft eco-system has taught me to understand such reply as it is dead, not officially, just kind of like VB 6 and Silverlight happen to be.
UWP is looking less attractive every day as Project Reunion just keeps taking tooling away from us. As per your sibling comment, there are no reasons to believe whatever was being sold to us since Windows 8 introduction, will still have a future. Reunion is on 0.5.5, with C++ tooling that is a joke versus C++/CX, .NET Native apparently only matters to 1% of .NET devs, and WinUI is miles away to even have feature parity with WPF, let alone being able to stand on its own.
Which is a pity, given that for me, .NET Native is what .NET version 1.0 should have been all along, as per Anders Hejlsberg experience with Delphi, and the VB 6.0 native compiler toolchain.
D, Go and Java are already quite ahead in AOT tooling, if we need to keep rewriting stuff, they start looking more like an alternative.
EDIT: Please don’t take this remark personally, rather as feedback that this continuous push for the next great thing that is left in Limbo and then we get to rewrite everything, and justify to customers why we advocated for said technology, is getting tiring and it would be great if some management levels at Microsoft would take notice of this.
This is actually a large problem and puts MAUI in a difficult situation. Their current samples use Project Reunion: https://github.com/dotnet/net6-mobile-samples/blob/main/HelloWinUI3/HelloWinUI3/HelloWinUI3.csproj
Maybe they could avoid Project Reunion and use WinUI with UWP (which is AOT-compatible I believe). But then they might have to make a separate WPF platform as Xamarin currently does.
I’m not sure I have all the details but there is no point in Project Reunion supporting UWP without AOT. Better to keep it as WIP until AOT is ready.
I posted this info to https://github.com/dotnet/maui/discussions/843
According to here
There’s no real roadmap or plans beside being experimental. As I mentioned to other people with the same problem as me, I think is time to move on with Vala+GTK+Glade (for xml UI similar to what we have in the XAML). There were quite a few nice apps there made in Vala which is very very similar to C#. It translates the software to C and compile it. You have two advantages there, you have real cross-platform and real native compilation equal to C code. It is not so easy to get used to the toolchain but it definitely worth trying it out. I am already developing my first software there 🙂
Thanks for the update; always nice to read what you are working on!
With regards to the format, I think it worked quite well for this and would like to see be used again 👍.
Thanks. I have two more already planned on related topics. Might as well stick with a similar theme.
What is the difference between crossgen and ngen?
ngen is a technology which is included within .net Framework. Its also a precompilation technique but runs on the target machine. Crossgen (and latest crossgen2) are available in .net core and can be run during build time to generate native code based on OS/architecture.
Thank you, now I get it. I perceive crossgenX as being similar to cross compilation on Linux. To build native code for a different platform/architecture than the host platform/architecture. Whereas ngen was/is a .NET Framework and Windows specific tool used after building apps. If I remember correctly the main purpose for ngen was to generate native code for the BCL of the .NET Framework and third-party libraries.
Here is the way I think of it. There are three key characteristics:
– Is the tool just a separate build of the runtime or a specialized tool?
– Can it cross-target to other OSes and architectures?
– Can it generate native images in the build?
– Can it generate images in multiple flavors?
– NGEN and Crossgen 1 are a separate build of the runtime, and Crossgen 2 is a specialized tools.
– NGEN and Crossgen 1 cannot. Crossgen 2 can.
– NGEN cannot. Crossgen 1 and 2 can. This is less a function of the tool and more the format it generates and the way .NET Framework and .NET Core are installed on the machine. It’s complicated.
– NGEN cannot. Crossgen 1 used to be able to, but we dropped that support. Crossgen 2 can.
Hope that helps.
A few other important new capabilities of crossgen2 are:
1. Ability to build native composite binaries — With this all precompiled code (including the Framework libraries + application code) will be placed in a single binary. This enables further optimizations like inlining cross assembly boundaries, and compiling code for generic instantiations. We have seen measurable startup gain on Linux with composite mode.
2. Ability to specify native instruction sets like AVX, AVX2.
.Net Core is an amazing platform, and C# is an amazing language. Yet Microsoft chose go to build Dapr. Is it because of political reason or .Net Core is not as good as GO?
It is pretty simple, actually. This is all from my point of view, so the Dapr team might say something different, but I think they’d agree.
Dapr was intended to reach the hearts and minds of the CNCF community where golang is popular. When you are working with another community, it is a good idea to reduce friction as much as possible. Writing Dapr in Go helps with that. Microsoft is a seen as a leading cloud company but likely not seen (at least when Dapr was announced) as a leading innovator of technologies that are near and dear to the CNCF community. Producing Dapr in ways that are more CNCF-like was a good plan.
I see now that Microsoft joined the CNCF in 2017. I didn’t realize that. https://azure.microsoft.com/en-us/blog/announcing-cncf/
The Yarp project is a low-level cloud infra project written in C#. I hope to see more like this. https://github.com/microsoft/reverse-proxy
We also see a lot of interest in Go and Rust at Microsoft. That’s for two reasons. These languages have different characteristics than .NET/C# and may be more appropriate for some projects. That’s very much the case with Rust, less so with Go. Also, Microsoft has a lot of very smart and talented engineers. A lot of them love C# but they also want to learn new things. The best way to learn a new language is building a real project in it. We’ve seen some project choices based on just wanting to learn a new thing. That said, I’m seeing less of that particular trend more recently. That was much more common five years ago.
In addition to what Richard has said, Go is probably a reasonably good fit at this stage from a technical point of view too. Dapr is essentially a set of fairly lightweight components that run in sidecars, and for these, Go does a few things right:
Essentially, it’s just a good way to build lightweight components for sidecars. IMO C# is a more capable language in general, and often quite a bit faster, but you’d have pretty hefty sidecars that don’t start up quickly enough. Rust would be a fantastic option too, but maybe less popular choice within the CNCF. I can kind of see where they’re coming from.
There’s no better way to commit to a genuinely polyglot Dapr than to write it in something other than C#.
So this is a complicated quasi-AOT, where the complexity involves maintaining IL which is needed for versioning and compatibility. Is the purpose for large componentized systems like visual studio, where workloads need to be compatible with each other?
That would explain why Microsoft is very interested in this Crossgen technology (if it’s needed to build VS), but app developers want a true AOT technology (because they just want to compile an app to native code specifically don’t want IL).
Great question and insight, Charles. Yes and no.
The key design scenario of readytorun isn’t componentized systems, although it lends itself really well to that. It is servicing. There is so much context to describe here. I’ll do my best. Unfortunately, some history is required, but I’ll try to be brief.
With .NET Framework, we had NGEN. NGEN images are always what we call “fragile”. It has no concept of version bubbles. Unlike readytorun where the default version bubble is the assembly, NGEN treats the entire machine as one version bubble, and that’s its only mode. If you service and re-NGEN an assembly, you need to re-NGEN all of the assemblies that depend on it. Many of the security fixes in .NET Framework are to low level components, and this causes re-NGENing most if not all NGEN’d assemblies on the machine. That’s really bad, but a consequence of the fragile, high-performance, design point. Context: https://devblogs.microsoft.com/dotnet/wondering-why-mscorsvw-exe-has-high-cpu-usage-you-can-speed-it-up/
As you’ve learned, readytorun was designed for compatibily, the opposite end of the spectrum, and we all have to pay a performance penalty for that. On one hand, this approach makes no sense because when we service, we always do it with the whole product, so why bother? Fragile would be perfect! In brief, there are a few scenarios, including servicing, that can result in incorrect graph of dependencies that would break the fragile model. It’s too complicated to describe here, but the point is that we’re not yet ready to return to an all-fragile model even though it appears, on first glance, that we can and should.
A group of us are indeed trying to re-introduce the fragile, high-performance, model and make it more prevalent. We haven’t quite cracked the nut yet on exactly how to do that, but we will. It’s all about drawing lines and ensuring that certain things can or cannot happen, to ensure consistent and coherent execution.
Servicing and componentized systems have a lot of overlap. We choose to focus on the servicing scenario because (A) it is more basic, and (B) it is a very real product scenario that we deliver to users every month. The fragile model isn’t appropriate for most componentized systems.
I’m now thinking we should do a post on just the readytorun format. The crossgen2 part of this conversation is interesting, but it’s one level removed from the core issues of the execution format we use.
Thanks for the very thorough and thoughtful response!
Thank you for taking the time to write to us Appreciate the useful info.