Five Myths about Managed Code

My name is Immo Landwerth and I was a Program Manager intern this year in the CLR team. In this blog post I am not going to showcase any of the fantastic features that will ship with .NET 4.0 – my colleagues in the CLR team know them much better and already did a fabulous job discussing them here, over there and on Channel 9.

Instead I want to discuss the following five myths about managed code and in particular about the CLR:

· Managed code is always JIT compiled

· Generic co- and contra variance are new in .NET 4.0

· Everything is an object

· .NET only supports statically typed languages

· Microsoft is not using Managed Code

Myth Five – Managed code is always JIT compiled

Having a JIT compiler has many advantages because a lot of things are becoming much easier when a JIT compiler is available:

1. On-the-fly code generation (System.Reflection.Emit) is much easier because you only have to target one virtual machine architecture (IL) instead all the processor architectures the runtime supports (such as x86 and x64).

2. To some degree it solves the fragile base class library problem. That means we can share class definitions across modules without having the problem that changes such as adding fields or adding virtual methods crashes dependent code.

3. The working set can be improved because the JIT only compiles methods that are actually executed.

4. Theoretically, you could take situational facts into consideration, such as which processor-architecture is actually used (e.g. is it SSE2 capable), the application usage patterns etc. and optimize differently for them.

However, JIT compilation also has downsides such as:

1. It takes time. That means JIT compilation always has to trade-off time vs. code quality.

2. The code is stored on private pages so the compiled code is not shared across processes.

Therefore we created a tool called NGEN that allows you to pre-create native images during the setup. You could call this ahead-of-time compilation (as opposed to just-in-time). Certain special conditions left aside (such as some hosting scenarios or profiling), the runtime will now pick up the native images instead of JIT-compiling the code.

Why did we not allow you to pre-create the native images during build time and let you ship the native images directly? Well, because we then run into the fragile base class library problem mentioned above. In that case, your native images would get invalid every time the .NET Framework is updated. Today we solve this problem by re-running NGEN on the customer’s machine when the framework is serviced. In .NET Framework 4 we ship a new feature called targeted patching, that allows us for method-body only changes to minimize or to even to fully avoid recompilation. For more details about NGEN in general see here and for more details about NGEN in .NET Framework 4 see here.

Even if you are not using NGEN for you application code: for desktop CLR applications all the assemblies that are part of the .NET Framework itself are not JIT compiled – instead the runtime will bind to the native images. So even in these cases only your application code will be JIT compiled and therefore both ahead-of-time as well as just-in-time technologies are used simultaneously. Thus, stating that all code is JITted is simply wrong.

Myth Four – Generic co- and contra variance are new in .NET 4.0

The short answer is ‘no’. The longer answer is ‘well, sort of’.

But I am getting ahead of myself. Let’s first see what co- and contravariance actually means. Generic covariance allows you to call a method that takes an IEnumerable<Shape> with an IEnumerable<Circle> (if Circle is derived from Shape). This is useful if Shape contains, e.g. a method that allows you to compute the area. This way you can write a method that computes the area for any collection of shapes. Contravariance on the other hand allows you to call a method that takes an IComparer<Circle> with an IComparer<Shape>. This is handy if someone wants to compare circles and you already have created a general comparer for any shape (this works because if your comparer knows how to compare two instances of Shape it certainly is also able to compare two instances of Circle).

The support for co- and contra variance has always been in the CLR since generics came up in the .NET Framework 2.0. However, as Rick Byers pointed out you would have to use ILASM for creating covariant and contravariant type definitions:

In IL, covariant type parameters are indicated by a ‘+’, and contravariant type parameters are indicated by a ‘-‘ (non-variant type parameters are the default, and can be used anywhere).

What has been added in the .NET 4.0 release is language support for C# and Visual Basic. For example, the following uses the C# syntax (in and out modifiers for the generic type declaration) to create some covariant and contra variant types:

// Covariant parameters can be used as result types

interface IEnumerator<out T> {

T Current { get; }

bool MoveNext();

}

// Covariant parameters can be used in covariant result types

interface IEnumerable<out T> {

IEnumerator<T> GetEnumerator();

}

// Contravariant parameters can be used as argument types

interface IComparer<in T> {

bool Compare(T x, T y);

}

Myth Three – Everything is an object

“Wait a minute – this is the number one programming promise everyone was making about .NET!” you might say now. Yes, and yet it is false. Many .NET or C# books make this mistake in one form or the other. “Everything is an object”. Although we believe there is a lot of value in simplifying things for didactic reasons (and hence many authors just claim it that way) we would like to take this opportunity to tell you “sorry, it is not completely true”.

Before we discuss this issue we should first define what the sentence “everything is an object” is supposed to mean. The interpretation we will use here is this:

Every type is derived from a single root (System.Object). This means, that every value can be implicitly casted to System.Object. More precisely, this means that every value is representable as an instance of System.Object.

So why is this not true for the CLR? The counter example is a whole class of types that are not derived from System.Object: pointers (such as int*). So you cannot pass a pointer to a method that takes an object. In addition you cannot call the ToString or GetHashCode methods on a pointer.

We could also use a different interpretation of “everything is an object” such as:

Every type is derived from a single root (System.Object). This means, that every value is an object at all times.

Why is this different? Simple values (i.e.. values that have types derived from System.ValueType) are not objects by the definition of an object (they lack identity). But every value can be casted implicitly to System.Object (because System.ValueType is derived from System.Object). However, in that case an object instance that contains the value is created. This process is called boxing. The resulting object instance (the “box”, not to be confused with Don Box) has indeed a notion of identity (which is in particular also true for Don Box).

As you can see, the CLR uses the first interpretation and yet it is still not completely true as pointers do not derive from System.Object.

Myth Two – .NET only supports statically typed languages

It is true that the CLR uses a static type system. But this does not necessarily mean that it is only suited for programming languages that use a static type system. At the end, the programming language is implemented using the CLR but it is not identical with the CLR. So do not be fooled by the fact that the type system and mechanics of C# almost map directly to first class CLR-concepts. In fact, there are many concepts in C# that the CLR is not aware of:

1. Namespaces. As far as the CLR is concerned namespaces do not even exist. They are just implemented as type prefixes separated by dots (so instead of saying ‘the class Console is contained in the namespace System’ the CLR would just say ‘there is a class called System.Console’).

2. Iterators. The CLR does not provide any support for it. All the magic is done by the compiler (if you want to know, the compiler turns your method into a new type that internally uses a state-machine to track the current point of execution. Details can be found here).

3. Lambdas. They are just syntactic sugar. For the runtime these are just delegates, which in turn can also be considered syntactic sugar. In fact, a delegate is nothing more than a class derived from System.MulticastDelegate that provides an Invoke, BeginInvoke and EndInvoke method with the appropriate parameters.

Please note that this list is not complete. Instead it is only used to show you that even C# has to implement itself on top of the CLR and hence it is not a 1:1 mapping of the concepts the runtime provides. What does this have to do with static typing vs. dynamic typing? The answer is simply: you can implement a dynamically typed system on top of a statically typed system.

If you know see a huge business opportunity here, we have to disappoint you. Some smart people already had the same idea. This effort is called the Dynamic Language Runtime, or DLR for short. If you are like me then you immediately think of native code when someone mentions the term ‘runtime’. However, the DLR is completely implemented in C# and is just a class library that can be used by programming languages to implement dynamic systems on top of the CLR. The DLR shares the fundamental design principle of the CLR, i.e. it provides a platform for more than one language. That means you can share IronPython objects with IronRuby objects because they are implemented with the same underlying runtime (the DLR).

With .NET 4.0 the DLR ships as part of the box. So while .NET has first-class support for statically typed languages through the CLR it also provides first-class support for dynamically typed languages through the DLR.

Myth One – Microsoft is not using Managed Code

We often hear this (“Office and Windows are still not built on top of managed code!”) when customers ask about performance and future investments of Microsoft in managed code. The reasoning goes like this:

Since Microsoft is not implementing Windows and Office in managed code that means that it must be significantly flawed/runs much slower than native code and therefore their long term strategy will still be C++. This in turn means that we should not use managed code either.

In fact Microsoft has a huge investment in managed code (although it is still true that Office and Windows are not implemented in managed code). However, there are a bunch of products that are significantly (if not completely) implemented in managed code:

1. Windows components, such as

a. PowerShell

b. System Center

2. Office components, such as

a. Exchange

b. SharePoint/Office Server

3. Developer Tools, such as

a. Visual Studio and Visual Studio Team System

b. Expression

4. Dynamics

This list if by far not complete but it should be large enough to convince you that we are in fact ‘eating our own dog food’.

The reason that not all products are written in managed is not only related to performance. Sometimes the wins of re-implementing working native code in managed code do not outweigh its costs. On the other hand, there are still scenarios in which managed code simply cannot be used today (such as building the CLR itself or the debugger).

However, we will not deny that there are scenarios in which we cannot compete with the performance of native code today. But this does not mean that we have given up on this. In fact, projects like Singularity should show you that we are really very ambitious about redefining the limits of the managed world.

The last thing to keep in mind is that manually optimized assembler code is also faster than plain C-code. But this does not mean that all operating systems are completely written in assembler.

Thus our vision is more like this: native code where it makes sense, managed code where it makes sense with the bigger portion being managed.