Most likely you’ve heard about The Law of Leaky Abstractions coined by Joel Spolsky. Even if you never heard of it, you definitely faced it in your day-to-day job. The “law” is pretty simple: “All non-trivial abstractions, to some degree, are leaky”. And this is 100% true. But sometimes even not that complicated abstractions can leak their internal details.
Let’s consider the following code snippet:
public class NodeFactory { public static TNode CreateNode<TNode>() where TNode : Node, new() { return new TNode(); } }
Do you see any issues with it? Will it pass a thorough code review? Of course, you need to know the context. For instance, you need to know what the TNode types are, whether the constructor of those types can throw exceptions and whether the method can be called on a hot path of an app.
But first of all, you need to know what the compiler and the runtime will do with a method like this.
Once a user calls a method CreateNode, the C# compiler checks that a given type has a default constructor and if this is the case the compiler will emit a call to it. Right? Not exactly. The compiler doesn’t know upfront what constructor to call, so it delegates all the job to a helper method – Activator.CreateInstance<T> (*).
(*) This statement is not 100% correct. Different C# compilers emit different code for new T(). The C# compiler starting from VS2015 emits a call to the Activator.CreateInstance(), but older versions are “smarter”: they return default(T) for value types and calls the Activator.CreateInstance() only for reference types.
Ok, and what’s wrong with the Activator? Nothing, if you know how it’s implemented.
Implementation details of the Activator.CreateInstance
Non-generic version of the Activator.CreateInstance(Type) was first introduced in the .NET Framework 1.0 and was based on reflection. The method checks for a default constructor of a given type and calls it to construct an instance. We can even implement a very naïve version of this method ourselves:
{ public static T CreateInstance<T>() where T : new() { return (T) CreateInstance(typeof(T)); } public static object CreateInstance(Type type) { var constructor = type.GetConstructor(new Type[0]); if (constructor == null && !type.IsValueType) { throw new NotSupportedException($"Type '{type.FullName}' doesn't have a parameterless constructor"); } var emptyInstance = FormatterServices.GetUninitializedObject(type); return constructor?.Invoke(emptyInstance, new object[0]) ?? emptyInstance; } }
As we’ll see shortly an actual implementation of Activator.CreateInstance is a bit more complicated and relies on some internal CLR methods for creating an uninitialized instance. But the idea is the same: get a ConstructorInfo, create uninitialized instance and then call the constructor to initialized it, similar to the placement new concept in C++.
But the generic version “knows” the type being created at compile time so the implementation could be way more efficient, right? Nope. Generic version is just a façade that gets the type from its generic argument and calls the old method – reflection-based Activator.CreateInstance(Type).
You may wander: “Ok, for new T() the C# compiler calls Activator.CreateInstance<T>() that calls Activator.CreateInstance(Type) that uses reflection to do its job. Is it a big deal?” Yes, it is!
Concern #1. Performance
Using reflection to create a frequently instantiated type can substantially affect the performance of your application. Currently I work on build system and one of the components is responsible for parsing build specification files. The first implementation of the parser was used a factory method that created every node using new TNode() as shown above. The very first profiling session showed a sizable impact of the factory on the end-to-end performance. Just by switching to a more expression-based implementation of the node factory we gained 10% performance improvements for one of our end-to-end scenarios.
To be more specific, let’s compare different ways of creating a Node instance: explicit construction, using Func<Node>, Activator.CreateInstance and a custom factory based on the new() constraint.
public static T Create<T>() where T : new() => new T(); public static Func<Node> NodeFactory => () => new Node(); // Benchmark 1: ActivatorCreateInstace var node1 = System.Activator.CreateInstance<Node>(); // Benchmark 2: FactoryWithNewConstraint var node2 = Create<Node>(); // Benchmark 3: ConstructorCall var node3 = new Node(); // Benchmark 4: FuncBasedFactory var node4 = NodeFactory();
Here are the perf numbers obtained using BenchmarkDotNet:
Method | Mean | StdDev | Gen 0 | ------------------------------- |------------ |---------- |------- | ActivatorCreateInstance | 98.6628 ns | 3.0845 ns | - | FactoryMethodWithNewConstraint | 103.0030 ns | 4.2670 ns | - | ConstructorCall | 2.4361 ns | 0.0430 ns | 0.0036 | FuncBasedFactory | 6.8369 ns | 0.0436 ns | 0.0034 |
As we can see, the difference is pretty drastic: a factory method based on the new() constraint is 15 times slower than a delegate-based solution and 50 times slower than manual construction. But performance is not the only concern.
Correctness
Reflection-based method invocation means that any exception thrown from the method will be wrapped in a TargetInvocationException:
class Node { public Node() { throw new InvalidOperationException(); } } public static T Create<T>() where T : new() => new T(); try { var node = Create<Node>(); Console.WriteLine("Node was create successfully"); } catch (InvalidOperationException) { // Handling the error! Console.WriteLine("Failed to create a node!"); }
Is it obvious for everyone that the code shown above is incorrect? Reflection-based object construction “leaks” through the generics implementation. And now every developer needs to know how new T()is implemented and the consequences it has in terms of exception handling: every exception thrown from the constructor will be wrapped in a TargetInvocationException!
You may fix the issue if you know that the type’s constructor may throw an exception. Starting from .NET 4.5 you can use ExceptionDispatchInfo class to rethrow an arbitrary exception object (an inner exception in this case) without altering the exception’s stack trace:
public static T Create<T>() where T : new() { try { return new T(); } catch (TargetInvocationException e) { var edi = ExceptionDispatchInfo.Capture(e.InnerException); edi.Throw(); // Required to avoid compiler error regarding unreachable code throw; } }
This code solves one issue with Activator.CreateInstance, but as we’ll see in a moment, there are better solutions that fix correctness as well as performance issues.
Correctness (2)
Activator.CreateInstance is implemented in a more complicated way than I mentioned before. Actually, it has a cache that holds constructor information for the last 16 instantiated types . This means that the user won’t pay the cost of getting the constructor info via reflection all the time, although it will pay the cost of a slow reflection-based constructor invocation.
A more accurate description of the algorithm used by Activator.CreateInstance is as following:
- Create a raw instance using RuntimeTypeHandle.Allocate(this)
- Get the ConstructorInfo for the given type’s parameterless constructor
- If the constructor information is already in the cache, get it from there
- If the constructor information is not in the cache, get a ConstructorInfo via reflection and put it into the cache
- Call the constructor on the newly created instance and return a fully constructed instance to the caller
But unfortunately, this optimization has an issue (reproducible in .NET 4.0 – 4.6.2): the optimization doesn’t handle structs with a parameterless constructor properly. Current C# compiler doesn’t support custom default constructors for structs. But the CLR and some other languages do: you may create a struct with a default constructor using C++/CLI or IL directly. Moreover, this feature was added to C# 6, but was removed from the language 3 months before the official release. And the reason is this bug in Activator.CreateInstance. Today there is a hot discussion at github about this feature, and it seems that even the language authors can’t agree on whether default constructors on structs is a good thing or not.
The issue is related to a caching logic in Activator.CreateInstance: if it gets the constructor information from the cache it doesn’t call the constructor for structs assuming, apparently, that they don’t exist (see InitializeCacheEntry method). And this means that if you have a struct with a default constructor, and you create an instance of that type multiple times, the constructor will only be called for the first instance.
We can’t easily fix the issues in Activator.CreateInstance and we definitely can’t change the existing behavior of new T() without breaking the world. But we can avoid using it and create our own generic factory that won’t suffer from the aforementioned issues.
Solution #1: using expression trees
Expression trees are a good tool for lightweight code generation. In our case, we can use an expression tree that creates a new instance of type T. And then we can compile it to a delegate to avoid performance penalty.
Lambda-expressions are special in the C# language because they’re convertible by the compiler to a delegate (DelegateType) or to an expression (Expression<DelegateType>). The compiler can convert an arbitrary expression to a delegate but only a limited set of language constructs can be converted to an expression. In our case the expression is very simple, so the compiler can cope with it:
public static class FastActivator { public static T CreateInstance<T>() where T : new() { return FastActivatorImpl<T>.NewFunction(); } private static class FastActivatorImpl<T> where T : new() { // Compiler translates 'new T()' into Expression.New() private static readonly Expression<Func<T>> NewExpression = () => new T(); // Compiling expression into the delegate public static readonly Func<T> NewFunction = NewExpression.Compile(); } }
FastActivator.CreateInstance is conceptually similar to Activator.CreateInstance but it lacks two main issues: it doesn’t suffer from the exception-wrapping problem and it doesn’t rely on reflection during the execution (it does rely on the reflection during expression construction, but this happens only once).
Let’s compare different solutions and see what we get:
Method | Mean | StdDev | Gen 0 | ---------------------------- |----------- |---------- |------- | ActivatorCreateInstance | 94.6173 ns | 0.5036 ns | - | FuncBasedFactory | 6.5049 ns | 0.0551 ns | 0.0034 | FastActivatorCreateInstance | 22.2258 ns | 0.2240 ns | 0.0020 |
FastActivator is almost 5 times faster than the default one, but still 3.5 times slower than the func-based factory. I’ve intentionally removed the other cases we saw at the beginning; func-based solution is our base line, because any custom solution can’t beat an explicit constructor call for a known type.
The question is, why is the compiled delegate way slower than a manually-written delegate? Expression.Compile creates a DynamicMethod and associates it with an anonymous assembly to run it in a sandboxed environment. This makes it safe for a dynamic method to be emitted and executed by partially trusted code but adds some run-time overhead.
The overhead can be removed by using a constructor of DynamicMethod which associates it with a specific module. Unfortunately, Expression.Compile doesn’t allow us to customize the creation of a dynamic method and the only other option is to use Expression.CompileToMethod. CompileToMethod compiles the expression into a given MethodBuilder instance. But this won’t work for our scenario because we can’t create a method via MethodBuilder that has access to internal/private members of different assemblies. And this will restrict our factory to public types only.
Instead of relying on Expression.Compile we can “compile” our simple factory manually:
public static class DynamicModuleLambdaCompiler { public static Func<T> GenerateFactory<T>() where T:new() { Expression<Func<T>> expr = () => new T(); NewExpression newExpr = (NewExpression)expr.Body; var method = new DynamicMethod( name: "lambda", returnType: newExpr.Type, parameterTypes: new Type[0], m: typeof(DynamicModuleLambdaCompiler).Module, skipVisibility: true); ILGenerator ilGen = method.GetILGenerator(); // Constructor for value types could be null if (newExpr.Constructor != null) { ilGen.Emit(OpCodes.Newobj, newExpr.Constructor); } else { LocalBuilder temp = ilGen.DeclareLocal(newExpr.Type); ilGen.Emit(OpCodes.Ldloca, temp); ilGen.Emit(OpCodes.Initobj, newExpr.Type); ilGen.Emit(OpCodes.Ldloc, temp); } ilGen.Emit(OpCodes.Ret); return (Func<T>)method.CreateDelegate(typeof(Func<T>)); } }
The GenerateFactory method creates a DynamicMethod instance and associates that method with a given module. This immediately gives the method access to all internal members of the current assembly. But we specify skipVisibility as well, because the factory method should be able to create internal/private types declared in other assemblies as well. The name ‘lambda’ is never used and would be visible only during debugging.
This method creates an expression tree to get the constructor information even though we can get it manually. Note that the method checks newExpr.ConstructorInfo and uses different logic if the constructor is missing (i.e. for value types without a default constructor defined).
With the new helper method, FastActivator will be implemented in the following way:
public static class FastActivator { public static T CreateInstance<T>() where T : new() { return FastActivatorImpl<T>.Create(); } private static class FastActivatorImpl<T> where T : new() { public static readonly Func<T> Create = DynamicModuleLambdaCompiler.GenerateFactory<T>(); } }
Let’s compare the new implementation (FastActivatorCreateInstance) with the expression-based one (CompiledExpression):
Method | Mean | StdDev | Gen 0 | ---------------------------- |----------- |---------- |------- | ActivatorCreateInstance | 93.8858 ns | 1.2702 ns | - | FuncBasedFactory | 6.4719 ns | 0.0640 ns | 0.0033 | FastActivatorCreateInstance | 11.6035 ns | 0.0774 ns | 0.0030 | CompiledExpression | 22.7874 ns | 0.1509 ns | 0.0021 |
As we can see, the new version of the fast activator is two times faster than the old one, but still two times slower than the func-based factory. Let’s explore, why.
The reason is in implementation of the generics in the CLR. A generic method that calls a method from a generic type will never be inlined, so we suffer the overhead of an additional method call. But the more important thing is subtler. If the generic is instantiated with a value type the CLR has no other options except to generate a separate type for it. This means that a List<int> and a List<double> are completely independent from the CLR perspective. However, this is not the case with reference types. Two generic instantiations like List<string> and List<object> share the same EEClass, which allows the CLR to reuse the code between different instantiations and avoid code bloating. But this optimization trades speed for memory.
When you have one generic type (or method) that calls another generic type (or method) the CLR needs to make sure that actual types are compatible at runtime (**). To make sure that this is the case the CLR will make a few look-ups that affect the performance in the previous example and make our FastActivator slower than the delegate like () => new Node().
(**) The CLR implementation of generics is a very complicated topic and it’s definitely out of scope of this blogpost. If you want to understand the design of generics and the complexity of the problem better, I recommend to read an amazing article written by the author of the generics in .NET – Don Syme – Design and Implementation of Generics for the .NET Common Language Runtime. If you want to understand the current state of affairs, please see Pro .NET Performance or a very good article by Alexandr Nikitin, .NET Generics under the hood.
To prove this assumption let’s use the same factory on a value type Node:
Method | Mean | StdDev | Gen 0 | Allocated | ---------------------------- |----------- |---------- |------- |---------- | ActivatorCreateInstance | 86.4298 ns | 2.5527 ns | 0.0005 | 12 B | FuncBasedFactory | 4.7406 ns | 0.0254 ns | - | 0 B | FastActivatorCreateInstance | 4.3134 ns | 0.0159 ns | - | 0 B | CompiledExpression | 3.1534 ns | 0.0210 ns | - | 0 B |
As we can see, there is no performance impact of the current solution when structs are involved.
To solve the issue with reference types we can avoid additional level of indirection and move the nested FastActivatorImpl<T> out from the façade FastActivator type and use it directly:
public static class FastActivator<T> where T : new() { /// <summary> /// Extremely fast generic factory method that returns an instance /// of the type <typeparam name="T"/>. /// </summary> public static readonly Func<T> Create = DynamicModuleLambdaCompiler.GenerateFactory<T>(); }
And here are the last results when the FastActivator<T> is introduced and used directly:
Method | Mean | StdDev | Gen 0 | ------------------------ |----------- |---------- |------- | ActivatorCreateInstance | 95.0161 ns | 1.0861 ns | 0.0005 | FuncBasedFactory | 6.5741 ns | 0.0608 ns | 0.0034 | FastActivator_T_Create | 5.1715 ns | 0.0466 ns | 0.0034 |
As you can see we’ve achieved the goal and created a generic factory method with the same performance characteristic as a plain delegate that instantiates a specific type!
Application-specific fix for the Activator.CreateInstance issue
The C# compiler uses “duck typing” for many language constructs. For example, LINQ syntax is pattern based: if the compiler is able to find Select, Where and other methods for a given variable (via extension methods or as instance methods) it will be able to compile queries using a query comprehension syntax.
The same is true for some other language features, like the collection initialization syntax, async/await, foreach loop and others. But not everyone knows that there is a large list of “well known members” that the user may potentially provide to change the runtime behavior. And one of such well-known members is Activator.CreateInstance<T>.
This means that if the C# compiler is able to find another System.Activator type with a generic CreateInstance method then the given method will be used instead of the method from mscorlib. The following behavior is undocumented and I would not recommend using it in a production environment without clear evidence from a profiler. And even if a profiler shows some benefit , I would prefer using FastActivator explicitly instead on relying on this hack.
namespace System { /// <summary> /// Dirty hack that allows using a fast implementation /// of the activator. /// </summary> public static class Activator { public static T CreateInstance<T>() where T : new() { #if DEBUG Console.WriteLine("Fast Activator was called"); #endif return ActivatorImpl<T>.Create(); } private static class ActivatorImpl<T> where T : new() { public static readonly Func<T> Create = DynamicModuleLambdaCompiler.GenerateFactory<T>(); } } }
Now, all methods that call new T() to create an instance of a type, will use our custom implementation instead of relying on the default one.
Conclusion
This is a fairly long post, but we managed to cover many interesting details.
- The new() constraint in the C# language is extremely leaky: in order to use it correctly and efficiently the developer should understand the implementation details of the compiler and the BCL.
- We’ve figured out that the C# compiler calls Activator.CreateInstance<T> for creating an instance of a generic argument with a new() constraint (but remember, this is true only for C# 6+ compilers and the older versions emit the call only for reference types).
- We’ve discovered the implications of the Activator.CreateInstance from a developer’s point of view in terms of correctness and performance.
- We’ve come up with a few alternatives, starting with a very simple one that “unwraps” TargetInvocationException, to a fairly sophisticated solution based on code generation.
- We’ve discussed a few interesting aspects of the generics implementation in the CLR and their impact on the performance (very minor, and likely negligible in the vast majority of cases).
- And finally, we’ve come up with a solution that can solve aforementioned issues with the new() constrained by using the custom System.Activator.CreateInstance<T> implementation.
And as a final conclusion I won’t suggest that anyone removes all calls to new T() in their codebase or define their own System.Activator class. You need to profile your application and make the decision only based on real evidence.
But to avoid shooting yourself in the foot, you need to know what the compiler and the runtime do for new T() and other widely used language constructs and what the implications are from correctness and performance perspectives.
0 comments