{"id":1275,"date":"2018-07-17T12:19:42","date_gmt":"2018-07-17T04:19:42","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/seteplia\/?p=1275"},"modified":"2019-06-11T21:39:27","modified_gmt":"2019-06-12T04:39:27","slug":"performance-implications-of-default-struct-equality-in-c","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/performance-implications-of-default-struct-equality-in-c\/","title":{"rendered":"Performance implications of default struct equality in C#"},"content":{"rendered":"<p>If you&#8217;re familiar with C#, then you most likely heard that you should always override <code>Equals<\/code> and <code>GetHashCode<\/code> for custom structs for performance reasons. To better understand the importance and the rationale behind this advice we&#8217;re going to look at the default behavior to see why and where the performance hit comes from. Then we&#8217;ll look at a performance bug that occurred in my project and at the end we&#8217;ll discuss some tools that can help to avoid the issue altogether.<\/p>\n<h4>How important the issue is?<\/h4>\n<p>Not every <strong>potential<\/strong> performance issue affects an end-to-end time of your application. <code>Enum.HasFlag<\/code> is not very efficient (*), but unless it is used on a very hot path, it would not cause a severe issue for your product. The same is true for <a href=\"https:\/\/blogs.msdn.microsoft.com\/seteplia\/2018\/05\/03\/avoiding-struct-and-readonly-reference-performance-pitfalls-with-errorprone-net\/\">defensive copies<\/a> caused by non-readonly structs in the readonly contexts. The issues are real but they&#8217;re unlikely to be visible in regular applications.<\/p>\n<p>(*) The implementation was fixed in .NET Core 2.1 and as I&#8217;ve mentioned in <a href=\"https:\/\/blogs.msdn.microsoft.com\/seteplia\/2018\/06\/12\/dissecting-new-generics-constraints-in-c-7-3\/\">the previous post<\/a>, now you can mitigate the issue with a custom <code>HasFlag<\/code> implementation for the older runtimes.<\/p>\n<p>But the issue we&#8217;re talking about today is different. If a struct does not provide <code>Equals<\/code>and <code>GetHashCode<\/code>, then the default versions of these methods from <code>System.ValueType<\/code> are used. The versions that could easily affect an application&#8217;s end-to-end performance in a very significant way.<\/p>\n<h4>Why the default implementations are so slow?<\/h4>\n<p>The CLR authors tried their best to make the default implementations of <code>Equals<\/code> and <code>GetHashCode<\/code> for value types as efficient as possible. But there are a couple of reasons why they won&#8217;t be as efficient as a custom version written by hand (or generated by a compiler) for a specific type.<\/p>\n<ol>\n<li>Boxing allocation. The way the CLR is designed, every call to a member defined in <code>System.ValueType<\/code> or <code>System.Enum<\/code> types cause a boxing allocation (**).<\/li>\n<\/ol>\n<p>(**) Unless the method is a JIT intrinsic. For instance, in Core CLR 2.1 the JIT compiler knows about <code>Enum.HasFlag<\/code> and emits a very optimal code that causes no boxing allocations.<\/p>\n<ol start=\"2\">\n<li>Potential collisions of the default <code>GetHashCode<\/code> implementation. An implementer of a hash function faces a dilemma: make a good distribution of the hash function or to make it fast. In some cases, it&#8217;s possible to achieve them both, but it is hard to do this generically in <code>ValueType.GetHashCode<\/code>.<\/li>\n<\/ol>\n<p>The canonical hash function of a struct &#8220;combines&#8221; hash codes of all the fields. But the only way to get a hash code of a field in a <code>ValueType<\/code> method is to use reflection. So, the CLR authors decided to trade speed over the distribution and the default <code>GetHashCode<\/code> version just returns a hash code of a first non-null field and <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/master\/src\/vm\/comutilnative.cpp#L2196\">&#8220;munges&#8221; it with a type id<\/a> (***) (for more details see <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/master\/src\/vm\/comutilnative.cpp#L2063\"><code>RegularGetValueTypeHashCode<\/code><\/a> in coreclr repo at github).<\/p>\n<p>(***) Based on the comment in the CoreCLR repo, this behavior may change in the future.<\/p>\n<pre class=\"lang:default decode:true \">public readonly struct Location\r\n{\r\n    public string Path { get; }\r\n    public int Position { get; }\r\n    public Location(string path, int position) =&gt; (Path, Position) = (path, position);\r\n}\r\n\r\n\r\nvar hash1 = new Location(path: \"\", position: 42).GetHashCode();\r\nvar hash2 = new Location(path: \"\", position: 1).GetHashCode();\r\nvar hash3 = new Location(path: \"1\", position: 42).GetHashCode();\r\n\/\/ hash1 and hash2 are the same and hash1 is different from hash3<\/pre>\n<p>This is a reasonable behavior unless it&#8217;s not. For instance, if you&#8217;re unlucky enough and the first field of your struct has the same value for most instances, then a hash function will provide the same result all the time. And, as you may imagine, this will cause a drastic performance impact if these instances are stored in a hash set or a hash table.<\/p>\n<ol start=\"3\">\n<li>Reflection-based implementation is slow. Very slow. Reflection is a powerful tool when used correctly. But it is horrible if it&#8217;s used on an application&#8217;s hot path.<\/li>\n<\/ol>\n<p>Let&#8217;s see how a poor hash function that you can get because of (2) and the reflection-based implementation affects the performance:<\/p>\n<pre class=\"lang:default decode:true \">public readonly struct Location1\r\n{\r\n    public string Path { get; }\r\n    public int Position { get; }\r\n    public Location1(string path, int position) =&gt; (Path, Position) = (path, position);\r\n}\r\n\r\npublic readonly struct Location2\r\n{\r\n    \/\/ The order matters!\r\n    \/\/ The default GetHashCode version will get a hashcode of the first field\r\n    public int Position { get; }\r\n    public string Path { get; }\r\n    public Location2(string path, int position) =&gt; (Path, Position) = (path, position);\r\n}\r\n\r\npublic readonly struct Location3 : IEquatable&lt;Location3&gt;\r\n{\r\n    public string Path { get; }\r\n    public int Position { get; }\r\n    public Location3(string path, int position) =&gt; (Path, Position) = (path, position);\r\n\r\n    public override int GetHashCode() =&gt; (Path, Position).GetHashCode();\r\n    public override bool Equals(object other) =&gt; other is Location3 l &amp;&amp; Equals(l);\r\n    public bool Equals(Location3 other) =&gt; Path == other.Path &amp;&amp; Position == other.Position;\r\n}\r\n\r\n\r\nprivate HashSet&lt;Location1&gt; _locations1;\r\nprivate HashSet&lt;Location2&gt; _locations2;\r\nprivate HashSet&lt;Location3&gt; _locations3;\r\n\r\n\r\n[Params(1, 10, 1000)]\r\npublic int NumberOfElements { get; set; }\r\n\r\n[GlobalSetup]\r\npublic void Init()\r\n{\r\n    _locations1 = new HashSet&lt;Location1&gt;(Enumerable.Range(1, NumberOfElements).Select(n =&gt; new Location1(\"\", n)));\r\n    _locations2 = new HashSet&lt;Location2&gt;(Enumerable.Range(1, NumberOfElements).Select(n =&gt; new Location2(\"\", n)));\r\n    _locations3 = new HashSet&lt;Location3&gt;(Enumerable.Range(1, NumberOfElements).Select(n =&gt; new Location3(\"\", n)));\r\n    _locations4 = new HashSet&lt;Location4&gt;(Enumerable.Range(1, NumberOfElements).Select(n =&gt; new Location4(\"\", n)));\r\n}\r\n\r\n[Benchmark]\r\npublic bool Path_Position_DefaultEquality()\r\n{\r\n    var first = new Location1(\"\", 0);\r\n    return _locations1.Contains(first);\r\n}\r\n\r\n[Benchmark]\r\npublic bool Position_Path_DefaultEquality()\r\n{\r\n    var first = new Location2(\"\", 0);\r\n    return _locations2.Contains(first);\r\n}\r\n\r\n[Benchmark]\r\npublic bool Path_Position_OverridenEquality()\r\n{\r\n    var first = new Location3(\"\", 0);\r\n    return _locations3.Contains(first);\r\n}<\/pre>\n<pre class=\"lang:default decode:true \">Method | NumOfElements | Mean | Gen 0 | Allocated | \r\n-------------------------------- |-------------- |--------------:|--------:|----------:| \r\nPath_Position_DefaultEquality | 1 | 885.63 ns | 0.0286 | 92 B | \r\nPosition_Path_DefaultEquality | 1 | 127.80 ns | 0.0050 | 16 B | \r\nPath_Position_OverridenEquality | 1 | 47.99 ns | - | 0 B | \r\nPath_Position_DefaultEquality | 10 | 6,214.02 ns | 0.2441 | 776 B | \r\nPosition_Path_DefaultEquality | 10 | 130.04 ns | 0.0050 | 16 B | \r\nPath_Position_OverridenEquality | 10 | 47.67 ns | - | 0 B | \r\nPath_Position_DefaultEquality | 1000 | 589,014.52 ns | 23.4375 | 76025 B | \r\nPosition_Path_DefaultEquality | 1000 | 133.74 ns | 0.0050 | 16 B | \r\nPath_Position_OverridenEquality | 1000 | 48.51 ns | - | 0 B |<\/pre>\n<p>If the first field is always the same, the default hash function returns the same value for all the elements. This effectively transforms a hash set into a linked list with O(N) for insertion and lookup operations. And the operation that populates the collection becomes O(N^2) (N insertions with O(N) complexity per insertion). It means that the insertion of 1000 elements into a set will cause almost 500_000 calls to <code>ValueType.Equals<\/code>. A method, that uses reflection under the hood!<\/p>\n<p>And as you can see from the benchmark, the performance may be tolerable if you&#8217;re lucky and the first element of the struct is unique (<code>Position_Path_DefaultEquality<\/code>case). But the performance may be horrible if this is not the case.<\/p>\n<h4>Real-world issue<\/h4>\n<p>Now you can guess what kind of issue I&#8217;ve faced recently. A couple of weeks ago, I received a bug report that the end-to-end time for an app I&#8217;m working on increased from 10 seconds to 60. Luckily the report was very detailed and contained an ETW trace that immediately showed the bottleneck: <code>ValueType.Equals<\/code> was taking 50 seconds.<\/p>\n<p>After a very quick look at the code it was clear what was the problem:<\/p>\n<pre class=\"lang:default decode:true \">private readonly HashSet&lt;(ErrorLocation, int)&gt; _locationsWithHitCount;\r\nreadonly struct ErrorLocation\r\n{\r\n    \/\/ Empty almost all the time\r\n    public string OptionalDescription { get; }\r\n    public string Path { get; }\r\n    public int Position { get; }\r\n}<\/pre>\n<p>We used a tuple that contained a custom struct with default equality implementation. And unfortunately, the struct had an optional first field that was almost always equals to <code>string.Equals<\/code>. The performance was OK until the number of elements in the set increased significantly causing a real performance issue, taking minutes to initialize a collection with tens of thousands of items.<\/p>\n<h4>Is the default implementation of <code>ValueType.Equals\/GetHashCode<\/code> always slow?<\/h4>\n<p>Both <code>ValueType.Equals<\/code> and <code>ValueType.GetHashCode<\/code> have a special optimization. If a type does not have &#8220;pointers&#8221; and is properly packed (I&#8217;ll show an example in a minute) then more optimal versions are used: <code>GethashCode<\/code> iterates over an instance and XORs blocks of 4 bytes and <code>Equals<\/code> method compares two instances using <code>memcmp<\/code>.<\/p>\n<pre class=\"lang:default decode:true \">\/\/ Optimized ValueType.GetHashCode implementation\r\nstatic INT32 FastGetValueTypeHashCodeHelper(MethodTable *mt, void *pObjRef)\r\n{\r\n    INT32 hashCode = 0;\r\n    INT32 *pObj = (INT32*)pObjRef;\r\n\r\n   \/\/ this is a struct with no refs and no \"strange\" offsets, just go through the obj and xor the bits\r\n   INT32 size = mt-&gt;GetNumInstanceFieldBytes();\r\n   for (INT32 i = 0; i &lt; (INT32)(size \/ sizeof(INT32)); i++)\r\n       hashCode ^= *pObj++;\r\n\r\nreturn hashCode;\r\n}\r\n\r\n\/\/ Optimized ValueType.Equals implementation\r\nFCIMPL2(FC_BOOL_RET, ValueTypeHelper::FastEqualsCheck, Object* obj1, Object* obj2)\r\n{\r\n    TypeHandle pTh = obj1-&gt;GetTypeHandle();\r\n\r\n    FC_RETURN_BOOL(memcmp(obj1-&gt;GetData(), obj2-&gt;GetData(), pTh.GetSize()) == 0);\r\n}<\/pre>\n<p>The check itself is implemented in <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/2583ce936776a0eac31df904e41d5119840c203b\/src\/vm\/comutilnative.cpp#L2009\"><code>ValueTypeHelper::CanCompareBits<\/code><\/a> and it is called from both <a href=\"https:\/\/referencesource.microsoft.com\/#mscorlib\/system\/valuetype.cs,43\"><code>ValueType.Equals<\/code><\/a> and from <a href=\"https:\/\/github.com\/dotnet\/coreclr\/blob\/master\/src\/vm\/comutilnative.cpp#L2210\"><code>ValueType.GetHashCode<\/code><\/a>implementations.<\/p>\n<p>But the optimization is very tricky.<\/p>\n<p>First, it is hard to know when the optimization is enabled and even minor changes in the code can turn it &#8220;on&#8221; and &#8220;off&#8221;:<\/p>\n<pre class=\"lang:default decode:true \">public struct Case1\r\n{\r\n    \/\/ Optimization is \"on\", because the struct is properly \"packed\"\r\n    public int X { get; }\r\n    public byte Y { get; }\r\n}\r\n\r\npublic struct Case2\r\n{\r\n    \/\/ Optimization is \"off\", because struct has a padding between byte and int\r\n    public byte Y { get; }\r\n    public int X { get; }\r\n}<\/pre>\n<p>For more information about memory layout see my blogpost <a href=\"https:\/\/devblogs.microsoft.com\/premier-developer\/managed-object-internals-part-4-fields-layout\/\">&#8220;Managed object internals, Part 4. Fields layout&#8221;<\/a>.<\/p>\n<p>Second, a memory comparison will not necessarily give you the right results. Here is a simple example:<\/p>\n<pre class=\"lang:default decode:true \">public struct MyDouble\r\n{\r\n    public double Value { get; }\r\n    public MyDouble(double value) =&gt; Value = value;\r\n}\r\n\r\n\r\ndouble d1 = -0.0;\r\ndouble d2 = +0.0;\r\n\r\n\/\/ True\r\nbool b1 = d1.Equals(d2);\r\n\r\n\/\/ False!\r\nbool b2 = new MyDouble(d1).Equals(new MyDouble(d2));<\/pre>\n<p><code>-0.0<\/code> and <code>+0.0<\/code> are equal but have different binary representations. This means that <code>Double.Equals<\/code> returns <code>true<\/code>, but <code>MyDouble.Equals<\/code> returns <code>false<\/code>. For most cases the difference is non-substantial, but just imagine how many hours you may potentially spend trying to fix an issue caused by this difference.<\/p>\n<h4>And how can we avoid the issue in the future?<\/h4>\n<p>You may wonder, how the issue mentioned above may appear in the real world? One obvious way to enforce <code>Equals<\/code> and <code>GetHashCode<\/code> for structs is to use FxCop rule <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/ms182276.aspx\">CA1815<\/a>. But there is an issue with this approach: it is a bit too strict.<\/p>\n<p>A performance critical application may have hundreds of structs that are not necessarily designed to be used in hash sets or dictionaries. This may cause app developers to suppress the rule that can backfire ones a struct use cases changes.<\/p>\n<p>A more appropriate approach is to warn a developer when &#8220;inappropriate&#8221; struct with default equality members (defined in the app or in a third-party library) is stored in a hash set. Of course, I&#8217;m talking about <a href=\"https:\/\/github.com\/SergeyTeplyakov\/ErrorProne.NET\">ErrorProne.NET<\/a> and the rule that I&#8217;ve added there ones I&#8217;ve faced this issue:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/image83.png\"><img decoding=\"async\" style=\"padding-top: 0px; padding-left: 0px; padding-right: 0px; border-width: 0px;\" title=\"image\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/image_thumb78.png\" alt=\"image\" width=\"2197\" height=\"343\" border=\"0\" \/><\/a><\/p>\n<p>The <a href=\"http:\/\/ErrorProne.NET\">ErrorProne.NET<\/a> version is not perfect and will &#8220;blame&#8221; a valid code if a custom equality comparer is provided in a constructor:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/image84.png\"><img decoding=\"async\" style=\"padding-top: 0px; padding-left: 0px; padding-right: 0px; border: 0px;\" title=\"image\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/31\/2019\/06\/image_thumb79.png\" alt=\"image\" width=\"2012\" height=\"974\" border=\"0\" \/><\/a><\/p>\n<p>But I think it still valid to warn when a struct with default equality members is used by not when it is produced. For instance, when I tested my rule I&#8217;ve realized that <code>System.Collections.Generic.KeyValuePair&lt;TKey, TValue&gt;<\/code> defined in mscorlib, does not override <code>Equals<\/code> and <code>GetHashCode<\/code>. Today it is unlikely that someone will define a variable of type <code>HashSet&lt;KeyValuePair&lt;string, int&gt;&gt;<\/code> but my point is that even BCL can violate the rule and it is useful to catch it until its not too late.<\/p>\n<h3>Conclusion<\/h3>\n<ul>\n<li>The default equality implementation for structs may easily cause a severe performance impact for your application. The issue is real, not a theoretical one.<\/li>\n<li>The default equliaty members for value types are reflection-based.<\/li>\n<li>The default <code>GetHashCode<\/code> implementation may provide a very poor distribution if a first field of many instances is the same.<\/li>\n<li>There is an optimized default version for <code>Equals<\/code> and <code>GetHashCode<\/code> but you should never rely on it because you may stop hitting it with an innocent code change.<\/li>\n<li>You may rely on FxCop rule to make sure that every struct overrides equality members, but a better approach is to catch the issue when the &#8220;wrong&#8221; struct is stored in a hash set or in a hash table using an analyzer.<\/li>\n<\/ul>\n<h3>Additional resources<\/h3>\n<ul>\n<li><a href=\"https:\/\/github.com\/SergeyTeplyakov\/ErrorProne.NET\">ErrorProne.NET at github<\/a><\/li>\n<li><a href=\"https:\/\/marketplace.visualstudio.com\/items?itemName=SergeyTeplyakov.EPNS01\">ErrorProne.NET Structs at marketplace<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;re familiar with C#, then you most likely heard that you should always override Equals and GetHashCode for custom structs for performance reasons. To better understand the importance and the rationale behind this advice we&#8217;re going to look at the default behavior to see why and where the performance hit comes from. Then we&#8217;ll [&hellip;]<\/p>\n","protected":false},"author":4004,"featured_media":37840,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[6700,6699],"tags":[6695],"class_list":["post-1275","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-net-internals","category-c","tag-seteplia"],"acf":[],"blog_post_summary":"<p>If you&#8217;re familiar with C#, then you most likely heard that you should always override Equals and GetHashCode for custom structs for performance reasons. To better understand the importance and the rationale behind this advice we&#8217;re going to look at the default behavior to see why and where the performance hit comes from. Then we&#8217;ll [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/1275","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/4004"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=1275"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/1275\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/37840"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=1275"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=1275"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=1275"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}