{"id":108572,"date":"2023-08-08T07:00:00","date_gmt":"2023-08-08T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=108572"},"modified":"2023-08-08T07:43:57","modified_gmt":"2023-08-08T14:43:57","slug":"20230808-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230808-00\/?p=108572","title":{"rendered":"Inside STL: The unordered_map, unordered_set, unordered_multimap, and unordered_multiset"},"content":{"rendered":"<p>The C++ standard library provides hash-based associative containers <code>unordered_<wbr \/>map<\/code>, <code>unordered_<wbr \/>set<\/code>, <code>unordered_<wbr \/>multimap<\/code>, and <code>unordered_<wbr \/>multiset<\/code>.<\/p>\n<p>All of these collections are hash tables with different payloads. The <code>unordered_<wbr \/>map<\/code> and the <code>unordered_<wbr \/>multimap<\/code> use a <code>std::pair&lt;Key, Value&gt;<\/code> as the payload, whereas the the <code>unordered_<wbr \/>set<\/code> and the <code>unordered_<wbr \/>multiset<\/code> use a <code>Key<\/code> as the payload.<\/p>\n<p>Conceptually, the hash table consists of a bunch of buckets, and each bucket contains a linked list of the nodes that fall into that bucket. (This design is known as <i>open hashing<\/i> or <i>separate chaining<\/i>.) However, that&#8217;s not how the information is structured internally, because iterating through a traditionally-structured hash table requires more state than a single pointer: When you reach the end of each hash chain, you need some other information to tell you which chain to enumerate next.<\/p>\n<p>C++ standard library implementations instead structure the hash table like this:<\/p>\n<pre>struct hashtable\r\n{\r\n    using hint = std::list&lt;payload&gt;::iterator;\r\n\r\n    std::list&lt;payload&gt; list;\r\n    std::vector&lt;hint&gt; buckets;\r\n};\r\n<\/pre>\n<p>The <code>list<\/code> is a linked list of payloads, sorted by bucket. The <code>buckets<\/code> is a vector of iterators (pointers) into the list that tells you where each bucket begins. Each bucket implicitly ends when the next bucket begins, or (for the last bucket) when the end of the list is reached.<\/p>\n<p>For example, suppose we have a hash table with four buckets, using the identity function as the hash function, and using &#8220;mod 4&#8221; as the bucket mapping function. Suppose that this hash table contains the values 2, 5, 6, 7. A traditional version of that hash table would look like a vector of lists:<\/p>\n<table style=\"border-collapse: collapse; text-align: center;\" title=\"A table of four buckets. Each bucket is the anchor of a singly-linked list of nodes. Bucket zero has a null pointer. Bucket 1 holds a pointer to a node whose value is 5. Bucket 2 holds a pointer to a node whose value is 2, which in turn points to a node whose value is 6. Bucket 3 holds a pointer to a node whose value is 7.\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td>&nbsp;<\/td>\n<td>buckets<\/td>\n<\/tr>\n<tr>\n<td>0<\/td>\n<td style=\"border: solid 1px gray; width: 3em;\">null<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td style=\"border: solid 1px gray; width: 3em;\">\u2022<\/td>\n<td>\u2192<\/td>\n<td>\n<div style=\"border: solid 1px gray; width: 3em;\">5<\/div>\n<\/td>\n<\/tr>\n<tr>\n<td>2<\/td>\n<td style=\"border: solid 1px gray; width: 3em;\">\u2022<\/td>\n<td>\u2192<\/td>\n<td>\n<div style=\"border: solid 1px gray; width: 3em;\">2<\/div>\n<\/td>\n<td>\u2192<\/td>\n<td>\n<div style=\"border: solid 1px gray; width: 3em;\">6<\/div>\n<\/td>\n<\/tr>\n<tr>\n<td>3<\/td>\n<td style=\"border: solid 1px gray;\">\u2022<\/td>\n<td>\u2192<\/td>\n<td>\n<div style=\"border: solid 1px gray; width: 3em;\">7<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>However, in reality, the hash table looks like this:<\/p>\n<table style=\"border-collapse: collapse; text-align: center;\" title=\"A singly-linked list of nodes 5, 2, 6, and 7. Buckets 0 and 1 point to node 5. Bucket 2 points to node 2. Bucket 3 points to node 7.\" border=\"0\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<td style=\"border: solid 1px gray; width: 3em;\">5<\/td>\n<td>\u2192<\/td>\n<td style=\"border: solid 1px gray; width: 3em;\">2<\/td>\n<td>\u2192<\/td>\n<td style=\"border: solid 1px gray; width: 3em;\">6<\/td>\n<td>\u2192<\/td>\n<td style=\"border: solid 1px gray; width: 3em;\">7<\/td>\n<\/tr>\n<tr>\n<td align=\"left\">\u2191<\/td>\n<td>&nbsp;<\/td>\n<td align=\"left\">\u2191<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td align=\"left\">\u2191<\/td>\n<\/tr>\n<tr>\n<td align=\"left\" valign=\"baseline\">[0]<br \/>\n[1]<\/td>\n<td>&nbsp;<\/td>\n<td align=\"left\" valign=\"baseline\">[2]<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<td align=\"left\" valign=\"baseline\">[3]<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In this diagram, we see that<\/p>\n<ul>\n<li>Bucket 0 consists of all the elements until we reach the start of bucket 1. But the start of bucket 1 is the same as the start of bucket 0, so bucket 0 is empty.<\/li>\n<li>Bucket 1 consists of all the elements until we reach the start of bucket 2, so that&#8217;s just <code>5<\/code>.<\/li>\n<li>Bucket 2 consists of all the elements until we reach the start of bucket 3, so that&#8217;s <code>2<\/code> and <code>6<\/code>.<\/li>\n<li>Bucket 3 consists of all the elements until we reach the end of the list, so that&#8217;s <code>7<\/code>.<\/li>\n<\/ul>\n<p>Another way of looking at this is that all of the bucket linked lists have been concatenated into a single linked list, and the <code>buckets<\/code> vector tells you how to break the giant list into individual <a href=\"https:\/\/slate.com\/culture\/2011\/11\/bucket-list-what-s-the-origin-of-the-term.html\"> bucket lists<\/a>.<\/p>\n<p>Internally, the standard library implementations often use bespoke versions of <code>list<\/code> and <code>vector<\/code> instead of the public ones. Furthermore, gcc and clang use a singly-linked list instead of the doubly-linked <code>std::list<\/code> I used here for expository purposes. The hash table-based collections do not support reverse iteration, so a forward list is sufficient.<\/p>\n<p>When you&#8217;re debugging, you&#8217;re not really interested in the buckets. You just want to see what&#8217;s in the hash table. To do that, dump the list. The Visual Studio debugger has a nice visualizer for these collections, but here&#8217;s how you dig in. (Again, the hash object, the equality object, and the allocator are typically empty classes, so they are often stored as compressed pairs with other stuff.)<\/p>\n<pre style=\"white-space: pre-wrap;\">0:000&gt; ?? s\r\nclass std::unordered_set&lt;int,std::hash&lt;int&gt;,std::equal_to&lt;int&gt;,std::allocator&lt;int&gt; &gt;\r\n   +0x000 _Traitsobj       : std::_Uset_traits&lt;int,std::_Uhash_compare&lt;int,std::hash&lt;int&gt;,std::equal_to&lt;int&gt; &gt;,std::allocator&lt;int&gt;,0&gt;\r\n   +0x008 _List            : std::list&lt;int,std::allocator&lt;int&gt; &gt;\r\n   +0x018 _Vec             : std::_Hash_vec&lt;std::allocator&lt;std::_List_unchecked_const_iterator&lt;std::_List_val&lt;std::_List_simple_types&lt;int&gt; &gt;,std::_Iterator_base0&gt; &gt; &gt;\r\n   +0x030 _Mask            : 7\r\n   +0x038 _Maxidx          : 8\r\n0:000&gt; ?? s._List\r\nclass std::list&lt;int,std::allocator&lt;int&gt; &gt;\r\n   +0x000 _Mypair          : std::_Compressed_pair&lt;std::allocator&lt;std::_List_node&lt;int,void *&gt; &gt;,std::_List_val&lt;std::_List_simple_types&lt;int&gt; &gt;,1&gt;\r\n0:000&gt; ?? s._List._Mypair\r\nclass std::_Compressed_pair&lt;std::allocator&lt;std::_List_node&lt;int,void *&gt; &gt;,std::_List_val&lt;std::_List_simple_types&lt;int&gt; &gt;,1&gt;\r\n   +0x000 _Myval2          : std::_List_val&lt;std::_List_simple_types&lt;int&gt; &gt;\r\n0:000&gt; ?? s._List._Mypair._Myval2\r\nclass std::_List_val&lt;std::_List_simple_types&lt;int&gt; &gt;\r\n   +0x000 _Myhead          : 0x000001b8`ad8a9280 std::_List_node&lt;int,void *&gt;\r\n   +0x008 _Mysize          : 4\r\n<\/pre>\n<p>Okay, we finally dug into the <code>_List<\/code> to the point where we found the list head. Now we can use the usual list dumping commands to inspect the linked list.<\/p>\n<pre style=\"white-space: pre-wrap;\">0:000&gt; dl 0x000001b8`ad8a9280\r\n000001b8`ad8a9280  000001b8`ad8a9640 000001b8`ad8a84c0\r\n000001b8`ad8a9290  baadf00d`baadf00d abababab`abababab \u2190 garbage sentinel node\r\n000001b8`ad8a9640  000001b8`ad8a92d0 000001b8`ad8a9280\r\n000001b8`ad8a9650  baadf00d`00000002 abababab`abababab \u2190 2\r\n000001b8`ad8a92d0  000001b8`ad8a8470 000001b8`ad8a9640\r\n000001b8`ad8a92e0  baadf00d`00000003 abababab`abababab \u2190 3\r\n000001b8`ad8a8470  000001b8`ad8a84c0 000001b8`ad8a92d0\r\n000001b8`ad8a8480  baadf00d`00000005 abababab`abababab \u2190 5\r\n000001b8`ad8a84c0  000001b8`ad8a9280 000001b8`ad8a8470\r\n000001b8`ad8a84d0  baadf00d`00000007 abababab`abababab \u2190 7\r\n<\/pre>\n<p>The Windows debugger just dumps all the nodes right next to each other, so it&#8217;s hard to see the boundaries between them. Here&#8217;s what it looks like after I&#8217;ve added some spaces and annotations:<\/p>\n<pre style=\"white-space: pre-wrap;\">0:000&gt; dl 0x000001b8`ad8a9280\r\n000001b8`ad8a9280  000001b8`ad8a9640 000001b8`ad8a84c0\r\n000001b8`ad8a9290  baadf00d`baadf00d abababab`abababab \u2190 garbage sentinel node\r\n                            ^garbage\r\n\r\n000001b8`ad8a9640  000001b8`ad8a92d0 000001b8`ad8a9280\r\n000001b8`ad8a9650  baadf00d`00000002 abababab`abababab \u2190 2\r\n                            ^value\r\n\r\n000001b8`ad8a92d0  000001b8`ad8a8470 000001b8`ad8a9640\r\n000001b8`ad8a92e0  baadf00d`00000003 abababab`abababab \u2190 3\r\n                            ^value\r\n\r\n000001b8`ad8a8470  000001b8`ad8a84c0 000001b8`ad8a92d0\r\n000001b8`ad8a8480  baadf00d`00000005 abababab`abababab \u2190 5\r\n                            ^value\r\n\r\n000001b8`ad8a84c0  000001b8`ad8a9280 000001b8`ad8a8470\r\n000001b8`ad8a84d0  baadf00d`00000007 abababab`abababab \u2190 7\r\n                            ^value\r\n<\/pre>\n<p>All of the techniques we used when debugging lists work here too, so I won&#8217;t repeat them.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A hash table.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-108572","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>A hash table.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108572","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=108572"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108572\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=108572"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=108572"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=108572"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}