{"id":110428,"date":"2024-10-28T07:00:00","date_gmt":"2024-10-28T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=110428"},"modified":"2024-10-28T07:00:01","modified_gmt":"2024-10-28T14:00:01","slug":"20241028-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20241028-00\/?p=110428","title":{"rendered":"How useful is the hint passed to the <CODE>std::<WBR>unordered_&#8230;<\/CODE> collections?"},"content":{"rendered":"<p>A little while ago, we learned about <a title=\"Speeding up the insertion of a sorted (or mostly-sorted) key list into a std::map or other ordered associative container\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230522-00\/?p=108226\"> the value of the <code>hint<\/code> to the C++ ordered associative containers <code>std::<wbr \/>map<\/code>, <code>std::<wbr \/>multimap<\/code>, <code>std::<wbr \/>set<\/code>, and <code>std::<wbr \/>multiset<\/code><\/a>. But what about the unordered associative containers <code>std::<wbr \/>unordered_<wbr \/>map<\/code>, <code>std::<wbr \/>unordered_<wbr \/>multimap<\/code>, <code>std::<wbr \/>unordered_<wbr \/>set<\/code>, and <code>std::<wbr \/>unordered_<wbr \/>multiset<\/code>?<\/p>\n<p>The answer varies by implementation.<\/p>\n<p>For Microsoft&#8217;s STL, <a href=\"https:\/\/github.com\/microsoft\/STL\/blob\/a62109595b6d89e08172fdf4beb75a2670fe0cc9\/stl\/inc\/xhash#L1596\"> the hint is used by <code>_Find_<wbr \/>Hint<\/code><\/a>:\u00b9 If the hint&#8217;s key is equivalent to the inserted item&#8217;s key, then we keep the hint. Otherwise, we ignore it. For containers that do not permit duplicate keys, a matching hint provides the exact node to update, avoiding the need to do a search for it. The code can just update the value or do nothing, depending on the method being implemented: <code>insert<\/code> does nothing, whereas <code>insert_or_assign<\/code> updates the value. For containers that allow duplicate keys, a matching hint is used to insert the new element without performing a search.<\/p>\n<p>The clang libcxx implementations <a href=\"https:\/\/github.com\/llvm\/llvm-project\/blob\/387c49f693c82bdf8b9b0f1ef48a92f51bb781b4\/libcxx\/include\/unordered_map#L1268\"> outright ignores the hint for containers that do not permit duplicate keys<\/a>. For containers that permit duplicate keys, <a href=\"https:\/\/github.com\/llvm\/llvm-project\/blob\/387c49f693c82bdf8b9b0f1ef48a92f51bb781b4\/libcxx\/include\/__hash_table#L1981\"> a hint whose key is equivalent to the item being inserted<\/a> is used to insert the new element without performing a search.<\/p>\n<p>The gcc libstdc++ implementation is mostly the same as the clang implementation: It <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/8637aecd5aea70bb13c08b5b96d3cb24f5afcead\/libstdc%2B%2B-v3\/include\/bits\/hashtable.h#L915\"> ignores the hint for containers that do not permit duplicate keys<\/a>, and it uses the hint for containers that permit duplicate keys <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/8637aecd5aea70bb13c08b5b96d3cb24f5afcead\/libstdc%2B%2B-v3\/include\/bits\/hashtable.h#L2174\"> provided the hint&#8217;s key matches that of the inserted item<\/a>. As a special case, for containers that permit duplicate keys <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/8637aecd5aea70bb13c08b5b96d3cb24f5afcead\/libstdc%2B%2B-v3\/include\/bits\/hashtable.h#L2099\"> that are &#8220;small&#8221; and have &#8220;slow&#8221; hash functions<\/a>, it uses the hint as the starting point for a linear search through the hash table contents looking for a matching key. The idea is that if the hash function is &#8220;slow&#8221;, then it&#8217;s faster to just compare keys instead of going to effort of hashing them.<\/p>\n<p>Wait, what do &#8220;small&#8221; and &#8220;slow&#8221; mean? The gcc libstdc++ implementation considers a container to be &#8220;small&#8221; if it has <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/8637aecd5aea70bb13c08b5b96d3cb24f5afcead\/libstdc%2B%2B-v3\/include\/bits\/hashtable_policy.h#L298\"> at most 20 elements<\/a>, and it <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/8637aecd5aea70bb13c08b5b96d3cb24f5afcead\/libstdc%2B%2B-v3\/include\/bits\/functional_hash.h#L283\"> considers all hash functions by default to be &#8220;fast&#8221;<\/a>, except for <code>long double<\/code>.<\/p>\n<p>To summarize:<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th rowspan=\"2\">\u00a0<\/th>\n<th colspan=\"2\">Must keys be unique?<\/th>\n<\/tr>\n<tr>\n<th>Yes<\/th>\n<th>No<\/th>\n<\/tr>\n<tr>\n<th>msvc\/STL<\/th>\n<td>Hint used if matches<\/td>\n<td>Hint used if matches<\/td>\n<\/tr>\n<tr>\n<th>clang\/libcxx<\/th>\n<td>Hint ignored<\/td>\n<td>Hint used if matches<\/td>\n<\/tr>\n<tr>\n<th>gcc\/libstdc++ (large or fast)<\/th>\n<td>Hint ignored<\/td>\n<td>Hint used if matches<\/td>\n<\/tr>\n<tr>\n<th>gcc\/libstdc++ (small and slow)<\/th>\n<td>Hint ignored<\/td>\n<td>Hint used<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u00b9 In the STL code, you should know that the <code>_Traitsobj::<wbr \/>operator()<\/code> <a href=\"https:\/\/github.com\/microsoft\/STL\/blob\/a62109595b6d89e08172fdf4beb75a2670fe0cc9\/stl\/inc\/xhash#L154\"> returns <code>true<\/code> if the keys are <i>unequal<\/i><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Only a little, or sometimes not at all.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-110428","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Only a little, or sometimes not at all.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110428","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=110428"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/110428\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=110428"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=110428"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=110428"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}