{"id":108107,"date":"2023-04-27T07:00:00","date_gmt":"2023-04-27T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=108107"},"modified":"2023-04-27T09:02:11","modified_gmt":"2023-04-27T16:02:11","slug":"20230427-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20230427-00\/?p=108107","title":{"rendered":"What&#8217;s up with this new <CODE>memory_<WBR>order_<WBR>consume<\/CODE> memory order?"},"content":{"rendered":"<p>C++20 introduces a new atomic memory order: <code>std::<wbr \/>memory_<wbr \/>order::<wbr \/>consume<\/code>, more commonly known as <code>std::<wbr \/>memory_<wbr \/>order_<wbr \/>consume<\/code>, What is this guy?<\/p>\n<p>The <code>consume<\/code> memory order is a weaker form of <code>acquire<\/code>. Whereas <code>acquire<\/code> prevents <i>all<\/i> future memory accesses from being ordered ahead of the load, the <code>consume<\/code> order only prevents <i>dependent<\/i> future memory accesses from being reorder ahead of the load.<\/p>\n<p>In all the examples, let&#8217;s assume global variables declared and initialized as<\/p>\n<pre>int v1 = 1;\r\nint v2 = 2;\r\nstd::atomic&lt;int*&gt; p{ &amp;v1 };\r\n<\/pre>\n<p>Okay, let&#8217;s do some consuming.<\/p>\n<pre>auto sample_consume()\r\n{\r\n    auto q = p.load(std::memory_order_consume);\r\n    return *q + v2;\r\n}\r\n<\/pre>\n<p>The compiler is required to read the value from <code>p<\/code> into <code>q<\/code>, and any future calculations depending on that value must occur after the load.<\/p>\n<p>This reordering is allowed:<\/p>\n<pre>auto sample_consume_allowed()\r\n{\r\n    auto prefetch2 = v2;\r\n    auto q = p.load(std::memory_order_consume);\r\n    return *q + prefetch2;\r\n}\r\n<\/pre>\n<p>The value of <code>v2<\/code> is not dependent on what was loaded from <code>p<\/code>. Therefore, the compiler and processor are permitted to advance the fetch of <code>v2<\/code> ahead of the load of <code>p<\/code>. Note that an <code>acquire<\/code> load of <code>p<\/code> would have prohibited this reordering, since acquire loads block <i>all<\/i> future memory access, even if unrelated to the value being acquired.<\/p>\n<p>However, this reordering of the above code is not allowed:<\/p>\n<pre>auto sample_consume_disallowed()\r\n{\r\n    auto speculate1 = v1;\r\n    auto q = p.load(std::memory_order_consume);\r\n    if (q == &amp;v1) return speculate1 + v2;\r\n    return *q + v2;\r\n}\r\n<\/pre>\n<p>This speculation lets the code hide the memory latency of accessing <code>v1<\/code> behind the load of <code>p<\/code>, and a compiler might choose to take advantage of this based on profiling feedback, and a processor might do it unilaterally because processors like to do speculative things nowadays. This would be allowed if the load from <code>p<\/code> were <code>relaxed<\/code>.<\/p>\n<p>However, the <code>consume<\/code> memory order prohibits this transformation: The value loaded from <code>p<\/code> is dereferenced, and that dereference operation is dependent upon the value that was loaded, so the <code>consume<\/code> memory order requires that the dereference occur after the load.<\/p>\n<p>Here&#8217;s a table, because people like tables.<\/p>\n<table class=\"cp3\" style=\"border-collapse: collapse;\" border=\"1\" cellspacing=\"0\" cellpadding=\"3\">\n<tbody>\n<tr>\n<th>Ordering<\/th>\n<th>Relaxed<\/th>\n<th>Consume<\/th>\n<th>Acquire<\/th>\n<\/tr>\n<tr>\n<td>Load <code>v2<\/code> before <code>p<\/code><\/td>\n<td>Allowed<\/td>\n<td>Allowed<\/td>\n<td>Prohibited<\/td>\n<\/tr>\n<tr>\n<td>Dereference <code>p<\/code> before load<\/td>\n<td>Allowed<\/td>\n<td>Prohibited<\/td>\n<td>Prohibited<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The <code>consume<\/code> memory order is not used much. Atomic variables are typically tied to other variables in ways that don&#8217;t show up in expression dependency graphs, such as for use as mutual exclusion locks. The <code>acquire<\/code> memory order is much more commonly used than <code>consume<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A weaker variation of acquire.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-108107","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>A weaker variation of acquire.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108107","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=108107"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/108107\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=108107"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=108107"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=108107"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}