{"id":111046,"date":"2025-04-04T07:00:00","date_gmt":"2025-04-04T14:00:00","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/oldnewthing\/?p=111046"},"modified":"2025-04-04T09:43:50","modified_gmt":"2025-04-04T16:43:50","slug":"20250404-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20250404-00\/?p=111046","title":{"rendered":"Adding delays to our task sequencer, part 3"},"content":{"rendered":"<p>Last time, <a title=\"Adding delays to our task sequencer, part 2\" href=\"https:\/\/devblogs.microsoft.com\/oldnewthing\/20250403-00\/?p=111043\"> we added task throttling to our <code>task_sequencer<\/code><\/a> by limiting requests to one per second. To do this, we used the <code>steady_<wbr \/>clock<\/code>. The standard leaves unspecified the resolution of the steady clock, but we can see how it is implemented on Windows by the three major libraries.<\/p>\n<p>On Windows, <a href=\"https:\/\/github.com\/microsoft\/STL\/blob\/7643c270e5bfb1cfad62f8b5ff4045c662bdaf81\/stl\/inc\/__msvc_chrono.hpp#L649\"> stl (msvc)<\/a> and <a href=\"https:\/\/github.com\/llvm\/llvm-project\/blob\/a222d00c667f5582194ba7e50b870312e4b4427b\/libcxx\/src\/chrono.cpp#L191\"> libcxx (clang)<\/a>, and <a href=\"https:\/\/github.com\/gcc-mirror\/gcc\/blob\/a6a15bc5b77c8703a95130f410a944f5408a5cc4\/libstdc%2B%2B-v3\/src\/c%2B%2B11\/chrono.cc#L79\"> libstdc++ (gcc)<\/a>\/<a href=\"https:\/\/github.com\/msys2-contrib\/mingw-w64\/blob\/cdb052f1d4056cd510cb83197b55868427b87476\/mingw-w64-libraries\/winpthreads\/src\/clock.c#L169\">mingw<\/a> use <code>Query\u00adPerformance\u00adCounter<\/code> as the source of the <code>steady_<wbr \/>clock<\/code>,\u00b9 reporting the time to nanosecond resolution. This is excellent resolution, but at a cost of significant arithmetic gymnastics.<\/p>\n<p>We can go cheaper.<\/p>\n<p>Our throttling delay of 1 second doesn&#8217;t have to be precise down to the nanosecond. In reality, it&#8217;ll only be as good as the hardware timer tick resolution, which is nowhere near 1 nanosecond. In practice, it&#8217;s closer to 10 milliseconds per tick, so an error of one or two milliseconds is well under measurement error.<\/p>\n<p>Therefore, we can choose to use <code>Get\u00adTick\u00adCount\u00ad64<\/code> as our steady clock source. The <code>Get\u00adTick\u00adCount\u00ad64<\/code> function is quite fast (just reading a 64-bit value from memory), and it reports at millisecond resolution, which means that it converts to <code>TimeSpan<\/code> with multiplication and not division.<\/p>\n<pre>struct task_sequencer\r\n{\r\n    \u27e6 ... \u27e7\r\n\r\n    struct completer\r\n    {\r\n        ~completer()\r\n        {\r\n            [](auto chain, auto delay) -&gt; winrt::fire_and_forget {\r\n                co_await winrt::resume_after(delay);\r\n                chain-&gt;complete();\r\n            }(std::move(chain),\r\n              <span style=\"border: solid 1px currentcolor; border-bottom: none;\">std::chrono::milliseconds(static_cast&lt;int64_t&gt;<\/span>\r\n              <span style=\"border: solid 1px currentcolor; border-top: none;\">  (earliest - GetTickCount64()))<\/span><span style=\"border-top: solid 1px currentcolor;\">);            <\/span>\r\n        }\r\n        std::shared_ptr&lt;chained_task&gt; chain;\r\n        <span style=\"border: solid 1px currentcolor;\">ULONGLONG earliest = GetTickCount64();<\/span>\r\n    };\r\n\r\n\r\npublic:\r\n    template&lt;typename Maker&gt;\r\n    auto QueueTaskAsync(Maker&amp;&amp; maker) -&gt;decltype(maker())\r\n    {\r\n        auto node = std::make_shared&lt;chained_task&gt;();\r\n\r\n        suspender suspend;\r\n\r\n        using Async = decltype(maker());\r\n        auto task = [&amp;]() -&gt; Async\r\n        {\r\n            completer completer{ current };\r\n            auto local_maker = std::forward&lt;Maker&gt;(maker);\r\n            auto context = winrt::apartment_context();\r\n\r\n            co_await suspend;\r\n            co_await context;\r\n            <span style=\"border: solid 1px currentcolor;\">completer.earliest = GetTickCount64() + 1000;<\/span>\r\n            co_return co_await local_maker();\r\n        }();\r\n\r\n        {\r\n            winrt::slim_lock_guard guard(m_mutex);\r\n            m_latest.swap(node);\r\n        }\r\n\r\n        node-&gt;continue_with(suspend.handle);\r\n\r\n        return task;\r\n    }\r\n\r\n    \u27e6 ... \u27e7\r\n};\r\n<\/pre>\n<p>If we really wanted to minimize the number of calls to <code>Get\u00adTick\u00adCount64()<\/code>, we could use a <code>std::optional<\/code>:<\/p>\n<pre>struct task_sequencer\r\n{\r\n    \u27e6 ... \u27e7\r\n\r\n    struct completer\r\n    {\r\n        ~completer()\r\n        {\r\n            [](auto chain, auto delay) -&gt; winrt::fire_and_forget {\r\n                co_await winrt::resume_after(delay);\r\n                chain-&gt;complete();\r\n            }(std::move(chain),\r\n              std::chrono::milliseconds(static_cast&lt;int64_t&gt;\r\n                (<span style=\"border: solid 1px currentcolor;\">earliest ? *earliest - GetTickCount64() : 0)<\/span>));\r\n        }\r\n        std::shared_ptr&lt;chained_task&gt; chain;\r\n        <span style=\"border: solid 1px currentcolor;\">std::optional&lt;ULONGLONG&gt;<\/span> earliest;\r\n    };\r\n\r\n    \u27e6 ... \u27e7\r\n};\r\n<\/pre>\n<p>Or we could reserve the sentinel value of 0 to mean &#8220;no delay&#8221;.<\/p>\n<pre>struct task_sequencer\r\n{\r\n    \u27e6 ... \u27e7\r\n\r\n    struct completer\r\n    {\r\n        ~completer()\r\n        {\r\n            [](auto chain, auto delay) -&gt; winrt::fire_and_forget {\r\n                co_await winrt::resume_after(delay);\r\n                chain-&gt;complete();\r\n            }(std::move(chain),\r\n              std::chrono::milliseconds(static_cast&lt;int64_t&gt;\r\n                (<span style=\"border: solid 1px currentcolor;\">earliest ? earliest - GetTickCount64() : 0)<\/span>));\r\n        }\r\n        std::shared_ptr&lt;chained_task&gt; chain;\r\n        ULONGLONG earliest = <span style=\"border: solid 1px currentcolor;\">0<\/span>;\r\n    };\r\n\r\n\r\npublic:\r\n    template&lt;typename Maker&gt;\r\n    auto QueueTaskAsync(Maker&amp;&amp; maker) -&gt;decltype(maker())\r\n    {\r\n        auto node = std::make_shared&lt;chained_task&gt;();\r\n\r\n        suspender suspend;\r\n\r\n        using Async = decltype(maker());\r\n        auto task = [&amp;]() -&gt; Async\r\n        {\r\n            completer completer{ current };\r\n            auto local_maker = std::forward&lt;Maker&gt;(maker);\r\n            auto context = winrt::apartment_context();\r\n\r\n            co_await suspend;\r\n            co_await context;\r\n            completer.earliest = (GetTickCount64() + 1000) <span style=\"border: solid 1px currentcolor;\">| 1<\/span>;\r\n            co_return co_await local_maker();\r\n        }();\r\n\r\n        {\r\n            winrt::slim_lock_guard guard(m_mutex);\r\n            m_latest.swap(node);\r\n        }\r\n\r\n        node-&gt;continue_with(suspend.handle);\r\n\r\n        return task;\r\n    }\r\n\r\n    \u27e6 ... \u27e7\r\n};\r\n<\/pre>\n<p>We force the bottom bit of <code>earliest<\/code> to 1 when recording the start time, so that the value is never zero. This introduces a potential error of 1 millisecond, but one millisecond error out of 1 second is not going to be noticeable in practice for this type of work.<\/p>\n<p><b>Bonus chatter<\/b>: You may have figured out that the point of this exercise, aside from actually adding a feature to the task scheduler, is just showing the process of studying the implementation of a chunk of code, getting an idea, following through the implementation, and then refining the implementation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Waiting more cheaply.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-111046","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Waiting more cheaply.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111046","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=111046"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/111046\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=111046"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=111046"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=111046"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}