What makes `cheap_steady_clock` faster than `std::chrono::high_resolution_clock`?

Raymond Chen

Some time ago, I noted that There is a std::chrono::high_resolution_clock, but no low_resolution_clock.

The Visual C++ library treats std::chrono::high_resolution_clock as an alias for std::chrono::steady_clock, which uses QueryPerformanceCounter() to retrieve the current time, and the multiplies it by the reciprocal of QueryPerformanceFrequency() to convert it to a clock tick count. So you are paying for a multiplication after QueryPerformanceCounter() returns.

But there’s a lot going on inside QueryPerformanceCounter() itself, too. It has to use a different algorithm depending on things like whether the timestamp counter (such as RDTSC on x86 or CNTVCT_EL0 on AArch64) is reliable, whether the system is running inside a virtual machine, whether the process is running under emulation, and various other conditions. In the worst case, it needs to make a system call into the kernel.

On the other hand, GetTickCount64() merely reads two 64-bit values from memory and multiplies them. One is a raw value that is updated by the kernel at each system timer interrupt (worst case), and the other is a conversion factor calculated at system startup to convert the raw value into milliseconds.

These two 64-bit values come from a special page that is mapped into user mode from kernel mode that contains handy values, including the current tick count. As it turns out, this is significantly faster than going through all the logic of QueryPerformanceCounter. It’s a great choice if you do not need high resolution.

And deciding how long to sleep a thread is a case where you do not need high resolution. Most of the functions for sleeping a thread already operate in milliseconds, so getting the value in milliseconds saves you a lot of conversions. Calculating values with sub-millisecond accuracy is pointless if the result is going to be converted to milliseconds anyway. And the accuracy of most (all?) of these sleep functions is only as good as the system timer anyway, so really they are good to only 10 or 50 milliseconds.¹

It’s like doing precise calculations to determine that you need to set your phone alarm to wake you in exactly 32 minutes, 21.1315 seconds. Your phone alarm can’t wake you to sub-second resolution, so all that work to calculate those extra .1315 seconds was wasted.

Now, I didn’t know all of these details when I originally wrote that article. But it stands to reason that GetTickCount64 is a lot cheaper than QueryPerformanceCounter because GetTickCount64 asks for less. Even if GetTickCount64 ends up not being faster than QueryPerformanceCounter, it surely won’t be slower.

¹ Or one millisecond if your process has called timeBeginPeriod(1) to ask that the system timer be sped up to 1 millisecond.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.