October 22nd, 2025

3 reactions

What makes `cheap_steady_clock` faster than `std::chrono::high_resolution_clock`?

Raymond Chen

Some time ago, I noted that There is a std::chrono::high_resolution_clock, but no low_resolution_clock.

The Visual C++ library treats std::chrono::high_resolution_clock as an alias for std::chrono::steady_clock, which uses QueryPerformanceCounter() to retrieve the current time, and the multiplies it by the reciprocal of QueryPerformanceFrequency() to convert it to a clock tick count. So you are paying for a multiplication after QueryPerformanceCounter() returns.

But there’s a lot going on inside QueryPerformanceCounter() itself, too. It has to use a different algorithm depending on things like whether the timestamp counter (such as RDTSC on x86 or CNTVCT_EL0 on AArch64) is reliable, whether the system is running inside a virtual machine, whether the process is running under emulation, and various other conditions. In the worst case, it needs to make a system call into the kernel.

On the other hand, GetTickCount64() merely reads two 64-bit values from memory and multiplies them. One is a raw value that is updated by the kernel at each system timer interrupt (worst case), and the other is a conversion factor calculated at system startup to convert the raw value into milliseconds.

These two 64-bit values come from a special page that is mapped into user mode from kernel mode that contains handy values, including the current tick count. As it turns out, this is significantly faster than going through all the logic of QueryPerformanceCounter. It’s a great choice if you do not need high resolution.

And deciding how long to sleep a thread is a case where you do not need high resolution. Most of the functions for sleeping a thread already operate in milliseconds, so getting the value in milliseconds saves you a lot of conversions. Calculating values with sub-millisecond accuracy is pointless if the result is going to be converted to milliseconds anyway. And the accuracy of most (all?) of these sleep functions is only as good as the system timer anyway, so really they are good to only 10 or 50 milliseconds.¹

It’s like doing precise calculations to determine that you need to set your phone alarm to wake you in exactly 32 minutes, 21.1315 seconds. Your phone alarm can’t wake you to sub-second resolution, so all that work to calculate those extra .1315 seconds was wasted.

Now, I didn’t know all of these details when I originally wrote that article. But it stands to reason that GetTickCount64 is a lot cheaper than QueryPerformanceCounter because GetTickCount64 asks for less. Even if GetTickCount64 ends up not being faster than QueryPerformanceCounter, it surely won’t be slower.

¹ Or one millisecond if your process has called timeBeginPeriod(1) to ask that the system timer be sped up to 1 millisecond.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Discussion is closed. Login to edit/delete existing comments.

許恩嘉 November 1, 2025 · Edited

Based on my extremely inaccurate benchmark tests, on my computer, `GetTickount64()` is approximately 16.6 times faster than `std::chrono::steady_clock::now()`, while `std::chrono::high_resolution_clock::now()` takes about 1.05 times longer than `QueryPerformanceCounter`.

The source code of MSVC's std::chrono::steady_clock::now() contains the following comments:
<code>
The QPC frequency on my computer is 10 MHz.

The source code of MSVC’s std::chrono::steady_clock::now() contains the following comments:

// The compiler recognizes the constants for frequency and time period and uses shifts and
// multiplies instead of divides to calculate the nanosecond value.
constexpr long long _TenMHz        = 10'000'000;
constexpr long long _TwentyFourMHz = 24'000'000;
if (_Freq == _TenMHz) {
    // 10 MHz is a very common QPC frequency on modern x86/x64 PCs. Optimizing for
    // this specific frequency can double the performance of this function by
    // avoiding the expensive frequency conversion path.
                static_assert(period::den % _TenMHz == 0, "It should never fail.");
                constexpr long long _Multiplier = period::den / _TenMHz;
                return time_point(duration(_Ctr * _Multiplier));
} else if (_Freq == _TwentyFourMHz) {
    // 24 MHz is a common frequency on ARM/ARM64, including cases where it emulates x86/x64.
    const long long _Whole = (_Ctr / _TwentyFourMHz) * period::den;
    const long long _Part  = (_Ctr % _TwentyFourMHz) * period::den / _TwentyFourMHz;
return time_point(duration(_Whole + _Part));
} else {
    // Instead of just having "(_Ctr * period::den) / _Freq",
    // the algorithm below prevents overflow when _Ctr is sufficiently large.
    // It assumes that _Freq * period::den does not overflow, which is currently true for nano period.
    // It is not realistic for _Ctr to accumulate to large values from zero with this assumption,
    // but the initial value of _Ctr could be large.
    const long long _Whole = (_Ctr / _Freq) * period::den;
    const long long _Part  = (_Ctr % _Freq) * period::den / _Freq;
    return time_point(duration(_Whole + _Part));
}

The QPC frequency on my computer is 10 MHz.

Shawn Van Ness October 23, 2025

Does original GetTickCount (32 not 64) do a multiplication? I always thought of it as a straight-up atomic read of a DWORD from the TEB ..

I also (maybe mistakenly) thought it was updated by the kernel thread scheduler any time control was being given to a thread .. so eg. in response to waking a WaitForSingleObject() or ResumeThread() .. or returning from Sleep() or executing an APC or whatever else.

I realize now I have so many questions .. assumptions I never tested. Do all threads in a process have a coherent view of the tick-count?...
Read more
Does original GetTickCount (32 not 64) do a multiplication? I always thought of it as a straight-up atomic read of a DWORD from the TEB ..

I also (maybe mistakenly) thought it was updated by the kernel thread scheduler any time control was being given to a thread .. so eg. in response to waking a WaitForSingleObject() or ResumeThread() .. or returning from Sleep() or executing an APC or whatever else.

I realize now I have so many questions .. assumptions I never tested. Do all threads in a process have a coherent view of the tick-count? I never relied on that.. but didn’t know one could.

Read less
- amoskevitz October 27, 2025
  
  GetTickCount (32bit) …
  that brings back memories… things stopping working after 49 days…
Yexuan Xiao October 23, 2025

I’ve looked into it before, and timeBeginPeriod doesn’t affect modern wait APIs like WaitOnAddress or SRWLock, which always maintain a wait precision between 10 and 15 milliseconds. Another significant drawback of timeBeginPeriod is that it is not thread-bound, so if multiple threads use it simultaneously, the results can lead to unexpected outcomes. Therefore, I believe it should no longer be used.
- Torsten Steitz October 24, 2025
  
  Are you sure that the wait precision of WaitOnAddress is not affected by timeBeginPeriod? That would be surprisingly and .. extremely unpleasant.
  timeBeginPeriod is still unavoidable if you need higher precision on wait functions.
Dave Gzorple October 22, 2025

I’d never heard of timeBeginPeriod before so I looked it up. Alongside a general error it can also return TIMERR_NOCANDO which seems like a huge i18n fail to me. I mean, it’s not quite TIMERR_BOTTOM_OF_THE_NINTH_BASES_LOADED but how many non-US-English speakers are going to know what NOCANDO means? Why not just TIMERR_RESOLUTION_UNAVAILABLE?
- GL October 22, 2025
  
  So I looked this up and figured out "no can do" is actually a recognized slang, instead of pure grammar error. My take on this (as a non-US non-English-native-speaker English speaker): if one is unaware of "no can do" as a slang, one will recognize it as an erroneous form of "cannot do" and understand it; if one is aware of the slang, then of course one knows what it means.
  
  My confusion about multimedia API is why most of its flat API (function exports instead of COM) are camelCase instead of PascalCase as is often the case in Windows. The...
  Read more
  So I looked this up and figured out “no can do” is actually a recognized slang, instead of pure grammar error. My take on this (as a non-US non-English-native-speaker English speaker): if one is unaware of “no can do” as a slang, one will recognize it as an erroneous form of “cannot do” and understand it; if one is aware of the slang, then of course one knows what it means.
  
  My confusion about multimedia API is why most of its flat API (function exports instead of COM) are camelCase instead of PascalCase as is often the case in Windows. The only other one I can remember is winsock, but the reason there is clear — it mimics POSIX socket API.
  
  Read less

What makes `cheap_steady_clock` faster than `std::chrono::high_resolution_clock`?

Author

7 comments

Read next

Windows Runtime design principle: Properties can be set in any order

The early history of the Windows Runtime PropertyValue and why there is a PropertyType.Inspectable that is never used

Author

7 comments

Read next

Windows Runtime design principle: Properties can be set in any order

The early history of the Windows Runtime PropertyValue and why there is a PropertyType.Inspectable that is never used

Stay informed