February 17th, 2020

Gotcha: A threadpool periodic timer will not wait for the previous tick to complete

Some time ago, we learned that if your WM_TIMER handler takes longer than the timer period, your queue will not fill up with WM_TIMER messages.

However, that behavior does not extend to thread pool timers.

This is called out in the documentation for Create­Timer­Queue­Timer:

The callback is called every time the period elapses, whether or not the previous callback has finished executing.

However, corresponding verbiage is missing from the documentation for Set­Threadpool­Timer and Set­Threadpool­Timer­Ex. But it applies there too.

This means that if your timer callback ends up taking time longer than the period, you may find multiple callbacks running at the same time, leading to confusion, because those multiple callbacks are probably trying to manipulate the same state, and they need to be careful not to confuse each other. If your callback holds a lock, you’ve starved out your main thread while all this is going on.

If the “long callback” situation is temporary, you may be able to catch up, but if the situation lasts a long time, you’re in a Lucy in the chocolate factory situation, and things will spiral out of control.

I encountered this phenomenon back in the very early days of what today goes by the name of Windows Presentation Framework. I had a visual tree inside a scroller, and I observed that when the user clicked the down-arrow on the scrollbar, the contents scrolled by one line, and then froze for about ten seconds. During that time, the process spawned a dozen threads and pegged the CPU. Finally, the scroller repainted itself, having scrolled all the way to the bottom.

This is not how a scroller is supposed to behave.

What happened was that clicking the down-arrow did two things: The first thing was that it immediately scrolled the document by one line. That’s what resulted in the contents scrolling by one line on the screen.

The second thing was that it set up an autorepeat timer to perform autoscrolling for as long as the user held the mouse button down. The autorepeat timer callback also scrolled the document by another line. The problem was that scrolling by a line took about a half a second, but the autorepeat timer was set to trigger every 100ms. In the time it took to scroll by one line, four additional requests to autoscroll were generated. The callbacks very quickly fell behind, and the thread pool tried to catch up by throwing more threads at the problem.

Now you had a dozen threads all trying to scroll the document by one line, and they’re all competing with each other for CPU, so the calculations are even slower than normal. The UI thread is so busy with the document updates that it hasn’t had a chance to notice that the user has released the mouse button, because input messages are relatively low priority.

Finally, the document scrolls all the way to the bottom, and the autoscroll callbacks start returning immediately with “Nope, no scrolling possible, nothing to do.” This stops the candy conveyor belt, and the existing backlog of callbacks gradually gets retired, the UI thread finally notices that the mouse button was released, so it cancels the autorepeat timer, and the madness finally ends.

But what the user observed was that everything froze for ten seconds, and then they ended up at the bottom of the document.

If you want only one instance of the callback at a time, you can switch from a periodic timer to a one-time timer, and have your callback schedule the next timer callback when it finishes.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

7 comments

Discussion is closed. Login to edit/delete existing comments.

  • Joe Beans

    You don't really run into this with async methods because framework "sleep" methods typically destroy the internal timer automatically after a single use. Nobody cares whether it's efficient or not because it composes better even for recurring timer ticks.

    I wrote a double-buffered graph control in the Win2K days that auto-scrolled to any new point placed beyond the limit of the graph. If any frame took more than N amount of time, I branched off into...

    Read more
  • Neil Rashbrook

    How does a BitBlt and an InvalidateRect take up half a second?

    • Raymond ChenMicrosoft employee Author

      There was complex layout going on, and the layout took more than 1/2 second. (The region that scrolled into view needs to be painted for the first time, so you also have to generate the bitmap to BitBlt *from*.)

    • Brian MacKay

      Note that Raymond mentioned that it was “in the early days”. Folks at Microsoft use software that is very far from being ready to show to anyone else. “Early Days” software often has “interesting” glitches

    • Alex Martin

      I think it’s an animated gradual scroll where the animation takes half a second.

  • cheong00

    This happens a lot on web world, when you have a grid that contain more than a few hundred items to load but don’t want paging for some reason (say, some auto-select feature for items that should select together but may fall between pages), when something like “infinite scroll” haven’t invented.

  • Richard Russell

    Faced with this scenario I will usually code my callback routine to detect a re-entrant call and return immediately in that case.