Limiting concurrency for faster and more responsive apps
When you have a set of highly parallelizeable work, executing it concurrently can be easy:
Of course you’d probably want to track the work at least so you know when it’s done:
Task.Run schedules the work to run on the .NET ThreadPool which is highly tuned and can likely get the work done as fast as you have CPUs to do the work as it tends to schedule at least as many threads as the machine has CPU cores.
The problem with the above code is that it floods the threadpool queue with items all at once. So if you have 10 items on your list, and only two cores, the queue and threads look like this:
Queue Active XXXXXXXX XX
Now suppose after you add all this work to the queue, your app adds one more item Y to the queue:
Queue Active YXXXXXXXX XX
Notice its placement is *behind* all that work you previously enqueued. When Y is more urgent than X, you just have to wait till all the X items are done anyway before Y gets time:
Queue Active YXXXX XX Queue Active YX XX Queue Active YX
That can be very bad for application responsiveness or perceived performance if the user is waiting for Y but doesn’t care (as much) how long X takes.
We can solve this by applying a simple throttle to the generator of X tasks, such that X items are progressively enqueued over time so that Y can get attention more quickly. If we assume that X has already started, the timeline on the queue may look like this:
Queue Active YXX XX Queue Active XXY XX Queue Active XXX XY Queue Active XXX XX
Do you see how the queue is never empty (guaranteeing all cores are still busy) but Y got through much more quickly? It’s a fairness scheme. And it ensures that X still runs very quickly in the absence of other work, and when other work exists, it shares the threadpool much more equitably.
Encoding this is actually quite simple if we use the ConcurrentExclusiveSchedulerPair class. Suppose our goal is to keep all cores busy while maintaining a short threadpool queue, no matter how many items of work you have to complete. Just a few extra lines is all we need to achieve our fairness scheme:
In the above, we’re using a concurrent TaskScheduler that comes with .NET that allows us to throttle the rate at which work is passed to the ThreadPool. By scheduling items for twice as many cores as the machine has, we will maximize CPU usage and have one item on the queue for each, ready and waiting to go as soon as each of the threadpool threads finishes work. As each work item completes, the TaskScheduler we’re using will add another item to the threadpool’s queue, ensuring that it stays busy. But importantly, it also keeps the threadpool queue short at any given time, ensuring that our Y task that comes in from another source can be executed in a reasonably short time.
The ConcurrentExclusiveSchedulerPair approach works great if you’re executing synchronous code, or async code that does not use
.ConfigureAwait(false) everywhere. Otherwise, throttling async code that uses
.ConfigureAwait(false) requires another technique like use of SemaphoreSlim.
To drive the point home that throttling your concurrency with this technique achieves better responsiveness without sacrificing throughput of your work, I wrote a little console app that schedules the same work both with and without throttling. Then I wrote it again to demonstrate an async work variant.
The output of the synchronous flavor on my machine is:
*** WITHOUT throttling *** Race has begun. Threadpool responded in: 00:00:05.7263928 Threadpool responded in: 00:00:00.0000353 FINISHED in 00:00:06.2941732 *** WITH throttling *** Race has begun. Threadpool responded in: 00:00:00.0000364 Threadpool responded in: 00:00:00.0000309 Threadpool responded in: 00:00:00.0000722 Threadpool responded in: 00:00:00.0000335 Threadpool responded in: 00:00:00.0000313 Threadpool responded in: 00:00:00.0000262 Threadpool responded in: 00:00:00.0000980 Threadpool responded in: 00:00:00.0000386 Threadpool responded in: 00:00:00.0000255 Threadpool responded in: 00:00:00.0000780 FINISHED in 00:00:04.6851826
The output of the async flavor on my machine is:
*** WITHOUT throttling *** Race has begun. Threadpool responded in: 00:00:04.2789838 Threadpool responded in: 00:00:01.6836769 Threadpool responded in: 00:00:00.0000324 FINISHED in 00:00:07.0103151 *** WITH throttling *** Race has begun. Threadpool responded in: 00:00:00.0000171 Threadpool responded in: 00:00:00.0000080 Threadpool responded in: 00:00:00.0000342 Threadpool responded in: 00:00:00.0000222 Threadpool responded in: 00:00:00.0000346 Threadpool responded in: 00:00:00.0000361 Threadpool responded in: 00:00:00.0000295 Threadpool responded in: 00:00:00.0000226 Threadpool responded in: 00:00:00.0000397 Threadpool responded in: 00:00:00.0000306 Threadpool responded in: 00:00:00.0000670 Threadpool responded in: 00:00:00.0000309 Threadpool responded in: 00:00:00.0000299 FINISHED in 00:00:06.2486978
I find it very interesting the throttled run is even faster in these tests. Your mileage may vary, but what is very clear and will be consistent is that you can see the threadpool is much more responsive to other work in the throttled run.
And that my friends, is the point I’d like you to take from this post.