tl;dr;
In the next update to Windows, codenamed 19H1, D3D12 will allow drivers to use idle priority background CPU threads to dynamically recompile shader programs. This can improve GPU performance by specializing shader code to better match details of the hardware it is running on and/or the context in which it is being used. Developers don’t have to do anything to benefit from this feature – as drivers start to use it, existing shaders will automatically be tuned more efficiently. But developers who are profiling their code may wish to use the new SetBackgroundProcessingMode API to control how and when these optimizations take place.
How shader compilation is changing
Creating a D3D12 pipeline state object is a synchronous operation. The API call does not return until all shaders have been fully compiled into ready-to-execute GPU instructions. This approach is simple, provides deterministic performance, and gives sophisticated applications control over things like compiling shaders ahead of time or compiling several in parallel on different threads, but in other ways it is quite limiting.
Most D3D11 drivers, on the other hand, implement shader creation by automatically offloading compilation to a worker thread. This is transparent to the caller, and works well as long as the compilation has finished by the time the shader is needed. A sophisticated driver might do things like compiling the shader once quickly with minimal optimization so as to be ready for use as soon as possible, and then again using a lower priority thread with more aggressive (and hence time consuming) optimizations. Or the implementation might monitor how a shader is used, and over time recompile different versions of it, each one specialized to boost performance in a different situation. This kind of technique can improve GPU performance, but the lack of developer control isn’t ideal. It can be hard to schedule GPU work appropriately when you don’t know for sure when each shader is ready to use, and profiling gets tricky when drivers can swap the shader out from under you at any time! If you measure 10 times and get 10 different results, how can you be sure whether the change you are trying to measure was an improvement or not?
In the 19H1 update to Windows, D3D12 is adding support for background shader recompilation. Pipeline state creation remains synchronous, so (unlike with D3D11) you always know for sure exactly when a shader is ready to start rendering. But now, after the initial state object creation, drivers can submit background recompilation requests at any time. These run at idle thread priority so as not to interfere with the foreground application, and can be used to implement the same kinds of dynamic optimization that were possible with the D3D11 design. At the same time, we are adding an API to control this behavior during profiling, so D3D12 developers will still be able to measure just once and get one reliable result.
How to use it
- Have recent build of Windows 19H1 (as of this writing, available through the Windows Insider Program)
- Have a driver that implements this feature
- That’s it, you’re done!
Surely there’s more to it?
Well ok. While profiling, you probably want to use SetBackgroundProcessingMode to make sure these dynamic optimizations get applied before you take timing measurements. For example:
SetBackgroundProcessingMode( D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS, D3D_MEASUREMENTS_ACTION_KEEP_ALL, null, null); // prime the system by rendering some typical content, e.g. a level flythrough SetBackgroundProcessingMode( D3D12_BACKGROUND_PROCESSING_MODE_ALLOWED, D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS, null, null); // continue rendering, now with dynamic optimizations applied, and take your measurements
API details
Dynamic optimization state is controlled by a single new API:
HRESULT ID3D12Device6::SetBackgroundProcessingMode(D3D12_BACKGROUND_PROCESSING_MODE Mode, D3D12_MEASUREMENTS_ACTION MeasurementsAction, HANDLE hEventToSignalUponCompletion, _Out_opt_ BOOL* FurtherMeasurementsDesired); enum D3D12_BACKGROUND_PROCESSING_MODE { D3D12_BACKGROUND_PROCESSING_MODE_ALLOWED, D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS, D3D12_BACKGROUND_PROCESSING_MODE_DISABLE_BACKGROUND_WORK, D3D12_BACKGROUND_PROCESSING_MODE_DISABLE_PROFILING_BY_SYSTEM, }; enum D3D12_MEASUREMENTS_ACTION { D3D12_MEASUREMENTS_ACTION_KEEP_ALL, D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS, D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS_HIGH_PRIORITY, D3D12_MEASUREMENTS_ACTION_DISCARD_PREVIOUS, };
The BACKGROUND_PROCESSING_MODE setting controls what level of dynamic optimization will apply to GPU work that is submitted in the future:
- ALLOWED is the default setting. The driver may instrument workloads and dynamically recompile shaders in a low overhead, non-intrusive manner which avoids glitching the foreground workload.
- ALLOW_INTRUSIVE_MEASUREMENTS indicates that the driver may instrument as aggressively as possible. Causing glitches is fine while in this mode, because the current work is being submitted specifically to train the system.
- DISABLE_BACKGROUND_WORK means stop it! No background shader recompiles that chew up CPU cycles, please.
- DISABLE_PROFILING_BY_SYSTEM means no, seriously, stop it for real! I’m doing an A/B performance comparison, and need the driver not to change ANYTHING that could mess up my results.
MEASUREMENTS_ACTION, on the other hand, indicates what should be done with the results of earlier workload instrumentation:
- KEEP_ALL – nothing to see here, just carry on as you are.
- COMMIT_RESULTS indicates that whatever the driver has measured so far is all the data it is ever going to see, so it should stop waiting for more and go ahead compiling optimized shaders. hEventToSignalUponCompletion will be signaled when all resulting compilations have finished.
- COMMIT_RESULTS_HIGH_PRIORITY is like COMMIT_RESULTS, but also indicates the app does not care about glitches, so the runtime should ignore the usual idle priority rules and go ahead using as many threads as possible to get shader recompiles done fast.
- DISCARD_PREVIOUS requests to reset the optimization state, hinting that whatever has previously been measured no longer applies.
Note that the DISABLE_BACKGROUND_WORK, DISABLE_PROFILING_BY_SYSTEM, and COMMIT_RESULTS_HIGH_PRIORITY options are only available in developer mode.
What about PIX?
PIX will automatically use SetBackgroundProcessingMode, first to prime the system and then to prevent any further changes from taking place in the middle of its analysis. It will wait on an event to make sure all background shader recompiles have finished before it starts taking measurements.
Since this will be handled automatically by PIX, the detail is only relevant if you’re building a similar tool of your own:
BOOL wantMoreProfiling = true; int tries = 0; while (wantMoreProfiling && ++tries < MaxPassesInCaseDriverDoesntConverge) { SetBackgroundProcessingMode( D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS, (tries == 0) ? D3D12_MEASUREMENTS_ACTION_DISCARD_PREVIOUS : D3D12_MEASUREMENTS_ACTION_KEEP_ALL, null, null); // play back the frame that is being analyzed SetBackgroundProcessingMode( D3D12_BACKGROUND_PROCESSING_MODE_DISABLE_PROFILING_BY_SYSTEM, D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS_HIGH_PRIORITY, handle, &wantMoreProfiling); WaitForSingleObject(handle); } // play back the frame 1+ more times while collecting timing data, // recording GPU counters, doing A/B perf comparisons, etc.
Is this feature already available in a regular Window SDK? It appears 10.0.17763.0 only goes up to ID3D12Device5 (and SetBackgroundProcessingMode requires ID3D12Device6), so I suspect this is still limited to Insider only builds then?
Thanks! Looking forward to giving this a try.
This shipped in the 19H1 update to Win10. 17763 is the older RS5 SDK.