{"id":2151,"date":"2019-04-16T08:23:05","date_gmt":"2019-04-16T15:23:05","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/directx\/?p=2151"},"modified":"2019-04-16T08:29:18","modified_gmt":"2019-04-16T15:29:18","slug":"background-shader-optimizations","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/directx\/background-shader-optimizations\/","title":{"rendered":"New in D3D12 \u2013 background shader optimizations"},"content":{"rendered":"<h2>tl;dr;<\/h2>\n<p>In the next update to Windows, codenamed 19H1, D3D12 will allow drivers to use idle priority background CPU threads to dynamically recompile shader programs. This can improve GPU performance by specializing shader code to better match details of the hardware it is running on and\/or the context in which it is being used. Developers don\u2019t have to do anything to benefit from this feature \u2013 as drivers start to use it, existing shaders will automatically be tuned more efficiently. But developers who are profiling their code may wish to use the new SetBackgroundProcessingMode API to control how and when these optimizations take place.<\/p>\n<h2>How shader compilation is changing<\/h2>\n<p>Creating a D3D12 pipeline state object is a synchronous operation. The API call does not return until all shaders have been fully compiled into ready-to-execute GPU instructions. This approach is simple, provides deterministic performance, and gives sophisticated applications control over things like compiling shaders ahead of time or compiling several in parallel on different threads, but in other ways it is quite limiting.<\/p>\n<p>Most D3D11 drivers, on the other hand, implement shader creation by automatically offloading compilation to a worker thread. This is transparent to the caller, and works well as long as the compilation has finished by the time the shader is needed. A sophisticated driver might do things like compiling the shader once quickly with minimal optimization so as to be ready for use as soon as possible, and then again using a lower priority thread with more aggressive (and hence time consuming) optimizations. Or the implementation might monitor how a shader is used, and over time recompile different versions of it, each one specialized to boost performance in a different situation. This kind of technique can improve GPU performance, but the lack of developer control isn\u2019t ideal. It can be hard to schedule GPU work appropriately when you don\u2019t know for sure when each shader is ready to use, and profiling gets tricky when drivers can swap the shader out from under you at any time! If you measure 10 times and get 10 different results, how can you be sure whether the change you are trying to measure was an improvement or not?<\/p>\n<p>In the 19H1 update to Windows, D3D12 is adding support for background shader recompilation. Pipeline state creation remains synchronous, so (unlike with D3D11) you always know for sure exactly when a shader is ready to start rendering. But now, after the initial state object creation, drivers can submit background recompilation requests at any time. These run at idle thread priority so as not to interfere with the foreground application, and can be used to implement the same kinds of dynamic optimization that were possible with the D3D11 design. At the same time, we are adding an API to control this behavior during profiling, so D3D12 developers will still be able to measure just once and get one reliable result.<\/p>\n<h2>How to use it<\/h2>\n<ol>\n<li>Have recent build of Windows 19H1 (as of this writing, available through the Windows Insider Program)<\/li>\n<li>Have a driver that implements this feature<\/li>\n<li>That\u2019s it, you\u2019re done!<\/li>\n<\/ol>\n<h2>Surely there\u2019s more to it?<\/h2>\n<p>Well ok. While profiling, you probably want to use SetBackgroundProcessingMode to make sure these dynamic optimizations get applied before you take timing measurements. For example:<\/p>\n<pre>SetBackgroundProcessingMode(\r\n    D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS,\r\n    D3D_MEASUREMENTS_ACTION_KEEP_ALL,\r\n    null, null);\r\n\r\n\/\/ prime the system by rendering some typical content, e.g. a level flythrough\r\n\r\nSetBackgroundProcessingMode(\r\n    D3D12_BACKGROUND_PROCESSING_MODE_ALLOWED,\r\n    D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS,\r\n    null, null);\r\n\r\n\/\/ continue rendering, now with dynamic optimizations applied, and take your measurements<\/pre>\n<h2>API details<\/h2>\n<p>Dynamic optimization state is controlled by a single new API:<\/p>\n<pre>HRESULT ID3D12Device6::SetBackgroundProcessingMode(D3D12_BACKGROUND_PROCESSING_MODE Mode,\r\n                                                   D3D12_MEASUREMENTS_ACTION MeasurementsAction,\r\n                                                   HANDLE hEventToSignalUponCompletion,\r\n                                                   _Out_opt_ BOOL* FurtherMeasurementsDesired);\r\n\r\nenum D3D12_BACKGROUND_PROCESSING_MODE\r\n{\r\n    D3D12_BACKGROUND_PROCESSING_MODE_ALLOWED,\r\n    D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS,\r\n    D3D12_BACKGROUND_PROCESSING_MODE_DISABLE_BACKGROUND_WORK,\r\n    D3D12_BACKGROUND_PROCESSING_MODE_DISABLE_PROFILING_BY_SYSTEM,\r\n};\r\n\r\nenum D3D12_MEASUREMENTS_ACTION\r\n{\r\n    D3D12_MEASUREMENTS_ACTION_KEEP_ALL,\r\n    D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS,\r\n    D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS_HIGH_PRIORITY,\r\n    D3D12_MEASUREMENTS_ACTION_DISCARD_PREVIOUS,\r\n};<\/pre>\n<p>The BACKGROUND_PROCESSING_MODE setting controls what level of dynamic optimization will apply to GPU work that is submitted in the future:<\/p>\n<ul>\n<li><strong>ALLOWED<\/strong> is the default setting. The driver may instrument workloads and dynamically recompile shaders in a low overhead, non-intrusive manner which avoids glitching the foreground workload.<\/li>\n<li><strong>ALLOW_INTRUSIVE_MEASUREMENTS<\/strong> indicates that the driver may instrument as aggressively as possible. Causing glitches is fine while in this mode, because the current work is being submitted specifically to train the system.<\/li>\n<li><strong>DISABLE_BACKGROUND_WORK<\/strong> means stop it! No background shader recompiles that chew up CPU cycles, please.<\/li>\n<li><strong>DISABLE_PROFILING_BY_SYSTEM<\/strong> means no, seriously, stop it for real! I\u2019m doing an A\/B performance comparison, and need the driver not to change ANYTHING that could mess up my results.<\/li>\n<\/ul>\n<p>MEASUREMENTS_ACTION, on the other hand, indicates what should be done with the results of earlier workload instrumentation:<\/p>\n<ul>\n<li><strong>KEEP_ALL<\/strong> &#8211; nothing to see here, just carry on as you are.<\/li>\n<li><strong>COMMIT_RESULTS<\/strong> indicates that whatever the driver has measured so far is all the data it is ever going to see, so it should stop waiting for more and go ahead compiling optimized shaders. hEventToSignalUponCompletion will be signaled when all resulting compilations have finished.<\/li>\n<li><strong>COMMIT_RESULTS_HIGH_PRIORITY<\/strong> is like COMMIT_RESULTS, but also indicates the app does not care about glitches, so the runtime should ignore the usual idle priority rules and go ahead using as many threads as possible to get shader recompiles done fast.<\/li>\n<li><strong>DISCARD_PREVIOUS<\/strong> requests to reset the optimization state, hinting that whatever has previously been measured no longer applies.<\/li>\n<\/ul>\n<p>Note that the DISABLE_BACKGROUND_WORK, DISABLE_PROFILING_BY_SYSTEM, and COMMIT_RESULTS_HIGH_PRIORITY options are only available in developer mode.<\/p>\n<h2>What about PIX?<\/h2>\n<p>PIX will automatically use SetBackgroundProcessingMode, first to prime the system and then to prevent any further changes from taking place in the middle of its analysis. It will wait on an event to make sure all background shader recompiles have finished before it starts taking measurements.<\/p>\n<p>Since this will be handled automatically by PIX, the detail is only relevant if you\u2019re building a similar tool of your own:<\/p>\n<pre>BOOL wantMoreProfiling = true;\r\nint tries = 0;\r\n\r\nwhile (wantMoreProfiling &amp;&amp; ++tries &lt; MaxPassesInCaseDriverDoesntConverge)\r\n{\r\n    SetBackgroundProcessingMode(\r\n        D3D12_BACKGROUND_PROCESSING_MODE_ALLOW_INTRUSIVE_MEASUREMENTS,\r\n        (tries == 0) ? D3D12_MEASUREMENTS_ACTION_DISCARD_PREVIOUS : D3D12_MEASUREMENTS_ACTION_KEEP_ALL,\r\n        null, null);\r\n\r\n    \/\/ play back the frame that is being analyzed\r\n\r\n    SetBackgroundProcessingMode(\r\n        D3D12_BACKGROUND_PROCESSING_MODE_DISABLE_PROFILING_BY_SYSTEM,\r\n        D3D12_MEASUREMENTS_ACTION_COMMIT_RESULTS_HIGH_PRIORITY,\r\n        handle,\r\n        &amp;wantMoreProfiling);\r\n\r\n    WaitForSingleObject(handle);\r\n}\r\n\r\n\/\/ play back the frame 1+ more times while collecting timing data,\r\n\/\/ recording GPU counters, doing A\/B perf comparisons, etc.<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>tl;dr; In the next update to Windows, codenamed 19H1, D3D12 will allow drivers to use idle priority background CPU threads to dynamically recompile shader programs. This can improve GPU performance by specializing shader code to better match details of the hardware it is running on and\/or the context in which it is being used. Developers [&hellip;]<\/p>\n","protected":false},"author":1719,"featured_media":12651,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2151","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-directx"],"acf":[],"blog_post_summary":"<p>tl;dr; In the next update to Windows, codenamed 19H1, D3D12 will allow drivers to use idle priority background CPU threads to dynamically recompile shader programs. This can improve GPU performance by specializing shader code to better match details of the hardware it is running on and\/or the context in which it is being used. Developers [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/2151","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/users\/1719"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/comments?post=2151"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/2151\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media\/12651"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media?parent=2151"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/categories?post=2151"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/tags?post=2151"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}