{"id":12887,"date":"2026-02-26T09:55:15","date_gmt":"2026-02-26T17:55:15","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/directx\/?p=12887"},"modified":"2026-03-04T14:15:28","modified_gmt":"2026-03-04T22:15:28","slug":"shader-execution-reordering","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/directx\/shader-execution-reordering\/","title":{"rendered":"D3D12 Shader Execution Reordering"},"content":{"rendered":"<hr \/>\n<p>Now officially released, Shader Execution Reordering (SER) is an addition to DirectX Raytracing that enables application shader code inform hardware how to find coherency across rays so they can be sorted to execute better in parallel.\u00a0 SER support is a required feature in Shader Model 6.9, meaning all drivers must accept shader code using SER.\u00a0 It&#8217;s up to individual devices to take advantage if possible.<\/p>\n<p>At GDC 2025 DXR 1.2 was announced including SER, and you can see it discussed in this: <a href=\"https:\/\/youtu.be\/CR-5FhfF5kQ?t=978\" target=\"_blank\" rel=\"noopener\">GDC DirectX State Of The Union YouTube Recording<\/a>. <span class=\"TextRun SCXW6712136 BCX8\" lang=\"EN-US\" xml:lang=\"EN-US\" data-contrast=\"auto\"><span class=\"NormalTextRun SCXW6712136 BCX8\">In the video, Remedy showed\u00a0<\/span><span class=\"NormalTextRun SCXW6712136 BCX8\">raytracing cost reduced by\u00a0<\/span><span class=\"NormalTextRun SCXW6712136 BCX8\">1<\/span><span class=\"NormalTextRun SCXW6712136 BCX8\">\/3\u00a0<\/span><span class=\"NormalTextRun SCXW6712136 BCX8\">using a synergistic combination of OMMs and Shader Execution Reordering in Alan Wake 2.<\/span><\/span><span class=\"EOP SCXW6712136 BCX8\" data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<p>The rest of this blog summarizes the feature, how to get bits, and highlights some sample code to help get started.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/directx\/shader-model-6-9-retail-and-more\/\" target=\"_blank\" rel=\"noopener\">Parent blog for all other features in this release.<\/a><\/p>\n<hr \/>\n<h3>Overview<\/h3>\n<p>Because of the stochastic nature of many raytracing workloads, DXR applications often suffer from divergent shader execution and divergent data access. Tackling the problem with application-side logic has many downsides, both in terms of achievable performance and developer effort. The existing DXR API allows implementations to dynamically schedule shading work triggered by TraceRay and CallShader, but does not offer a way for the application to control scheduling in any way. Shader Execution Reordering (SER) fills this gap by introducing HLSL primitives that enable application-controlled reordering of work across the GPU for improved execution and data coherence.<\/p>\n<p>Furthermore, the current TraceRay pipeline of traversal and ClosestHit\/Miss shading is not always flexible enough. First, common code, such as vertex fetch and interpolation, must be duplicated in all ClosestHit shaders. Second, simple visibility rays must unnecessarily execute hit shaders in order to access basic information about the hit. To address these problems, the concept of a HitObject decouples raytracing traversal (including AnyHit shading and Intersection shading) from ClosestHit and Miss shading. This enables arbitrary RayGeneration code to execute between traversal, execution reordering, and ClosestHit\/Miss handling, and allows ClosestHit\/Miss dispatch starting from hit information from sources other than traversal, such as RayQuery.<\/p>\n<p>The combination of HitObject and SER is particularly powerful and enables reordering for execution and data coherence using information in the HitObject and additional hints supplied by the user. The result is further improved coherence potential for hit\/miss processing.<\/p>\n<hr \/>\n<h3>Specification (Docs)<\/h3>\n<p>For full documentation see the <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#shader-execution-reordering\" target=\"_blank\" rel=\"noopener\">Shader Execution Reordering<\/a> section of the DXR spec.<\/p>\n<p>The DXR spec also has a <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#d3d12_raytracing_tier\" target=\"_blank\" rel=\"noopener\">section describing D3D12_RAYTRACING_TIER_1_2<\/a> including how SER fits in.<\/p>\n<hr \/>\n<h3>Availability<\/h3>\n<p>SER is a required part of Shader Model 6.9. This requires:<\/p>\n<ul>\n<li>AgilitySDK 1.619 available <a href=\"https:\/\/devblogs.microsoft.com\/directx\/directx12agility\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/li>\n<li>DXC with Shader Model 6.9 support available <a href=\"https:\/\/github.com\/microsoft\/DirectXShaderCompiler\/releases\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/li>\n<\/ul>\n<p><strong>Device support:<\/strong><\/p>\n<p>For device and driver support see: <a href=\"https:\/\/devblogs.microsoft.com\/directx\/shader-model-6-9-retail-and-more\/\">https:\/\/devblogs.microsoft.com\/directx\/shader-model-6-9-retail-and-more\/<\/a><\/p>\n<p>Make sure raytracing is supported by checking the raytracing tier (not shown here).\u00a0 There is a <code>D3D12_RAYTRACING_TIER_1_2<\/code> tier that can be queried, but that means all the features in this tier are supported: SER and Opacity Micromaps. If only SER is needed, just check for Shader Model 6.9 and <code>D3D12_RAYTRACING_TIER_1_0\/1_1<\/code> as needed.<\/p>\n<p>Also see <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#ser-device-support\">https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#ser-device-support<\/a>, which also discusses a device query that reports if it actually tries to do thread sorting requested by use of the SER feature, and it isn&#8217;t just a no-op. This can be useful during development and testing particular devices, or if an app wanted to do its own manual sorting if SER wasn&#8217;t going to actually sort.<\/p>\n<hr \/>\n<h3>PIX<\/h3>\n<p>As usual SER comes with Day One PIX support. Please read the <a href=\"https:\/\/devblogs.microsoft.com\/pix\/pix-2602-25\" target=\"_blank\" rel=\"noopener\">PIX blog post<\/a> for more information.<\/p>\n<hr \/>\n<h3>NVIDIA Sample<\/h3>\n<p><a href=\"https:\/\/github.com\/NVIDIA-RTX\/RTXPT\" target=\"_blank\" rel=\"noopener\">RTX Path Tracing<\/a> is a code sample that strives to embody years of raytracing and neural graphics research and experience. It is intended as a starting point for a path tracer integration, as a reference for various integrated SDKs, and\/or for learning and experimentation. This now has a DXR path with SER.\u00a0 In the codebase, look for <span data-ogsc=\"black\" data-olk-copy-source=\"MessageBody\">relevant code guarded with &#8220;<\/span><span data-ogsc=\"black\">USE_DX_HIT_OBJECT_EXTENSION<\/span><span data-ogsc=\"black\">&#8220;.<\/span><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/RTXPathTracing.png\"><img decoding=\"async\" class=\"alignnone size-full wp-image-11837\" src=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/RTXPathTracing.png\" alt=\"RTXPathTracing image\" width=\"1280\" height=\"720\" srcset=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/RTXPathTracing.png 1280w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/RTXPathTracing-300x169.png 300w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/RTXPathTracing-1024x576.png 1024w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/RTXPathTracing-768x432.png 768w\" sizes=\"(max-width: 1280px) 100vw, 1280px\" \/><\/a><\/p>\n<hr \/>\n<h3>Simple Microsoft SER Sample<\/h3>\n<p>The <strong>D3D12RaytracingHelloShaderExecutionReordering<\/strong> modifies the original <strong>D3D12RaytracingHelloWorld<\/strong> sample to minimally demonstrate various uses of Shader Execution Reordering and showing performance gains described below.<\/p>\n<p>[<strong>Edit, added this note based on some of the understandable press\/public response to the juicy numbers here<\/strong>: Keep in mind perf numbers in synthetic test like this shouldn&#8217;t be expected to translate to the same performance in games where so many other factors are at play.\u00a0 Like how much game rendering workload happens to be a direct candidate for SER to help, and then how much SER ends up being able to actually help with those workloads.\u00a0 It&#8217;s reasonable to expect efficiency gains as developers adopt the feature over time and device implementations improve, however expect less than the best-case numbers here.]<\/p>\n<p><strong>D3D12RaytracingHelloShaderExecutionReordering<\/strong> can be found in the DirectX-Graphics-Samples repo on github <a href=\"https:\/\/github.com\/microsoft\/DirectX-Graphics-Samples\/tree\/master\/Samples\/Desktop\/D3D12Raytracing\/src\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>This sample simply draws a fullscreen quad with triangle barycentrics used as the pixel color. Each ray does some artificial work when shading, and some proportion of rays do a heavier artificial workload, rendered white (vertical stripes).\nThe Ray Generation Shader uses SER to tell the system which threads will be more expensive so it can try to sort similar threads to be together.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/HelloSERScreenshot.jpg\"><img decoding=\"async\" class=\"alignnone size-full wp-image-11986\" src=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/HelloSERScreenshot.jpg\" alt=\"HelloSERScreenshot image\" width=\"1598\" height=\"936\" srcset=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/HelloSERScreenshot.jpg 1598w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/HelloSERScreenshot-300x176.jpg 300w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/HelloSERScreenshot-1024x600.jpg 1024w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/HelloSERScreenshot-768x450.jpg 768w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/9999\/05\/HelloSERScreenshot-1536x900.jpg 1536w\" sizes=\"(max-width: 1598px) 100vw, 1598px\" \/><\/a><\/p>\n<p>The shader file, <code>Raytracing.hlsl<\/code> contains some configuration options that can be tweaked before running the app, where the shader is compiled at launch. The options allow\ncomparing the performance of ways of using SER, as well as not using SER at all. In fact the mechanics SER can be understood simply by playing with this shader file and running the app, ignoring the rest of the boilerplate C++ code in the sample.<\/p>\n<p>Using SER with the settings below running on an NVIDIA RTX 4090 showed a <strong>40%<\/strong> framerate increase versus not using SER, and a couple of configurations of Intel Arc B-Series GPUs each showed a <strong>90%<\/strong> framerate increase.<\/p>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">\r\n\/\/*********************************************************\r\n\/\/ Configuration options\r\n\/\/*********************************************************\r\n\r\n\/\/ TraceRay the old fashioned way\r\n\/\/#define USE_ORIGINAL_TRACERAY_NO_SER\r\n\r\n\/\/ Call MaybeReorderThread(sortKey,1), sortKey is 1 bit \r\n\/\/ indicating if the thread has dummy work\r\n#define REQUEST_REORDER\r\n\r\n\/\/ Don't invoke ClosestHit or Miss shaders, use hitObject \r\n\/\/ properties in RayGen to shade\r\n\/\/#define SKIP_INVOKE_INSTEAD_SHADE_IN_RAYGEN\r\n\r\n\/\/ Rays do loop a of artificial work in the \r\n\/\/ Closest Hit shader.  This setting makes \r\n\/\/ some rays looping more than others (a sort candidate):\r\n#define USE_VARYING_ARTIFICIAL_WORK\r\n\r\n\/\/ Number of iterations in the heavy artificial work loop\r\n#define WORK_LOOP_ITERATIONS_HEAVY 5000\r\n\r\n\/\/ Number of iterations in the light artificial work loop\r\n#define WORK_LOOP_ITERATIONS_LIGHT 1000\r\n\r\n\/\/ N, where 1\/N is the proportion of rays that do the \r\n\/\/ heavy artificial work load\r\n#define RAYS_WITH_HEAVY_WORK_FRACTION 4\r\n\r\n\/\/ Put all the rays with dummy work on the left side\r\n\/\/ #define SPATIALLY_SORTED\r\n\r\n\/\/*********************************************************\r\n<\/code><\/pre>\n<p>Below is the sample&#8217;s Ray Generation Shader\nillustrating various basic uses of SER via the above options. Notice\nthat when SER is used, <code>TraceRay<\/code> returns a <code>HitObject<\/code>.<\/p>\n<p>Depending on the config, the shader can call <code>MaybeReorderThread()<\/code>, in this\ncase taking a shader defined sort key, though there&#8217;s another variant not shown that takes\nthe hit object and sorts on its properties.<\/p>\n<p>Finally, depending on the config, the shader can call <code>HitObject::Invoke()<\/code> to\nrun Closest Hit or Miss Shader on the hit, or not bother calling <code>Invoke()<\/code> at\nall and do shading locally based on hit object properties. In this case shading is based\non hit attributes (barycentrics) returned via <code>hit.GetAttributes()<\/code>.<\/p>\n<pre class=\"prettyprint language-cpp\"><code class=\"language-cpp\">\r\nusing namespace dx; \/\/ dx::HitObject and dx::MaybeReorderThread\r\n[shader(\"raygeneration\")]\r\nvoid MyRaygenShader()\r\n{\r\n    RayDesc ray = \r\n        SetupRay(DispatchRaysIndex(), DispatchRaysDimensions()); \r\n\r\n    uint iterations = WORK_LOOP_ITERATIONS_LIGHT;\r\n\r\n    #ifdef USE_VARYING_ARTIFICIAL_WORK\r\n\r\n        #ifdef SPATIALLY_SORTED\r\n            \/\/ Extra work is all on left side of screen\r\n            if((origin.x + 1)\/2.f &lt;= 1.f\/RAYS_WITH_HEAVY_WORK_FRACTION)\r\n            {\r\n                iterations = WORK_LOOP_ITERATIONS_HEAVY; \r\n            }\r\n        #else\r\n            \/\/ Extra work distributed in vertical bands\r\n            if( (DispatchRaysIndex().x) % RAYS_WITH_HEAVY_WORK_FRACTION == 0 )\r\n            {\r\n                iterations = WORK_LOOP_ITERATIONS_HEAVY; \r\n            }\r\n        #endif\r\n\r\n    #endif\r\n\r\n    RayPayload payload = { float4(0, 0, 0, 0), iterations };\r\n    float4 color = float4(1,1,1,1);\r\n\r\n    #ifdef USE_ORIGINAL_TRACERAY_NO_SER\r\n        TraceRay(Scene, RAY_FLAG_NONE, ~0, 0, 1, 0, ray, payload);\r\n        color = payload.color;\r\n    #else\r\n\r\n        HitObject hit = \r\n            HitObject::TraceRay(Scene, RAY_FLAG_NONE, ~0, 0, 1, 0, \r\n                                ray, payload);\r\n\r\n        #ifdef REQUEST_REORDER\r\n            int sortKey = iterations != WORK_LOOP_ITERATIONS_LIGHT ? 1:0;\r\n            dx::MaybeReorderThread(sortKey, 1);\r\n\r\n            \/\/ There's currently a DXC bug that causes \"using namespace dx;\" \r\n            \/\/ (at the top) to generate bad DXIL for MaybeReorderThread, \r\n            \/\/ so it's explicitly scoped here. The namespace works fine for \r\n            \/\/ HitObject\r\n        #endif\r\n\r\n        #ifdef SKIP_INVOKE_INSTEAD_SHADE_IN_RAYGEN\r\n            if(hit.IsHit())\r\n            {\r\n                MyAttributes attr = hit.GetAttributes();\r\n                color = ClosestHitWorker(attr,iterations);\r\n            }\r\n            else\r\n            {\r\n                color = MissWorker();\r\n            }\r\n\r\n        #else\r\n            HitObject::Invoke(hit, payload);\r\n            color = payload.color;\r\n        #endif\r\n\r\n    #endif\r\n\r\n    \/\/ Write the raytraced color to the output texture.\r\n    RenderTarget[DispatchRaysIndex().xy] = color;\r\n}\r\n<\/code><\/pre>\n<hr \/>\n","protected":false},"excerpt":{"rendered":"<p>Now officially released, Shader Execution Reordering (SER) is an addition to DirectX Raytracing that enables application shader code inform hardware how to find coherency across rays so they can be sorted to execute better in parallel.\u00a0 SER support is a required feature in Shader Model 6.9, meaning all drivers must accept shader code using SER.\u00a0 [&hellip;]<\/p>\n","protected":false},"author":8584,"featured_media":12651,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-12887","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-directx"],"acf":[],"blog_post_summary":"<p>Now officially released, Shader Execution Reordering (SER) is an addition to DirectX Raytracing that enables application shader code inform hardware how to find coherency across rays so they can be sorted to execute better in parallel.\u00a0 SER support is a required feature in Shader Model 6.9, meaning all drivers must accept shader code using SER.\u00a0 [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/12887","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/users\/8584"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/comments?post=12887"}],"version-history":[{"count":2,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/12887\/revisions"}],"predecessor-version":[{"id":13102,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/12887\/revisions\/13102"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media\/12651"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media?parent=12887"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/categories?post=12887"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/tags?post=12887"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}