{"id":2523,"date":"2019-11-06T00:30:51","date_gmt":"2019-11-06T08:30:51","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/directx\/?p=2523"},"modified":"2020-04-16T12:24:25","modified_gmt":"2020-04-16T19:24:25","slug":"dxr-1-1","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/directx\/dxr-1-1\/","title":{"rendered":"DirectX Raytracing (DXR) Tier 1.1"},"content":{"rendered":"<hr \/>\n<p>Real-time raytracing is still in its very early days, so unsurprisingly there is plenty of room for the industry to move forward.\u00a0 Since the launch of DXR, the initial wave of feedback has resulted in a set of new features collectively named Tier 1.1.<\/p>\n<p>An earlier <a href=\"https:\/\/devblogs.microsoft.com\/directx\/dev-preview-of-new-directx-12-features\/\">blog post<\/a> concisely summarizes these raytracing features along with other DirectX features coming at the same time.<\/p>\n<p>This post discusses each new raytracing feature individually.\u00a0 The <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md\">DXR<\/a> spec has the full definitions, starting with its <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#D3D12_RAYTRACING_TIER\">Tier 1.1 summary<\/a>.<\/p>\n<hr \/>\n<h4>Topics<\/h4>\n<p><a href=\"#inline-raytracing\">Inline raytracing<\/a>\n<a href=\"#executeindirect\">DispatchRays() calls via ExecuteIndirect()<\/a>\n<a href=\"#addtostateobject\">Growing state objects via\u00a0AddToStateObject()<\/a>\n<a href=\"#additionalvertexformats\">Additional vertex formats for acceleration structure build<\/a>\n<a href=\"#geometryindex\">GeometryIndex()\u00a0in raytracing shaders<\/a>\n<a href=\"#flags\">Raytracing flags\/configuration tweaks<\/a>\n<a href=\"#support\">Support<\/a><\/p>\n<hr \/>\n<h3><a id=\"inline-raytracing\"><\/a>Inline raytracing<\/h3>\n<p>(<a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#inline-raytracing\">link to spec<\/a>)<\/p>\n<p>Inline raytracing is an alternative form of raytracing that doesn&#8217;t use any separate dynamic shaders or shader tables.\u00a0 It is available in any shader stage, including compute shaders, pixel shaders etc. Both the dynamic-shading and inline forms of raytracing use the same opaque acceleration structures.<\/p>\n<p>Inline raytracing in shaders starts with instantiating a <code>RayQuery<\/code> object as a local variable, acting as a state machine for ray query with a relatively large state footprint.\u00a0 The shader interacts with the <code>RayQuery<\/code> object\u2019s methods to advance the query through an acceleration structure and query traversal information.<\/p>\n<p>The API hides access to the acceleration structure (e.g. data structure traversal, box, triangle intersection), leaving it to the hardware\/driver.\u00a0 All necessary app code surrounding these fixed-function acceleration structure accesses, for handling both enumerated candidate hits and the result of a query (e.g. hit vs miss), can be self-contained in the shader driving the <code>RayQuery<\/code>.<\/p>\n<p>The <code>RayQuery<\/code> object is instantiated with optional ray flags as a template parameter.\u00a0 For example in a simple shadow scenario, the shader may declare it only wants to visit opaque triangles and to stop traversing at the first hit.\u00a0 Here, the <code>RayQuery<\/code> would be declared as:<\/p>\n<pre class=\"\"><code>\r\n    RayQuery&lt;RAY_FLAG_CULL_NON_OPAQUE |\r\n             RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES |\r\n             RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH&gt; myQuery;\r\n<\/code><\/pre>\n<p>This sets up shared expectations:\u00a0It enables both the shader author and driver compiler to produce only necessary code and state.<\/p>\n<h4>Example<\/h4>\n<p>The spec contains some <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#tracerayinline-control-flow\">illustrative state diagrams<\/a> and <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#tracerayinline-examples\">pseudo-code examples<\/a>. The simplest of these examples is shown here:<\/p>\n<div>\n<div>\n<pre class=\"\"><code>\r\nRaytracingAccelerationStructure myAccelerationStructure : register(t3);\r\n\r\nfloat4 MyPixelShader(float2 uv : TEXCOORD) : SV_Target0\r\n{\r\n    ...\r\n    \/\/ Instantiate ray query object.\r\n    \/\/ Template parameter allows driver to generate a specialized\r\n    \/\/ implementation.\r\n    RayQuery&lt;RAY_FLAG_CULL_NON_OPAQUE |\r\n             RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES |\r\n             RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH&gt; q;\r\n\r\n    \/\/ Set up a trace.  No work is done yet.\r\n    q.TraceRayInline(\r\n        myAccelerationStructure,\r\n        myRayFlags, \/\/ OR'd with flags above\r\n        myInstanceMask,\r\n        myRay);\r\n\r\n    \/\/ Proceed() below is where behind-the-scenes traversal happens,\r\n    \/\/ including the heaviest of any driver inlined code.\r\n    \/\/ In this simplest of scenarios, Proceed() only needs\r\n    \/\/ to be called once rather than a loop:\r\n    \/\/ Based on the template specialization above,\r\n    \/\/ traversal completion is guaranteed.\r\n    q.Proceed();\r\n\r\n    \/\/ Examine and act on the result of the traversal.\r\n    \/\/ Was a hit committed?\r\n    if(q.CommittedStatus()) == COMMITTED_TRIANGLE_HIT)\r\n    {\r\n        ShadeMyTriangleHit(\r\n            q.CommittedInstanceIndex(),\r\n            q.CommittedPrimitiveIndex(),\r\n            q.CommittedGeometryIndex(),\r\n            q.CommittedRayT(),\r\n            q.CommittedTriangleBarycentrics(),\r\n            q.CommittedTriangleFrontFace() );\r\n    }\r\n    else \/\/ COMMITTED_NOTHING\r\n         \/\/ From template specialization,\r\n         \/\/ COMMITTED_PROCEDURAL_PRIMITIVE can't happen.\r\n    {\r\n        \/\/ Do miss shading\r\n        MyMissColorCalculation(\r\n            q.WorldRayOrigin(),\r\n            q.WorldRayDirection());\r\n    }\r\n    ...\r\n}\r\n<\/code><\/pre>\n<\/div>\n<h4>Motivation<\/h4>\n<\/div>\n<p>Inline raytracing gives developers the option to drive more of the raytracing process.\u00a0 As opposed to handing work scheduling entirely to the system.\u00a0 This could be useful for many reasons:<\/p>\n<ul>\n<li>Perhaps the developer knows their scenario is simple enough that the overhead of dynamic shader scheduling is not worthwhile. For example a well constrained way of calculating shadows.<\/li>\n<\/ul>\n<ul>\n<li>It could be convenient\/efficient to query an acceleration structure from a shader that doesn\u2019t support dynamic-shader-based rays.\u00a0 Like a compute shader.<\/li>\n<\/ul>\n<ul>\n<li>It might be helpful to combine dynamic-shader-based raytracing with the inline form. Some raytracing shader stages, like intersection shaders and any hit shaders, don\u2019t even support tracing rays via dynamic-shader-based raytracing.\u00a0 But the inline form is available everywhere.<\/li>\n<\/ul>\n<ul>\n<li>Another combination is to switch to the inline form for simple recursive rays.\u00a0 This enables the app to declare there is no recursion for the underlying raytracing pipeline, given inline raytracing is handling recursive rays.\u00a0\u00a0The simpler dynamic scheduling burden on the system might yield better efficiency.\u00a0 This trades off against the large state footprint in shaders that use inline raytracing.<\/li>\n<\/ul>\n<p>The basic assumption is that scenarios with many complex shaders will run better with dynamic-shader-based raytracing.\u00a0 As opposed to using massive inline raytracing uber-shaders. And scenarios that would use a very minimal shading complexity and\/or very few shaders might run better with inline raytracing.<\/p>\n<p>Where to draw the line between the two isn\u2019t obvious in the face of varying implementations.\u00a0 Furthermore, this basic framing of extremes doesn\u2019t capture all factors that may be important, such as the impact of ray coherence.\u00a0 Developers need to test real content to find the right balance among tools, of which inline raytracing is simply one.<\/p>\n<hr \/>\n<h3><a id=\"executeindirect\"><\/a>DispatchRays() calls via\u00a0ExecuteIndirect()<\/h3>\n<p>(<a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#executeindirect\">link to spec<\/a>)<\/p>\n<p>This enables shaders on the GPU to generate a list of <code>DispatchRays()<\/code> calls, including their individual parameters like thread counts, shader table settings and other root parameter settings.\u00a0 The list can then execute without an intervening round-trip back to the CPU.<\/p>\n<p>This could help with adaptive raytracing scenarios like shader-based culling \/ sorting \/ classification \/ refinement.\u00a0 Basically, scenarios that prepare raytracing work on the GPU and then immediately spawn it.<\/p>\n<hr \/>\n<h3><a id=\"addtostateobject\"><\/a>Growing state objects via\u00a0AddToStateObject()<\/h3>\n<p>(<a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#incremental-additions-to-existing-state-objects\">link to spec<\/a>)<\/p>\n<p>Suppose a raytracing pipeline has 1000 shaders.\u00a0 As a result of world streaming, upcoming rendering needs to add more shaders periodically.\u00a0 Consider the task of just adding one shader to the 1000:\u00a0 Without <code>AddToStateObject()<\/code>, a new raytracing pipeline would have to be created with 1001 shaders, including the CPU overhead of the system parsing and validating 1001 shaders even though 1000 of them had been seen earlier.<\/p>\n<p>That\u2019s clearly wasteful, so it\u2019s more likely the app would just not bother streaming shaders.\u00a0 Instead it would create the worst-case fully populated raytracing pipeline, with a high up-front cost. \u00a0Certainly, precompiled <a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#collection-state-object\">collection state objects<\/a> can help avoid much of the driver overhead of reusing existing shaders.\u00a0 But the D3D12 runtime still parses the full state object being created out of building blocks, mostly to verify it\u2019s correctness.<\/p>\n<p>With <code>AddToStateObject()<\/code>, a new state object can be made by adding shaders to an existing shader state object with CPU overhead proportional only to what is being added.<\/p>\n<p>It was deemed not worth the effort or complexity to support incremental deletion, i.e.\u00a0<code>DeleteFromStateObject()<\/code>.\u00a0 The time pressure on a running app to shrink state objects is likely lower than being able to grow quickly.\u00a0 After all, rendering can go on even with too many shaders lying around.\u00a0 This also assumes it is unlikely that having too many shaders becomes a memory footprint problem.<\/p>\n<p>Regardless, if an app finds it absolutely must shrink state objects, there are options.\u00a0 For one, it can keep some previously created smaller pipelines around to start growing again.\u00a0 Or it can create the desired smaller state object from scratch, perhaps using existing collections as building blocks.<\/p>\n<hr \/>\n<h3><a id=\"additionalvertexformats\"><\/a>Additional vertex formats for acceleration structure build<\/h3>\n<p>(<a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#d3d12_raytracing_geometry_triangles_desc\">link to spec<\/a>)<\/p>\n<p>Acceleration structure builds support some additional input vertex formats:<\/p>\n<p><code>DXGI_FORMAT_R16G16B16A16_UNORM<\/code>\u00a0(A16 component is ignored, other data can be packed there, such as setting vertex stride to 6 bytes)<\/p>\n<p><code>DXGI_FORMAT_R16G16_UNORM<\/code>\u00a0(third component assumed 0)<\/p>\n<p><code>DXGI_FORMAT_R10G10B10A2_UNORM<\/code>\u00a0(A2 component is ignored, stride must be 4 bytes)<\/p>\n<p><code>DXGI_FORMAT_R8G8B8A8_UNORM<\/code>\u00a0(A8 component is ignored, other data can be packed there, such as setting vertex stride to 3 bytes)<\/p>\n<p><code>DXGI_FORMAT_R8G8_UNORM<\/code>\u00a0(third component assumed 0)<\/p>\n<p><code>DXGI_FORMAT_R8G8B8A8_SNORM<\/code>\u00a0(A8 component is ignored, other data can be packed there, such as setting vertex stride to 3 bytes)<\/p>\n<p><code>DXGI_FORMAT_R8G8_SNORM<\/code>\u00a0(third component assumed 0)<\/p>\n<hr \/>\n<h3><a id=\"geometryindex\"><\/a>GeometryIndex()\u00a0in raytracing shaders<\/h3>\n<p>(<a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#geometryindex\">link to spec<\/a>)<\/p>\n<p>The <code>GeometryIndex()<\/code> intrinsic is a convenience to allow shaders to distinguish geometries within bottom level acceleration structures.<\/p>\n<p>The other way geometries can be distinguished is by varying data in shader table records for each geometry.\u00a0 With <code>GeometryIndex()<\/code> the app is no longer forced to do this.<\/p>\n<p>In particular if all geometries share the same shader and the app doesn\u2019t want to put any per-geometry information in shader records, it can choose to set the <code>MultiplierForGeometryContributionToHitGroupIndex<\/code> parameter to <code>TraceRay()<\/code> to 0.<\/p>\n<p>This means that all geometries in a bottom level acceleration structure share the same shader record.\u00a0 In other words, the geometry index no longer factors into the fixed-function shader table indexing calculation.\u00a0 Then, if needed, shaders can use <code>GeometryIndex()<\/code> to index into the app&#8217;s own data structures.<\/p>\n<hr \/>\n<h3><a id=\"flags\"><\/a>Raytracing flags\/configuration tweaks<\/h3>\n<p>Added ray flags,\u00a0<code>RAY_FLAG_SKIP_TRIANGLES<\/code>\u00a0and\u00a0<code>RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES<\/code>. (<a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#ray-flags\">link to spec<\/a>)<\/p>\n<p>These flags, in addition to being available to individual raytracing calls, can also be globally declared via raytracing pipeline configuration.\u00a0 This behaves like OR\u2019ing the flags into every <code>TraceRay()<\/code> call in the raytracing pipeline. (<a href=\"https:\/\/github.com\/microsoft\/DirectX-Specs\/blob\/master\/d3d\/Raytracing.md#raytracing-pipeline-config1\">link to spec<\/a>)<\/p>\n<p>Implementations might make pipeline optimizations knowing that one of the primitive types can be skipped everywhere.<\/p>\n<hr \/>\n<h3><a id=\"support\"><\/a>Support<\/h3>\n<p>None of these features specifically require new hardware.\u00a0 Existing DXR Tier 1.0 capable devices can support Tier 1.1 if the GPU vendor implements driver support.<\/p>\n<p>Reach out to GPU vendors for their timelines for hardware and drivers.<\/p>\n<p>OS support begins with the latest Windows 10 Insider Preview Build and SDK Preview Build for <a href=\"https:\/\/docs.microsoft.com\/en-us\/windows-insider\/flight-hub\/#in-development-builds-of-windows-10-20h1\">Windows 10 (20H1)<\/a>\u00a0from the\u00a0<a href=\"https:\/\/insider.windows.com\/en-us\/\">Windows Insider Program<\/a>.\u00a0\u00a0The features that involve shaders require shader model 6.5 support which can be targeted by the latest\u00a0<a href=\"https:\/\/github.com\/microsoft\/DirectXShaderCompiler\">DirectX Shader Compiler<\/a>.\u00a0 Last but not least, PIX support for DXR Tier 1.1 is in the works.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>An overview of features in DXR Tier 1.1.<\/p>\n","protected":false},"author":8584,"featured_media":12651,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2523","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-directx"],"acf":[],"blog_post_summary":"<p>An overview of features in DXR Tier 1.1.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/2523","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/users\/8584"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/comments?post=2523"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/2523\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media\/12651"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media?parent=2523"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/categories?post=2523"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/tags?post=2523"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}