DirectX Raytracing (DXR) Tier 1.1

Amar Patel


Real-time raytracing is still in its very early days, so unsurprisingly there is plenty of room for the industry to move forward.  Since the launch of DXR, the initial wave of feedback has resulted in a set of new features collectively named Tier 1.1.

An earlier blog post concisely summarizes these raytracing features along with other DirectX features coming at the same time.

This post discusses each new raytracing feature individually.  The DXR spec has the full definitions, starting with its Tier 1.1 summary.


Topics

Inline raytracing DispatchRays() calls via ExecuteIndirect() Growing state objects via AddToStateObject() Additional vertex formats for acceleration structure build GeometryIndex() in raytracing shaders Raytracing flags/configuration tweaks Support


Inline raytracing

(link to spec)

Inline raytracing is an alternative form of raytracing that doesn’t use any separate dynamic shaders or shader tables.  It is available in any shader stage, including compute shaders, pixel shaders etc. Both the dynamic-shading and inline forms of raytracing use the same opaque acceleration structures.

Inline raytracing in shaders starts with instantiating a RayQuery object as a local variable, acting as a state machine for ray query with a relatively large state footprint.  The shader interacts with the RayQuery object’s methods to advance the query through an acceleration structure and query traversal information.

The API hides access to the acceleration structure (e.g. data structure traversal, box, triangle intersection), leaving it to the hardware/driver.  All necessary app code surrounding these fixed-function acceleration structure accesses, for handling both enumerated candidate hits and the result of a query (e.g. hit vs miss), can be self-contained in the shader driving the RayQuery.

The RayQuery object is instantiated with optional ray flags as a template parameter.  For example in a simple shadow scenario, the shader may declare it only wants to visit opaque triangles and to stop traversing at the first hit.  Here, the RayQuery would be declared as:


    RayQuery<RAY_FLAG_CULL_NON_OPAQUE |
             RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES |
             RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH> myQuery;

This sets up shared expectations: It enables both the shader author and driver compiler to produce only necessary code and state.

Example

The spec contains some illustrative state diagrams and pseudo-code examples. The simplest of these examples is shown here:


RaytracingAccelerationStructure myAccelerationStructure : register(t3);

float4 MyPixelShader(float2 uv : TEXCOORD) : SV_Target0
{
    ...
    // Instantiate ray query object.
    // Template parameter allows driver to generate a specialized
    // implementation.
    RayQuery<RAY_FLAG_CULL_NON_OPAQUE |
             RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES |
             RAY_FLAG_ACCEPT_FIRST_HIT_AND_END_SEARCH> q;

    // Set up a trace.  No work is done yet.
    q.TraceRayInline(
        myAccelerationStructure,
        myRayFlags, // OR'd with flags above
        myInstanceMask,
        myRay);

    // Proceed() below is where behind-the-scenes traversal happens,
    // including the heaviest of any driver inlined code.
    // In this simplest of scenarios, Proceed() only needs
    // to be called once rather than a loop:
    // Based on the template specialization above,
    // traversal completion is guaranteed.
    q.Proceed();

    // Examine and act on the result of the traversal.
    // Was a hit committed?
    if(q.CommittedStatus()) == COMMITTED_TRIANGLE_HIT)
    {
        ShadeMyTriangleHit(
            q.CommittedInstanceIndex(),
            q.CommittedPrimitiveIndex(),
            q.CommittedGeometryIndex(),
            q.CommittedRayT(),
            q.CommittedTriangleBarycentrics(),
            q.CommittedTriangleFrontFace() );
    }
    else // COMMITTED_NOTHING
         // From template specialization,
         // COMMITTED_PROCEDURAL_PRIMITIVE can't happen.
    {
        // Do miss shading
        MyMissColorCalculation(
            q.WorldRayOrigin(),
            q.WorldRayDirection());
    }
    ...
}

Motivation

Inline raytracing gives developers the option to drive more of the raytracing process.  As opposed to handing work scheduling entirely to the system.  This could be useful for many reasons:

  • Perhaps the developer knows their scenario is simple enough that the overhead of dynamic shader scheduling is not worthwhile. For example a well constrained way of calculating shadows.
  • It could be convenient/efficient to query an acceleration structure from a shader that doesn’t support dynamic-shader-based rays.  Like a compute shader.
  • It might be helpful to combine dynamic-shader-based raytracing with the inline form. Some raytracing shader stages, like intersection shaders and any hit shaders, don’t even support tracing rays via dynamic-shader-based raytracing.  But the inline form is available everywhere.
  • Another combination is to switch to the inline form for simple recursive rays.  This enables the app to declare there is no recursion for the underlying raytracing pipeline, given inline raytracing is handling recursive rays.  The simpler dynamic scheduling burden on the system might yield better efficiency.  This trades off against the large state footprint in shaders that use inline raytracing.

The basic assumption is that scenarios with many complex shaders will run better with dynamic-shader-based raytracing.  As opposed to using massive inline raytracing uber-shaders. And scenarios that would use a very minimal shading complexity and/or very few shaders might run better with inline raytracing.

Where to draw the line between the two isn’t obvious in the face of varying implementations.  Furthermore, this basic framing of extremes doesn’t capture all factors that may be important, such as the impact of ray coherence.  Developers need to test real content to find the right balance among tools, of which inline raytracing is simply one.


DispatchRays() calls via ExecuteIndirect()

(link to spec)

This enables shaders on the GPU to generate a list of DispatchRays() calls, including their individual parameters like thread counts, shader table settings and other root parameter settings.  The list can then execute without an intervening round-trip back to the CPU.

This could help with adaptive raytracing scenarios like shader-based culling / sorting / classification / refinement.  Basically, scenarios that prepare raytracing work on the GPU and then immediately spawn it.


Growing state objects via AddToStateObject()

(link to spec)

Suppose a raytracing pipeline has 1000 shaders.  As a result of world streaming, upcoming rendering needs to add more shaders periodically.  Consider the task of just adding one shader to the 1000:  Without AddToStateObject(), a new raytracing pipeline would have to be created with 1001 shaders, including the CPU overhead of the system parsing and validating 1001 shaders even though 1000 of them had been seen earlier.

That’s clearly wasteful, so it’s more likely the app would just not bother streaming shaders.  Instead it would create the worst-case fully populated raytracing pipeline, with a high up-front cost.  Certainly, precompiled collection state objects can help avoid much of the driver overhead of reusing existing shaders.  But the D3D12 runtime still parses the full state object being created out of building blocks, mostly to verify it’s correctness.

With AddToStateObject(), a new state object can be made by adding shaders to an existing shader state object with CPU overhead proportional only to what is being added.

It was deemed not worth the effort or complexity to support incremental deletion, i.e. DeleteFromStateObject().  The time pressure on a running app to shrink state objects is likely lower than being able to grow quickly.  After all, rendering can go on even with too many shaders lying around.  This also assumes it is unlikely that having too many shaders becomes a memory footprint problem.

Regardless, if an app finds it absolutely must shrink state objects, there are options.  For one, it can keep some previously created smaller pipelines around to start growing again.  Or it can create the desired smaller state object from scratch, perhaps using existing collections as building blocks.


Additional vertex formats for acceleration structure build

(link to spec)

Acceleration structure builds support some additional input vertex formats:

DXGI_FORMAT_R16G16B16A16_UNORM (A16 component is ignored, other data can be packed there, such as setting vertex stride to 6 bytes)

DXGI_FORMAT_R16G16_UNORM (third component assumed 0)

DXGI_FORMAT_R10G10B10A2_UNORM (A2 component is ignored, stride must be 4 bytes)

DXGI_FORMAT_R8G8B8A8_UNORM (A8 component is ignored, other data can be packed there, such as setting vertex stride to 3 bytes)

DXGI_FORMAT_R8G8_UNORM (third component assumed 0)

DXGI_FORMAT_R8G8B8A8_SNORM (A8 component is ignored, other data can be packed there, such as setting vertex stride to 3 bytes)

DXGI_FORMAT_R8G8_SNORM (third component assumed 0)


GeometryIndex() in raytracing shaders

(link to spec)

The GeometryIndex() intrinsic is a convenience to allow shaders to distinguish geometries within bottom level acceleration structures.

The other way geometries can be distinguished is by varying data in shader table records for each geometry.  With GeometryIndex() the app is no longer forced to do this.

In particular if all geometries share the same shader and the app doesn’t want to put any per-geometry information in shader records, it can choose to set the MultiplierForGeometryContributionToHitGroupIndex parameter to TraceRay() to 0.

This means that all geometries in a bottom level acceleration structure share the same shader record.  In other words, the geometry index no longer factors into the fixed-function shader table indexing calculation.  Then, if needed, shaders can use GeometryIndex() to index into the app’s own data structures.


Raytracing flags/configuration tweaks

Added ray flags, RAY_FLAG_SKIP_TRIANGLES and RAY_FLAG_SKIP_PROCEDURAL_PRIMITIVES. (link to spec)

These flags, in addition to being available to individual raytracing calls, can also be globally declared via raytracing pipeline configuration.  This behaves like OR’ing the flags into every TraceRay() call in the raytracing pipeline. (link to spec)

Implementations might make pipeline optimizations knowing that one of the primitive types can be skipped everywhere.


Support

None of these features specifically require new hardware.  Existing DXR Tier 1.0 capable devices can support Tier 1.1 if the GPU vendor implements driver support.

Reach out to GPU vendors for their timelines for hardware and drivers.

OS support begins with the latest Windows 10 Insider Preview Build and SDK Preview Build for Windows 10 (20H1) from the Windows Insider Program.  The features that involve shaders require shader model 6.5 support which can be targeted by the latest DirectX Shader Compiler.  Last but not least, PIX support for DXR Tier 1.1 is in the works.

0 comments

Discussion is closed.

Feedback usabilla icon