October 22nd, 2019

A Look Inside D3D12 Resource State Barriers

Bill Kristiansen
Principal Developer

Many D3D12 developers have become accustomed to managing resource state transitions and read/write hazards themselves using the ResourceBarrier API. Prior to D3D12, such details were handled internally by the driver.  However, D3D12 command lists cannot provide the same deterministic state tracking as D3D10 and D3D11 device contexts.  Therefore, state transitions need to be scheduled during D3D12 command list recording. When used responsibly, applications are able to minimize GPU cache flushes and resource state changes. However, it can be tricky to properly leverage resource barriers for correct behavior while also keeping performance penalties low.

There are many questions posted online about why D3D12 resource barriers are needed and when to use them. The D3D12 documentation contains a good API-level description of resource barriers, and PIX and the D3D12 Debug Layer help developers iron out some of the confusion. Despite this, proper resource barrier management is a complex art.

In this post, I would like to take a peek under the hood of the resource state transition barrier and why implicit promotion and decay exist.

State Transitions Barriers

At a high level, a “resource state” is a description of how a GPU intends to access a resource. D3D12 developers can logically combine D3D12_RESOURCE_STATES flags to describe a given state, or combination of states. It is important to note that read-only states cannot be combined with write-states. For example, D3D12_RESOURCE_STATE_UNORDERED_ACCESS and D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE state flags cannot be combined.

When transitioning a resource from write state to a read state (or even to another write state), the expectation is that all preceding write operations have completed and that subsequent reads of the resource data reflect what was previously written. In some cases this can mean flushing a data cache. Additionally, some devices write data using a compressed layout but can only read from decompressed resource data. Therefore, a transition from a write-state to a read-state may also force a decompress operation. Note that not all devices are the same. In some cases the cache flushes or decompress operations are not necessary. This is one reason why the D3D12 Debug Layer can produce resource state errors when stuff appears to render just fine (“on my machine”).

Regardless of hardware caching and compression differences, if an operation writes data to a resource and a later operation reads that data, there must be a transition barrier to prevent the scheduler from executing both operations concurrently on the GPU. In fact, the reason for various different read states such as D3D12_RESOURCE_STATE_PIXEL_SHADER and D3D12_RESOURCE_STATE_NON_PIXEL_SHADER is to support transition scheduling later in the graphics pipeline. For example, a state transition from D3D12_RESOURCE_STATE_RENDER_TARGET to D3D12_RESOURCE_STATE_NON_PIXEL_SHADER will block all subsequent shader execution until the render target data is resolved and decompressed. On the other hand, transitioning to D3D12_RESOURCE_STATE_PIXEL_SHADER will only block subsequent pixel shader execution, allowing the vertex processing pipeline to run concurrently with render target resolve and decompress.

Resource State Promotion and Decay

This frequently-misunderstood feature exists to reduce unnecessary resource state transitions. Developers can completely ignore resource state promotion and decay, choosing instead to explicitly manage all resource state. However, doing so can have a significant impact on GPU scheduling.  So it may be worth taking the time to invest in promotion and decay in your resource state management system.

The official documentation on D3D12 Implicit State Transitions is a good place to start when trying to understand resource state promotion and decay, at least from an API level. What is important to understand is that these state transitions are truly *implicit*. In other words, neither the D3D12 runtime or drivers actively *do* anything to promote or decay a resource state. These are actually natural consequences of how GPU pipelines work in combination with resource layout.

Rules for D3D12_RESOURCE_STATE_COMMON

For any resource to be in the D3D12_RESOURCE_STATE_COMMON state it must: 1) Have no pending write operations, cache flushes or layout changes. 2) Have a layout that is intrinsically readable by any GPU operation.

Based on those rules, a resource in the D3D12_RESOURCE_STATE_COMMON does not require a state transition to be read from. Any GPU reads effectively “promote” the resource to the relevant read state.

ExecuteCommandLists

D3D12 specifications require that completion of ExecuteCommandLists must not have any outstanding work in flight, including cache flushes and resource layout changes. Note that this means there are behavioral differences between sequentially calling ExecuteCommandLists once per command list and calling a single ExecuteCommandLists with multiple command lists.

Since ExecuteCommandLists must have no outstanding resource writes or cache flushes, rule (1) above is fulfilled for *all* accessed resources once the ExecuteCommandLists operation has completed. Therefore, any resources that also meet rule (2) implicitly “decay” to D3D12_RESOURCE_STATE_COMMON.

Example

Say TextureA and TextureB are both in the D3D12_RESOURCE_STATE_COMMON state and are accessed in a pixel shader, promoting each texture to the D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE.

InitDrawWithTexturesAAndB(pCL);
pCL->Draw();

Next, the developer now wishes to start writing to TextureB as a UAV. Therefore, the developer must explicitly transition the state of TextureB from D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE to D3D12_RESOURCE_STATE_UNORDERED_ACCESS.  This tells the scheduler to complete all preceding pixel shader operations before transitioning TextureB to the UNORDERED_ACCESS state, which may now have a compressed layout.

TransitionResourceState(pCL, pTextureB, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_UNORDERED_ACCESS);
InitDispatchWithTextureB(pCL);
pCL->Dispatch();
pCL->Close();

ID3D12CommandList *ExecuteList[] = { pCL };
pQueue->ExecuteCommandLists(1, ExecuteList );

Upon completion of the ExecuteCommandLists workload, TextureA remains in the “common layout” and has no pending writes therefore TextureA is implicitly “decayed” back to D3D12_RESOURCE_STATE_COMMON according to the rules above. However, the state of TextureB cannot decay because the layout is no longer common as a result of transitioning into the UNORDERED_ACCESS state.

Buffers and Simultaneous-Access Textures

Buffers and simultaneous-access textures allow resources to be read from by multiple command queues concurrently, while at the same time be written to by no more than one additional command queue. Some details on the D3D12_RESOURCE_FLAG_ALLOW_SIMULTANEOUS_ACCESS resource flag can be found in the D3D12_RESOURCE_FLAGS API documentation.

Since buffers and simultaneous-access textures must be readable by all GPU operations and write operations must not change the layout, the state of these resources always implicitly “decays” to D3D12_RESOURCE_STATE_COMMON when no GPU work using these resources is in flight. In other words, the state and layout of buffers and simultaneous-access textures always meet D3D12_RESOURCE_STATE_COMMON rule (2) above.

Some Best Practices

Take advantage of COMMON state promotion and decay

  • You know you hate to leave good performance laying on the table.
  • If you make a mistake the debug layer has your back in most cases.

Use the AssertResourceState debug layer API’s

Avoid explicit transitions to D3D12_RESOURCE_STATE_COMMON.

  • A transition to the COMMON state is always a pipeline stall and can often induce a cache flush and decompress operation.
  • If such a transition is necessary, do it as late as possible.

Consider using split-barriers

  • A split-barrier lets a driver optimize scheduling of resource transition between specified begin and end points.

Batch ResourceBarrier Calls

  • Reduces DDI overhead

Avoid transitioning from one read state to another

  • It is okay to logically combine read states into a single state value.
  • D3D12_RESOURCE_STATE_GENERIC_READ is literally a bitwise-or of other READ state bits.

 

Category
DirectX

Author

Bill Kristiansen
Principal Developer

Principal Developer, Microsoft DirectX

0 comments

Discussion are closed.