In Windows 10 1903, DRED 1.1 provided D3D12 developers with the ability to diagnose device removed events using GPU page fault data and automatic breadcrumbs. As a result, TDR debugging pain has been greatly reduced. Hooray! Unfortunately, developers still struggle to pinpoint which specific GPU workloads triggered the error. So, we’ve made a few tweaks in DRED in the Windows 10 20H1 Release Preview. Specifically, DRED 1.2 adds ‘Context Data’ to auto-breadcrumbs by integrating PIX marker and event strings into the auto-breadcrumb data. With context data, developers can more precisely determine where a GPU fault occurred. For example, instead of observing that a TDR occurs after the 71’st DrawInstanced call, the data can now indicate the fault occurred after the second DrawInstanced following the “BeginFoliage” PIX begin-event.
DRED 1.2 API’s
New D3D12 interfaces and data structures have been added to D3D12 to support DRED 1.2.
ID3D12DeviceRemovedExtendedDataSettings1
ID3D12DeviceRemovedExtendedDataSettings1 inherits from ID3D12DeviceRemovedExtendedDataSettings, adding a method for controlling DRED 1.2 breadcrumb context data.
void ID3D12DeviceRemovedExtendedDataSettings::SetBreadcrumbContextEnablement(D3D12_DRED_ENABLEMENT Enablement);
ID3D12DeviceRemovedExtendedData1
HRESULT ID3D12DeviceRemovedExtendedData1::GetAutoBreadcrumbsOutput1(D3D12_DRED_AUTO_BREADCRUMBS_OUTPUT1 *pOutput);
D3D12_DRED_AUTO_BREADCRUMBS_OUTPUT1
typedef struct D3D12_DRED_AUTO_BREADCRUMBS_OUTPUT1 { const D3D12_AUTO_BREADCRUMB_NODE1 *pHeadAutoBreadcrumbNode; } D3D12_DRED_AUTO_BREADCRUMBS_OUTPUT1;
pHeadAutoBreadcrumbsNode
Points to the head of a linked list of D3D12_AUTO_BREADCRUMB_NODE1 structures.
D3D12_AUTO_BREADCRUMB_NODE1
Almost identical to D3D12_AUTO_BREADCRUMB_NODE with additional members describing DRED 1.2 breadcrumb context data.
typedef struct D3D12_AUTO_BREADCRUMB_NODE1
{
const char *pCommandListDebugNameA;
const wchar_t *pCommandListDebugNameW;
const char *pCommandQueueDebugNameA;
const wchar_t *pCommandQueueDebugNameW;
ID3D12GraphicsCommandList *pCommandList;
ID3D12CommandQueue *pCommandQueue;
UINT BreadcrumbCount;
const UINT *pLastBreadcrumbValue;
const D3D12_AUTO_BREADCRUMB_OP *pCommandHistory;
const struct D3D12_AUTO_BREADCRUMB_NODE1 *pNext;
UINT BreadcrumbContextsCount;
D3D12_DRED_BREADCRUMB_CONTEXT *pBreadcrumbContexts;
} D3D12_AUTO_BREADCRUMB_NODE1;
BreadcrumbContextsCount
Number of D3D12_DRED_BREADCRUMB_CONTEXT elements in the array pointed to by pBreadcrumbContexts.
pBreadcrumbContexts
Pointer to an array of D3D12_DRED_BREADCRUMB_CONTEXT structures.
D3D12_DRED_BREADCRUMB_CONTEXT
Provides access to the context string associated with a command list op breadcrumb.
typedef struct D3D12_DRED_BREADCRUMB_CONTEXT { UINT BreadcrumbIndex; const wchar_t *pContextString; } D3D12_DRED_BREADCRUMB_CONTEXT;
BreadcrumbIndex
Index of the command list operation in the command history of the associated command list. The command history is the array pointed to by the pCommandHistory member of the D3D12_AUTO_BREADCRUMB_NODE1 structure.
pContextString
Pointer to the null-terminated wide-character context string.
Accessing DRED 1.2 Context Data in Code
Use the ID3D12DeviceRemovedExtendedDataSettings1 interface to enable DRED before creating the device:
CComPtr<ID3D12DeviceRemovedExtendedDataSettings1> pDredSettings; ThrowFailure(D3D12GetDebugInterface(IID_PPV_ARGS(&pDredSettings))); pDredSettings->SetAutoBreadcrumbsEnablement(D3D12_DRED_ENABLEMENT_FORCED_ON); pDredSettings->SetBreadcrumbContextEnablement(D3D12_DRED_ENABLEMENT_FORCED_ON); pDredSettings->SetPageFaultEnablement(D3D12_DRED_ENABLEMENT_FORCED_ON);
After a device removed event, use the ID3D12DeviceRemovedExtendedData1::GetAutoBreadcrumbsOutput1 method to access DRED 1.2 auto-breadcrumb data.
CComPtr<ID3D12DeviceRemovedExtendedData1> pDred; ThrowFailure(m_pDevice->QueryInterface(&pDred)); D3D12_DRED_AUTO_BREADCRUMBS_OUTPUT1 AutoBreadcrumbsOutput; ThrowFailure(pDred->GetAutoBreadcrumbsOutput1(&AutoBreadcrumbsOutput));
Post-mortem Debugging
The DRED data can be accessed in a user-mode debugger without requiring the application to log the DRED output. To support this, we’ve implemented a DRED open-source debugger extension on GitHub. This extension has been updated to support DRED 1.2 breadcrumb context data.
More details on how to use the debugger extension can be found in this README.md in the GitHub repositiory.
Force Enabling/Disabling DRED
Developers are no longer required to instrument application code to take advantage of DRED. Instead, DRED can now be forced on or off using D3DConfig.exe.
D3DConfig.exe is a new console application in Windows 10 20H1 Release Preview that gives extended control over traditional DirectX Control Panel settings. More details about D3DConfig can be found here.
To set an application to use d3dconfig/dxcpl settings use:
> d3dconfig apps --add myd3d12app.exe apps ---------------- myd3d12app.exe
Note, this is identical to opening the DirectX Control panel and adding “myd3d12app.exe” to the executable list.
To view the current DRED settings use:
> d3dconfig dred dred ---------------- auto-breadcrumbs=system-controlled breadcrumb-contexts=system-controlled page-faults=system-controlled watson-dumps=system-controlled
To force DRED page-faults on use:
> d3dconfig dred page-faults=forced-on dred ---------------- page-faults=forced-on
It may be more useful to simply enable all DRED features:
> d3dconfig dred --force-on-all dred ---------------- auto-breadcrumbs=forced-on breadcrumb-contexts=forced-on page-faults=forced-on watson-dumps=forced-on
The End of Mysterious TDR’s Forever?
No. Unfortunately, there are still many device removal event bugs that DRED analysis may not help solve, including driver bugs or app bugs that can result in GPU errors in non-deterministic ways. For example, hardware might prefetch from an invalid data-static descriptor, triggering device removal at some point before the first operation that accesses that descriptor. While this would likely produce auto-breadcrumb results, the location of the error could be misleading.
We plan to continue making TDR debugging improvements. As such, we would like to know if you’ve discovered a TDR-causing bug that was missed by the Debug Layer, GPU-Based Validation, PIX and DRED.
0 comments