D3D12 Preview: Mesh Nodes in Work Graphs

Hi everyone,

Today we’re excited to announce the newest Preview Agility SDK v1.715.0, download here.

Since this is a preview SDK, the new features are not in final state and thus only available via Developer Mode within Windows (see the prerequisites section for more). This preview serves as a first look to explore what might be possible to ship in a future retail Agility SDK. We encourage developers to try out the feature and provide feedback to askd3dteam@microsoft.com which may influence the final design.

– DirectX team

Overview
Specification <—— the docs
Drivers and other prerequisites <—— get running on AMD/NVIDIA GPUs or WARP (CPU)
Programming guide
Samples
PIX

Overview

Work graphs was initially released (see here) with graph nodes limited to compute-like shaders. While certainly a useful way to perform GPU driven work as-is, some app scenarios could also benefit from the ability to drive the rasterizer directly from a work graph. This preview release introduces such a capability.

Outside of work graphs, D3D12 has always had ExecuteIndirect as a way to do GPU driven work as well. That has always supported driving graphics, albeit with the limitation on PC that there was no way to make GPU driven changes to what program (aka pipeline state object / PSO) is being used to rasterize. Extending ExecuteIndirect’s capabilities in significant ways like this has historically proven difficult though. It turned out that the first opportunity to expose GPU driven selection of graphics programs on PC is using work graphs.

This preview exposes the ability to define mesh nodes at the leaves of a work graph. A mesh node is a program containing a mesh launch node shader (described next), combined with all the rest of the optional states needed to define a graphics pipeline, such as pixel shader, rasterizer state, blend state etc. This way of bundling of state, the equivalent of PSOs in state objects, was shipped already at the same time as work graphs. These PSO equivalents referred to as generic programs. Now, mesh launch node shaders can be part of generic programs, which can then be used as leaves in a work graph.

A mesh launch node shader is like a hybrid of a mesh shader and the existing broadcasting launch nodes in work graphs. Each input record to the node generates a grid of thread group invocations, where like a mesh shader each thread group can output a local mesh description, vertices, primitive data and indices, to drive rasterization, pixel shader invocations etc.

Additional commentary

Programs (aka PSOs) that start with a vertex shader, essentially what Draw*() calls normally invoke, are not supported in work graphs. It’s certainly possible to support them – in fact the spec proposes how they might look. For now this just isn’t practical to build.

Furthermore, mesh nodes appear to be a more natural fit for work graphs, not needing the complexity of input assembler state and vertex bindings. If there’s a potential benefit for apps in supporting draw style pipeline in work graphs today, that value might also diminish over the time horizon when more apps can start to use work graphs at all (maturity of the feature, market size).

In contrast, mesh nodes should become increasingly useful over time, with standalone mesh shaders starting to gain traction with apps. Exposing mesh nodes could lead to eventually deconstructing rasterization into smaller parts addressable from shaders and work graphs.

Some food for thought: By way of comparison to ExecuteIndirect, it’s worth pointing out an odd work graph topology that might actually be of use. Imagine a work graph containing nothing but independent mesh nodes, where each node is a standalone graph entrypoint. The result is the app’s input buffer(s) driving this graph would look a bit like an ExecuteIndirect argument buffer. But with the added ability for inputs (or groups of inputs) to each select which mesh + rasterization pipeline to use from a palette of options.

By default, graph execution is unordered (unlike ExecuteIndirect), but for the preview there is the option to request ordered rasterization. In the spec see Graphics nodes with ordered rasterization. It’s too early to know the long-term potential for this kind of scenario (with or without ordering).

Certainly, a more obvious use of mesh nodes is to use them in a proper graph, with compute nodes to do tasks like culling and binning to directly feed into different mesh nodes for rasterization.

Overall, keep in mind this preview is just the start of a journey. Expect the process of driver and hardware improvement around performance to take quite some time, aided in part by learnings from your early use of the features. Based on bring-up testing so far, the current drivers with mesh nodes support already work reasonably well at least from a functional standpoint.

Specification

Within the Work graphs spec the Mesh nodes section is starting place for information, as well as Graphics nodes and subsequent sections, which cover the more general semantics of having graphics in a work graph.

The following sections show how to get started and skim through basic concepts in the form of a small programming guide.

Drivers and other prerequisites

Driver Support

AMD: AMD Software: Adrenalin Edition™ Knowledge Base 24.10.30.01 driver for Windows has support for the Mesh Nodes in Work Graphs on AMD Radeon™ RX 7000 Series graphics cards and can be downloaded here. Support for Mesh Nodes in Work Graphs in our public Adrenalin drivers will hit the shelves in Q3 2024. Visit the GPUOpen blog post here to get all the details and samples for this preview release of Mesh Nodes in Work Graphs.
NVIDIA: For NVIDIA drivers, please reach out to your developer engagement representative.
Intel: Developers interested in this feature should contact Intel Developer Relations for additional details.
Qualcomm: Future support is planned.

Prerequisites:

A PC with any OS that supports the AgilitySDK
App set up to use AgilitySDK 1.715 and the DirectX Shader Compiler Mesh Nodes Preview Release
Developer mode enabled for Windows
- The sample project automatically downloads the nugets for these.
A GPU with corresponding driver installed that is advertised to support work graphs. See below.
Optional: Install the latest WARP preview build, which includes mesh nodes support. This software driver alternative could be handy for testing or development without needing a supported GPU.

Programming Guide

The original work graphs blog and linked material there are a necessary baseline, describing both work graphs and generic programs, before continuing here to learn about mesh nodes. After that the following guide can make the most sense, along with the specifics about mesh nodes in the spec.

Here is a minimal walkthrough that highlights relevant code snippets from the basic D3D12HelloMeshNodes sample.

Authoring shaders
Enabling mesh nodes preview in an app
Compiling shaders
Creating a state object
Preparing for work graph for execution
Extra assistance for drivers
Executing the work graph on command list

Authoring shaders

Shown below is the mesh node shader, which is the new shader type introduced to make mesh nodes possible. Mesh node shaders must be compiled with shader target lib_6_9. This doesn’t imply that mesh nodes will necessarily be part of final shader model 6.9, but just that the “experimental” version is always one version past the latest release shader model (6.8).

See the sample for the rest of the shaders in the graph, including root node and pixel shaders, all of which are authored using existing mechanisms, nothing new there.

The mesh node shader here inputs a record, and launches a fixed [1,1,1] grid. The grid size can be overridden at graph construction at the API, or it can be a dynamic grid instead, by inputting SV_DispatchGrid in the input record, just like with broadcasting launch nodes.

Notice the similarities with mesh shaders here as well, such as selecting an output topology and the shader outputting vertex data, primitive data and indices (topology). The original mesh shader spec is a useful resource for the parts that aren’t related to work graphs, as defined in the mesh nodes sections in the work graphs spec.

Not applicable to this simple sample is a [NodeMaxInputRecordsPerGraphEntryRecord(uint count,bool sharedAcrossNodeArray)] declaration, required for complex graphs with mesh nodes at least for now in preview. There are also similar declarations that needed on the D3D API when using mesh nodes as well. See Extra assistance for drivers further below.

struct MeshNodeRecord
{
    float4 position;
    float redChannel;
};

struct MeshOutVert
{
    float4 position : SV_POSITION;
};

struct MeshOutPrim
{
    float redChannel : RED;
};

[Shader("node")]
[NodeLaunch("mesh")]
[NumThreads(1,1,1)]
[NodeDispatchGrid(1,1,1)]
[NodeIsProgramEntry] // allow mesh nodes to also act as direct program entry (for fun)
[OutputTopology("triangle")]
void Mesh(
    DispatchNodeInputRecord<MeshNodeRecord> input,
    out vertices MeshOutVert verts[3],
    out primitives MeshOutPrim prim[1],
    out indices uint3 idx[1]
)
{
    SetMeshOutputCounts(3, 1);
    float4 center = input.Get().position;
    verts[0].position = center + float4(0, 0.1f, 0, 0);
    verts[1].position = center + float4(0.1f, -0.1f, 0, 0);
    verts[2].position = center + float4(-0.1f, -0.1f, 0, 0);
    prim[0].redChannel = input.Get().redChannel;
    idx[0] = uint3(0, 1, 2);
}

Enabling mesh nodes preview in an app

Before creating a D3D device, enable two experimental features as shown from the sample:

UUID Features[2] =
        {D3D12ExperimentalShaderModels, D3D12StateObjectsExperiment};

HRESULT hr = D3D12EnableExperimentalFeatures(_countof(Features), Features,
                                             nullptr, nullptr);

Once a D3D device has been created, double check that mesh nodes are supported, via work graphs tier 1.1. From the sample:

D3D12_FEATURE_DATA_D3D12_OPTIONS_EXPERIMENTAL Options;
VERIFY_SUCCEEDED(Ctx.spDevice->CheckFeatureSupport(
    D3D12_FEATURE_D3D12_OPTIONS_EXPERIMENTAL, &Options, sizeof(Options)));
    D3D12_FEATURE_DATA_D3D12_OPTIONS21 Options;

ThrowIfFailed(m_device->CheckFeatureSupport(
    D3D12_FEATURE_D3D12_OPTIONS21, &Options, sizeof(Options)));

if (Options.WorkGraphsTier < D3D12_WORK_GRAPHS_TIER_1_1) {
    OutputDebugStringA(
        "Device does not report support for work graphs tier 1.1 (mesh nodes).");
    ThrowIfFailed(E_FAIL);
}

Compiling shaders

The sample code shown here illustrates compiling shaders at runtime, though the shaders could be precompiled instead of course.

Notice that node shaders, including mesh launch nodes come from libraries compiled via lib_6_9. By contrast pixel shaders are compiled via ps_6_* (non-lib target), as compiling these shaders in lib target isn’t supported for use in generic programs.

ComPtr<ID3DBlob> pixelShader;
ComPtr<ID3DBlob> pixelShader2;
ComPtr<ID3DBlob> libShaders;

// Compile shaders
        
// Pixel shaders must be compiled via non-lib shader target, e.g. ps_6_0:
ThrowIfFailed(CompileDxilLibraryFromFile(
   GetAssetFullPath(L"shaders.hlsl").c_str(), 
    L"PSMain", L"ps_6_0", nullptr, 0, nullptr, 0, &pixelShader));

ThrowIfFailed(CompileDxilLibraryFromFile(
    GetAssetFullPath(L"shaders.hlsl").c_str(), 
    L"PSMain2", L"ps_6_0", nullptr, 0, nullptr, 0, &pixelShader2));

LPCWSTR cDefines[] = { 
    L"-D LIB_TARGET", 
    L"-select-validator internal", 
    L"-enable-16bit-types"};

// Node shaders use lib target, lib_6_9 here for mesh node support:
ThrowIfFailed(CompileDxilLibraryFromFile(
    GetAssetFullPath(L"shaders.hlsl").c_str(), 
    nullptr, L"lib_6_9", cDefines, _countof(cDefines), nullptr, 0, &libShaders));

ThrowIfFailed(m_device->CreateRootSignatureFromSubobjectInLibrary(0, 
    libShaders->GetBufferPointer(), 
    libShaders->GetBufferSize(), 
    L"MeshNodesGlobalRS", 
            IID_PPV_ARGS(&m_globalRootSignature)));

Creating a state object

Basic configuration and adding shader libs

In this initial state object setup, notice the pixel shaders being imported as if they are dxil libraries, even though they technically aren’t libraries. From the state object point of view non-lib shaders simply appear as a library that happens to have just one entrypoint.

CD3DX12_STATE_OBJECT_DESC SODesc;
SODesc.SetStateObjectType(D3D12_STATE_OBJECT_TYPE_EXECUTABLE);

// Work graphs with mesh nodes need to use graphics global root arguments
// (as opposed to compute):
auto pSOConfig = SODesc.CreateSubobject<CD3DX12_STATE_OBJECT_CONFIG_SUBOBJECT>();
pSOConfig->SetFlags(
    D3D12_STATE_OBJECT_FLAG_WORK_GRAPHS_USE_GRAPHICS_STATE_FOR_GLOBAL_ROOT_SIGNATURE);

// Add global root signature
auto pGlobalRootSig = 
    SODesc.CreateSubobject<CD3DX12_GLOBAL_ROOT_SIGNATURE_SUBOBJECT>();
pGlobalRootSig->SetRootSignature(m_globalRootSignature.Get());

// Add DXIL library with node shaders and local root signature definition
auto pLib = SODesc.CreateSubobject<CD3DX12_DXIL_LIBRARY_SUBOBJECT>();
CD3DX12_SHADER_BYTECODE bcLib(libShaders.Get());
pLib->SetDXILLibrary(&bcLib);
pLib->DefineExport(L"Root");
pLib->DefineExport(L"Mesh");
pLib->DefineExport(L"MeshNodesLocalRS");

// Add pixel shaders
auto pPS = SODesc.CreateSubobject<CD3DX12_DXIL_LIBRARY_SUBOBJECT>();
CD3DX12_SHADER_BYTECODE bcPS(pixelShader.Get());
pPS->SetDXILLibrary(&bcPS); // by not listing exports, 
                            // just taking whatever is in the library

auto pPS2 = SODesc.CreateSubobject<CD3DX12_DXIL_LIBRARY_SUBOBJECT>();
CD3DX12_SHADER_BYTECODE bcPS2(pixelShader2.Get());
pPS2->SetDXILLibrary(&bcPS2);

Root signature associations

Root signatures are associated directly with shaders, as opposed to other types of subobjects (like primitive topology, rasterizer state etc.) shown in the next section, which are listed in the definition of a generic program.

// Not specifying associations for the global root signature,
// which simply makes the above global root signature act
// as a default that gets associated to all exports.

// Associate local root signature with pixel shaders 
// (the only shaders that reference it)
auto pLocalRSAssoc = 
    SODesc.CreateSubobject<CD3DX12_DXIL_SUBOBJECT_TO_EXPORTS_ASSOCIATION>();
pLocalRSAssoc->SetSubobjectNameToAssociate(L"MeshNodesLocalRS");
// Could omit the lines below and it would associate with all exports,
// which would be just fine in this sample
pLocalRSAssoc->AddExport(L"PSMain");
pLocalRSAssoc->AddExport(L"PSMain2");

Sububoject building blocks

Define some subobjects to be used by the programs for each mesh node. Not all subobjects are defined here for simplicity, relying on defaults. See a full listing of subobject options here in the spec, including defaults for missing ones.

// Add necessary building block subobjects for the mesh nodes
auto pPrimitiveTopology = 
    SODesc.CreateSubobject<CD3DX12_PRIMITIVE_TOPOLOGY_SUBOBJECT>();
pPrimitiveTopology->SetPrimitiveTopologyType(
    D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE);
auto pRTFormats = 
    SODesc.CreateSubobject<CD3DX12_RENDER_TARGET_FORMATS_SUBOBJECT>();
pRTFormats->SetNumRenderTargets(1);
pRTFormats->SetRenderTargetFormat(0, DXGI_FORMAT_R8G8B8A8_UNORM);

Generic program definitions

Here a couple of program definitions are made using the shaders and subobjects described above.

// Define generic programs out of the building blocks:
// name, list of shaders, list of subobjects:
// (Can define multiple generic programs in the same state object, 
// each picking the building blocks it wants)
auto pGenericProgram = 
    SODesc.CreateSubobject<CD3DX12_GENERIC_PROGRAM_SUBOBJECT>();
pGenericProgram->SetProgramName(L"myMeshNode0");
pGenericProgram->AddExport(L"Mesh");
pGenericProgram->AddExport(L"PSMain");
pGenericProgram->AddSubobject(*pPrimitiveTopology);
pGenericProgram->AddSubobject(*pRTFormats);

// Notice the root signature isn't added to the list here.  
// Root signatures are associated with shader exports directly, not programs.  

// Second mesh node definition with just a different pixel shader
auto pGenericProgram2 = 
    SODesc.CreateSubobject<CD3DX12_GENERIC_PROGRAM_SUBOBJECT>();
pGenericProgram2->SetProgramName(L"myMeshNode1");
pGenericProgram2->AddExport(L"Mesh");
pGenericProgram2->AddExport(L"PSMain2");
pGenericProgram2->AddSubobject(*pPrimitiveTopology);
pGenericProgram2->AddSubobject(*pRTFormats);

Work graph definition and finally creating state object

// Define a work graph
auto pWorkGraph = 
    SODesc.CreateSubobject<CD3DX12_WORK_GRAPH_SUBOBJECT>();
pWorkGraph->SetProgramName(L"myWorkGraph");

// Add root node to work graph
auto pRootNode = pWorkGraph->CreateShaderNode(L"Root");

// Add array of 3 nodes: {"MaterialID",0/1/2}

auto pMeshNode0 = 
    pWorkGraph->CreateCommonProgramNodeOverrides(L"myMeshNode0");
pMeshNode0->NewName({ L"Materials",0 });
pMeshNode0->LocalRootArgumentsTableIndex(0);

// Second node uses a different program "myMeshNode1" 
// and different local root argument data
auto pMeshNode1 = 
    pWorkGraph->CreateCommonProgramNodeOverrides(L"myMeshNode1");
pMeshNode1->NewName({ L"Materials",1 });
pMeshNode1->LocalRootArgumentsTableIndex(1);

// Third node uses the same program as the previous, 
// only difference being local root argument data
auto pMeshNode2 = 
    pWorkGraph->CreateCommonProgramNodeOverrides(L"myMeshNode1");
pMeshNode2->NewName({ L"Materials",2 });
pMeshNode2->LocalRootArgumentsTableIndex(2);

ThrowIfFailed(
    m_device->CreateStateObject(SODesc, 
        IID_PPV_ARGS(&m_stateObject)));

Preparing for work graph for execution

This code from the sample shows initial preparation – retrieving information about the work graph, allocating backing memory and local root argument storage and stashing it all in a helper container.

It also illustrates that before calling GetWorkGraphMemoryRequirements() there is a call to SetMaximumInputRecords(). This extra call is required for work graphs with mesh nodes. See Extra assistance for drivers further below.

// Define local root argument data
struct MeshNodesLocalStruct {
    float blueChannel;
};

MeshNodesLocalStruct LocalRootArgs[3] = { 0.f, 0.5f, 1.f };

UINT MaxInputRecords = 10; // max input records per DispatchGraph call
UINT MaxNodeInputs = 3; // max node inputs per DispatchGraph call 
                        // (that the records are split across)
InitWorkGraphContext(
    &m_workGraphContext, 
    m_stateObject.Get(), 
    L"myWorkGraph", 
    LocalRootArgs, 
    _countof(LocalRootArgs)*sizeof(MeshNodesLocalStruct), 
    MaxInputRecords,
    MaxNodeInputs);

Here is the implementation of the InitWorkGraphContext helper in the sample:

struct WorkGraphContext
{
    ComPtr<ID3D12WorkGraphProperties1> spWGProps;
    ComPtr<ID3D12Resource> spBackingMemory;
    D3D12_GPU_VIRTUAL_ADDRESS_RANGE BackingMemory = {};
    D3D12_PROGRAM_IDENTIFIER hWorkGraph = {};
    D3D12_WORK_GRAPH_MEMORY_REQUIREMENTS MemReqs = {};

    ComPtr<ID3D12Resource> spLocalRootArgumentsTable;
    D3D12_GPU_VIRTUAL_ADDRESS_RANGE_AND_STRIDE LocalRootArgumentsTable = {};

    UINT NumEntrypoints = 0;
    UINT NumNodes = 0;
    UINT WorkGraphIndex = 0;
    UINT MaxInputRecords = 0;
};

...

void D3D12HelloMeshNodes::InitWorkGraphContext(
    WorkGraphContext* pCtx, 
    ID3D12StateObject* pSO, 
    LPCWSTR pWorkGraphName,
    void* pLocalRootArgumentsTable,
    UINT LocalRootArgumentsTableSizeInBytes,
    UINT MaxInputRecords,
    UINT MaxNodeInputs)
{
    ComPtr<ID3D12StateObjectProperties1> spSOProps;
    pSO->QueryInterface(IID_PPV_ARGS(&spSOProps));
    pCtx->hWorkGraph = spSOProps->GetProgramIdentifier(pWorkGraphName);
    pSO->QueryInterface(IID_PPV_ARGS(&pCtx->spWGProps));
    pCtx->WorkGraphIndex = pCtx->spWGProps->GetWorkGraphIndex(pWorkGraphName);

    // Work graphs with mesh nodes require the max number of input records that will 
    // be sent to a given DispatchGraph() call to be specified before retrieving 
    // backing memory
    pCtx->spWGProps->SetMaximumInputRecords(
        pCtx->WorkGraphIndex, MaxInputRecords, MaxNodeInputs);
    pCtx->MaxInputRecords = MaxInputRecords;
    pCtx->MaxNodeInputs = MaxNodeInputs;

    pCtx->spWGProps->GetWorkGraphMemoryRequirements(
        pCtx->WorkGraphIndex, 
        &pCtx->MemReqs);

    pCtx->BackingMemory.SizeInBytes = pCtx->MemReqs.MaxSizeInBytes;

    MakeBuffer(
        &pCtx->spBackingMemory, 
        pCtx->BackingMemory.SizeInBytes, 
        D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS);

    pCtx->BackingMemory.StartAddress = 
        pCtx->spBackingMemory->GetGPUVirtualAddress();
    pCtx->NumEntrypoints = 
        pCtx->spWGProps->GetNumEntrypoints(pCtx->WorkGraphIndex);
    pCtx->NumNodes = pCtx->spWGProps->GetNumNodes(pCtx->WorkGraphIndex);

   if (pLocalRootArgumentsTable && LocalRootArgumentsTableSizeInBytes)
   {
        pCtx->LocalRootArgumentsTable.SizeInBytes = 
            LocalRootArgumentsTableSizeInBytes;

        MakeBufferAndInitialize(
            &pCtx->spLocalRootArgumentsTable, 
            pLocalRootArgumentsTable,
            pCtx->LocalRootArgumentsTable.SizeInBytes);

        pCtx->LocalRootArgumentsTable.StartAddress = 
            pCtx->spLocalRootArgumentsTable->GetGPUVirtualAddress();
        pCtx->LocalRootArgumentsTable.StrideInBytes = sizeof(UINT);
    }
}

Extra assistance for drivers

See Helping mesh nodes work better on some hardware in the spec.

There are various declarations described there that may be required when using mesh nodes, some at the API and one as a shader annotation in mesh nodes. Which declarations the app needs to make depends on graph complexity. The simple sample shown here doesn’t happen to need most of them.

For complex graphs (definition of which is in the spec) the numbers being declared can be difficult for an application to define without being too conservative. This particularly applies to the [NodeMaxInputRecordsPerGraphEntryRecord(uint count,bool sharedAcrossNodeArray)] shader attribute, not needed in this simple sample, encoding maximum record counts that could arrive at mesh nodes.

Dynamic work expansion can result in extremely large worst-case numbers, which implies that an advanced mesh node implementation really shouldn’t care about them. The hope is over the course of the mesh nodes preview the driver implementations that depend on this assistance will mature beyond needing this help. Ideally when mesh nodes exit preview into official release these declarations can be removed, at least the most awkward ones like the HLSL attribute.

Executing the work graph on command list

This code binds state on the command list, initializes input records and kicks off the work graph.

It shows records being sent to the root of the graph as well as directly to the leaf mesh nodes (as the leaves were marked as entrypoints in the graph, for fun).

In this example the records were sent to the graph via CPU data, but the data could have alternatively been sent via GPU memory.

// Set necessary state.
m_commandList->SetGraphicsRootSignature(m_globalRootSignature.Get());
m_commandList->RSSetScissorRects(1, &m_scissorRect);

CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(
    m_rtvHeap->GetCPUDescriptorHandleForHeapStart(), 
    m_frameIndex, 
    m_rtvDescriptorSize);
m_commandList->OMSetRenderTargets(1, &rtvHandle, FALSE, nullptr);

m_commandList->RSSetViewports(1, &m_viewport);

// Record commands.
const float clearColor[] = { 0.0f, 0.2f, 0.4f, 1.0f };
m_commandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr);

float greenChannel = 0.5f;
m_commandList->SetGraphicsRoot32BitConstant(0, *(UINT*)&greenChannel, 0);

D3D12_SET_PROGRAM_DESC SP;
SP.Type = D3D12_PROGRAM_TYPE_WORK_GRAPH;
SP.WorkGraph.BackingMemory = m_workGraphContext.BackingMemory;
SP.WorkGraph.Flags = D3D12_SET_WORK_GRAPH_FLAG_INITIALIZE;
SP.WorkGraph.ProgramIdentifier = m_workGraphContext.hWorkGraph;
SP.WorkGraph.NodeLocalRootArgumentsTable = 
    m_workGraphContext.LocalRootArgumentsTable;
m_commandList->SetProgram(&SP);

struct BinningRecord
{
    uint16_t materialID;
    float position[4];
    float redChannel;
};

struct MeshNodeRecord
{
    float position[4];
    float redChannel;
};

std::vector<BinningRecord> rootInputRecord;

// Send some records to the root of the graph

rootInputRecord.push_back({0, {-0.5f, 0.2f, 0.0f, 1.0f}, 0.0f});
rootInputRecord.push_back({0, {-0.5f, 0.0f, 0.0f, 1.0f}, 0.5f});
rootInputRecord.push_back({0, {-0.5f, -0.2f, 0.0f, 1.0f}, 1.0f});
UINT totalRecords = (UINT)rootInputRecord.size();

std::vector<D3D12_NODE_CPU_INPUT> multiNodeInput(3);
assert(m_workGraphContext.spWGProps->GetEntrypointIndex(
    m_workGraphContext.WorkGraphIndex, 
    { L"Root",0 }) == 0);

multiNodeInput[0].EntrypointIndex = 0;
multiNodeInput[0].NumRecords = (UINT)rootInputRecord.size();
multiNodeInput[0].pRecords = rootInputRecord.data();
multiNodeInput[0].RecordStrideInBytes = sizeof(BinningRecord);

std::vector<MeshNodeRecord> meshInputRecord[2];

// Send some records directly to a couple of the mesh nodes

meshInputRecord[0].push_back({ {0.f, 0.2f, 0.0f, 1.0f}, 0.f });
meshInputRecord[0].push_back({ {0.f, 0.0f, 0.0f, 1.0f}, 0.5f });
meshInputRecord[0].push_back({ {0.f, -0.2f, 0.0f, 1.0f}, 1.f });
totalRecords += (UINT)meshInputRecord[0].size();

meshInputRecord[1].push_back({ {0.5f, 0.2f, 0.0f, 1.0f}, 0.f });
meshInputRecord[1].push_back({ {0.5f, 0.0f, 0.0f, 1.0f}, 0.5f });
meshInputRecord[1].push_back({ {0.5f, -0.2f, 0.0f, 1.0f}, 1.0f });
totalRecords += (UINT)meshInputRecord[1].size();

for (UINT i = 0; i < 2; i++)
{
    assert(m_workGraphContext.spWGProps->GetEntrypointIndex(
        m_workGraphContext.WorkGraphIndex, 
        { L"Materials",i+1 }) == i+2);
    multiNodeInput[i+1].EntrypointIndex = i+2;
    multiNodeInput[i+1].NumRecords = (UINT)meshInputRecord[i].size();
    multiNodeInput[i+1].pRecords = meshInputRecord[i].data();
    multiNodeInput[i+1].RecordStrideInBytes = sizeof(MeshNodeRecord);
}

assert(totalRecords <= m_workGraphContext.MaxInputRecords);

// Kick off work graph

D3D12_DISPATCH_GRAPH_DESC DG;
DG.Mode = D3D12_DISPATCH_MODE_MULTI_NODE_CPU_INPUT;
DG.MultiNodeCPUInput.NumNodeInputs = (UINT)multiNodeInput.size();
DG.MultiNodeCPUInput.pNodeInputs = multiNodeInput.data();
DG.MultiNodeCPUInput.NodeInputStrideInBytes = sizeof(D3D12_NODE_CPU_INPUT);

m_commandList->DispatchGraph(&DG);

ThrowIfFailed(m_commandList->Close());

Samples

Work graphs samples can be found at DirectX-Graphics-Samples on GitHub, under the Samples/Desktop/D3D12HelloWorld folder.

Relevant to this preview is the basic sample D3D12HelloMeshNodes that the above code is from: Samples/Desktop/D3D12HelloWorld/src/HelloMeshNodes

D3D12HelloMeshNodes sample

This sample is a tweak of the basic D3D12HelloTriangle sample that uses a work graphs with mesh nodes to draw a few triangles. Each triangle’s color (in each color channel) is determined by various properties of a work graph execution, such as selecting different mesh nodes, some with different pixel shaders, varying data in the data record passed to the nodes, varying per-node local root arguments, as well as global root argument bindings.

AMD has also created some samples and additional programming guide which can be found here.

AMD Mesh Nodes samples

PIX

As usual, PIX has day one support for mesh nodes in work graphs. Please see this PIX blog post for more information.

PIX capturing AMD mesh nodes sample

D3D12 Preview: Mesh Nodes in Work Graphs

Contents

Overview

Additional commentary

Specification

Drivers and other prerequisites

Driver Support

Prerequisites:

Programming Guide

Authoring shaders

Enabling mesh nodes preview in an app

Compiling shaders

Creating a state object

Basic configuration and adding shader libs

Root signature associations

Sububoject building blocks

Generic program definitions

Work graph definition and finally creating state object

Preparing for work graph for execution

Extra assistance for drivers

Executing the work graph on command list

Samples

PIX

Author

1 comment

Read next

The Silicon Graphics Media and Artificial Intelligence (SiGMA) team is hiring!

DirectX Adopting SPIR-V as the Interchange Format of the Future

Contents

Overview

Additional commentary

Specification

Drivers and other prerequisites

Driver Support

Prerequisites:

Programming Guide

Authoring shaders

Enabling mesh nodes preview in an app

Compiling shaders

Creating a state object

Basic configuration and adding shader libs

Root signature associations

Sububoject building blocks

Generic program definitions

Work graph definition and finally creating state object

Preparing for work graph for execution

Extra assistance for drivers

Executing the work graph on command list

Samples

PIX

Author

1 comment

Read next

The Silicon Graphics Media and Artificial Intelligence (SiGMA) team is hiring!

DirectX Adopting SPIR-V as the Interchange Format of the Future

Stay informed