{"id":2612,"date":"2019-11-08T10:36:49","date_gmt":"2019-11-08T18:36:49","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/directx\/?p=2612"},"modified":"2019-11-08T10:36:49","modified_gmt":"2019-11-08T18:36:49","slug":"coming-to-directx-12-mesh-shaders-and-amplification-shaders-reinventing-the-geometry-pipeline","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/directx\/coming-to-directx-12-mesh-shaders-and-amplification-shaders-reinventing-the-geometry-pipeline\/","title":{"rendered":"Coming to DirectX 12\u2014 Mesh Shaders and Amplification Shaders:\u00a0Reinventing the\u00a0Geometry Pipeline\u00a0\u00a0"},"content":{"rendered":"<hr \/>\n<p><span data-contrast=\"auto\">D3D12 <\/span><span data-contrast=\"auto\">is<\/span><span data-contrast=\"auto\">\u00a0add<\/span><span data-contrast=\"auto\">ing<\/span><span data-contrast=\"auto\">\u00a0two new shader stages: the Mesh Shader and the Amplification Shader. These additions will streamline the rendering pipeline, while simultaneously boosting flexibility and efficiency.\u00a0 In\u00a0<\/span><span data-contrast=\"auto\">this\u00a0<\/span><span data-contrast=\"auto\">new and improved pre-rasterization pipeline, Mesh and Amplification Shaders\u00a0<\/span><span data-contrast=\"auto\">will\u00a0<\/span><span data-contrast=\"auto\">optionally<\/span><span data-contrast=\"auto\">\u00a0replace\u00a0<\/span><span data-contrast=\"auto\">the section of the pipeline consisting of\u00a0<\/span><span data-contrast=\"auto\">the Input Assembler<\/span><span data-contrast=\"auto\">\u00a0as well as<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">Vertex<\/span><span data-contrast=\"auto\">, Geometry, Domain, and Hull Shaders\u00a0<\/span><span data-contrast=\"auto\">with richer and more\u00a0<\/span><span data-contrast=\"auto\">general purpose<\/span><span data-contrast=\"auto\">\u00a0capabilities<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-contrast=\"none\">This is possible through a reimagination of how geometry is processed.<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<hr \/>\n<h3>Topics<\/h3>\n<p><a href=\"#geo_pipe_now\">What does the geometry pipeline look like now?<\/a><\/p>\n<p><a href=\"#how_to_fix\">How can we fix it?<\/a><\/p>\n<p><a href=\"#MS_work\">How do Mesh Shaders work?<\/a><\/p>\n<p><a href=\"#AS_work\">What does an Amplification Shader do?<\/a><\/p>\n<p><a href=\"#what_is_meshlet\">What exactly is a meshlet?<\/a><\/p>\n<p><a href=\"#how_to_use_MS\">Now that I&#8217;m sold, how do I build a Mesh Shader?<\/a><\/p>\n<p><a href=\"#how_to_use_AS\">How to build an Amplification Shader<\/a><\/p>\n<p><a href=\"#calling_shaders\">Calling shaders in the runtime<\/a><\/p>\n<p><a href=\"#getting_started\">Getting Started<\/a><\/p>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"geo_pipe_now\"><\/a>What\u00a0<\/span><span data-contrast=\"none\">does the geometry pipeline look like now?<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"auto\">In current pipelines, geometry is processed whole. This means that for a mesh with hundreds of millions of triangles,\u00a0<\/span><span data-contrast=\"auto\">all the\u00a0<\/span><span data-contrast=\"auto\">v<\/span><span data-contrast=\"auto\">alues<\/span><span data-contrast=\"auto\">\u00a0in the index buffer<\/span><span data-contrast=\"auto\">\u00a0need to be processed\u00a0<\/span><span data-contrast=\"auto\">in order,<\/span><span data-contrast=\"auto\">\u00a0and\u00a0<\/span><span data-contrast=\"auto\">all the vertices of a triangle must be processed<\/span><span data-contrast=\"auto\">\u00a0before even culling can occur<\/span><span data-contrast=\"auto\">. Although not all geometry is that dense, we live in a world of increasing complexity, where users want more detail without sacrificing on speed. This means that a pipeline with a\u00a0<\/span><b><span data-contrast=\"auto\">linear bottleneck<\/span><\/b><span data-contrast=\"auto\">\u00a0like the index buffer is unsustainable.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Additionally, the process is rigid.\u00a0<\/span><span data-contrast=\"auto\">Because of the use of the index buffer, all index data must be 16 or 32 bits in size, and a single index value applies to all the vertex attributes at once.<\/span><span data-contrast=\"auto\">\u00a0Options for compressing geometry data are limited.\u00a0 Culling can be performed by software at the level of an entire draw call, or by hardware on a per-primitive basis only after all the vertices of a primitive have been shaded, but there are no in-between options. These are all\u00a0<\/span><span data-contrast=\"auto\">requirements that can limit how much a developer is able to do. For example,<\/span><span data-contrast=\"auto\">\u00a0what if you want to store separate bounding boxes for pieces of a larger mesh, then frustum cull each piece individually, or split up a mesh into groups of triangles that share similar normals, so an entire backfacing triangle group can be rejected up-front by a single test?\u00a0 How about moving per-triangle backface tests as early as possible in the geometry pipeline, which could allow skipping the cost of fetching vertex attributes for rejected triangles?\u00a0 Or implementing conservative animation-aware bounding box culling for small chunks of a mesh, which could run before the expensive skinning computations.\u00a0 With mesh shaders, these choices are entirely under you<\/span><span data-contrast=\"auto\">r<\/span><span data-contrast=\"auto\">\u00a0control.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"how_to_fix\"><\/a>How can we fix this?<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"auto\">In fact, we\u2019re not going to try. Mesh Shaders are not putting a band-aid onto a system that\u2019s\u00a0<\/span><span data-contrast=\"auto\">struggling to keep up<\/span><span data-contrast=\"auto\">.<\/span><span data-contrast=\"auto\">\u00a0Instead, they are reinventing the pipeline. By\u00a0<\/span><span data-contrast=\"auto\">using a compute\u00a0<\/span><span data-contrast=\"auto\">programming\u00a0<\/span><span data-contrast=\"auto\">model, the Mesh Shade<\/span><span data-contrast=\"auto\">r can process chunks of the mesh, which we call \u201cmeshlets\u201d, in parallel.\u00a0<\/span><span data-contrast=\"auto\">The threads that\u00a0<\/span><span data-contrast=\"auto\">process\u00a0<\/span><span data-contrast=\"auto\">each\u00a0<\/span><span data-contrast=\"auto\">meshlet can work together using groupshared memory to\u00a0<\/span><span data-contrast=\"auto\">read whatever format of input data\u00a0<\/span><span data-contrast=\"auto\">they choose\u00a0<\/span><span data-contrast=\"auto\">in whatever way they like<\/span><span data-contrast=\"auto\">, process the geometry, then output a small indexed primitive list<\/span><span data-contrast=\"auto\">. This means no\u00a0<\/span><span data-contrast=\"auto\">more linear\u00a0<\/span><span data-contrast=\"auto\">iterating through\u00a0<\/span><span data-contrast=\"auto\">the entire mesh, and\u00a0<\/span><span data-contrast=\"auto\">no limits imposed by the\u00a0<\/span><span data-contrast=\"auto\">more\u00a0<\/span><span data-contrast=\"auto\">rigid structure of previou<\/span><span data-contrast=\"auto\">s shader stages.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"MS_work\"><\/a>How do Mesh Shaders work?\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"none\">A Mesh Shader begins its work by dispatching a\u00a0<\/span><span data-contrast=\"none\">set of\u00a0<\/span><span data-contrast=\"none\">threadgroup<\/span><span data-contrast=\"none\">s, each of which processes a subset of the larger mesh<\/span><span data-contrast=\"none\">. Each\u00a0<\/span><span data-contrast=\"none\">threadgroup<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-contrast=\"none\">ha<\/span><span data-contrast=\"none\">s<\/span><span data-contrast=\"none\">\u00a0access to groupshared memory like compute shaders<\/span><span data-contrast=\"none\">, but\u00a0<\/span><span data-contrast=\"none\">output<\/span><span data-contrast=\"none\">s<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-contrast=\"none\">vertices and primitives<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-contrast=\"none\">that<\/span><span data-contrast=\"none\">\u00a0do not have to correlate with a specific thread in the group. As long a<\/span><span data-contrast=\"none\">s the threadgroup processes all vertices associated with the primitives in the threadgroup, resources can be allocated in whatever way is most effici<\/span><span data-contrast=\"none\">e<\/span><span data-contrast=\"none\">nt. Additionally, t<\/span><span data-contrast=\"none\">he Mesh Shader outputs\u00a0<\/span><b><span data-contrast=\"none\">both\u00a0<\/span><\/b><span data-contrast=\"none\">per-vertex and per-primitive attributes, which allows the user to be more precise and space efficient.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"AS_work\"><\/a>What does an Amplification Shader do<\/span><span data-contrast=\"none\">?<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"none\">While the Mesh Shader is a fairly flexible tool, it does not allow for all\u00a0<\/span><span data-contrast=\"none\">tessellation<\/span><span data-contrast=\"none\">\u00a0scenarios<\/span><span data-contrast=\"none\">\u00a0and is not always the most efficient way to implement per-instance culling<\/span><span data-contrast=\"none\">. For this we have the Amplification Shader. What it does is simple: dispatch\u00a0<\/span><span data-contrast=\"none\">threadgroups of\u00a0<\/span><span data-contrast=\"none\">Mesh Shaders.\u00a0<\/span><span data-contrast=\"none\">Each Mesh Shader has access to the data from the parent Amplification Shader and does not return anything. The Amplification Shader\u00a0<\/span><span data-contrast=\"none\">is optional, and a<\/span><span data-contrast=\"none\">lso has access to groupshared memory, making it a\u00a0<\/span><span data-contrast=\"none\">powerful tool to allow the Mesh Shader to replace any current pipeline scenario.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"what_is_meshlet\"><\/a>W<\/span><span data-contrast=\"none\">hat<\/span><span data-contrast=\"none\">\u00a0exactly<\/span><span data-contrast=\"none\">\u00a0is a Meshlet<\/span><span data-contrast=\"none\">?<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"auto\">A meshlet is a subset of a mesh created through an intentional partition of the geometry.\u00a0<\/span><span data-contrast=\"auto\">M<\/span><span data-contrast=\"auto\">eshlets\u00a0<\/span><span data-contrast=\"auto\">should be<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">somewhere in the range of 32 to around 200 vertices<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">\u00a0depending on the number of attributes<\/span><span data-contrast=\"auto\">,<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">and\u00a0<\/span><span data-contrast=\"auto\">will\u00a0<\/span><span data-contrast=\"auto\">have<\/span><span data-contrast=\"auto\">\u00a0as many shared vertices as possible<\/span><span data-contrast=\"auto\">\u00a0to allow for vertex re-use during rendering.<\/span><span data-contrast=\"auto\">\u00a0<\/span><span data-contrast=\"auto\">This partitioning will be\u00a0<\/span><span data-contrast=\"auto\">pre-computed and stored with the geometry to avoid computation at runtime<\/span><span data-contrast=\"auto\">,\u00a0<\/span><span data-contrast=\"auto\">unlike the current Input Assembler which must attempt to dynamically identify vertex reuse every time a mesh is drawn<\/span><span data-contrast=\"auto\">. Titles can convert meshlets into regular index buffers for vertex shader fallback if a device does not support Mesh Shaders.\u00a0<\/span><\/p>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"how_to_use_MS\"><\/a>Now that I\u2019m sold, how do I use this feature?<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"none\">Building a Mesh Shader is fairly simple.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">You must specify the number of threads in your thread group using\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">[ numthreads ( X, Y, Z ) ]<\/pre>\n<p><span data-contrast=\"none\">And the type of primitive being used with\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">[ outputtopology ( T ) ]<\/pre>\n<p><span data-contrast=\"none\">The Mesh Shader can take a number of system values as inputs, including <span class=\"lang:default decode:true crayon-inline \">SV_DispatchThreadID<\/span> , <span class=\"lang:default decode:true crayon-inline \">SV_GroupThreadID<\/span> , <span class=\"lang:default decode:true crayon-inline\">SV_ViewID<\/span>\u00a0<\/span><span data-contrast=\"none\">and more, but must output an array for vertices and one for primitives.\u00a0<\/span><span data-contrast=\"none\">These are the arrays that you will write to at the end of your computations.\u00a0<\/span><span data-contrast=\"none\">If the Mesh Shader is attached to an Amplification Shader, it must also have an input for the payload.\u00a0<\/span><span data-contrast=\"none\">The final requirement\u00a0<\/span><span data-contrast=\"none\">is\u00a0<\/span><span data-contrast=\"none\">that you must set the\u00a0<\/span><span data-contrast=\"none\">number of primitives and vertices that the Mesh Shader will export. You do this by calling\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">SetMeshOutputCounts (\u00a0uint numVertices,\u00a0uint numPrimatives\u00a0)<\/pre>\n<p><span data-contrast=\"none\">This function\u00a0<\/span><b><span data-contrast=\"none\">must<\/span><\/b><span data-contrast=\"none\">\u00a0be called\u00a0<\/span><b><span data-contrast=\"none\">exactly once<\/span><\/b><b><span data-contrast=\"none\">\u00a0<\/span><\/b><span data-contrast=\"none\">in the Mesh Shader before the output arrays are written to. If this does not happen, the Mesh Shader will not output any data.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Beyond these rules, there is so much flexibility in what you can do. Here is an example Mesh Shader, but more information and examples can be found in the spec.<\/span><\/p>\n<pre title=\"Defining specs\" class=\"lang:default decode:true\">#define MAX_MESHLET_SIZE 128 \r\n#define GROUP_SIZE MAX_MESHLET_SIZE \r\n#define ROOT_SIG \"CBV(b0), \\ \r\n    CBV(b1), \\ \r\n    CBV(b2), \\ \r\n    SRV(t0), \\ \r\n    SRV(t1), \\ \r\n    SRV(t2), \\ \r\n    SRV(t3)\"<\/pre>\n<pre title=\"Setting up buffers\" class=\"lang:default decode:true\">struct Meshlet \r\n{ \r\n    uint32_t VertCount; \r\n    uint32_t VertOffset; \r\n    uint32_t PrimCount; \r\n    uint32_t PrimOffset; \r\n \r\n    DirectX::XMFLOAT3 AABBMin; \r\n    DirectX::XMFLOAT3 AABBMax; \r\n    DirectX::XMFLOAT4 NormalCone; \r\n};\r\n \r\nstruct MeshInfo \r\n{ \r\n    uint32_t IndexBytes; \r\n    uint32_t MeshletCount; \r\n    uint32_t LastMeshletSize; \r\n}; \r\n \r\nConstantBuffer&lt;Constants&gt;   Constants : register(b0); \r\nConstantBuffer&lt;Instance&gt;    Instance : register(b1); \r\nConstantBuffer&lt;MeshInfo&gt;    MeshInfo : register(b2); \r\nStructuredBuffer&lt;Vertex&gt;    Vertices : register(t0); \r\nStructuredBuffer&lt;Meshlet&gt;   Meshlets : register(t1); \r\nByteAddressBuffer           UniqueVertexIndices : register(t2); \r\nStructuredBuffer&lt;uint&gt;      PrimitiveIndices : register(t3);<\/pre>\n<pre title=\"Helper functions\" class=\"lang:default decode:true\">uint3 GetPrimitive(Meshlet m, uint index) \r\n{ \r\n    uint3 primitiveIndex = PrimitiveIndices[m.PrimOffset + index]); \r\n    return uint3(primitiveIndex &amp; 0x3FF, (primitiveIndex &gt;&gt; 10) &amp; 0x3FF, (primitiveIndex &gt;&gt; 20) &amp; 0x3FF);  \r\n}\r\n \r\nuint GetVertexIndex(Meshlet m, uint localIndex) \r\n{ \r\n    localIndex = m.VertOffset + localIndex; \r\n    if (MeshInfo.IndexBytes == 4) \/\/ 32-bit Vertex Indices \r\n    { \r\n        return UniqueVertexIndices.Load(localIndex * 4); \r\n    } \r\n    else \/\/ 16-bit Vertex Indices \r\n    { \r\n        \/\/ Byte address must be 4-byte aligned. \r\n        uint wordOffset = (localIndex &amp; 0x1); \r\n        uint byteOffset = (localIndex \/ 2) * 4; \r\n \r\n        \/\/ Grab the pair of 16-bit indices, shift &amp; mask off proper 16-bits. \r\n        uint indexPair = UniqueVertexIndices.Load(byteOffset); \r\n        uint index = (indexPair &gt;&gt; (wordOffset * 16)) &amp; 0xffff; \r\n \r\n        return index; \r\n    } \r\n} \r\n \r\nVertexOut GetVertexAttributes(uint meshletIndex, uint vertexIndex) \r\n{ \r\n    Vertex v = Vertices[vertexIndex]; \r\n\r\n    float4 positionWS = mul(float4(v.Position, 1), Instance.World); \r\n \r\n    VertexOut vout; \r\n    vout.PositionVS   = mul(positionWS, Constants.View).xyz; \r\n    vout.PositionHS   = mul(positionWS, Constants.ViewProj); \r\n    vout.Normal       = mul(float4(v.Normal, 0), Instance.WorldInvTrans).xyz; \r\n    vout.MeshletIndex = meshletIndex; \r\n \r\n    return vout; \r\n} \r\n<\/pre>\n<p>&nbsp;<\/p>\n<pre title=\"Example Mesh Shader\" class=\"lang:default decode:true\">[RootSignature(ROOT_SIG)] \r\n[NumThreads(GROUP_SIZE, 1, 1)] \r\n[OutputTopology(\"triangle\")] \r\nvoid main( \r\n    uint gtid : SV_GroupThreadID, \r\n    uint gid : SV_GroupID, \r\n    out indices uint3 tris[MAX_MESHLET_SIZE], \r\n    out vertices VertexOut verts[MAX_MESHLET_SIZE] \r\n) \r\n{ \r\n    Meshlet m = Meshlets[gid]; \r\n    SetMeshOutputCounts(m.VertCount, m.PrimCount); \r\n    if (gtid &lt; m.PrimCount) \r\n    { \r\n        tris[gtid] = GetPrimitive(m, gtid); \r\n    } \r\n \r\n    if (gtid &lt; m.VertCount) \r\n    { \r\n        uint vertexIndex = GetVertexIndex(m, gtid); \r\n        verts[gtid] = GetVertexAttributes(gid, vertexIndex); \r\n    } \r\n}<\/pre>\n<hr \/>\n<h4><span data-contrast=\"none\"><a id=\"how_to_use_AS\"><\/a>How to build an Amplification Shader<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"none\">Amplification Shaders are similarly easy to start using.\u00a0<\/span><span data-contrast=\"none\">If you choose to use an Amplification Shader, you only have to specify the number of\u00a0<\/span><span data-contrast=\"none\">threads<\/span><span data-contrast=\"none\">\u00a0per group<\/span><span data-contrast=\"none\">, using\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">[ numthreads ( X, Y, Z ) ]<\/pre>\n<p><span data-contrast=\"none\">You\u00a0<\/span><span data-contrast=\"none\">m<\/span><span data-contrast=\"none\">ay issue 0 or 1 calls to<\/span><span data-contrast=\"none\">\u00a0dispatch your Mesh Shaders using<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">DispatchMesh ( ThreadGroupCount X, ThreadGroupCountY, ThreadGroupCountZ, MeshPayload )<\/pre>\n<p><span data-contrast=\"none\">Beyond this, you can choose to use groupshared memory, and the rest is up to your creativity on how to leverage this feature in the best way for your project.<\/span><span data-contrast=\"none\">\u00a0Here is a simple example to get you started:\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">struct payloadStruct\r\n{ \r\n    uint myArbitraryData; \r\n}; \r\n \r\n[numthreads(1,1,1)] \r\nvoid AmplificationShaderExample(in uint3 groupID : SV_GroupID)    \r\n{ \r\n    payloadStruct p; \r\n    p.myArbitraryData = groupID.z; \r\n    DispatchMesh(1,1,1,p);\r\n}<\/pre>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"calling_shaders\"><\/a>Calling Shaders in the Runtime\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"none\">To use Mesh Shaders on the API side, make sure to call CheckFeatureSupport as follows\u00a0<\/span><span data-contrast=\"none\">to ensure that Mesh Shaders are available on your device<\/span><span data-contrast=\"none\">:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><span data-contrast=\"none\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true \">D3D12_FEATURE_DATA_D3D12_OPTIONS7 featureData = {};  \r\npDevice-&gt;CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS7, &amp;featureData, sizeof(featureData)); \r\n \r\nIf ( featureData.MeshShaderTier &gt;= D3D12_MESH_SHADER_TIER_1 ) { \r\n  \/\/Supported Mesh Shader Use \r\n}<\/pre>\n<p><span data-contrast=\"none\">Additionally,\u00a0<\/span><span data-contrast=\"none\">the Pipeline State Object must be compliant with the restrictions of Mesh Shaders, meaning that no incompatible shaders can be attached (Vertex, Geometry, Hull, or Domain), IA and streamout must be disabled,\u00a0<\/span><span data-contrast=\"none\">and\u00a0<\/span><span data-contrast=\"none\">your pixel shader, if provided,\u00a0<\/span><span data-contrast=\"none\">must\u00a0<\/span><span data-contrast=\"none\">be<\/span><span data-contrast=\"none\">\u00a0DXIL<\/span><span data-contrast=\"none\">. Shaders can be attached to a <span class=\"lang:default decode:true crayon-inline \">D3D12_PIPELINE_STATE_STREAM_DESC\u00a0<\/span>\u00a0<\/span><span data-contrast=\"none\">struct with the types <span class=\"lang:default decode:true crayon-inline \">CD3DX12_PIPELINE_STATE_STREAM_AS\u00a0<\/span>\u00a0<\/span><span data-contrast=\"none\">and <span class=\"lang:default decode:true crayon-inline \">CD3DX12_PIPELINE_STATE_STREAM_MS<\/span>\u00a0<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:1,&quot;335559739&quot;:160,&quot;335559740&quot;:285}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">To call the shader, run\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:1,&quot;335559739&quot;:160,&quot;335559740&quot;:285}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">DispatchMesh(ThreadGroupCountX, ThreadGroupCountY, ThreadGroupCountZ)<\/pre>\n<p><span data-contrast=\"none\">Which will launch either the Mesh Shader or the Amplification Shader if it is present. You can also use\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:1,&quot;335559739&quot;:160,&quot;335559740&quot;:285}\">\u00a0<\/span><\/p>\n<pre class=\"lang:default decode:true\">void ExecuteIndirect(  \r\n    ID3D12CommandSignature *pCommandSignature,  \r\n    UINT MaxCommandCount,  \r\n    ID3D12Resource *pArgumentBuffer,  \r\n    UINT64 ArgumentBufferOffset,  \r\n    ID3D12Resource *pCountBuffer,  \r\n    UINT64 CountBufferOffset );<\/pre>\n<p><span data-contrast=\"none\">To launch the shaders from the GPU instead of the CPU.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:1,&quot;335559739&quot;:160,&quot;335559740&quot;:285}\">\u00a0<\/span><\/p>\n<hr \/>\n<h4 aria-level=\"2\"><span data-contrast=\"none\"><a id=\"getting_started\"><\/a>Getting Started<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h4>\n<p><span data-contrast=\"none\">To use Mesh Shaders and Amplification Shaders in your application, install the latest Windows 10 Insider Preview build and SDK Preview Build for\u00a0<\/span><a href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fwindows-insider%2Fflight-hub%2F%23in-development-builds-of-windows-10-20h1&amp;data=02%7C01%7CSarah.Jobalia%40microsoft.com%7Cfd9ab121621048fa44bf08d75e288b09%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637081400536628868&amp;sdata=I8tD3i8HYWtMsOA%2BV43bXvnR7SNtNd6xJ1Fl7k%2F1bEM%3D&amp;reserved=0\"><span data-contrast=\"none\">Windows 10 (20H1)<\/span><\/a><span data-contrast=\"none\">\u00a0from the\u00a0<\/span><a href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Finsider.windows.com%2F&amp;data=02%7C01%7CSarah.Jobalia%40microsoft.com%7Cfd9ab121621048fa44bf08d75e288b09%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637081400536638866&amp;sdata=vB3Ud%2FuULCeUIBgZSlnOOTks6NiQgoYHHrljcctsNvs%3D&amp;reserved=0\"><span data-contrast=\"none\">Windows Insider Program<\/span><\/a><span data-contrast=\"none\">. You\u2019ll also need to download and use the latest DirectX Shader Compiler. Finally, because this feature relies on GPU hardware support, you\u2019ll need to contact GPU vendors to find out specifics regarding supported hardware and drivers.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">You can find more information in the Mesh Shader specification, located here: <a href=\"https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/MeshShader.html\">https:\/\/microsoft.github.io\/DirectX-Specs\/d3d\/MeshShader.html<\/a>.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>D3D12 is\u00a0adding\u00a0two new shader stages: the Mesh Shader and the Amplification Shader. These additions will streamline the rendering pipeline, while simultaneously boosting flexibility and efficiency.\u00a0 In\u00a0this\u00a0new and improved pre-rasterization pipeline, Mesh and Amplification Shaders\u00a0will\u00a0optionally\u00a0replace\u00a0the section of the pipeline consisting of\u00a0the Input Assembler\u00a0as well as\u00a0Vertex, Geometry, Domain, and Hull Shaders\u00a0with richer and more\u00a0general purpose\u00a0capabilities.\u00a0This is possible [&hellip;]<\/p>\n","protected":false},"author":10279,"featured_media":12651,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2612","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-directx"],"acf":[],"blog_post_summary":"<p>D3D12 is\u00a0adding\u00a0two new shader stages: the Mesh Shader and the Amplification Shader. These additions will streamline the rendering pipeline, while simultaneously boosting flexibility and efficiency.\u00a0 In\u00a0this\u00a0new and improved pre-rasterization pipeline, Mesh and Amplification Shaders\u00a0will\u00a0optionally\u00a0replace\u00a0the section of the pipeline consisting of\u00a0the Input Assembler\u00a0as well as\u00a0Vertex, Geometry, Domain, and Hull Shaders\u00a0with richer and more\u00a0general purpose\u00a0capabilities.\u00a0This is possible [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/2612","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/users\/10279"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/comments?post=2612"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/2612\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media\/12651"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media?parent=2612"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/categories?post=2612"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/tags?post=2612"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}