{"id":3278,"date":"2020-11-23T12:15:05","date_gmt":"2020-11-23T20:15:05","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/directx\/?p=3278"},"modified":"2020-11-23T12:21:43","modified_gmt":"2020-11-23T20:21:43","slug":"in-the-works-hlsl-shader-model-6-6","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/directx\/in-the-works-hlsl-shader-model-6-6\/","title":{"rendered":"In the works: HLSL Shader Model 6.6"},"content":{"rendered":"<p><a href=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/2017\/01\/XII_BLACK_1kx1k.jpg\"><img decoding=\"async\" class=\"size-medium wp-image-845 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/2017\/01\/XII_BLACK_1kx1k-300x300.jpg\" alt=\"\" width=\"300\" height=\"300\" srcset=\"https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/2017\/01\/XII_BLACK_1kx1k-300x300.jpg 300w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/2017\/01\/XII_BLACK_1kx1k-150x150.jpg 150w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/2017\/01\/XII_BLACK_1kx1k-768x768.jpg 768w, https:\/\/devblogs.microsoft.com\/directx\/wp-content\/uploads\/sites\/42\/2017\/01\/XII_BLACK_1kx1k.jpg 1024w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Microsoft and its partners are happy to announce the development of Shader Model 6.6, the latest advancement in HLSL capability.<\/p>\n<p>Shader Model 6.6 will grant shader developers increased flexibility to enhance and expand existing rendering approaches and devise all new ones.\nNew features include <strong>expanded atomic operations<\/strong>, <strong>dynamic resource binding<\/strong>, <strong>derivatives and samples in compute shaders<\/strong>, <strong>packed 8-bit computations<\/strong>, and <strong>wave size.<\/strong><\/p>\n<h2>Feature Details<\/h2>\n<h3>64-bit Integer Atomic Operations<\/h3>\n<p>Shader Model 6.6 will introduce the ability to perform atomic arithmetic, bitwise, and exchange\/store operations on 64-bit values.<\/p>\n<p>All the following atomic intrinsic functions and methods will take 64-bit values when used on <span style=\"font-family: terminal, monaco, monospace;\">RWByteAddressBuffer<\/span> and <span style=\"font-family: terminal, monaco, monospace;\">RWStructuredBuffer<\/span> types in all shader stages:<\/p>\n<pre class=\"prettyprint\">void InterlockedAdd(inout BufType dest, int64_t value, out int64_t orig);\r\nvoid InterlockedAnd(inout BufType dest, int64_t value, out int64_t orig);\r\nvoid InterlockedOr(inout BufType dest, int64_t value, out int64_t orig);\r\nvoid InterlockedXor(inout BufType dest, int64_t value, out int64_t orig);\r\nvoid InterlockedMin(inout BufType dest, int64_t value, out int64_t orig);\r\nvoid InterlockedMax(inout BufType dest, int64_t value, out int64_t orig);\r\nvoid InterlockedExchange(inout BufType dest, int64_t value, out int64_t orig);\r\nvoid InterlockedCompareStore(inout BufType dest,  int64_t cmpval, int64_t value);\r\nvoid InterlockedCompareExchange(inout BufType dest, int64_t cmpval, int64_t value,\r\n                                 out int64_t orig);<\/pre>\n<p>Where <span style=\"font-family: terminal, monaco, monospace;\">RWByteAddressBuffer<\/span> methods are concerned, each of these will have a <span style=\"font-family: terminal, monaco, monospace;\">*64<\/span>\u00a0suffix to indicate the expected type.<\/p>\n<p>Shader Model 6.6 will include optional support for other resource and variable types. Typed resources, including writeable typed buffers and textures, will be supported where <span style=\"font-family: terminal, monaco, monospace;\">AtomicInt64OnTypedResourceSupported<\/span> option is set. Shared memory <span style=\"font-family: terminal, monaco, monospace;\">groupshared<\/span> variables will be supported where <span style=\"font-family: terminal, monaco, monospace;\">AtomicInt64OnGroupSharedSupported<\/span>\u00a0is set.<\/p>\n<p>&nbsp;<\/p>\n<h3>Integer Atomics on Float-Typed Resources<\/h3>\n<p>Shader Model 6.6 will introduce support for using floating point values in the existing integer compare and exchange intrinsic functions. The functions that use compares use bitwise compares and not true floating point compares:<\/p>\n<pre class=\"prettyprint\">void InterlockedExchange(inout BufType dest, float value, out float orig);\r\nvoid InterlockedCompareStoreFloatBitwise(inout BufType dest, float cmpval, float value);\r\nvoid InterlockedCompareExchangeFloatBitwise(inout BufType dest, float cmpval, float value,\r\n                                             out float orig);<\/pre>\n<p><span style=\"font-family: terminal, monaco, monospace;\">InterlockedExchange<\/span> was an existing intrinsic function extended to include floats since it involved no compare, no new suffix was needed. The <span style=\"font-family: terminal, monaco, monospace;\">ByteAddressBuffer<\/span> version was given a <span style=\"font-family: terminal, monaco, monospace;\">*Float<\/span> suffix to indicate the intended type.<\/p>\n<p>&nbsp;<\/p>\n<h3>Dynamic Resource Binding<\/h3>\n<p>Shader Model 6.6 will introduce the ability to create resources from descriptors by directly indexing into the CBV_SRV_UAV heap or the Sampler heap. This resource creation method eliminates the need for root signature descriptor table mapping but requires new global root signature flags to indicate the use of each heap.<\/p>\n<p>The feature is exposed as two new builtin global indexable objects: <span style=\"font-family: terminal, monaco, monospace;\">ResourceDescriptorHeap<\/span> and <span style=\"font-family: terminal, monaco, monospace;\">SamplerDescriptorHeap<\/span>. Indexing these global objects returns an internal handle object. This object can be assigned to temporary resource or sampler objects without requiring resource binding locations or mapping through root signature descriptor tables.<\/p>\n<pre class=\"prettyprint\">&lt;resource variable&gt; = ResourceDescriptorHeap[uint index];\r\n&lt;sampler variable&gt; = SamplerDescriptorHeap[uint index];<\/pre>\n<p>The assigned variable must match the heap type of the indexed array.<\/p>\n<p>&nbsp;<\/p>\n<h3>Compute Shader Derivatives and Samples<\/h3>\n<p>Shader Model 6.6 will introduce derivative and sample intrinsic functions to compute shaders. Previous shader models restricted these functions to pixel shaders.<\/p>\n<p>Derivative operations depend on 2&#215;2 quads. Compute shaders don&#8217;t have quads. So in order to map these functions to a compute shader which views data as a serial sequence, we&#8217;ve defined the quads these functions operate on according to the compute shader lane index. One quad consists of the first four elements in the land index sequence in left-to-right and then top-to-bottom order. Another quad similarly consists of the next four and so on. This gives the 2&#215;2 quads that the following intrinsic functions operate on.<\/p>\n<p>The derivative functions added:<\/p>\n<pre class=\"prettyprint\">T ddx(in T value)\r\nT ddx_coarse(in T value)\r\nT ddy(in T value)\r\nT ddy_coarse(in T value)\r\nT ddx_fine(in T value)\r\nT ddy_fine(in T value)<\/pre>\n<p>The sample functions added:<\/p>\n<pre class=\"prettyprint\">float TexObject::CalculateLevelOfDetail( SamplerState sampler_state, F pos )\r\nfloat TexObject::CalculateLevelOfDetailUnclamped( SamplerState sampler_state, F pos )\r\nR TexObject::Sample( SamplerState sampler_state, F location)\r\nR TexObject::SampleBias( SamplerState sampler_state, F location, float Bias)\r\nfloat TexObject::SampleCmp( SamplerComparisonState S, F location, float cmpval)<\/pre>\n<p>These operations will be optionally available for Amplification and Mesh shader stages where the D<span style=\"font-family: terminal, monaco, monospace;\">erivativesInMeshAndAmplificationShadersSupported<\/span>\u00a0capability bit is set.<\/p>\n<p>&nbsp;<\/p>\n<h3>Packed 8-Bit Operations<\/h3>\n<p>Shader Model 6.6 will add a new set of intrinsic functions for processing packed 8-bit data. These are useful to reduce bandwidth usage where lower precision calculations are acceptable.<\/p>\n<p>These are the new data types representing a vector of packed 8-bit values:<\/p>\n<pre class=\"prettyprint\">uint8_t4_packed \/\/ 4 packed uint8_t values in a uint32_t\r\nint8_t4_packed \/\/ 4 packed int8_t values in a uint32_t<\/pre>\n<p>These new types can be cast to and from <span style=\"font-family: terminal, monaco, monospace;\">uint32_t<\/span> values without a change in the bitwise representation.<\/p>\n<p>The pack intrinsic functions allow packing a vector of 4 signed or unsigned values into a packed 32-bit value represented by the new packed data types. One version performs a datatype clamp and the other simply drops the unused bits.<\/p>\n<pre class=\"prettyprint\">uint8_t4_packed pack_u8(uint32_t4 unpackedVal); \/\/ Pack lower 8 bits, drop unused bits\r\nint8_t4_packed pack_s8(int32_t4 unpackedVal); \/\/ Pack lower 8 bits, drop unused bits\r\n\r\nuint8_t4_packed pack_u8(uint16_t4 unpackedVal); \/\/ Pack lower 8 bits, drop unused bits\r\nint8_t4_packed pack_s8(int16_t4 unpackedVal); \/\/ Pack lower 8 bits, drop unused bits\r\n\r\nuint8_t4_packed pack_clamp_u8(int32_t4 unpackedVal); \/\/ Pack and Clamp [0, 255]\r\nint8_t4_packed pack_clamp_s8(int32_t4 unpackedVal); \/\/ Pack and Clamp [-128, 127]\r\n\r\nuint8_t4_packed pack_clamp_u8(int16_t4 unpackedVal); \/\/ Pack and Clamp [0, 255]\r\nint8_t4_packed pack_clamp_s8(int16_t4 unpackedVal); \/\/ Pack and Clamp [-128, 127]<\/pre>\n<p>To unpack 32-bit values representing 4 8-bit values into a vector of 16 bit or 32 bit signed or unsigned values:<\/p>\n<pre class=\"prettyprint\">int16_t4 unpack_s8s16(int8_t4_packed packedVal); \/\/ Sign Extended\r\nuint16_t4 unpack_u8u16(uint8_t4_packed packedVal); \/\/ Non-Sign Extended\r\n\r\nint32_t4 unpack_s8s32(int8_t4_packed packedVal); \/\/ Sign Extended\r\nuint32_t4 unpack_u8u32(uint8_t4_packed packedVal); \/\/ Non-Sign Extended\r\n\r\n<\/pre>\n<h3>Wave Size<\/h3>\n<p>Shader Model 6.6 will introduce a new compute shader attribute that allows the shader author to specify a wave size that the compute shader is compatible with.<\/p>\n<p>This feature allows the application to guarantee that a shader will be run at the required wave size. With this attribute, DirectX 12 runtime validation will fail if shaders in a pipeline state object have a required wave size that is not in the range reported by the driver. Because use of this feature limits shader flexibility, we only recommended it for shaders compatible with only one wave size.<\/p>\n<p>The required wave size is specified by an attribute before the entry function. The allowed wave sizes that an HLSL shader may specify are the powers of 2 between 4 and 128, inclusive. In other words, the set: <span style=\"font-family: terminal, monaco, monospace;\">[4, 8, 16, 32, 64, 128]<\/span>.<\/p>\n<p>&nbsp;<\/p>\n<pre class=\"prettyprint\">[WaveSize(&lt;numLanes&gt;)]\r\nvoid main() ...\r\n\r\n<\/pre>\n<p><span style=\"font-family: terminal, monaco, monospace;\">&lt;numLanes&gt;<\/span> must be an immediate integer value of an allowed wave size.<\/p>\n<h2>Development<\/h2>\n<p>Shader Model 6.6 is a work in progress.<\/p>\n<p>Initial compiler implementation of these features will be submitted to the <a href=\"https:\/\/github.com\/microsoft\/DirectXShaderCompiler\">DirectXShaderCompiler GitHub repository<\/a> at https:\/\/github.com\/microsoft\/DirectXShaderCompiler. We will continue making additional improvements there over the coming months. Hardware vendors will be implementing driver backend support in parallel. Once complete, we will formally release Shader Model 6.6 and developers can take full advantage of it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Microsoft and its partners are happy to announce the development of Shader Model 6.6, the latest advancement in HLSL capability. Shader Model 6.6 will grant shader developers increased flexibility to enhance and expand existing rendering approaches and devise all new ones. New features include expanded atomic operations, dynamic resource binding, derivatives and samples in [&hellip;]<\/p>\n","protected":false},"author":45155,"featured_media":845,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3278","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-directx"],"acf":[],"blog_post_summary":"<p>&nbsp; Microsoft and its partners are happy to announce the development of Shader Model 6.6, the latest advancement in HLSL capability. Shader Model 6.6 will grant shader developers increased flexibility to enhance and expand existing rendering approaches and devise all new ones. New features include expanded atomic operations, dynamic resource binding, derivatives and samples in [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/3278","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/users\/45155"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/comments?post=3278"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/posts\/3278\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media\/845"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/media?parent=3278"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/categories?post=3278"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/directx\/wp-json\/wp\/v2\/tags?post=3278"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}