Agility SDK 1.606.3: Shader Model 6.7 is now publicly available!

Greg Roth

The DirectX Compiler Team and our partners are pleased to announce the release of Shader Model 6.7!

Shader Model 6.7 expands texture fetching, quad querying, and wave capabilities to enable ever more complex (and compatible) new shader-driven features!

Advanced Texture Operations

SM 6.7 adds a collection of useful texture capabilities that fill in gaps the capabilities of existing texture operations as well as adding versatile new ones collectively referred to as Advanced Texture Operations. This will be an optional feature.

Integer Sampling

Textures with integer components can now be sampled. To enable this, we’ve created a new way to describe samplers with integer type border colors.

typedef struct D3D12_SAMPLER_DESC2 {
  D3D12_FILTER Filter; // Most of this is the same as D3D12_SAMPLER_DESC
  UINT MaxAnisotropy;
  D3D12_COMPARISON_FUNC ComparisonFunc;
  union {
    FLOAT FloatBorderColor[4];
    UINT UintBorderColor[4]; // <--- This is new!
  D3D12_SAMPLER_FLAGS Flags; // <-- This indicates you are using the new thing!

Where Flags is the enum:


For static samplers, you can now specify new enum values to D3D12_STATIC_BORDER_COLOR field in D3D12_STATIC_SAMPLER_DESC:


Raw Gather

Previous gather operations grant the ability to retrieve a single channel of the sampled elements. Because the operations were limited to a single channel, retrieving all the channels of an element would require multiple gathers. Additionally, implicit conversions and other processing is done on these elements that the programmer has no control over.

Raw gathers give the control to the author by retrieving the raw element data including all channels without any conversion. Like other gathers, they draw from the elements that have been sampled with bilinear filtering using Sample. The elements are retrieved in the form of unsigned integers of sizes matching the size of the full elements with channels packed in as specified for the underlying format.

Raw gather grants the author full access to the contents of the texture and full control on how to process it.

Casting Resources to Unsigned Integer Views

To glean the full benefit of raw gathers, you’ll need the relaxed format casting D3D12 feature to create unsigned integer views for the underlying formats resources. Its availability is indicated by the presence of D3D12_FEATURE_DATA_D3D12_OPTIONS12::EnhancedBarriersSupported and D3D12_FEATURE_DATA_D3D12_OPTIONS12::RelaxedFormatCastingSupported. To create the resource to cast to unsigned integer, use the new ID3D12Device10::reateCommittedResource3 method (or similar CreatePlacedResource2 and CreateReservedResource2 methods) and provide a list of resource view formats it may be cast to using the NumCastableFormats and pCastableFormats parameters. The unsigned integer cast target must be of the same size as the full element of the resource created.

    HRESULT CreateCommittedResource3(
        const D3D12_HEAP_PROPERTIES* pHeapProperties,
        D3D12_HEAP_FLAGS HeapFlags,
        const D3D12_RESOURCE_DESC1* pDesc,
        D3D12_BARRIER_LAYOUT InitialLayout,
        const D3D12_CLEAR_VALUE* pOptimizedClearValue,
        ID3D12ProtectedResourceSession* pProtectedSession,
        UINT32 NumCastableFormats,
        DXGI_FORMAT *pCastableFormats,
        REFIID riidResource,
        void** ppvResource);

These resource views are then used in the shader to retrieve the raw texture data.

Raw Gather Built-in Shader Function

For example, a R32_UINT format resource view could be created for a R8G8B8A8 texture and then within the shader the R32_UINT resource view could then be raw gathered into four 32-bit unsigned integers that represent the raw representation of the R8G8B8A8 data. The author is then able to use that data however they wish.

The simplest raw gather overload is:

uint32_t4 TexObj2D.GatherRaw(SamplerState S, float2 location);

Where the return type could be a 16, 32, or 64-bit integer and location’s type depends on the texture object type

An example of using it to gather 4 32-bit values for a R8G8B8A8 texture:

Texture2D<uint32_t> R8G8B8A8Tex : register(t0);
// R32_UINT SRV aliased to a R8G8B8A8 resource

SamplerState samp : register(s0);


uint32_t4 elements = R8G8B8A8Tex.GatherRaw(samp, uv);

The elements variable will then contain the four elements sample as determined by the sampler state and uv location packed into the 32 bit integers as four 8-bit values representing the RGBA channels. 2D texture arrays can also be used. Where 16-bit and 64-bit integers are supported, formats of those sizes can be cast to the corresponding integer views and raw gathered as well using similar shader code. The number of channels and their layout depends on the underlying resource format.

Raw Gather Limitations

Raw Gathers require Enhanced Barriers for the full set of formats the feature has been designed to support; see our preview Agility SDK to also get access to Enhanced Barriers. Without Enhanced Barriers and Relaxed Format Casting support, only the following formats that could previously be cast to uint views will be castable:

Format  Castable to View Format 
R32G32_TYPELESS  R32G32_UINT (Serves as 64-bit uint) 
R32G32_SINT  R32G32_UINT 

Programmable Offsets

Existing sample and load operations require their offsets to be immediate integers. Programmers had to decide on the offset values they wanted prior even to shader compile time. To say the least, this made them of limited use.

Shader Model 6.7 frees offset arguments to the full suite of sample and load operations to be variable values just as they can be in gather operations. The effective range remains [-8,7], by respecting only the 4 least significant bits of the provided offset values. The full list of affected resource methods:

Load( int2 Location, int2 Offset, [out uint Status] );
Sample( SamplerState S, float2 Location, int2 Offset,
        [float Clamp], [out uint Status] );
SampleBias( SamplerState S, float2 Location, float Bias,
            int2 Offset, [float Clamp], [out uint Status] );
SampleCmp( SamplerComparisonState S, float2 Location,
           float CompareValue, int2 Offset, [float Clamp], [out uint Status] );
SampleCmpLevelZero( SamplerComparisonState S, float2 Location,
                    float CompareValue, int2 Offset, [out uint Status] );
SampleGrad( SamplerState S, float2 Location, float DDX, float DDY,
            int2 Offset, [float Clamp], [out uint<N> Status]);
SampleLevel( SamplerState S, float2 Location, float LOD, int2 Offset,
             [out uint Status]);

Note that the above list assumes a Texture2D resource. For other resources, the int2 and float2 types for offset and location might be different sized vectors depending on the method’s object.

Explicit Sample Compare Level

Previously, to perform same and compare operation, you could either use the default SampleCmp, which used a MIP level determined by the location gradients or access the zero level using SampleCmpLevelZero. This left an obvious gap in functionality where using smaller MIP levels explicitly indexed would be useful. Shader Model 6.7 adds SampleCmpLevel which simply allows you to specify the level you want to sample and compare to:

SampleCmpLevel( SamplerComparisonState S, float2 Location,
                float CompareValue, float LOD, [int2 Offset], [out uint Status]);

As before, this is the Texture2D method and the vector sizes of Location and Offset will vary for other resource methods. Along with the pre-existing sample operations, SampleCmpLevel includes the programmable offsets feature of 6.7 allowing its offset parameter to be variable as well.

Cube map variants are also included which take a slightly different method signature:

Format TextureCube::SampleCmpLevel( SamplerComparisonState S, float3 Location,
                                   float CompareValue, float LOD, [out uint Status]);
Format TextureCubeArray::SampleCmpLevel( SamplerComparisonState S, float4 Location,
                                        float CompareValue, float LOD, [out uint Status]);

Writable Multisampled Textures

We’ve introduced writable multisampled texture types to permit specific multisampled texture alterations beyond use of them as render targets.  Previously, multisampled textures could only be bound as render targets or read-only inputs. Expanded multisampled texture writability will allow shader authors to alter sampled texture contents more specifically, perhaps focusing on key areas while leaving areas of less interest to the default behavior.

The following resource types with corresponding methods can be used to bind and write to multisampled texture information:

RWTexture2DMS<Type, Samples>
RWTexture2DMSArray<Type, Samples>

The Type and Samples template variables represent the HLSL type of the resource and the number of samples. Unlike read-only multisampled texture resource types, they are required.

These new types can be accessed much like their read-only counterparts using a single index operator [loc] which references sample index 0 at location loc or the .sample[samp][loc] operator which accesses sample index samp at location loc. The difference is that, in addition to being readable as before, they can be assignment targets:

RWTexture2DMS<float4, 4> g_ms;
float4 main(float2 loc : TEXCOORD0) : SV_Target {
  g_ms[loc] = GetZeroSamp(loc);
  g_ms.sample[1][loc] = GetOneSamp(loc);


We’ve added two new quad ops to HLSL that allow any and all operations on lanes within a quad. As with all quad operations, this is best illustrated with cats in boxes:

Image qcat1 Image qcat1b
IsHelperLane   QuadReadAcrossDiagonal
Image qcat2 Image qcat2b
 QuadReadAcrossX   QuadReadAcrossY
Image qcat3 Image qcat3b
QuadAny QuadAll



(Sorry they don’t look much like cats😕 Engineer art 😆)

The bottom row contains the new efficient queries that determine if a given expression is true for any or all of the lanes in the current quad are true.

Quads can return whether any or all of them evaluate an expression to true

QuadAny can efficiently resolve non-quad-uniform flow problems:

float4 main(int x: X, float2 uv : TEXCOORD0) : SV_Target {
  float4 ret = 0;
  float2 temp_uv = modifyuv(uv);
  if (x > SCREEN_X)
    ret = t0.Sample(s0, temp_uv);
  return ret;


float4 main(float x : X, float2 uv : TEXCOORD0) : SV_Target {
  float4 ret = 0;
  if (QuadAny(x > SCREEN_X)) {
    float2 temp_uv = modifyuv(uv);
    float4 sampled_result = t0.Sample(s0, temp_uv);
    if (x > SCREEN_X)
      ret = sampled_result;
  return ret;

Helper Lanes in Wave Ops Mode

Sometimes you need a little help. For those times, Shader Model 6.7 introduces the WaveOpsIncludeHelperlanes Attribute!

Helper lanes previously only contributed to derivative calculations and not wave operations. That meant that derivative operations that depend on values or control flow that derived from wave operations had undefined results. In combination with the IsHelperLane() query added with Shader Model 6.6, developers will have full control over how wave ops interact and behave on helper lanes. This control will allow derivative operations to be reliably used in the presence of wave operations.

This is effected by applying the WaveOpsIncludeHelperLanes attribute to the shader entry function:

[WaveOpsIncludeHelperLanes] // Yes, it's just this easy!
void func() ...

Lanes will be created as or demoted to helper lanes exactly as they were before. The only effect of this attribute is that they will now contribute to the wave functions and, as such, they will persist until the last wave operation is complete so that they can contribute to those results. So some helper lanes might stick around a bit longer than they used to.

Great! How Can I Try It Out?

I love your enthusiasm! You need three things:

  • A compiler that allows you to compile shader model 6.7 shaders
  • A runtime SDK that allows you to use the new D3D interfaces and recognizes the compiled shaders
  • A hardware-specific driver to run those shaders

You can get the compiler from the July 2022 DXC release. You’ll need to specify the appropriate shader target with the ending *_6_7 to compile a shader using the new features.

You’ll need to use either the 1.606.3 Agility SDK or the 1.706.3 preview Agility SDK with your build.

Finally, you can get preview drivers for the following platforms: 


AMD support for Shader Model 6.7 will be publicly released in an upcoming AMD Radeon Software Adrenalin release. The AMD Adrenalin driver for 1.706.3 preview Agility SDK can be used until then:  


Developers interested in working with the latest Public and Preview Agility SDKs on NVIDIA hardware should reach out to their NVIDIA representative for more details. 


A preview driver for Intel® Arc™ Graphics Family (DG2) cards is available at:  

Support for additional Intel® graphic cards will be available in future driver updates. 


For more details about these features, see the documentation and specs in DirectX-Specs.

1 comment

Discussion is closed. Login to edit/delete existing comments.

  • tangogu 0

    Based on Programmable Offsets section of this blog:

    DXIL needs update for this, since it only mentioned constants now: “Offset input parameters are i8 constants in [-8,+7] range; default offset is 0.
    HLSL Sample document also needs to update for this. Currently, it requires texture offsets to be “static”:

    [in] An optional texture coordinate offset, which can be used for any texture-object type; the offset is applied to the location before sampling. The texture offsets need to be static. The argument type is dependent on the texture-object type. For more info, see Applying texture coordinate offsets.


Feedback usabilla icon