May 27th, 2025
2 reactions

HLSL Native and Long Vectors

Greg Roth
Dev Lead

Vectors have been supported as native primitive types in HLSL from the beginning. However, they have been limited to a maximum of 4 elements. This was reasonable for the use cases for which HLSL was designed as 3D vertices, 3D vectors, and RGBA colors can be fully represented using 4 scalar values or fewer.

There are other applications in the machine learning (ML) space and for uses of vectors that don’t directly map to these concepts that benefit from longer vectors. Sometimes they benefit from much longer vectors! These uses are principally, though not exclusively, through Cooperative Vectors.

To meet these needs, Shader Model 6.9 drastically increases the limit of vector lengths from 4 to 1024, a 128x increase! This comes with a few constraints in usage and some changes to DXIL that are needed to accommodate them. Hereafter, vectors of over 4 elements will be referred to as “long vectors” and preexisting shorter vectors as “short vectors”.

A Few Words on Native Vectors

A conceptually straightforward facilitator for long vectors is the introduction of the mostly invisible Native Vectors feature.

While vectors have been supported in HLSL since its inception, they are not directly representable in DXIL 1.8 and earlier. Instead, vectors were converted into scalar values (“scalarized”) that were each loaded, stored, and otherwise operated on as individual values.

Native Vectors allow the vectors in HLSL to be represented as such in DXIL 1.9 shaders. This means they will be loaded and stored from and to supported resources and operated on using native vector DXIL operations that retain the vector primitive in the DXIL interchange format.

Native Vectors share most of the same restrictions as Long Vectors that are described in more detail below:

  • Only loadable and storable from and to raw buffers.
  • Only usable as local, static global, or groupshared variables
  • Only usable on DXIL intrinsics that represent elementwise operations.

Other uses will scalarize and de-scalarize vectors to and from scalars as needed to perform the operations and package the results back into a vector.

While Native Vectors are largely invisible to the HLSL author, they are not without some potential functional changes. By representing vector operations as native vectors, the platforms that consume DXIL can potentially better optimize the operations for their specific hardware. Additionally, the representation in vector operations will avoid repeated operations on scalars that should result in more compact output DXIL.

For more information about Native Vectors, see the specification.

Declaring Long Vectors

Long vectors use an existing but underappreciated syntax for declaring vector types. Since adding the full list of “shorthand” vector types from float5float1024 and the same range for all other types would be cumbersome and potentially confusing, the template-style type declaration is the only way to declare vectors of more than 4 elements:

vector< TYPE, SIZE > VectorName;

Where TYPE is the scalar type for the elements of the vector, SIZE is the number of elements in the vector, and VectorName is the user-chosen name of the vector. Note that the way a vector is declared has no effect on how it can be used or how it is compiled to the target interchange format. Long vectors can be used with existing short vectors in whatever ways they are compatible regardless of whether the short vector was declared using the template or shorthand syntax. When targeting DXIL 1.9, short or long will be represented as native vectors the same way regardless of declaration style.

Long vectors can be delcared as static global, groupshared,  user function parameters, and user function return types. They cannot currently be used in cbuffers or as any part of the shader signature. This includes parameters or return types of any entry functions. Parameters and return types of exported library functions or user utility functions within another target shader are allowed as they will ultimately be inlined.

Operating on Long Vectors

Long Vectors can be used on all native operators including arithmetic, assignment, and comparison operators. They can also be used on HLSL intrinsics that perform elementwise operations. An elementwise operation is one that conceptually performs the same operation on each element of all of the parameters where the indices of each element operand will match. For such intrinsics where the output is a vector of size corresponding to the parameters, the result of each element will be assigned to result element in the corresponding index location.

Consider a mental model of the implementation of an imaginary intrinsic “dogoodthings()”:

template<typename T, int N>
vector<T, N> dogoodthings(vector<T,N> AVec, vector<T,N> BVec) {
  vector<T,N> Res;
  for(int Ix = 0; Ix < N; Ix++)
    Res[Ix] = dogoodthings(AVec[Ix], BVec[Ix]); // scalar overload
  return Res;
}

This is only a conceptual model of what an elementwise operation does. In practice, the implementation will likely perform these operations at least somewhat in parallel.

Elementwise intrinsics can take single parameters such as trigonometry intrinsics (cos, sin, tan, etc.), binary operations (e.g. max and min), or tertiary intrinsics (e.g. mad). Most of these return the same types as the parameters they take, but some perform some kind of reduction or other conversion such as dot and countbits. For these, the output will not contain the elementwise result of the operation on each element, but some output thereof.

Loading and Storing Long Vectors

However useful the elementwise operations are, if they can’t be performed on vectors loaded from interesting external input and stored into useable external output, we’ll still have scalarization operations to contend with.

Long vectors can be used for static and groupshared global types which allows sharing between function scopes or threads in the same shader, but to actually interact with external memory, we need to employ resources. Long vectors aren’t a good match for the tight restrictions of typed buffers. They work well with raw buffers (both ByteAddressBuffers and StructuredBuffers). For these, templated load and store operations from ByteAddressBuffers or subscript operators on StructuredBuffers can be used to retrieve vectors of a specified size and element type from the raw buffer storage:


RWByteAddressBuffer BABVectors;
RWStructuredBuffer<vector<float,15> > StVectors;

...
  // Loads
  vector<float,15> AVec = BABVectors.Load<vector<float,15> >(/*offset*/0);
  vector<float,15> BVec = StVectors[0];
  // Stores
  BABVectors.Store<vector<float,15> >(sizeof(AVec), BVec);
  StVectors[1] = AVec;

Note that a space is required between the closing > of the vector declaration and the closing > of the Load or Store specialization due to the ambiguity of >> in the current HLSL language version.

For more information about loading, storing, operating and declaring Long Vectors, see the specification.

Category
HLSL

Author

Greg Roth
Dev Lead

0 comments