Today DirectX 12 provides APIs to support GPU video encode acceleration for several applications, as detailed in D3D12 Video Encoding – Windows drivers | Microsoft Learn previous blog posts such as Announcing new DirectX 12 feature – Video Encoding! – DirectX Developer Blog.

In this blog post we’re happy to announce a series of new features included in the Agility SDK 1.716.0-preview that provide more control to apps using the D3D12 Video Encode API. These new features help reduce latency and improve quality in several scenarios.

New feature list

Subregion notifications: Slice/tile partial encoding and async completion signaling
Dirty regions: Configurable skip encoding for frame regions
Motion vector hints provided externally to the encoder
Enhanced frame/block statistics per encoded frame
HEVC 4:2:2/4:4:4 profiles support
Readable DPB reconstructed pictures
Input ID3D12Resource QP Map

Let’s take a look at each of the features. Except for the HEVC additions, the rest of them are codec-agnostic features, meaning that the interfaces are defined without using any specific codec structures. For a detailed description of the features and their interfaces, please refer to the new video specs uploaded to DirectX-Specs | Engineering specs for DirectX features.

Subregion notifications

When encoding a frame and requesting it to be partitioned into multiple slices or tiles, until now the apps had to wait for the commands to be finished executing in the GPU, before being able to access the compressed bitstream buffer for the entire frame containing all slices.

By introducing the subregion notifications feature, it is now possible to execute the EncodeFrame command but split each slice/tile in different ID3D12Resource objects (or a single one suballocated buffer) containing each subregion bitstream and waiting for completion on independent ID3D12Fence objects, one for each of the frame subregions. The full frame metadata is still reported at the end of the frame encoding in ResolveEncoderOutputMetadata as usual, but the subregion offsets/sizes are reported asynchronously during EncodeFrame as their completion is signaled, allowing the apps to start consuming these compressed bitstream buffers asynchronously while the rest of the subregions are still being encoded. The latter is useful in helping reduce latency in scenarios such as streaming, where subregions now can be sent over the network while the rest of the subregions are still being encoded.

Dirty regions

In scenarios where most regions of a video don’t change between consecutive frames (e.g. screen sharing), and the app knows the regions where small change occurs (the dirty regions), now it will be possible to feed this information to the D3D12 Video Encoder, so the encoder can “skip” regions that didn’t change accelerating the encode operation. This helps improve encoding speed as opposed to having the encoder re-scanning the entire frame. To avoid having to stall CPU/GPU between encoding frames if the input dirty regions come as a result from the GPU timeline, the API supports both CPU and GPU buffer inputs, but initial driver support will be only for CPU input.

Motion vector hints

When the application driving the D3D12 video encoder has knowledge about the motion of the content being encoded, such as, for example, a scroll in a shared screen with an open document or when the app rendering the content knows the exact motion vectors, it can now feed motion vector hints to be used by the encoder to help accelerate the motion search process and help guide the encoder to produce higher quality results. Similarly to dirty rects, the API also supports both GPU and CPU inputs for these motion hints, however initial driver support will only be for CPU inputs.

Enhanced frame statistics

Three new optional stats can be collected at the end of the ResolveEncoderOutputMetadata execution in the GPU: Quantization parameters (QPMap) utilized per block, Sum of absolute transformed differences (SATD) per block and rate-control bit allocations per block. These stats are provided as ID3D12Resource GPU textures with per block information that can be accessed in the GPU timeline.

These new stats give more information to the application, which in turn would be able to tweak future frame parameters in a closed-loop feedback context. For example, by analyzing the SATD and the QP map, the application can identify blocks with high or low distortion and dynamically adjust these regions of interest in future frames (e.g. using delta QP), while using the per block bits used as guide not to go above the expected bitrate usage. Similarly, by analyzing the used bits per block, the app can identify regions that are consuming a disproportionate amount of bandwidth that may not be of that much interest and dynamically adjust the QP map into those regions for future frames, improving the bitrate.

HEVC 4:2:2/4:4:4 profiles support

The D3D12 Encoding API has been extended to support the HEVC profiles for 4:2:2 and 4:4:4 subsampling formats and different color depths.

Readable DPB reconstructed pictures

Until now, the reconstructed pictures stored in D3D12 required the ID3D12Resource containing them to have set the D3D12_RESOURCE_FLAG_VIDEO_ENCODE_REFERENCE_ONLY flag, restricting access to them. Starting today, this is no longer a mandatory restriction, and we allow the IHV drivers to optionally support regular ID3D12Resource textures for DPB resources. This is useful for applications in scenarios where they’d like to preview the video being encoded in real-time or calculate statistics from the reconstructed pictures without having to re-decode the compressed bitstream.

Input ID3D12Resource for QPMap

Until now, the QPMap input provided to the encoder needed to be passed as a CPU array. From now on, D3D12 encoder can also accept the QPMap input as ID3D12Resource GPU textures, when the driver supports this. This avoids having to stall CPU/GPU between encoding frames if adjusting the QPMap based on output frame statistics or other statistics coming from the GPU pipeline.

Supported OS & Hardware

This section will specify the IHV drivers & hardware supporting these features. These will be included in the Agility SDK 1.716.0-preview release.

Please note that the “Subregion Notifications” feature requires Windows 11, version 24H2 or later. Additionally, the latest updates are required from Windows Update for certain hardware due to essential fixes included for this feature.

NVIDIA

HEVC Feature	Supported platforms
Subregion notifications	NVIDIA will fully support this SDK release, please contact your developer relations representative for specifics.
Dirty regions	Ampere and newer
Motion vectors hints	Pascal and newer
Per block output stats: SATD, bits usage and QP	Ada and newer
444 input texture support (DXGI formats AYUV, YUY2,Y210,Y410)	NVIDIA will fully support this SDK release, please contact your developer relations representative for specifics.
Readable DPB reconpic (NV12 only)	Pascal and newer
Input QPMap as GPU texture	Pascal and newer

AMD

H264 Feature	Supported platforms
Subregion notifications	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs
Input QPMap as CPU/GPU texture	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs
Dirty rects (Repeat frame, CPU input)	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs
Motion vectors (Full search, CPU input)	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs

HEVC Feature	Supported platforms
Subregion notifications	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs
Dirty rects (Repeat frame, CPU input)	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs
Motion vectors (Full search, CPU input)	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs

AV1 Feature	Supported platforms
Subregion notifications	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs
Dirty rects (Repeat frame, CPU input)	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs
Motion vectors (Full search, CPU input)	RX 7×00 7X4xHS Series Laptop APUs 8X40 Series Laptop APUs

Intel

For Intel drivers, please contact your developer representative.

Agility SDK 1.716.0-preview: New D3D12 Video Encode Features

New feature list

Subregion notifications

Dirty regions

Motion vector hints

Enhanced frame statistics

HEVC 4:2:2/4:4:4 profiles support

Readable DPB reconstructed pictures

Input ID3D12Resource for QPMap

Supported OS & Hardware

NVIDIA

AMD

Intel

Author

0 comments

Leave a commentCancel reply

Read next

Agility SDK 1.716.0-preview: Application Specific Driver State

Agility SDK 1.716.0-preview: Recreate At GPUVA

New feature list

Subregion notifications

Dirty regions

Motion vector hints

Enhanced frame statistics

HEVC 4:2:2/4:4:4 profiles support

Readable DPB reconstructed pictures

Input ID3D12Resource for QPMap

Supported OS & Hardware

NVIDIA

AMD

Intel

Author

0 comments

Leave a commentCancel reply

Read next

Agility SDK 1.716.0-preview: Application Specific Driver State

Agility SDK 1.716.0-preview: Recreate At GPUVA

Stay informed