Today DirectX 12 provides APIs to support GPU video encode acceleration for several applications, as detailed in D3D12 Video Encoding – Windows drivers | Microsoft Learn previous blog posts such as Announcing new DirectX 12 feature – Video Encoding! – DirectX Developer Blog.
In this blog post we’re happy to announce a series of new features included in the Agility SDK 1.716.0-preview that provide more control to apps using the D3D12 Video Encode API. These new features help reduce latency and improve quality in several scenarios.
New feature list
- Subregion notifications: Slice/tile partial encoding and async completion signaling
- Dirty regions: Configurable skip encoding for frame regions
- Motion vector hints provided externally to the encoder
- Enhanced frame/block statistics per encoded frame
- HEVC 4:2:2/4:4:4 profiles support
- Readable DPB reconstructed pictures
- Input ID3D12Resource QP Map
Let’s take a look at each of the features. Except for the HEVC additions, the rest of them are codec-agnostic features, meaning that the interfaces are defined without using any specific codec structures. For a detailed description of the features and their interfaces, please refer to the new video specs uploaded to DirectX-Specs | Engineering specs for DirectX features.
Subregion notifications
When encoding a frame and requesting it to be partitioned into multiple slices or tiles, until now the apps had to wait for the commands to be finished executing in the GPU, before being able to access the compressed bitstream buffer for the entire frame containing all slices.
By introducing the subregion notifications feature, it is now possible to execute the EncodeFrame command but split each slice/tile in different ID3D12Resource objects (or a single one suballocated buffer) containing each subregion bitstream and waiting for completion on independent ID3D12Fence objects, one for each of the frame subregions. The full frame metadata is still reported at the end of the frame encoding in ResolveEncoderOutputMetadata as usual, but the subregion offsets/sizes are reported asynchronously during EncodeFrame as their completion is signaled, allowing the apps to start consuming these compressed bitstream buffers asynchronously while the rest of the subregions are still being encoded. The latter is useful in helping reduce latency in scenarios such as streaming, where subregions now can be sent over the network while the rest of the subregions are still being encoded.
Dirty regions
In scenarios where most regions of a video don’t change between consecutive frames (e.g. screen sharing), and the app knows the regions where small change occurs (the dirty regions), now it will be possible to feed this information to the D3D12 Video Encoder, so the encoder can “skip” regions that didn’t change accelerating the encode operation. This helps improve encoding speed as opposed to having the encoder re-scanning the entire frame. To avoid having to stall CPU/GPU between encoding frames if the input dirty regions come as a result from the GPU timeline, the API supports both CPU and GPU buffer inputs, but initial driver support will be only for CPU input.
Motion vector hints
When the application driving the D3D12 video encoder has knowledge about the motion of the content being encoded, such as, for example, a scroll in a shared screen with an open document or when the app rendering the content knows the exact motion vectors, it can now feed motion vector hints to be used by the encoder to help accelerate the motion search process and help guide the encoder to produce higher quality results. Similarly to dirty rects, the API also supports both GPU and CPU inputs for these motion hints, however initial driver support will only be for CPU inputs.
Enhanced frame statistics
Three new optional stats can be collected at the end of the ResolveEncoderOutputMetadata execution in the GPU: Quantization parameters (QPMap) utilized per block, Sum of absolute transformed differences (SATD) per block and rate-control bit allocations per block. These stats are provided as ID3D12Resource GPU textures with per block information that can be accessed in the GPU timeline.
These new stats give more information to the application, which in turn would be able to tweak future frame parameters in a closed-loop feedback context. For example, by analyzing the SATD and the QP map, the application can identify blocks with high or low distortion and dynamically adjust these regions of interest in future frames (e.g. using delta QP), while using the per block bits used as guide not to go above the expected bitrate usage. Similarly, by analyzing the used bits per block, the app can identify regions that are consuming a disproportionate amount of bandwidth that may not be of that much interest and dynamically adjust the QP map into those regions for future frames, improving the bitrate.
HEVC 4:2:2/4:4:4 profiles support
The D3D12 Encoding API has been extended to support the HEVC profiles for 4:2:2 and 4:4:4 subsampling formats and different color depths.
Readable DPB reconstructed pictures
Until now, the reconstructed pictures stored in D3D12 required the ID3D12Resource containing them to have set the D3D12_RESOURCE_FLAG_VIDEO_ENCODE_REFERENCE_ONLY flag, restricting access to them. Starting today, this is no longer a mandatory restriction, and we allow the IHV drivers to optionally support regular ID3D12Resource textures for DPB resources. This is useful for applications in scenarios where they’d like to preview the video being encoded in real-time or calculate statistics from the reconstructed pictures without having to re-decode the compressed bitstream.
Input ID3D12Resource for QPMap
Until now, the QPMap input provided to the encoder needed to be passed as a CPU array. From now on, D3D12 encoder can also accept the QPMap input as ID3D12Resource GPU textures, when the driver supports this. This avoids having to stall CPU/GPU between encoding frames if adjusting the QPMap based on output frame statistics or other statistics coming from the GPU pipeline.
Supported OS & Hardware
This section will specify the IHV drivers & hardware supporting these features. These will be included in the Agility SDK 1.716.0-preview release.
Please note that the “Subregion Notifications” feature requires Windows 11, version 24H2 or later. Additionally, the latest updates are required from Windows Update for certain hardware due to essential fixes included for this feature.
NVIDIA
HEVC Feature | Supported platforms |
Subregion notifications |
|
Dirty regions |
|
Motion vectors hints |
|
Per block output stats: SATD, bits usage and QP |
|
444 input texture support (DXGI formats AYUV, YUY2,Y210,Y410) |
|
Readable DPB reconpic (NV12 only) |
|
Input QPMap as GPU texture |
|
AMD
H264 Feature | Supported platforms |
Subregion notifications |
|
Input QPMap as CPU/GPU texture |
|
Dirty rects (Repeat frame, CPU input) |
|
Motion vectors (Full search, CPU input) |
|
HEVC Feature | Supported platforms |
Subregion notifications |
|
Dirty rects (Repeat frame, CPU input) |
|
Motion vectors (Full search, CPU input) |
|
AV1 Feature | Supported platforms |
Subregion notifications |
|
Dirty rects (Repeat frame, CPU input) |
|
Motion vectors (Full search, CPU input) |
|
Intel
For Intel drivers, please contact your developer representative.
0 comments
Be the first to start the discussion.