New in D3D12 – Motion Estimation



In the Windows 10 May 2019 Update, codenamed 19H1, D3D12 has added a new Motion Estimation feature to D3D12. Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another. Motion estimation is an essential part of video encoding and can be used in frame rate conversion algorithms. Windows Mixed Reality leverages this feature as part of it’s Motion Reprojection feature as of the latest beta release.

While motion estimation can be implemented with shaders, the purpose of the D3D12 Motion Estimation feature is to expose fixed function acceleration for motion searching to offload this part of the work from 3D. Often this comes in the form of exposing the GPU video encoder motion estimator. The goal of D3D12 Motion estimation is optical flow, but it should be noted that encoder motion estimators may be optimized for improving compression.

Checking for Support
To understand the supported block size and resolutions for a given format, use the D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR check with the D3D12_FEATURE_DATA_VIDEO_MOTION_ESTIMATOR struct like the example below. Currently only DXGI_FORMAT_NV12 is supported, so content may need to be color converted and downsampled to use motion estimation:

The D3D12_FEATURE_DATA_MOTION_ESTIMATOR struct looks like this:

Creating the Motion Estimator
The Video Motion Estimator is a driver state object for performing the motion estimation operation. The selected block size, precision, and supported size range would depend on values supported by hardware returned from the D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR feature check. You can select a smaller size range than the driver supports. Size range informs internal allocation sizes.

Creating the Motion Vector Output
A Motion Vector Heap is used as a hardware dependent output for motion estimation operations. Then, a resolve operation translates those results into an API defined format in a standard 2D texture. The resolved output 2D texture is a DXGI_FORMAT_R16G16_SINT texture where R holds the horizontal component and G holds the vertical component of the motion vector. This texture is sized to hold one pair of components per block.

Performing the Motion Search
The example below executes the motion search and resolves the motion vectors to the 2D texture with D3D12_COMMAND_LIST_TYPE_VIDEO_ENCODE.  D3D12 Resources used as input to Estimate Motion must be in the ENCODE_READ state and the resource written to by ResolveMotionVectorHeap must be in the ENCODE_WRITE state.


Randy Tidd

Follow Randy   

No Comments.