New in D3D12 - Motion Estimation - DirectX Developer Blog

In the Windows 10 May 2019 Update, codenamed 19H1, D3D12 has added a new Motion Estimation feature to D3D12. Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another. Motion estimation is an essential part of video encoding and can be used in frame rate conversion algorithms. Windows Mixed Reality leverages this feature as part of it’s Motion Reprojection feature as of the latest beta release.

While motion estimation can be implemented with shaders, the purpose of the D3D12 Motion Estimation feature is to expose fixed function acceleration for motion searching to offload this part of the work from 3D. Often this comes in the form of exposing the GPU video encoder motion estimator. The goal of D3D12 Motion estimation is optical flow, but it should be noted that encoder motion estimators may be optimized for improving compression.

Checking for Support To understand the supported block size and resolutions for a given format, use the D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR check with the D3D12_FEATURE_DATA_VIDEO_MOTION_ESTIMATOR struct like the example below. Currently only DXGI_FORMAT_NV12 is supported, so content may need to be color converted and downsampled to use motion estimation:

D3D12_FEATURE_DATA_VIDEO_MOTION_ESTIMATOR MotionEstimatorSupport = {0u, DXGI_FORMAT_NV12};
VERIFY(spVideoDevice->CheckFeatureSupport(D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR, &MotionEstimatorSupport, sizeof(MotionEstimatorSupport)));

The D3D12_FEATURE_DATA_MOTION_ESTIMATOR struct looks like this:

// D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR
typedef struct D3D12_FEATURE_DATA_VIDEO_MOTION_ESTIMATOR
{
    UINT NodeIndex;                                                                 // input
    DXGI_FORMAT InputFormat;                                                        // input
    D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_FLAGS BlockSizeFlags;            // output
    D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_FLAGS PrecisionFlags;             // output
    D3D12_VIDEO_SIZE_RANGE SizeRange;                                               // output
} D3D12_FEATURE_DATA_VIDEO_MOTION_ESTIMATOR;

Creating the Motion Estimator The Video Motion Estimator is a driver state object for performing the motion estimation operation. The selected block size, precision, and supported size range would depend on values supported by hardware returned from the D3D12_FEATURE_VIDEO_MOTION_ESTIMATOR feature check. You can select a smaller size range than the driver supports. Size range informs internal allocation sizes.

D3D12_VIDEO_MOTION_ESTIMATOR_DESC motionEstimatorDesc = { 
    0, //NodeIndex
    DXGI_FORMAT_NV12, 
    D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_16X16,
    D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_QUARTER_PEL, 
    {1920, 1080, 1280, 720} // D3D12_VIDEO_SIZE_RANGE
    }; 

CComPtr<ID3D12VideoMotionEstimator> spVideoMotionEstimator;
VERIFY_SUCCEEDED(spVideoDevice->CreateVideoMotionEstimator(
    &motionEstimatorDesc, 
    nullptr,
    IID_PPV_ARGS(&spVideoMotionEstimator)));

Creating the Motion Vector Output A Motion Vector Heap is used as a hardware dependent output for motion estimation operations. Then, a resolve operation translates those results into an API defined format in a standard 2D texture. The resolved output 2D texture is a DXGI_FORMAT_R16G16_SINT texture where R holds the horizontal component and G holds the vertical component of the motion vector. This texture is sized to hold one pair of components per block.

D3D12_VIDEO_MOTION_VECTOR_HEAP_DESC MotionVectorHeapDesc = { 
    0, // NodeIndex 
    DXGI_FORMAT_NV12, 
    D3D12_VIDEO_MOTION_ESTIMATOR_SEARCH_BLOCK_SIZE_16X16,
    D3D12_VIDEO_MOTION_ESTIMATOR_VECTOR_PRECISION_QUARTER_PEL, 
    {1920, 1080, 1280, 720} // D3D12_VIDEO_SIZE_RANGE
    }; 

CComPtr<ID3D12VideoMotionVectorHeap> spVideoMotionVectorHeap;
VERIFY_SUCCEEDED(spVideoDevice->CreateVideoMotionVectorHeap(
    &MotionVectorHeapDesc, 
    nullptr, 
    IID_PPV_ARGS(&spVideoMotionVectorHeap)));

CD3DX12_RESOURCE_DESC resolvedMotionVectorDesc =
    CD3DX12_RESOURCE_DESC::Tex2D(
        DXGI_FORMAT_R16G16_SINT, 
        Align(1920, 16) / 16, // This example uses a 16x16 block size. Pixel width and height
        Align(1080, 16) / 16, // are adjusted to store the vectors for those blocks.
        1, // ArraySize
        1  // MipLevels
        );

    ATL::CComPtr< ID3D12Resource > spResolvedMotionVectors;
    VERIFY_SUCCEEDED(pDevice->CreateCommittedResource(
        &Properties,
        D3D12_HEAP_FLAG_NONE,
        &resolvedMotionVectorDesc,
        D3D12_RESOURCE_STATE_COMMON,
        nullptr,
        IID_PPV_ARGS(&spResolvedMotionVectors)));

Performing the Motion Search The example below executes the motion search and resolves the motion vectors to the 2D texture with D3D12_COMMAND_LIST_TYPE_VIDEO_ENCODE. D3D12 Resources used as input to Estimate Motion must be in the ENCODE_READ state and the resource written to by ResolveMotionVectorHeap must be in the ENCODE_WRITE state.

const D3D12_VIDEO_MOTION_ESTIMATOR_OUTPUT outputArgs = {spVideoMotionVectorHeap};

const D3D12_VIDEO_MOTION_ESTIMATOR_INPUT inputArgs = {
    spCurrentResource,
    0,
    spReferenceResource,
    0,
    nullptr // pHintMotionVectorHeap
    };

spCommandList->EstimateMotion(spVideoMotionEstimator, &outputArgs, &inputArgs);

const D3D12_RESOLVE_VIDEO_MOTION_VECTOR_HEAP_OUTPUT outputArgs = { 
    spResolvedMotionVectors,
    {}};

const D3D12_RESOLVE_VIDEO_MOTION_VECTOR_HEAP_INPUT inputArgs = {
    spVideoMotionVectorHeap,
    1920,
    1080
    };

spCommandList->ResolveMotionVectorHeap(&outputArgs, &inputArgs);
        
VERIFY(spCommandList->Close());

// Execute Commandlist.
ID3D12CommandList *ppCommandLists[1] = { spCommandList.p };
spCommandQueue->ExecuteCommandLists(1, ppCommandLists);