Agility SDK 1.610.3: Updated RenderPasses and minor Vulkan compatibility improvement

Contents:
Updated RenderPasses
Minor Vulkan compatibility improvements
PIX support for these features

Updated RenderPasses

Background

RenderPasses are an option for apps to organize rendering commands to be friendly to GPUs that need tile-based rendering to perform efficiently. Ideally, these GPUs can internally loop sequences of commands from the app in order to render surfaces a tile (sub-region) at a time. The tile size is a hidden implementation detail, and data for the current tile sits in small, fast memory close to the GPU’s processors.

The RenderPass support that originally shipped soon after D3D12 launched made an aggressive attempt at being as lenient as possible on applications so adoption would be easy. It turned out, however, that GPUs that might have taken advantage of the original RenderPasses never gained traction. The RenderPasses APIs were never worth using.

Moving to the present, there are now GPUs supported by Windows that could benefit from applications using RenderPass APIs. These are in devices like Windows Dev Kit 2023 that are built around Qualcomm SoCs.

Unfortunately, the aggressive simplification in the original D3D12 RenderPasses design made it impractical for these GPUs to see any performance benefits. The design didn’t let the app explicitly indicate when a given rendering pass is safe for the GPU to process in a tile loop in on chip memory. It is exceedingly difficult for the driver to figure out what is happening, so potential performance wins are left on the table.

What’s new

This AgilitySDK release extends the existing RenderPass design, adding ways for apps to indicate more precisely what assumptions drivers can make about a given pass. Basically, this is about helping drivers understand situations where the GPU can use tiling because the sequence rendering operations don’t depend on anything about the contents of neighboring tiles.

Details are in the RenderPasses spec. In particular, see the various flags with PRESERVE_LOCAL in their name. In short, these new flags let the app tell the driver things like: “it is safe to keep what is sitting in tile memory from the previous pass there for the next pass”. As a result, drivers know when they can easily chain multiple passes together into a larger loop over tiles.

For systems that don’t care about RenderPasses, the D3D runtime simply converts the RenderPass APIs into the equivalent non-RenderPass form, such as binding render targets for the pass via OMSetRenderTargets(). Therefore, if a developer is willing to use the RenderPass APIs with this AgilitySDK, their app can use one code path regardless of GPU or driver.

In early testing, the types of performance wins on GPUs that care are in the 5% to 18% range for portions of rendering that involve rasterization and can be tiled.

As of this post, a public driver from Qualcomm that takes advantage of the RenderPass APIs is not quite ready. Some developers have private access to a driver, and for them this release is useful. For other developers this feature is probably not interesting yet. That said, the APIs still function with emulation on any GPU/driver. If/when a public driver that takes advantage of RenderPasses is available it will be listed here.

Minor Vulkan compatibility improvements

This release contains support for the features in the Vulkan compatibility features specification, which are mainly just relaxations of specification wording and corresponding validation to address corner-case mismatches between D3D and Vulkan.

PIX support for these features

A private preview of PIX is available with support for the new RenderPasses and Vulkan compatibility improvements. Please contact the PIX team (askwinpix@microsoft.com) if you would like this preview. A public version of PIX with support for these features will be released in the coming weeks.