Moving Gears to Tier 2 Variable Rate Shading
The team saw similarly large perf gains from VRS Tier 2 – up to 14%! – this time with no noticeable visual impact. See for yourself if you can tell which side of the first image in the blog has VRS enabled to get a perf boost, and which side doesn’t.
Because of the unprecedented alignment between PC and Xbox with DirectX 12 Ultimate, The Coalition could bring their implementation to both console and PC with ease. That’s right: their VRS Tier 2 implementation runs on the full range of DirectX 12 Ultimate-capable devices, from the Xbox Series X|S to supported AMD and NVIDIA cards on PC!
Here’s another excellent guest blog from The Coalition, where Chris Wallis shares implementation details, performance data, awesome screenshots and a useful section for developers evaluating whether or not to bring VRS to their engines.
Moving Gears to Tier 2 Variable Rate Shading
Chris Wallis, Senior Software Engineer at The Coalition
Tier 2 VRS allowed Gears 5/Tactics to see an up to 14% boost in GPU perf with no perceptible impact to visual quality. It is available on all hardware supporting DirectX 12 Ultimate, including Xbox Series X|S, AMD Radeon™ RX 6000 Series graphics cards, and NVIDIA GeForce RTX 20 Series and 30 Series GPUs.
VRS is enabled on the left and disabled on the right.
Moving to Tier 2
The Xbox Series X|S launch of Gears 5/Tactics and the story DLC Hivebusters added new rendering features including contact shadows and screen space global illumination as well as an emphasis on 60FPS cinematics. While the features had great visual results, they were costly even on high-end GPUs. We investigated ways of keeping 4K and 60 FPS while also maintaining the rich detail of our PC Ultra textures and running these new visual features on both Xbox Series X|S and PC. This led us to revisit some of the VRS work done in Gears Tactics. While the use of Tier 1 VRS in Gears Tactics offered some great performance gains, it had some small compromises to visual quality and didn’t work well with Dynamic Resolution Scaling. As a result, we investigated the extra flexibility allowed in Tier 2 to see if we could solve the Tier 1 shortcomings.
The primary difference between Tier 1 and Tier 2 VRS is granularity. Tier 1 allows you to specify a shading rate per draw. Tier 2 allows you to instead specify the shading rate in a screen space texture. The texture is not 1-1 with the render target but instead specified in coarser VRS tiles of either 8×8 or 16×16 depending on hardware. By analyzing our previous frame’s scene color, this allowed us to output a texture that would apply coarse shading rate only in sections that we’ve determined can reduce shading without causing any perceptible difference. For more details on the VRS API, refer to the VRS announcement.
Tier 1 VRS Visualization on Gears Tactics. Colored regions mark the use of coarser shading rates
Tier 2 VRS Visualization on Gears Tactics. Colored regions mark the use of coarser shading rates
VRS Texture Generation
We generated the VRS texture by running a sobel edge detection compute shader on our final scene color buffer. The VRS Texture is reprojected for use on the next frame as part of a rescale shader described later. The edge detection is run on the luminance of the sRGB color. The use of sRGB ensures that edges are detected based on the perceptual difference of colors. A configurable threshold value is passed to the shader that can adjust how aggressive the edge detection should be and is also the primary knob used for tuning the different VRS quality settings on PC.
Screenshot from the Gears 5 Hivebusters DLC on Xbox Series X:
The relatively simple edge detection filter can find areas of lower frequency detail from a wide variety of passes. These are some common cases the edge detection chooses to reduce the shading rate:
- Reduced visibility due to shadowing/low lighting
- Occlusion due to volumetric fog
- Dense translucent particles (i.e. the waterfall)
By having the edge detection at the end of the frame, it also caught areas blurred out due to post processing effects such as motion blur and depth of field. This is particularly effective in cinematics where a majority of the screen is often blurred due to depth of field.
We learned from Gears Tactics that coarser shading is more noticeable in some passes than others. To handle this, we generated a second conservative VRS texture that passes can opt in to. The conservative VRS texture added no notable overhead because we generated both textures in the same edge detection shader but do a check against a more conservative threshold value when computing the shading rate for the conservative VRS tile. As an example, we found our translucency pass mostly contained low frequency detail textures like water or dust particles that would generally take well to more aggressive amounts of VRS. Techniques like our screen space reflections (SSR) that relied on dithering and temporal accumulation benefitted from a more conservative use of VRS.
We made it a priority to minimize the overhead of the edge detection shader. An unoptimized shader can potentially cause VRS be slower than not using it at all due to the shader overhead. We made several key optimizations that were able to get our VRS texture generation to under .1ms on Xbox Series X|S and DX12 Ultimate GPUs:
Skip edge detection on the borders of a VRS tile
The Variable Rate Shading spec specifies that coarse shading will never straddle the edge of a VRS tile. As a result, we skipped running edge detection on the outside boundaries of a VRS tile. To use an 8×8 tile size as an example, this reduces the number of pixels requiring edge detection from 64 -> 36.
Merge the VRS texture generation to be part of tonemapping
Our first iterations had the edge detection run as a standalone compute shader at the end of post processing. However, at 4K resolution this introduced a bandwidth bottleneck due to the need to read in the whole scene color buffer. We moved the VRS texture generation to be part of our tonemapping shader, the last shader in our post processing, removing a roundtrip of memory for the color buffer.
Running the VRS texture generation on the Async Compute Queue.
Since VRS texture generation is run as the last step of post processing, anything in the next frame leading up to the first pass that uses the VRS texture (the base pass in our case) is a possible candidate for async compute overlap. We had already done work to move the post processing chain to overlap with the next frame’s depth pass and so this allowed an easy optimization with minimal changes.
VRS for different rendering passes
This is a list of passes that we applied VRS to:
- Base Pass: Renders all opaque meshes
- Screen Space Ambient Occlusion: Use screen-space information to approximate areas that should receive less light due to occlusion.
- Lighting: Calculate lighting for all light sources on the visible opaque meshes.
- Screen Space Global Illumination: Use screen-space information to calculate bounced lighting.
- Screen Space Reflections: Use screen-space information for creating reflections.
- SSR Temporal AA: Anti-aliasing for the results of the Screen Space Reflection pass.
- Translucency: Renders all translucent meshes.
Many of the above match the same passes we applied VRS to in Gears Tactics with our Tier 1 implementation, but one interesting pass to call out is Translucency. Tier 1 VRS applied to Translucency caused artifacts too severe to apply VRS due to the reliance on translucency for some UI effects. However, with the extra control enabled in Tier 2, we were able to bring back VRS to Translucency. Tier 2 VRS ensured UI elements in the translucency pass maintain their crispness while particles, like dust are good candidates for coarse shading.
Screen Space Global Illumination (SSGI) is a unique use-case for VRS because it is done via a compute shader. VRS is a rasterization feature so it cannot be natively applied to compute shaders. Instead we were able to emulate VRS behavior in a compute shader because the VRS texture can be read from as an SRV. Screen Space Global Illumination is a costly GPU pass and global illumination results tends to take well to being composited at lower resolutions so applying VRS seemed like a good fit. The VRS emulation works by eliminating threads in a threadgroup based on the shading rate. The remaining threadgroups then expand their coverage to fill in for the terminated. One caveat was that SSGI requires a denoiser pass, and VRS can amplify the noise since it effectively reduces the amount of samples being taken. To handle this, we feed the VRS texture into the denoiser which uses the shading rate to help weight the final blur.
Output of SSGI:
Working with Dynamic Resolution Scaling
Gears 5/Tactics leverage Dynamic Resolution Scaling to ensure it hits a smooth 60 FPS. If we detect we’re nearly over budget, Dynamic Resolution Scaling kicks in and renders the next frame at a lower resolution to ensure a frame isn’t dropped. We also leveraged Unreal Engine’s temporal upscaling to run post processing at full resolution–even if Dynamic Resolution is downscaling, which keeps a high-quality final image. However, this causes a problem since the VRS texture generation is run at full resolution but then could potentially need to be applied at a lower resolution. To resolve this, we ran a compute shader that rescaled the VRS texture to correct for dynamic resolution. Because the VRS texture is significantly smaller than the full resolution buffer, the GPU cost of this rescale ended up being very fast (0.02ms on Xbox Series X|S).
Variable Rate Shading and Dynamic Resolution Scaling are both powerful techniques with different strengths and weaknesses. Dynamic Resolution Scaling allows a scaling of resolution in the form of a percentage that can be dialed up and down at a pixel level to ensure the targeted frame rate is maintained while keeping the GPU fully utilized. The weakness, however, is that scaling down resolution must be done on the entire render target resulting in a global reduction of resolution. Tier 2 Variable Rate Shading is a complete flip of Dynamic resolution. Reduction in resolution is discretely controlled via the small handful of allowed shading rates, but in exchange is flexible in what parts of the render target are affected.
We found our approach allowed us to play to the strengths of both Dynamic Resolution Scaling and Variable Rate Shading. VRS takes a first stab at applying coarse shading based on the edge detection results. Next frame, Dynamic Resolution Scaling looks at the total GPU frame time with the VRS savings being factored in and adjusts the scaling if needed. As an example, VRS applied to the real-time cinematics on the Xbox Series X allowed for dynamic resolution to run an average of 10% higher, and in the best cases, removed the need for any downscaling altogether.
For PC we allowed VRS to be tuned with 3 different video settings: Quality, Balanced, and Performance. Quality matches what is used by default on Xbox Series X|S and targets no perceptual impact. Balanced similarly targets a minimal amount of perceptual difference, but under scrutiny may show some differences in favor of extra performance. Performance is an aggressive use of VRS that makes some visible compromises but gets back the most performance.
The previous post on VRS performance on Gears Tactics focused on performance on NVidia Turing hardware, so this time we will be looking at latest AMD’s RDNA2 cards instead. However, we’d like to note that we saw similar performance scaling across both AMD and NVidia hardware.
The below results were taken on an AMD 6900 XT at 4K resolution with all graphics settings set to Ultra:
|Frametime (ms)||Savings (ms)||Savings (%)|
To push the AMD 6900 XT further, we ran another test at 4K resolution with all settings set to Insane and with Screen Space Global Illumination on:
|Frametime (ms)||Savings (ms)||Savings (%)|
Comparison of shading rate usage in Quality vs Balanced vs Performance
|Rendering Pass||Total Cost (ms)||Quality Savings (ms)||Balanced Savings (ms)||Performance Savings (ms)|
|Screen Space Ambient Occlusion||2.13||0.94||1||1.17|
|Screen Space Global Illumination*||3||–||–||0.64|
|Screen Space Reflections||2.67||1.27||1.27||1.49|
Is it worth implementing Tier 2 VRS for my game?
Every engine is different and not all games will benefit equally from VRS. There are 2 things to keep in mind when evaluating VRS:
- VRS is an optimization that reduces the amount of pixel shader invocations. As such, it will only see improvement on games that are GPU bound due to pixel shader work.
- Tier 2 VRS sees higher performance gains when running at higher resolutions. While actual results will vary based on engine and content, we found that resolutions of 1080p or lower saw generally saw diminishing returns from Tier 2 VRS.
One of the perks of the VRS API the ease of integration. By using Tier 1 VRS and adding RSSetShading to the start of all command lists to set the shading rate to 2×2, you can quickly get a sense of the upper bound of the performance gain from VRS. We recommend taking 30-50% of the savings as an estimate of what you’d expect to get back from a proper Tier 2 implementation. It’s also important to look only at the savings of individual passes rather than the whole frame time, ignoring passes that Tier 2 VRS might not apply to. For example, our Tier 2 VRS texture couldn’t be used with a shadow pass since it’s generated from the point of view of the player camera, not the light.
While we were able to implement VRS for all the passes that gave us the biggest bang for the buck, it was not plumbed into the entire engine due to time constraints. A deeper integration would allow VRS to provide even larger GPU savings.
“Software-Based Variable Rate Shading in Call of Duty” presented at SIGGRAPH 2020 (http://advances.realtimerendering.com/s2020/index.htm) has some interesting thoughts on this topic as well. They present a method leveraging how console hardware handles MSAA to emulate VRS on platforms without hardware VRS support and extra flexibility such as smaller tile sizes. In addition, they present an optimized way to apply VRS to compute shaders that uses ExecuteDispatchIndirect to ensure only waves with actual work are dispatched in contrast to our brute force method. However, Software-Based VRS also has some trade-offs including implementation complexity and the overhead of a de-blocking pass. One possibility is to use a hybrid of both techniques, switching between VRS techniques based on the characteristics of the rendering pass.
Tier 2 VRS allows for a free boost in performance with minimal visual impact. As we see more adoption of 120+ FPS and higher fidelity effects, it’s become increasingly important that we spend our GPU budget in all the right places, making Tier 2 VRS a welcome tool to help tackle the next generation of rendering.