DirectStorage is coming to PC

Andrew Yeung - MSFT

Andrew

Earlier this year, Microsoft showed the world how the Xbox Series X, with its portfolio of technology innovations, will introduce a new era of no-compromise gameplay. Alongside the actual console announcements, we unveiled the Xbox Velocity Architecture, a key part of how the Xbox Series X will deliver next generation gaming experiences.

We’re excited to bring DirectStorage, an API in the DirectX family originally designed for the Velocity Architecture to Windows PCs!  DirectStorage will bring best-in-class IO tech to both PC and console just as DirectX 12 Ultimate does with rendering tech. With a DirectStorage capable PC and a DirectStorage enabled game, you can look forward to vastly reduced load times and virtual worlds that are more expansive and detailed than ever.

In this blog post, we’re going to give gaming enthusiasts more details on how it’s going to work and how it will revolutionize PC gaming.

 

The evolution of storage technologies and game IO patterns

Recent advancements in SSD and PCIe technologies, specifically NVMe technologies, allow gaming PCs to have storage solutions that deliver far more bandwidth than was ever possible with older hard drive technologies. Instead of tens of megabytes per second, drives like the upcoming Xbox Series X console’s custom NVMe can deliver a blazing-fast multiple gigabytes per second.

Game workloads have also evolved. Modern games load in much more data than older ones and are smarter about how they load this data. These data loading optimizations are necessary for this larger amount of data to fit into shared memory/GPU accessible memory. Instead of loading large chunks at a time with very few IO requests, games now break assets like textures down into smaller pieces, only loading in the pieces that are needed for the current scene being rendered. This approach is much more memory efficient and can deliver better looking scenes, though it does generate many more IO requests.

Unfortunately, current storage APIs were not optimized for this high number of IO requests, preventing them from scaling up to these higher NVMe bandwidths creating bottlenecks that limit what games can do. Even with super-fast PC hardware and an NVMe drive, games using the existing APIs will be unable to fully saturate the IO pipeline leaving precious bandwidth on the table.

That’s where DirectStorage for PC comes in. This API is the response to an evolving storage and IO landscape in PC gaming. DirectStorage will be supported on certain systems with NVMe drives and work to bring your gaming experience to the next level. If your system doesn’t support DirectStorage, don’t fret; games will continue to work just as well as they always have.

 

What exactly will DirectStorage do for my PC gaming experience and how?

There are two primary areas this new API is going to improve: reducing frustratingly long load times of the past and enabling games to be more detailed and expansive than ever.

Although seemingly different, both benefits stem from the same IO system advancements that DirectStorage brings. Whether it’s the textures of your characters clothing, or the details of the mountains off in the distance, both fundamentally involve the loading of data from a storage device which needs to eventually get to the GPU. The former just happens while on a loading screen whereas the latter happens as you walk through an open world game that loads in the distant scenery coming into view in real time while dumping things that drop out of view.

In either case, previous gen games had an asset streaming budget on the order of 50MB/s which even at smaller 64k block sizes (ie. one texture tile) amounts to only hundreds of IO requests per second. With multi-gigabyte a second capable NVMe drives, to take advantage of the full bandwidth, this quickly explodes to tens of thousands of IO requests a second. Taking the Series X’s 2.4GB/s capable drive and the same 64k block sizes as an example, that amounts to >35,000 IO requests per second to saturate it.

Existing APIs require the application to manage and handle each of these requests one at a time first by submitting the request, waiting for it to complete, and then handling its completion. The overhead of each request is not very large and wasn’t a choke point for older games running on slower hard drives, but multiplied tens of thousands of times per second, IO overhead can quickly become too expensive preventing games from being able to take advantage of the increased NVMe drive bandwidths.

On top of that, many of these assets are compressed. In order to be used by the CPU or GPU, they must first be decompressed. A game can pull as much data off the disk as it wants, but you still need an efficient way to decompress and get it to the GPU for rendering. By using DirectStorage, your games are able to leverage the best current and upcoming decompression technologies.

In a world where a game knows it needs to load and decompress thousands of blocks for the next frame, the one-at-a-time model results in loss of efficiency at various points in the data block’s journey. The DirectStorage API is architected in a way that takes all this into account and maximizes performance throughout the entire pipeline from NVMe drive all the way to the GPU.

It does this in several ways: by reducing per-request NVMe overhead, enabling batched many-at-a-time parallel IO requests which can be efficiently fed to the GPU, and giving games finer grain control over when they get notified of IO request completion instead of having to react to every tiny IO completion.

In this way, developers are given an extremely efficient way to submit/handle many orders of magnitude more IO requests than ever before ultimately minimizing the time you wait to get in game, and bringing you larger, more detailed virtual worlds that load in as fast as your game character can move through it.

 

Why NVMe?

NVMe devices are not only extremely high bandwidth SSD based devices, but they also have hardware data access pipes called NVMe queues which are particularly suited to gaming workloads. To get data off the drive, an OS submits a request to the drive and data is delivered to the app via these queues. An NVMe device can have multiple queues and each queue can contain many requests at a time. This is a perfect match to the parallel and batched nature of modern gaming workloads. The DirectStorage programming model essentially gives developers direct control over that highly optimized hardware.

In addition, existing storage APIs also incur a lot of ‘extra steps’ between an application making an IO request and the request being fulfilled by the storage device, resulting in unnecessary request overhead. These extra steps can be things like data transformations needed during certain parts of normal IO operation. However, these steps aren’t required for every IO request on every NVMe drive on every gaming machine. With a supported NVMe drive and properly configured gaming machine, DirectStorage will be able to detect up front that these extra steps are not required and skip all the necessary checks/operations making every IO request cheaper to fulfill.

For these reasons, NVMe is the storage technology of choice for DirectStorage and high-performance next generation gaming IO.

 

When can we expect more details?

For every DirectX family feature, Microsoft brings together the best of the PC gaming industry players to standardize new gaming features, make them available to game developers, and eventually get them into your gaming machines.

This process has already begun for DirectStorage and we’re working with our industry partners right now to finish designing/building the API and its supporting components. We’re targeting getting a development preview of DirectStorage into the hands of game developers next year.

24 comments

Leave a comment

  • Ron King
    Ron King

    I suppose my question becomes why wasn’t this change made at the kernel level of Windows years ago when the first SSDs starting hitting the market. I’ve often wondered why modern I/O seems to offer no real world benefit in gaming, but now knowing the pipeline itself was limited, it makes sense.

    • Avatar
      Joris Lamberty

      I would guess, that no big studio invested into the technology, while it was not usable on consoles anyway. But now, where the next console gen is on the way, studios can develop games making use of this technology on all platforms.

      • Avatar
        Andre Ch

        I see; so RTX 30 series already has that feature.

        “AMD ryzen and pcie 4.0 AMD motherboards will sell even more now ”

        This does apply to all storage configurations on X570 boards, but on non-X570 boards, only the SSD connected directly to the CPU can have an end-to-end PCIe 4.0 connection to the video card, since on B550 and A520 boards the chipset itself only provides PCIe 3.0 lanes to peripherals.

    • Avatar
      Gareth Evans

      I checked the storage spaces driver and it is dated June 2006 (and there are no new drivers available). Even the NVMe specific Samsung driver hasn’t been updated since February 2018 (again, no new updates even within Samsung’s Magician software). It strikes me as odd that there are no new updates for so long, SSD’s where a rarity in 2006, and NVMe drives are quite a different beast to those. Yet no updates to the drivers? 🤷‍♂️

  • Avatar
    Gareth Evans

    How does this fit in with the proposed compression on the new consoles (and RTX30?) doubling the effective drive bandwidth with compression (of files that are already optimised) doesn’t seem viable to me, and it begs the obvious question: If you can double the transfer rate of reads, when not compress the data written to disk to double the effective storage capacity?

    PS: Does this mean you’re going to use more than a single queue for Windows Explorer, even with NVMe it’s painfully slow when you have a lot of small files, and doesn’t even begin to stretch its legs.

    Might also be a good time to change the max transfer speed of BITS, the default 10Mb/s doesn’t seem very intelligent these days.

    • Avatar
      Ronny Heuschkel

      The block size on SSDs is usually 4k anyway, so that wouldn’t matter. Cluster size only helps to reduce the size and overhead of the file table, but DirectStorage bypasses that. The GPU issues NVME commands directly to the SSD via PCIe and so has to deal with the 4k block size of the storage device.

      • Avatar
        Dmitry

        Thanks for the reply, but I seriously doubt that GPU manages SSD directly. The slide Nvidia showed yesterday on this one is a bit of generalization. My guess is that RTX IO is based on Nvidia GPUDirect Storage https://developer.nvidia.com/blog/gpudirect-storage/ and it does not read data at HW level on its own but rather provides an interface to CPU (OS) to feed fetched bits into it (using DMA engine). Put it simple – the main point here is to skip reading data into RAM first but direct it to GPU along a shortest path. So OS will keep reading bits from NTFS files but forwards them directly into GPU. Still think that cluster size matters. Also even having a larger cluster size on SSD may be mostly beneficial to reduce write amplification (in case there is some sort of zoned spaces approach used), since a block size usually is about 1-2MB (not 4k, 4k is one of the possible page sizes but data is written onto SSD in blocks), yet having a fewer reads is better. If the average read chunk is about 64k – using 64k cluster gives best results in random reads (to my knowledge).

  • Avatar
    Михаил Гаврилов

    I have a huge game collection (1500 games) I store the games on the large 16TB HDD. So with DirectStorage, you kill this ability to store a lot of games because plus size NVMe is very expensive costs, and games will stop running from the HDD.

  • Avatar
    Ronny Heuschkel

    How would this work if your NVME drive is Bitlocker (software) encrypted? For example with the Nvidia RTX I/O shown yesterday. Is there a defined process of the OS lending the sym key to the GPU so it could do the decrypting itself when reading directly via PCIe with NVME commands from the SSD? Assuming that the hardware block does support AES-XTS.