DirectML ❤ Stable Diffusion
Text-to-image models are amazing tools that can transform natural language into stunning images. Stable Diffusion is particularly interesting: the base model can create images from text and, since it’s open-source, developers can customize it for their own needs and preferences. With some extra training, developers can fine-tune their model to generate images of any domain, subject, or style they want.
We are demonstrating what can be done with Stable Diffusion models in two of our Build sessions tomorrow: Shaping the future of work with AI and Deliver AI-powered experiences across cloud and edge, with Windows.
We’ve optimized DirectML to accelerate transformer and diffusion models, like Stable Diffusion, so that they run even better across the Windows hardware ecosystem. Our goal is to enable developers to infuse apps with AI hardware acceleration at scale. For more on how Stable Diffusion lights up on our partners’ hardware with DML, check out:
- AMD: https://gpuopen.com/amd-microsoft-directml-stable-diffusion/
- NVIDIA: https://blogs.nvidia.com/blog/2023/05/23/microsoft-build-nvidia-ai-windows-rtx
See our Drivers section and Python sample if you want to get started right away.
Getting the best performance with DirectML
We worked closely with the Olive team to build a powerful optimization tool that leverages DirectML to produce models that are optimized to run across the Windows ecosystem. For more on Olive with DirectML, check out our post, Optimize DirectML performance with Olive
You can use Olive to ensure your Stable Diffusion model works as well as possible with DirectML. Make sure your model is in the ONNX format; you can use Olive to do this conversion. Once you’ve done this, follow the steps in our DML and Olive blog post
See here for a sample that shows how to optimize a Stable Diffusion model. We’ve tested this with CompVis/stable-diffusion-v1-4 and runwayml/stable-diffusion-v1-5. Stable Diffusion models with different checkpoints and/or weights but the same architecture and layers as these models will work well with Olive.
DirectML in action
Check out tomorrow’s Build Breakout Session to see Stable Diffusion in action: Deliver AI-powered experiences across cloud and edge, with Windows
See here for a Python sample showing how to use Stable Diffusion with Olive.
We also built some samples to show how you can use DirectML in general in C++. For more links to help you get started, check out our documentation and helpful links page.
We recommend upgrading to the latest drivers for the best performance.
AMD: AMD plans to release optimized graphics drivers in the next month supporting AMD RDNA™ 3 devices including AMD Radeon™ RX 7900 Series graphics cards and AMD Ryzen™ 7040 Series Mobile processors with Radeon™ graphics.
Intel: Developers interested in Intel drivers supporting Stable Diffusion on DirectML should contact Intel Developer Relations for additional details
NVIDIA: Users of NVIDIA GeForce RTX 30 Series and 40 Series GPUs, can see these improvements first hand, in GeForce Game Ready Driver 532.03
Get hold of us
The AI space is changing fast! In case you run into any problems feel free to open an issue on our Github repo or email us at firstname.lastname@example.org
“Users of NVIDIA GeForce RTX 30 Series and 40 Series GPUs, can see these improvements first hand, in GeForce Game Ready Driver 532.03”
What about RTX 20 series? They also have tensor cores and are RTX GPUs. Nvidia mentions RTX GPUs, so the editor of this article probably forgot that.
This is very good news indeed. I am working on an entirely C++/WinRT / WinUI based Stable Diffusion app, which has no dependency on python and uses DirectML + ONNX already (it is available here: https://github.com/axodox/native-diffusion, you can build and run the Unpaint project easily with just VS). Will make sure to give this a go during the weekend.
Edit: could not wait till weekend, gave it a go, I see massive improvement in my app, inference time with 25 steps is much faster than with 15 steps before (even in debug mode), my VRAM usage got halved as well. Great work!