{"id":592,"date":"2023-11-15T08:00:12","date_gmt":"2023-11-15T16:00:12","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/windowsai\/?p=592"},"modified":"2023-11-16T11:26:33","modified_gmt":"2023-11-16T19:26:33","slug":"directml-llama2","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/windowsai\/directml-llama2\/","title":{"rendered":"Announcing preview support for Llama 2 in DirectML"},"content":{"rendered":"<p>At Inspire this year we <a href=\"https:\/\/blogs.microsoft.com\/blog\/2023\/07\/18\/microsoft-and-meta-expand-their-ai-partnership-with-llama-2-on-azure-and-windows\/\">talked<\/a> about how developers will be able to run Llama 2 on Windows with DirectML and the ONNX Runtime and we&#8217;ve been hard at work to make this a reality.<\/p>\n<p>We now have a sample showing our progress with Llama 2 7B!<\/p>\n<p>See <a href=\"https:\/\/github.com\/microsoft\/Olive\/tree\/main\/examples\/directml\/llama_v2\">https:\/\/github.com\/microsoft\/Olive\/tree\/main\/examples\/directml\/llama_v2<\/a><\/p>\n<p>This sample relies on first doing an optimization pass on the model with <a href=\"https:\/\/github.com\/microsoft\/Olive\/tree\/main\">Olive<\/a>, a powerful optimization tool for ONNX models. Olive utilizes powerful graph fusion optimizations from the ONNX Runtime and a model architecture optimized for DirectML to speed up inference times by up to <strong>10X<\/strong>!<\/p>\n<p>After this optimization pass, Llama 2 7B runs fast enough that you can have a conversation in real time on multiple vendors\u2019 hardware!<\/p>\n<p>We\u2019ve also built a little UI to make it easy to see the optimized model in action.<\/p>\n<p>Thank you to our hardware partners who helped make this happen. For more on how Llama 2 lights up on our partners\u2019 hardware with DirectML, see:<\/p>\n<ul>\n<li><strong>AMD<\/strong>: <a href=\"https:\/\/community.amd.com\/t5\/ai\/how-to-running-optimized-llama2-with-microsoft-directml-on-amd\/ba-p\/645190\">https:\/\/community.amd.com\/t5\/ai\/how-to-running-optimized-llama2-with-microsoft-directml-on-amd\/ba-p\/645190<\/a><\/li>\n<li><strong>Intel<\/strong>: <a href=\"https:\/\/community.intel.com\/t5\/Blogs\/Tech-Innovation\/Artificial-Intelligence-AI\/Intel-and-Microsoft-Collaborate-to-Optimize-DirectML-for-Intel\/post\/1542055\">https:\/\/community.intel.com\/t5\/Blogs\/Tech-Innovation\/Artificial-Intelligence-AI\/Intel-and-Microsoft-Collaborate-to-Optimize-DirectML-for-Intel\/post\/1542055<\/a><\/li>\n<li><strong>NVIDIA<\/strong>: <a href=\"https:\/\/blogs.nvidia.com\/blog\/ignite-rtx-ai-tensorrt-llm-chat-api\/?#directml-llama\">https:\/\/blogs.nvidia.com\/blog\/ignite-rtx-ai-tensorrt-llm-chat-api\/?#directml-llama<\/a><\/li>\n<\/ul>\n<p>We&#8217;re excited about this milestone, but this is only a first peek &#8211; stay tuned for future enhancements to support even larger models, fine-tuning and lower-precision data types.<\/p>\n<h2>Getting Started<\/h2>\n<h5>Requesting Llama 2 access<\/h5>\n<p>To run our Olive optimization pass in our sample you should first <a href=\"https:\/\/ai.meta.com\/resources\/models-and-libraries\/llama-downloads\/\">request access<\/a> to the Llama 2 weights from Meta.<\/p>\n<h4>Drivers<\/h4>\n<p>We recommend upgrading to the latest drivers for the best performance.<\/p>\n<ul>\n<li><strong>AMD<\/strong> has released optimized graphics drivers supporting AMD RDNA\u2122 3 devices including AMD Radeon\u2122 RX 7900 Series graphics cards. Download Adrenalin Edition\u2122 23.11.1 or newer (<a href=\"https:\/\/www.amd.com\/en\/support\">https:\/\/www.amd.com\/en\/support<\/a>)<\/li>\n<li><strong>Intel<\/strong> has released optimized graphics drivers supporting Intel Arc A-Series graphics cards. Download the latest drivers <a href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdownload-center%2Fhome.html&amp;data=05%7C01%7CJacques.Van%40microsoft.com%7C84e0c7d8e4be448341ac08dbe1712b29%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638351648504563317%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=4i4jXcqfbEoN46F7HDfWv9iviii01gy4TknxYZ8EEgU%3D&amp;reserved=0\">here<\/a><\/li>\n<li><strong>NVIDIA:<\/strong> Users of NVIDIA GeForce RTX 20, 30 and 40 Series GPUs, can see these improvements first hand, in <a href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.nvidia.com%2Fdownload%2FdriverResults.aspx%2F216300%2Fen-us%2F&amp;data=05%7C01%7CJacques.Van%40microsoft.com%7C56e54f03f9814a5ad7bf08dbe1741569%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638351660934416564%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=6VbLBNOx79ehPc2nqPuPRHOLXcKn3u83CN5Rq8rBPQ4%3D&amp;reserved=0\">GeForce Game Ready Driver 546.01<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>At Inspire this year we talked about how developers will be able to run Llama 2 on Windows with DirectML and the ONNX Runtime and we&#8217;ve been hard at work to make this a reality. We now have a sample showing our progress with Llama 2 7B! See https:\/\/github.com\/microsoft\/Olive\/tree\/main\/examples\/directml\/llama_v2 This sample relies on first doing [&hellip;]<\/p>\n","protected":false},"author":2237,"featured_media":74,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-592","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-windows-ai"],"acf":[],"blog_post_summary":"<p>At Inspire this year we talked about how developers will be able to run Llama 2 on Windows with DirectML and the ONNX Runtime and we&#8217;ve been hard at work to make this a reality. We now have a sample showing our progress with Llama 2 7B! See https:\/\/github.com\/microsoft\/Olive\/tree\/main\/examples\/directml\/llama_v2 This sample relies on first doing [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/posts\/592","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/users\/2237"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/comments?post=592"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/posts\/592\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/media\/74"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/media?parent=592"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/categories?post=592"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/windowsai\/wp-json\/wp\/v2\/tags?post=592"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}