{"id":1584,"date":"2025-11-04T12:30:11","date_gmt":"2025-11-04T12:30:11","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/all-things-azure\/?p=1584"},"modified":"2025-11-07T03:20:04","modified_gmt":"2025-11-07T03:20:04","slug":"powering-distributed-aiml-at-scale-with-azure-and-anyscale","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/all-things-azure\/powering-distributed-aiml-at-scale-with-azure-and-anyscale\/","title":{"rendered":"Powering Distributed AI\/ML at Scale with Azure and Anyscale"},"content":{"rendered":"<p>The path from prototype to production for AI\/ML workloads is rarely straightforward. As data pipelines expand and model complexity grows, teams can find themselves spending more time orchestrating distributed compute than building the intelligence that powers their products. Scaling from a laptop experiment to a production-grade workload still feels like reinventing the wheel. What if scaling AI workloads felt as natural as writing in Python itself? That\u2019s the idea behind <a href=\"https:\/\/www.ray.io\/\">Ray<\/a>, the open-source distributed computing framework born at UC Berkeley\u2019s RISELab, and now, it\u2019s coming to Azure in a whole new way.<\/p>\n<p>Today, at Ray Summit, we announced a new partnership between Microsoft and <a href=\"https:\/\/www.anyscale.com\/\">Anyscale<\/a>, the company founded by Ray\u2019s creators, to bring Anyscale\u2019s managed Ray service to Azure as an Azure-native offering in private preview. This new managed experience will deliver the simplicity of Anyscale\u2019s developer experience on top of Azure\u2019s enterprise-grade Kubernetes infrastructure, making it possible to run distributed Python workloads with native integrations, unified governance, and streamlined operations, all inside your Azure subscription.<\/p>\n<h3><strong>Ray: Open-Source Distributed Computing for Python <\/strong><\/h3>\n<p>Ray reimagines distributed systems for the Python ecosystem, making it simple for developers to scale code from a single laptop to a large cluster with minimal changes. Instead of rewriting applications for distributed execution, Ray offers Pythonic APIs that allow functions and classes to be transformed into distributed tasks and actors without altering core logic. Its smart scheduling seamlessly orchestrates workloads across CPUs, GPUs, and heterogeneous environments, ensuring efficient resource utilization.<\/p>\n<p>Developers can also build complete AI systems using Ray\u2019s native libraries\u2014Ray Train for distributed training, Ray Data for data processing, Ray Serve for model serving, and Ray Tune for hyperparameter optimization\u2014all fully compatible with frameworks like PyTorch and TensorFlow. By abstracting away infrastructure complexity, Ray lets teams focus on model performance and innovation.<\/p>\n<h3><strong>Anyscale: Enterprise Ray on Azure<\/strong><\/h3>\n<p>Ray makes distributed computing accessible; Anyscale running on Azure takes it to the next level for enterprise-readiness. At the heart of this offering is Anyscale Runtime, Anyscale\u2019s high-performance runtime for Ray. Anyscale Runtime is designed to maximize cluster efficiency and accelerate Python workloads, enabling teams on Azure to:<\/p>\n<ul>\n<li>Spin up Ray clusters in minutes, without Kubernetes expertise, directly from the Azure portal or CLI.<\/li>\n<li>Dynamically allocate tasks across CPUs, GPUs, and heterogeneous nodes, ensuring efficient resource utilization and minimizing idle time.<\/li>\n<li>Easily run large experiments quickly and cost-effectively with elastic scaling, GPU packing, and native support for Azure spot VMs.<\/li>\n<li>Run reliably at production scale with automatic fault recovery, zero-downtime upgrades, and integrated observability.<\/li>\n<li>Maintain control and governance; clusters run inside your Azure subscription, so data, models, and compute stay secure, with unified billing and compliance under Azure standards.<\/li>\n<\/ul>\n<p>By combining Ray\u2019s flexible APIs with Anyscale\u2019s managed platform and runtime performance, Python developers can move from prototype to production faster, with less operational overhead, and at cloud scale on Azure.<\/p>\n<h3><strong>\nKubernetes for Distributed Computing<\/strong><\/h3>\n<p>Under the hood, <a href=\"https:\/\/azure.microsoft.com\/en-in\/products\/kubernetes-service\/\">Azure Kubernetes Service (AKS<strong>)<\/strong><\/a> powers this new managed offering, providing the infrastructure foundation for running Ray at production scale. \u00a0AKS handles the complexity of orchestrating distributed workloads while delivering the scalability, resilience, and governance that enterprise AI applications require.<\/p>\n<p>AKS delivers:<\/p>\n<ul>\n<li>Dynamic resource orchestration: Automatically provision and scale clusters across CPUs, GPUs, and mixed configurations as demand shifts.<\/li>\n<li>High availability: Self-healing nodes and failover keep workloads running without interruption.<\/li>\n<li>Elastic scaling: scale from development clusters to production deployments spanning hundreds of nodes.<\/li>\n<li>Integrated Azure services: Native connections to Azure Monitor, Microsoft Entra ID, Blob Storage, and policy tools streamline governance across IT and data science teams.<\/li>\n<\/ul>\n<p>AKS gives Ray and Anyscale a strong foundation\u2014one that\u2019s already trusted for enterprise workloads and ready to scale from small experiments to global deployments.<\/p>\n<h3><strong>\nEnabling teams with Anyscale running on Azure<\/strong><\/h3>\n<p>With this partnership, Microsoft and Anyscale are bringing together the best of open-source Ray, managed cloud infrastructure, and Kubernetes orchestration. By pairing Ray\u2019s distributed computing platform for Python with Anyscale\u2019s management capabilities and AKS\u2019s robust orchestration, Azure customers gain flexibility in how they can scale AI workloads. Whether you want to start small with rapid experimentation or run mission-critical systems at global scale, this offering gives you the choice to adopt distributed computing without the complexity of building and managing infrastructure yourself.<\/p>\n<p>You can leverage Ray\u2019s open-source ecosystem, integrate with Anyscale\u2019s managed experience, or combine both with Azure-native services, all within your subscription and governance model. This optionality means teams can choose the path that best fits their needs: prototype quickly, optimize for cost and performance, or standardize for enterprise compliance.<\/p>\n<p>Together, Microsoft and Anyscale are removing operational barriers and giving developers more ways to innovate with Python on Azure, so they can move faster, scale smarter, and focus on delivering breakthroughs. Read the full release <a href=\"https:\/\/www.anyscale.com\/press\/anyscale-collaborates-with-microsoft-to-deliver-ai-native-computing-on-azure\">here<\/a>.<\/p>\n<h3>Get started<\/h3>\n<p>Learn more about the private preview and how to request access at <a href=\"https:\/\/aka.ms\/anyscale\">https:\/\/aka.ms\/anyscale<\/a> or subscribe to Anyscale in the\u00a0<a href=\"https:\/\/marketplace.microsoft.com\/en-us\/product\/saas\/anyscale1750870039553.anyscale-2025-1?tab=Overview\" target=\"_blank\" rel=\"noopener\"><u>Azure Marketplace.\u00a0<\/u><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The path from prototype to production for AI\/ML workloads is rarely straightforward. As data pipelines expand and model complexity grows, teams can find themselves spending more time orchestrating distributed compute than building the intelligence that powers their products. Scaling from a laptop experiment to a production-grade workload still feels like reinventing the wheel. What if [&hellip;]<\/p>\n","protected":false},"author":201574,"featured_media":1590,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[36,1,87],"tags":[30,40,43,42,2,93],"class_list":["post-1584","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-apps","category-azure","category-containers","tag-ai","tag-ai-apps","tag-app-development","tag-appdev","tag-azure","tag-containers"],"acf":[],"blog_post_summary":"<p>The path from prototype to production for AI\/ML workloads is rarely straightforward. As data pipelines expand and model complexity grows, teams can find themselves spending more time orchestrating distributed compute than building the intelligence that powers their products. Scaling from a laptop experiment to a production-grade workload still feels like reinventing the wheel. What if [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts\/1584","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/users\/201574"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/comments?post=1584"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/posts\/1584\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/media\/1590"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/media?parent=1584"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/categories?post=1584"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/all-things-azure\/wp-json\/wp\/v2\/tags?post=1584"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}