Powering Distributed AI/ML at Scale with Azure and Anyscale

The path from prototype to production for AI/ML workloads is rarely straightforward. As data pipelines expand and model complexity grows, teams can find themselves spending more time orchestrating distributed compute than building the intelligence that powers their products. Scaling from a laptop experiment to a production-grade workload still feels like reinventing the wheel. What if scaling AI workloads felt as natural as writing in Python itself? That’s the idea behind Ray, the open-source distributed computing framework born at UC Berkeley’s RISELab, and now, it’s coming to Azure in a whole new way.

Today, at Ray Summit, we announced a new partnership between Microsoft and Anyscale, the company founded by Ray’s creators, to bring Anyscale’s managed Ray service to Azure as an Azure-native offering in private preview. This new managed experience will deliver the simplicity of Anyscale’s developer experience on top of Azure’s enterprise-grade Kubernetes infrastructure, making it possible to run distributed Python workloads with native integrations, unified governance, and streamlined operations, all inside your Azure subscription.

Ray: Open-Source Distributed Computing for Python

Ray reimagines distributed systems for the Python ecosystem, making it simple for developers to scale code from a single laptop to a large cluster with minimal changes. Instead of rewriting applications for distributed execution, Ray offers Pythonic APIs that allow functions and classes to be transformed into distributed tasks and actors without altering core logic. Its smart scheduling seamlessly orchestrates workloads across CPUs, GPUs, and heterogeneous environments, ensuring efficient resource utilization.

Developers can also build complete AI systems using Ray’s native libraries—Ray Train for distributed training, Ray Data for data processing, Ray Serve for model serving, and Ray Tune for hyperparameter optimization—all fully compatible with frameworks like PyTorch and TensorFlow. By abstracting away infrastructure complexity, Ray lets teams focus on model performance and innovation.

Anyscale: Enterprise Ray on Azure

Ray makes distributed computing accessible; Anyscale running on Azure takes it to the next level for enterprise-readiness. At the heart of this offering is Anyscale Runtime, Anyscale’s high-performance runtime for Ray. Anyscale Runtime is designed to maximize cluster efficiency and accelerate Python workloads, enabling teams on Azure to:

Spin up Ray clusters in minutes, without Kubernetes expertise, directly from the Azure portal or CLI.
Dynamically allocate tasks across CPUs, GPUs, and heterogeneous nodes, ensuring efficient resource utilization and minimizing idle time.
Easily run large experiments quickly and cost-effectively with elastic scaling, GPU packing, and native support for Azure spot VMs.
Run reliably at production scale with automatic fault recovery, zero-downtime upgrades, and integrated observability.
Maintain control and governance; clusters run inside your Azure subscription, so data, models, and compute stay secure, with unified billing and compliance under Azure standards.

By combining Ray’s flexible APIs with Anyscale’s managed platform and runtime performance, Python developers can move from prototype to production faster, with less operational overhead, and at cloud scale on Azure.

Kubernetes for Distributed Computing

Under the hood, Azure Kubernetes Service (AKS) powers this new managed offering, providing the infrastructure foundation for running Ray at production scale. AKS handles the complexity of orchestrating distributed workloads while delivering the scalability, resilience, and governance that enterprise AI applications require.

AKS delivers:

Dynamic resource orchestration: Automatically provision and scale clusters across CPUs, GPUs, and mixed configurations as demand shifts.
High availability: Self-healing nodes and failover keep workloads running without interruption.
Elastic scaling: scale from development clusters to production deployments spanning hundreds of nodes.
Integrated Azure services: Native connections to Azure Monitor, Microsoft Entra ID, Blob Storage, and policy tools streamline governance across IT and data science teams.

AKS gives Ray and Anyscale a strong foundation—one that’s already trusted for enterprise workloads and ready to scale from small experiments to global deployments.

Enabling teams with Anyscale running on Azure

With this partnership, Microsoft and Anyscale are bringing together the best of open-source Ray, managed cloud infrastructure, and Kubernetes orchestration. By pairing Ray’s distributed computing platform for Python with Anyscale’s management capabilities and AKS’s robust orchestration, Azure customers gain flexibility in how they can scale AI workloads. Whether you want to start small with rapid experimentation or run mission-critical systems at global scale, this offering gives you the choice to adopt distributed computing without the complexity of building and managing infrastructure yourself.

You can leverage Ray’s open-source ecosystem, integrate with Anyscale’s managed experience, or combine both with Azure-native services, all within your subscription and governance model. This optionality means teams can choose the path that best fits their needs: prototype quickly, optimize for cost and performance, or standardize for enterprise compliance.

Together, Microsoft and Anyscale are removing operational barriers and giving developers more ways to innovate with Python on Azure, so they can move faster, scale smarter, and focus on delivering breakthroughs. Read the full release here.

Get started

Learn more about the private preview and how to request access at https://aka.ms/anyscale or subscribe to Anyscale in the Azure Marketplace.

Powering Distributed AI/ML at Scale with Azure and Anyscale