October 15th, 2025
0 reactions

Easily connect AI workloads to Azure Blob Storage with adlfs

Easily connect AI workloads to Azure Blob Storage with adlfs

We’re excited to announce the newest release of adlfs—a unified Pythonic file system interface to Azure Blob Storage and Azure Data Lake Storage. Microsoft works closely with the fsspec open-source community to enhance this package. We continue to focus on optimizing performance, security, and authentication. This collaboration ensures that adlfs can be a robust and reliable solution for connecting AI workloads to Azure Blob and Data Lake Storage.

Who is this for?

Data professionals face a constant challenge: bridging code and cloud storage no matter the scale. Python’s fsspec standard is the universal “file adapter,” and adlfs is its specialized, high-performance gateway to Azure’s Blob and Data Lake Storage. While any Python code can utilize adlfs, what sets it apart is the native integration into frameworks like Dask, Pandas, Ray, PyTorch, PyIceberg, and more. adlfs is a clear choice for Azure-centric ML, data science, and extract-transform-load (ETL) workloads. For example, developers can use adlfs to load datasets and save model checkpoints to Azure using PyTorch and PyTorch Lightning.

These improvements in adlfs benefit the broad range of AI/ML tools at once. By making Azure Storage faster and more reliable in adlfs, every tool from PyTorch to pandas that uses fsspec gets a boost on Azure. There’s no custom integration required. Switching from local files or other cloud files to Azure is often as simple as changing a file path (for example, from file:// or s3:// to az://) or a configuration setting.

What’s new?

The 2025.8.0 release of adlfs (now available on PyPI) brings several key enhancements focused on performance, resiliency, and ease-of-use. In summary, this update delivers faster file operations (via parallel uploads) and improved reliability (smaller default block sizes to reduce timeouts, plus fixes for geo-redundant storage scenarios).

  • Writing large files is two to five times faster due to support for concurrent block uploads.
  • Decreased default block size from 1 GiB to 50-MiB addresses timeout and connection issues for large file uploads.

Installation

adlfs is listed on PyPI, and you can install it using your favorite package manager. This example utilizes pip:

pip install adlfs==2025.8.0

Example usage in Ray

adlfs can be used directly with Ray to enable distributed access to Azure Blob Storage within Ray data pipelines. You can pass an adlfs AzureBlobFileSystem as the filesystem argument in Ray’s data loading functions. Doing so allows data to be read from Azure storage in parallel across the Ray cluster. You can configure authentication through various methods, including Azure CLI credentials, environment variables, managed identity, or explicit parameters. This flexibility makes it easy to switch from one development environment to another.

import ray
from adlfs import AzureBlobFileSystem

ray.init()
# Configure authentication - set anon=False to use credentials
abfs = AzureBlobFileSystem(account_name="myaccount", anon=False)
ds = ray.data.read_parquet("az://mycontainer/data/", filesystem=abfs)
print(ds.count())
ray.shutdown()

This pattern allows seamless, scalable data access from cloud storage within Ray jobs using standard Azure authentication and storage paths.

What’s next?

Microsoft is actively contributing to the adlfs package to ensure that customers can have the best experience when interacting with their data in Azure Blob Storage. If you work on Python or AI/ML frameworks, upgrade to adlfs 2025.8.0 and try it out. If you’re already using these frameworks, the improvements we made are automatically available to your applications and require no code changes.

We’d love to hear your feedback. If you have feature requests or run into issues, let us know on the adlfs GitHub repo. Community input will directly influence our next round of contributions. We’re excited to continuously improve how AI workloads can utilize Azure Storage.

Summary

The 2025.8.0 release of adlfs brings significant performance improvements and enhanced reliability for AI workloads using Azure Blob Storage and Data Lake Storage. With 2-5 times faster large file writes and improved timeout handling, adlfs makes it easier than ever to connect Python AI/ML frameworks to Azure storage services.

0 comments