In this post we will describe how we were able to run our Dask-based data preparation at scale on Azure ML Compute Clusters, using Dask-MPI and the native Azure ML MPI support.
This code story describes a collaboration with ZenCity around detecting trending topics at scale. We discuss the datasets, data preparation, models used and the deployment story for this scenario.
Storing and querying big data is very challenging, especially when performing real-time queries on huge amounts of data. In this post, I'll share benchmarking results and compare several big data technologies in the context of genomic data queries.