Running Parallel Apache Spark Notebook Workloads On Azure Databricks

This article walks through the development of a technique for running Spark jobs in parallel on Azure Databricks. The technique enabled us to reduce the processing times for JetBlue's reporting threefold while keeping the business logic implementation straight forward. The technique can be re-used for any notebooks-based Spark workload on Azure Databricks.

How to Build A K8S Http API For Helm, and Serve Micro-services Using A Single IP

The Commercial Software Engineering team (CSE) partnered with Axonize to automate the process of deploying apps to Kubernetes, and expose these apps to the internet via a single IP. This post is about enabling applications in your Kubernetes cluster to programmatically install helm charts and expose them through a single public facing IP.

Attaching and Detaching an Edge Node From a HDInsight Spark Cluster when running Dataiku Data Science Studio (DSS)

Earlier this year, Dataiku and Microsoft joined forces to add extra flexibility to DSS on HDInsight, and also to allow Dataiku customers to attach a persistent edge node on an HDInsight cluster – something which was previously not a feature supported by the most recent edition of Azure HDInsight.  
Comments are closed.0 0
CSE

Infrastructure as Code – On demand GPU clusters with Terraform & Jenkins

Developing robust algorithms for self-driving cars requires sourcing event data from over 10 billion hours of recorded driving time. CSE worked with Cognata, a startup developing simulation platforms for autonomous vehicles, to build a Jenkins pipeline and Terraform solution that enabled our partner to dynamically scale GPU resources for their simulations.