Scala

Running Parallel Apache Spark Notebook Workloads On Azure Databricks

January 18, 2019 Jan 18, 2019 01/18/19

Clemens Wolff

This article walks through the development of a technique for running Spark jobs in parallel on Azure Databricks. The technique enabled us to reduce the processing times for JetBlue's reporting threefold while keeping the business logic implementation straight forward. The technique can be re-used for any notebooks-based Spark workload on Azure Databricks.

Social Stream Pipeline on Databricks with auto-scaling and CI/CD using Travis

December 12, 2018 Dec 12, 2018 12/12/18

Mor Shemesh

This code story describes CSE's work with ZenCity to create a data pipeline on Azure Databricks supported by a CI/CD pipeline on TravisCI. The aim of the collaboration was to create a pipeline capable of processing a stream of social posts, analyzing them, and identifying trends.

Runtime Configuration of Spark Streaming Jobs

May 1, 2018 May 1, 2018 05/1/18

Kevin Hartman

We achieved zero-downtime reconfiguration and management of the Spark Streaming job used in Project Fortis with Azure Service Bus.

Permissively-Licensed Named Entity Recognition on the JVM

November 20, 2017 Nov 20, 2017 11/20/17

Clemens Wolff

The ability to correctly identify entities, such as places, people, and organizations, adds a powerful level of natural language understanding to applications. This post introduces a MIT-licensed one-click deployment to Azure for web services that lets developers get started with a wide range of natural language tasks in 5 minutes or less, by consuming simple HTTP services for language identification, tokenization, part-of-speech-tagging and named entity recognition.

ISE Developer Blog

Scala - ISE Developer Blog

Running Parallel Apache Spark Notebook Workloads On Azure Databricks

Social Stream Pipeline on Databricks with auto-scaling and CI/CD using Travis

Runtime Configuration of Spark Streaming Jobs

Permissively-Licensed Named Entity Recognition on the JVM