Showing results for Spark - ISE Developer Blog

Jan 18, 2019
Post likes count0

Running Parallel Apache Spark Notebook Workloads On Azure Databricks

Clemens Wolff
Clemens Wolff

This article walks through the development of a technique for running Spark jobs in parallel on Azure Databricks. The technique enabled us to reduce the processing times for JetBlue's reporting threefold while keeping the business logic implementation straight forward. The technique can be re-used for any notebooks-based Spark workload on Azure Dat...

Big Data
Dec 12, 2018
Post likes count0

Social Stream Pipeline on Databricks with auto-scaling and CI/CD using Travis

Mor Shemesh
Mor Shemesh

This code story describes CSE's work with ZenCity to create a data pipeline on Azure Databricks supported by a CI/CD pipeline on TravisCI. The aim of the collaboration was to create a pipeline capable of processing a stream of social posts, analyzing them, and identifying trends.

DevOpsBig DataAzure App Services
May 1, 2018
Post likes count0

Runtime Configuration of Spark Streaming Jobs

Kevin Hartman
Kevin Hartman

We achieved zero-downtime reconfiguration and management of the Spark Streaming job used in Project Fortis with Azure Service Bus.

Big Data
Nov 20, 2017
Post likes count0

Permissively-Licensed Named Entity Recognition on the JVM

Clemens Wolff
Clemens Wolff

The ability to correctly identify entities, such as places, people, and organizations, adds a powerful level of natural language understanding to applications. This post introduces a MIT-licensed one-click deployment to Azure for web services that lets developers get started with a wide range of natural language tasks in 5 minutes or less, by consu...

Machine LearningAzure App Services
Nov 1, 2017
Post likes count1

Building a Custom Spark Connector for Near Real-Time Speech-to-Text Transcription

Clemens Wolff
Clemens Wolff

This post describes in detail the Azure Cognitive Services speech-to-text WebSocket protocol and shows how to implement the protocol in Java. This enables us to transcribe audio to text in near real-time. We then show how to feed the transcribed radio into a pipeline based on Spark Streaming for further analysis, augmentation, and aggregation. The ...

Machine LearningCognitive Services
Sep 29, 2016
Post likes count0

Benchmarking Technologies for Storing and Querying Genomic Data

Ami Turgman
Ami Turgman

Storing and querying big data is very challenging, especially when performing real-time queries on huge amounts of data. In this post, I'll share benchmarking results and compare several big data technologies in the context of genomic data queries.

Big Data