ISE Developer Blog

Developing and Deploying a Recommender Model for Continuous Systematic Literature Reviews

February 10, 2021 Feb 10, 2021 02/10/21

Paolo Tenti

This blog post is about using the Microsoft Academic Graph and NLP to build a personalized recommender system to suggest new scientific publications to researchers maintaining Systematic Literature Reviews.

Running Parallel Apache Spark Notebook Workloads On Azure Databricks

January 18, 2019 Jan 18, 2019 01/18/19

Clemens Wolff

This article walks through the development of a technique for running Spark jobs in parallel on Azure Databricks. The technique enabled us to reduce the processing times for JetBlue's reporting threefold while keeping the business logic implementation straight forward. The technique can be re-used for any notebooks-based Spark workload on Azure Databricks.

Social Stream Pipeline on Databricks with auto-scaling and CI/CD using Travis

December 12, 2018 Dec 12, 2018 12/12/18

Mor Shemesh

This code story describes CSE's work with ZenCity to create a data pipeline on Azure Databricks supported by a CI/CD pipeline on TravisCI. The aim of the collaboration was to create a pipeline capable of processing a stream of social posts, analyzing them, and identifying trends.

Unsupervised driver safety estimation at scale, a collaboration with Pointer Telocation

July 30, 2018 Jul 30, 2018 07/30/18

Omri Mendels

A scalable unsupervised approach for driver safety estimation on Pointer Telocation's dataset

Runtime Configuration of Spark Streaming Jobs

May 1, 2018 May 1, 2018 05/1/18

Kevin Hartman

We achieved zero-downtime reconfiguration and management of the Spark Streaming job used in Project Fortis with Azure Service Bus.

Permissively-Licensed Named Entity Recognition on the JVM

November 20, 2017 Nov 20, 2017 11/20/17

Clemens Wolff

The ability to correctly identify entities, such as places, people, and organizations, adds a powerful level of natural language understanding to applications. This post introduces a MIT-licensed one-click deployment to Azure for web services that lets developers get started with a wide range of natural language tasks in 5 minutes or less, by consuming simple HTTP services for language identification, tokenization, part-of-speech-tagging and named entity recognition.

Building a Custom Spark Connector for Near Real-Time Speech-to-Text Transcription

November 1, 2017 Nov 1, 2017 11/1/17

Clemens Wolff

This post describes in detail the Azure Cognitive Services speech-to-text WebSocket protocol and shows how to implement the protocol in Java. This enables us to transcribe audio to text in near real-time. We then show how to feed the transcribed radio into a pipeline based on Spark Streaming for further analysis, augmentation, and aggregation. The Java client is reusable across a wide range of text-to-speech scenarios that require time-efficient speech-to-text transcription in more than 10 languages including English, French, Spanish, German and Chinese.

Project Fortis: Accelerating UN Humanitarian Aid Planning with GraphQL

May 10, 2017 May 10, 2017 05/10/17

Erik Schlegel

Using GraphQL and Azure to create a data processing pipeline for identifying trends and providing insights about global humanitarian crises.

Benchmarking Technologies for Storing and Querying Genomic Data

September 29, 2016 Sep 29, 2016 09/29/16

Ami Turgman

Storing and querying big data is very challenging, especially when performing real-time queries on huge amounts of data. In this post, I'll share benchmarking results and compare several big data technologies in the context of genomic data queries.

ISE Developer Blog

Spark - ISE Developer Blog