CSE Developer Blog

Detecting “Action” and “Cut” in Archival Footage Using a Multi-model Computer Vision and Audio Approach with Azure Cognitive Services

Movies and TV shows require multiple takes per scene and may have a substantial amount of archival footage as a result. Here, we use Azure Cognitive Services and custom code to develop a multi-model Machine Learning (ML) solution to automatically detect discardable footage to save media companies manual archiving hours and storage space.

Building an Action Detection Scoring Pipeline for Digital Dailies

Media companies capture footage filmed for the entire day in what's known as ‘digital dailies’. When talking about terabytes and petabytes of content, storage costs can be a factor. Lets explore Machine Learning approaches to identify which content can be archived or discarded which will save on those storage costs.

Using Azure Cognitive Services to Analyse Evidence in Public Safety and Justice

A scenario commonly encountered in public safety and justice is the need to collect and index digital data recovered from devices, so that investigating officers can perform evidence-based analysis. We recently built an advanced evidence analysis platform that uses Azure AI services for automated labelling of media and documents.

Making sense of Handwritten Sections in Scanned Documents using the Azure ML Package for Computer Vision and Azure Cognitive Services

Extracting general concepts, rather than specific phrases, from documents and contracts is challenging. It's even more complicated when applied to scanned documents containing handwritten annotations. We describe using object detection and OCR with Azure ML Package for Computer Vision and Cognitive Services API.

Building a Custom Spark Connector for Near Real-Time Speech-to-Text Transcription

This post describes in detail the Azure Cognitive Services speech-to-text WebSocket protocol and shows how to implement the protocol in Java. This enables us to transcribe audio to text in near real-time. We then show how to feed the transcribed radio into a pipeline based on Spark Streaming for further analysis, augmentation, and aggregation. The Java client is reusable across a wide range of text-to-speech scenarios that require time-efficient speech-to-text transcription in more than 10 languages including English, French, Spanish, German and Chinese.

Feedback usabilla icon