Co-authored by Rodrigo Souza, Ramnandan Krishnamurthy, Anitha Adusumilli and Jovan Popovic (Azure Cosmos DB and Azure Synapse Analytics teams)
Azure Synapse Link now supports querying Azure Cosmos DB data using Synapse SQL serverless. This capability, available in public preview, allows you to use familiar analytical T-SQL queries and build powerful near real-time BI dashboards on Azure Cosmos DB data.
As announced at Ignite 2020, you can now also query Azure Cosmos DB API for Mongo DB data using Azure Synapse Link, enabling analytics with Synapse Spark and Synapse SQL serverless.
Support for T-SQL queries and building near real-time BI dashboards
Azure Synapse SQL serverless (previously known as SQL on-demand) is a serverless, distributed data processing service offering built-in query execution fault-tolerance and a consumption-based pricing model. It enables you to analyze your data in Cosmos DB analytical store within seconds, without any performance or RU impact on your transactional workloads.
Using OPENROWSET syntax and automatic schema inference, data and business analysts can use familiar T-SQL query language to quickly explore and reason about the contents in Azure Cosmos DB analytical store. You can query this data in place without the need to copy or load the data into a specialized store.
You can also create SQL views to join data in the analytical stores across multiple Azure Cosmos DB containers, to better organize your data in a semantic layer that will accelerate your data exploration and reporting workloads. BI Professionals can quickly create Power BI reports on top of these SQL views in Direct Query mode.
You can further extend this by building a logical data warehouse to create and analyze unified views of data across Azure Cosmos DB, Azure Data Lake Storage and Azure Blob Storage.
In this video, you can see support for SQL serverless in Azure Synapse Link in action:
Querying Azure Cosmos DB API for MongoDB data
In addition to existing support of Azure Cosmos DB analytical store for Core (SQL) API, you can now enable analytical store on API for MongoDB collections. This automatically creates a fully isolated column store for enabling large-scale analytics against operational data in your MongoDB collections, without any impact to your transactional workloads.
You can run Apache Spark or T-SQL queries against live data in MongoDB collections, build rich near real-time BI dashboards using Power BI in Direct Query mode or build ML/AI pipelines using Azure Synapse Analytics and its integration with Azure Machine Learning.
Azure Synapse Link supports full-fidelity schema representation for your data in MongoDB collections, which is designed to handle the full breadth of polymorphic schemas in your schema-agnostic operational data. This allows handling of properties with multiple data types while querying this data.
With full fidelity schema representation, a distinct column is created in analytical store for each observed datatype in the leaf properties of your documents. For example, let’s say that your documents have a property named DoB (date of birth) and it has datetime format in some items and string format in others. Your analytical store representation will have two columns: DoB.date and DoB.string. At the same time, if your documents have a property named StreetName and it is always a string across all items. Your analytical store representation will have one column, StreetName.string.
You can hide the complexities of full fidelity schema representation from data analysts and developers using queries or views in Synapse Spark and Synapse SQL, formatting the schema of the data as you want.
Pricing update
Starting November 2nd, 2020, Azure Cosmos DB analytical store usage will be billed, which follows a consumption-based pricing model.
The billing model of Azure Synapse Link includes the costs incurred by using the Azure Cosmos DB analytical store as well as the Synapse compute runtime. For information on pricing, please see Analytical store pricing and Azure Synapse Link pricing.
Next steps
To get started with Synapse SQL serverless preview, please see our documentation Using T-SQL to query Azure Cosmos DB data and Use SQL views to build Power BI dashboards using Azure Synapse Link.
You can also use our sample published on GitHub.
For any feedback or suggestions to improve our product offering, please reach out to us directly at cosmosdbsynapselink@microsoft.com.
These are great news Ram!