{"id":3623,"date":"2021-11-02T08:00:51","date_gmt":"2021-11-02T15:00:51","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=3623"},"modified":"2021-11-02T07:33:19","modified_gmt":"2021-11-02T14:33:19","slug":"custom-partitioning-azure-synapse-link","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/custom-partitioning-azure-synapse-link\/","title":{"rendered":"Now in preview \u2013 Support for custom partitioning in Azure Synapse Link for Azure Cosmos DB"},"content":{"rendered":"<p><em>This blog post was co-authored by Revin Chalil, Rodrigo Souza and Anitha Adusumilli from the Azure Cosmos DB Team.<\/em><\/p>\n<p><span data-contrast=\"auto\">Today, we are\u00a0excited\u00a0to announce\u00a0that\u00a0custom partitioning is in preview for\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/synapse-link\"><span data-contrast=\"none\">Azure Synapse Link for Azure Cosmos DB<\/span><\/a><span data-contrast=\"auto\">. This capability improves query performance by <\/span><span data-contrast=\"none\">enabling you to partition Azure Cosmos DB analytical store data using keys that are <\/span><span data-contrast=\"auto\">frequently used<\/span><span data-contrast=\"none\"> as query filters.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">In addition to custom partitioning,\u00a0this blog post also covers two\u00a0Azure Synapse Link\u00a0capabilities that\u00a0have\u00a0been\u00a0enabled in the last few months:\u00a0serverless compatibility and full fidelity schema.\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"2\"><span data-contrast=\"none\">Custom\u00a0partitioning\u00a0for analytical\u00a0workloads<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">You can now use custom partitioning to improve\u00a0analytical\u00a0query performance. By default, data in Azure Cosmos DB analytical store is not partitioned.\u00a0<\/span><span data-contrast=\"none\">If your analytical queries have frequently used filters, you\u202fnow have the option to partition based on these fields.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Partitioned data from the Azure Cosmos DB analytical store is written to the partitioned store, which points to the primary storage account linked to your Azure Synapse workspace.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><span data-contrast=\"auto\">Analytical queries with filters on the partition columns can benefit from partition pruning and avoid scanning any partition that does not satisfy the filters. The pruned data from the partitioned store is then automatically merged with the most recent non-partitioned data from the analytical store, returning near real-time results for your queries. <\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">By compacting\u00a0and optimizing the\u00a0analytical data\u00a0written to\u00a0the\u00a0partitioned store, custom partitioning al<\/span><span data-contrast=\"none\">so helps\u00a0to\u00a0improve the query performance when there is a\u00a0<\/span><span data-contrast=\"none\">high volume of update or delete operations on the underlying Azure Cosmos DB container<\/span><span data-contrast=\"none\">.\u00a0<\/span><span data-contrast=\"auto\">Instead of keeping track of multiple versions of records in the analytical store and loading them during each query execution, the partitioned store only contains the latest version of the data.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><figure id=\"attachment_3625\" aria-labelledby=\"figcaption_attachment_3625\" class=\"wp-caption aligncenter\" ><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-1.png\"><img decoding=\"async\" class=\"wp-image-3625 size-full\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-1.png\" alt=\"Architecture of custom partitioning in Azure Synapse Link for Azure Cosmos DB showing operational data flowing into the Azure Cosmos DB transactional store, then being automatically synced into the analytical store before the data flows into a partitioned store for analysis using Azure Synapse Analytics.\" width=\"1476\" height=\"679\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-1.png 1476w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-1-300x138.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-1-1024x471.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-1-768x353.png 768w\" sizes=\"(max-width: 1476px) 100vw, 1476px\" \/><\/a><figcaption id=\"figcaption_attachment_3625\" class=\"wp-caption-text\">Diagram of Azure Synapse Link custom partitioning architecture<\/figcaption><\/figure><\/p>\n<h3><span data-contrast=\"none\">Triggering a\u00a0custom partitioning job<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"none\">Partitioning can be triggered from an Azure Synapse Spark notebook using Azure Synapse Link. You can schedule it to run as a background job, once or twice a day, or it can be executed more often if needed. <\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><span data-contrast=\"none\">You can also choose one or more fields from the dataset as the analytical store partition key. <\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">In the below sample, the\u202f&#8221;readDate&#8221;\u202fis chosen as the partition key. Each time you trigger the partitioning job, the incremental analytical store data will be partitioned and written to the partitioned store, which is indicated by the &#8220;basePath&#8221; config. <\/span><span data-contrast=\"auto\">Please see\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/configure-custom-partitioning\"><span data-contrast=\"none\">configure custom partitioning<\/span><\/a><span data-contrast=\"auto\"> to understand the different configurations, which can be used with custom partitioning.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335551550&quot;:6,&quot;335551620&quot;:6,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><figure id=\"attachment_3624\" aria-labelledby=\"figcaption_attachment_3624\" class=\"wp-caption alignnone\" ><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-2.png\"><img decoding=\"async\" class=\"size-full wp-image-3624\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-2.png\" alt=\"Azure Synapse Link custom partitiniong job definition\" width=\"1426\" height=\"420\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-2.png 1426w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-2-300x88.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-2-1024x302.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-2-768x226.png 768w\" sizes=\"(max-width: 1426px) 100vw, 1426px\" \/><\/a><figcaption id=\"figcaption_attachment_3624\" class=\"wp-caption-text\">Azure Synapse Link custom partitioning job definition code snippet<\/figcaption><\/figure><\/p>\n<h3 aria-level=\"3\"><span data-contrast=\"none\">Analytical query execution\u00a0on\u00a0partitioned store\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h3>\n<p aria-level=\"3\"><span data-contrast=\"none\">While querying, specify\u202fthe\u202fsame\u202f&#8221;partition.keys&#8221; and\u202f&#8221;basePath&#8221;\u202fconfigs, as defined in the partitioning job,\u202fto make use of the partitioned store\u202fin the query path.\u202f<\/span><span data-contrast=\"none\">\u00a0<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">Below filter will avoid scanning partitions other than &#8220;readDate = &#8216;2021-11-01&#8217;\u201d\u202fpartition: <\/span><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><figure id=\"attachment_3626\" aria-labelledby=\"figcaption_attachment_3626\" class=\"wp-caption alignnone\" ><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-3.png\"><img decoding=\"async\" class=\"size-full wp-image-3626\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-3.png\" alt=\"Azure Synapse Link custom partitioning query \" width=\"1416\" height=\"697\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-3.png 1416w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-3-300x148.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-3-1024x504.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2021\/10\/Custom-Partitioning-3-768x378.png 768w\" sizes=\"(max-width: 1416px) 100vw, 1416px\" \/><\/a><figcaption id=\"figcaption_attachment_3626\" class=\"wp-caption-text\">Azure Synapse Link custom partitioning query<\/figcaption><\/figure><\/p>\n<p><span data-contrast=\"auto\">In summary,\u00a0custom partitioning\u00a0can help\u00a0improve\u00a0analytical\u00a0query performance\u00a0when partition column(s) are used\u00a0as query filters.\u00a0To learn more,\u00a0refer to\u00a0our\u00a0<\/span><a href=\"https:\/\/aka.ms\/synapselink-partitioning\"><span data-contrast=\"none\">documentation<\/span><\/a><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3>What customers say about Azure Synapse Link for Azure Cosmos DB<\/h3>\n<blockquote><p>&#8220;AEMO leverages Azure and Azure Cosmos DB to ingest terabytes of critical metering data. With the significant amounts of data that our system brings in, the capability to do custom partitioning in Azure Cosmos DB&#8217;s HTAP functionality was essential for us to provide performant insights to our internal stakeholders.&#8221;<\/p>\n<p>&#8211; Australian\u00a0Energy\u00a0Market\u00a0Operator<\/p><\/blockquote>\n<h3><span data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><span data-contrast=\"none\">Azure Synapse Link compatibility with Azure Cosmos DB serverless<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"none\">You can now use Azure Synapse Link with <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/serverless\" target=\"_blank\" rel=\"noopener\">Azure Cosmos DB serverless<\/a> accounts. Serverless is ideal for developing, testing, and running production applications where the traffic pattern is unknown. These accounts are billed on<\/span><span data-contrast=\"none\"> consumption, meaning that you are charged only for the Request Units consumed by database operations and the GBs of transactional and analytical storage consumed by your data.\u00a0<\/span> <span data-contrast=\"none\">Visit <a href=\"https:\/\/azure.microsoft.com\/pricing\/details\/cosmos-db\/\" target=\"_blank\" rel=\"noopener\">our pricing page<\/a> for details on Azure Cosmos DB options<\/span><span data-contrast=\"none\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"none\">This feature is generally available in all Azure regions, using Azure Powershell or CLI.\u00a0<\/span><\/p>\n<h3 aria-level=\"2\"><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><span data-contrast=\"none\">Full fidelity schema for Azure Cosmos DB Core (SQL) API accounts<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559738&quot;:40,&quot;335559739&quot;:0,&quot;335559740&quot;:259}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"none\">Now you can use\u00a0<\/span><a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/analytical-store-introduction#full-fidelity-schema-for-sql-api-accounts\"><span data-contrast=\"none\">full fidelity schema for Azure Cosmos DB Core (SQL) API containers<\/span><\/a><span data-contrast=\"none\">. This provides you with more flexibility to run analytical queries against Core (SQL) API containers with mixed data types for properties or nested structures. Here is some important information on this capability:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"1\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\"><span data-contrast=\"none\">By default, Azure Synapse Link supports well-defined schema for Core (SQL) API containers. If you want this to be changed to full fidelity schema, you should specify this option, when you are enabling Azure Synapse Link on your account. <\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"1\" aria-setsize=\"-1\" data-aria-posinset=\"2\" data-aria-level=\"1\"><span data-contrast=\"none\">Once set, this setting cannot be changed. Existing accounts with Azure Synapse Link-enabled can\u2019t have their schema option changed.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"1\" aria-setsize=\"-1\" data-aria-posinset=\"3\" data-aria-level=\"1\"><span data-contrast=\"none\">If you enable Azure Synapse Link in your Core (SQL) API database account using the Azure portal, it will always be well-defined. You must use PowerShell or Azure CLI to override the default option and use full-fidelity schema in Core (SQL) API database accounts.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<ul>\n<li data-leveltext=\"%1.\" data-font=\"Calibri\" data-listid=\"1\" aria-setsize=\"-1\" data-aria-posinset=\"4\" data-aria-level=\"1\"><span data-contrast=\"none\">This is an account-level configuration, meaning that all containers in all databases in that database account will have the same schema inference method defined when Azure Synapse Link was enabled.<\/span><span data-ccp-props=\"{&quot;134233279&quot;:true,&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/li>\n<\/ul>\n<p><span data-contrast=\"none\">This feature is\u00a0generally available\u00a0in\u00a0all Azure regions.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"2\">About Azure Synapse Link for Azure Cosmos DB<\/h3>\n<p><span data-contrast=\"none\">Run near-real-time analytics and AI on the operational data within your Azure Cosmos DB NoSQL database, to reduce time to insight. <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/synapse-link\" data-bi-an=\"media-01\" data-bi-tn=\"undefined\">Azure Synapse Link<\/a> for Azure Cosmos DB seamlessly integrates with Azure Synapse Analytics without data movement or diminishing the performance of your operational store.\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Support for custom partitioning in Azure Synapse Link for Azure Cosmos DB was announced in preview at Microsoft Ignite. This capability\u00a0improves query performance by enabling\u00a0you\u00a0to partition Azure Cosmos DB\u00a0analytical store data using keys that are\u00a0frequently used\u00a0as query filters.<\/p>\n","protected":false},"author":21894,"featured_media":3625,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1216,12,14,1611,1310,1778],"tags":[1242,1228,1796,1737,292,1177,1798],"class_list":["post-3623","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-analytics","category-announcements","category-core-sql-api","category-data-architecture","category-notebooks","category-spark","tag-analytics","tag-azure-synapse-link","tag-azure-synapse-spark","tag-partitioning","tag-pyspark","tag-serverless","tag-spark-sql"],"acf":[],"blog_post_summary":"<p>Support for custom partitioning in Azure Synapse Link for Azure Cosmos DB was announced in preview at Microsoft Ignite. This capability\u00a0improves query performance by enabling\u00a0you\u00a0to partition Azure Cosmos DB\u00a0analytical store data using keys that are\u00a0frequently used\u00a0as query filters.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3623","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/21894"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=3623"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3623\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/3625"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=3623"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=3623"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=3623"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}