{"id":27611,"date":"2018-11-14T00:11:20","date_gmt":"2018-11-14T07:11:20","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/premier-developer\/?p=27611"},"modified":"2019-02-14T20:17:44","modified_gmt":"2019-02-15T03:17:44","slug":"synchronizing-azure-cosmos-db-collections-for-blazing-fast-queries","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/synchronizing-azure-cosmos-db-collections-for-blazing-fast-queries\/","title":{"rendered":"Synchronizing Azure Cosmos DB Collections for Blazing Fast Queries"},"content":{"rendered":"<p>App Dev Manager <a href=\"https:\/\/www.linkedin.com\/in\/jabele\/\">John Abele<\/a> spotlights how materialized views and the right partition key strategy can make a huge difference in your Cosmos DB query performance.<\/p>\n<hr \/>\n<p>Azure Cosmos DB is the fastest growing data service in Azure \u2013 and for good reason. The service offers global distribution in a few clicks, seamless horizontal scaling, automatic indexing, and 99.99% guarantees for availability, throughput, latency, and consistency. Enabling Cosmos DB <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/how-to-manage-database-account#configure-multiple-write-regions\">multi-master mode<\/a> provides a service-level agreement backed read and write availability of 99.999% &#8211; financially backed by Microsoft.<\/p>\n<p>While Cosmos DB offloads many of the hard NoSQL scaling problems, shaping your data and choosing a logical partition key are left to you. The partition key choice is arguably the most important decision you\u2019ll need to make \u2013 it must be determined upon creation of a collection and cannot be changed once created.<\/p>\n<p>Among the <a href=\"https:\/\/na01.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fcosmos-db%2Fpartition-data%23best-practices-when-choosing-a-partition-key&amp;data=02%7C01%7CJohn.Abele%40microsoft.com%7C9787faa2522f4b3114d808d6496faf31%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636777141384304384&amp;sdata=uSt3GSzzKNRXqxuATBDA59n2T7Qx%2BW5JOvAu4nyf9Aw%3D&amp;reserved=0\">best practices for choosing a partition key<\/a>, it is often recommended to choose a value which appears frequently as a filter in your queries. This is because of the inherent speed and lower cost of constructing single-partition queries.<\/p>\n<p>Take a look at this list which organizes query performance from fastest\/most efficient to slowest\/least efficient:<\/p>\n<ul>\n<li>GET on a single document (&lt; 1ms)<\/li>\n<li>Single-partition query<\/li>\n<li>Cross-partition query (&lt; 10ms)<\/li>\n<li>Scan query (query without filters)<\/li>\n<\/ul>\n<p>Cosmos DB offers guaranteed &lt;10ms read and write availability at the 99<sup>th<\/sup> percentile anywhere in the world \u2013 making it ideal for highly responsive and mission critical applications. That said, optimizing your partitioning strategy around single-partition queries significantly improves query performance and reduce RU\/s (<a href=\"https:\/\/na01.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fcosmos-db%2Frequest-units&amp;data=02%7C01%7CJohn.Abele%40microsoft.com%7C9787faa2522f4b3114d808d6496faf31%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636777141384314393&amp;sdata=%2Flxgu2laAPgXOHwSmpOy88qEfjZTB92oL8PqBWf9O4s%3D&amp;reserved=0\">request unit<\/a> per second) consumption. Let\u2019s look at how to maximize read-optimization around multiple query filters while avoiding fan-out queries.<\/p>\n<p>Assume we have a concert and events application which tracks tours, artists, dates, locations, ticket information, etc. For simplicity, let\u2019s assume we are working with the following document:<\/p>\n<pre class=\"lang:default decode:true\">{\r\n   \"eventId\":\"1756307\",\r\n   \"type\" : \"Concert\",\r\n   \"performers\" : [{\r\n      \"performerId\" : \"22047\",\r\n       \"performerName\" : \"The Contosos\"\r\n   },{\r\n       \"performerId\" : \"19118\",\r\n       \"performerName\" : \"Fabrikams\"\r\n   }],\r\n   \"eventName\" : \"The Final Countdown\",\r\n   \"description\" : \"We're excited to announce the reunion of the Fabrikams and Contosos!\",\r\n   \"startdate\":\"1540320148\",\r\n   \"enddate\":\"1540320148\",\r\n   \"location\" : {\r\n       \"locationId\" : \"112\",\r\n       \"streetAddress\" : \"100 Main St.\"\r\n       \"locality\" : \"Seattle\",\r\n       \"region\" : \"WA\",\r\n       \"postalCode\" : \"98101\"\r\n   }\r\n   \u2026\r\n}<\/pre>\n<p>In order to avoid hot partitions and any future storage issues, we might partition on <em>eventId<\/em> or a location key since they have high cardinality. Doing so might cause our partitioning to look like this:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-27612 size-large\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2018\/11\/cosmo1-1024x446.png\" alt=\"\" width=\"640\" height=\"279\" \/><\/p>\n<p>While this partition key choice checks most of the best practices boxes, we can expect users of our app to want to query by performer name, find events by city or by date &#8211; resulting in cross-partition or fan-out query. In practice, we shouldn\u2019t expect to eliminate all fan-out queries, but if we have a read-intensive, latency sensitive workload we can optimize around multiple pivots using the <a href=\"https:\/\/na01.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fcosmos-db%2Fchange-feed&amp;data=02%7C01%7CJohn.Abele%40microsoft.com%7C9787faa2522f4b3114d808d6496faf31%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636777141384324401&amp;sdata=J5%2F1%2BZd8tMG%2B6cgu2x7NqquSphxmVyrDWXnO3KYryL8%3D&amp;reserved=0\">Azure Cosmos DB Change Feed<\/a>. The change feed supports the following scenarios:<\/p>\n<ul>\n<li>Triggering a notification or a call to an API, when an item is inserted or updated.<\/li>\n<li>Real-time stream processing for IoT or real-time analytics processing on operational data.<\/li>\n<li>Additional data movement by either synchronizing with a cache or a search engine or a data warehouse or archiving data to cold storage.<\/li>\n<\/ul>\n<p>The Change Feed works by monitoring an Azure Cosmos DB collection for any changes (inserts and updates only). It then creates a sorted list of documents in the order in which they were modified. These changes are persisted and can be distributed across one or more consumers for parallel processing.<\/p>\n<p>For our purposes, we will use the change feed and an <a href=\"https:\/\/na01.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fazure.microsoft.com%2Fen-us%2Fservices%2Ffunctions%2F&amp;data=02%7C01%7CJohn.Abele%40microsoft.com%7C9787faa2522f4b3114d808d6496faf31%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636777141384324401&amp;sdata=h9GzO%2F2e0w%2BpaNCSfp8YATqHIOLmOvRHWBCy3ky2258%3D&amp;reserved=0\">Azure Function<\/a> to implement a <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/architecture\/patterns\/materialized-view\">materialized view pattern<\/a> to synchronize a secondary collection with a different partition key, such as <em>performerName<\/em>. Creating a materialized view will support efficient querying when the original data isn\u2019t ideally formatted for additional required query operations, increasing overall performance.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-27613 size-large\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2018\/11\/cosmo2-1024x300.png\" alt=\"feed\" width=\"640\" height=\"188\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2018\/11\/cosmo2-1024x300.png 1024w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2018\/11\/cosmo2-300x88.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2018\/11\/cosmo2-768x225.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2018\/11\/cosmo2.png 1422w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><b><\/b><i><\/i><u><\/u><\/p>\n<p>This setup very simple to configure within an Azure Function by using the <em>Azure Cosmos DB Trigger<\/em> template. Once you specify account and collection details, this will automatically run the function\u2019s code whenever documents change in a collection.<\/p>\n<p>Finally, by adding the following code to our function, we can write documents to the secondary collection:<\/p>\n<pre class=\"lang:r decode:true\">#r \"Microsoft.Azure.DocumentDB.Core\"\r\nusing System;\r\nusing System.Collections.Generic;\r\nusing Microsoft.Azure.Documents;\r\nusing Microsoft.Azure.Documents.Client;\r\n\/\/Get Account URI and Primary Key from the Keys menu inside your Cosmos Account.\r\nstatic System.Uri uri = new System.Uri(Environment.GetEnvironmentVariable(\"CosmosDBAccountURI\"));\r\nstatic DocumentClient client = new DocumentClient(uri,Environment.GetEnvironmentVariable(\"CosmosDBAccountKey\"));\r\n \r\npublic static void Run(IReadOnlyList&lt;Document&gt; changes, ILogger log)\r\n{\r\n   if (changes != null &amp;&amp; changes.Count &gt; 0){\r\n       foreach(var doc in changes)\r\n       {          \r\n           client.UpsertDocumentAsync(\"\/dbs\/SyncDatabase\/colls\/DestinationCollection\", doc);\r\n       }\r\n       log.LogInformation(\"Documents added or modified: \" + changes.Count);\r\n   }\r\n}<\/pre>\n<p>That\u2019s it! Once the function is created, future writes will be automatically synchronized across the collections. In addition to achieving the best possible performance with single-partition queries across multiple pivots, this configuration also allows you to manage RU\/s with more granularity, since throughput is provisioned on each collection.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>While Cosmos DB offloads many of the hard NoSQL scaling problems, shaping your data and choosing a logical partition key are left to you. The partition key choice is arguably the most important decision you\u2019ll need to make \u2013 it cannot be changed and must be determined upon creation of a collection. <\/p>\n","protected":false},"author":582,"featured_media":27612,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[25,8],"tags":[24,186,3],"class_list":["post-27611","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure","category-data","tag-azure","tag-cosmosdb","tag-team"],"acf":[],"blog_post_summary":"<p>While Cosmos DB offloads many of the hard NoSQL scaling problems, shaping your data and choosing a logical partition key are left to you. The partition key choice is arguably the most important decision you\u2019ll need to make \u2013 it cannot be changed and must be determined upon creation of a collection. <\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/27611","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/582"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=27611"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/27611\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/27612"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=27611"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=27611"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=27611"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}