{"id":1602,"date":"2020-08-24T13:18:35","date_gmt":"2020-08-24T20:18:35","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=1602"},"modified":"2020-08-25T10:01:43","modified_gmt":"2020-08-25T17:01:43","slug":"how-to-handle-reference-tables-in-cosmos-db-that-are-common-in-the-relational-database-world","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/how-to-handle-reference-tables-in-cosmos-db-that-are-common-in-the-relational-database-world\/","title":{"rendered":"Azure Cosmos DB for the SQL Professional &#8211; Referencing Tables"},"content":{"rendered":"<p>I had a <a href=\"https:\/\/www.jamesserra.com\/archive\/2018\/06\/understanding-cosmos-db\/\">previous blog comparing Azure Cosmos DB to a relational database<\/a> and one topic that it did not address that I want to now is how to handle reference tables that are common in the relational database world.<\/p>\n<p>A big difference with Azure Cosmos DB, a NoSQL database, compared to a relational database is you will create a <em>denormalized data model<\/em>.\u00a0 Take a person record for example.\u00a0 You will embed all the information related to a person, such as their contact details and addresses, into a single JSON document.\u00a0 Retrieving a complete person record from the database is now a single read operation against a single container and for a single item.\u00a0 Updating a person record, with their contact details and addresses, is also a\u00a0single write operation against a single item.\u00a0 By denormalizing data, your application typically will have better read performance and write performance and allow for a scale-out architecture since you don&#8217;t need to join tables.<\/p>\n<p><em>(Side note: &#8220;container&#8221; is the generic term. Depending on the API, a <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/databases-containers-items\">specific term<\/a> is used such as &#8220;collection&#8221; for the Core (SQL) API). Think of a container as one or more tables in the relational world. Going a little deeper, think of a container as a group of one or more &#8220;entities&#8221; which share the same partition key. A relational table shares a schema, but containers are not bound in that way.)<\/em><\/p>\n<p>Embedding data works nicely for many cases but there are scenarios when denormalizing your data will cause more problems than it is worth.\u00a0 In a document database, you can have information in one document that relates to data in other documents. While there may be some use cases that are better suited for a relational database than in Azure Cosmos DB (see <a href=\"#choices\">below<\/a>), in most cases you can handle relationships in Azure Cosmos DB by creating a <em>normalized data model<\/em> for them, with the trade-off that it can require more round trips to the server to read data (but improve the efficiency of write operations since less data is written).\u00a0 In general, use normalized data models to represent one-to-many relationships or many-to-many relationships when related data changes frequently. <em>The key is knowing whether the cost of the updates is greater than the cost of the queries<\/em>.<\/p>\n<p>When using a normalized data model, your application will need to handle creating the reference document.\u00a0 One way would be to use a <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/change-feed\">change feed<\/a> that triggers on the creation of a new document &#8211; the change feed essentially triggers an <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/functions\/\">Azure function<\/a> that creates the\u00a0 relationship record.<\/p>\n<p>When using a normalized data model, your application will need to query the multiple documents that need to be joined (costing more money because it will use more <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/request-units\">request units<\/a>), and do the joining within the application (i.e. join a main document with documents that contain the reference data) as you cannot do a &#8220;join&#8221; between documents within different containers in Azure Cosmos DB (joins between documents within the same container can be done via <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/sql-query-join\">self-joins<\/a>).\u00a0 Since every time you display a document it needs to search the entire container for the name, it would be best to put the other document type (the reference data) in a different container so you can have different partition keys for each document type (read up on how <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/how-to-model-partition-example\">partitioning<\/a> can make a big impact on performance and cost).<\/p>\n<p>Note that &#8220;partitioning&#8221; in a relational database management system (RDBMS) compared to Azure Cosmos DB are different things: partitioning in Azure Cosmos DB refers to &#8220;sharding&#8221; or &#8220;<em>horizontal partitioning<\/em>&#8220;, where replica sets which contain both the data\u00a0and copies of compute (database) resources operating in a \u201cshared nothing\u201d architecture (i.e. scaled \u201chorizontally\u201d where each compute resource (server node) operates independently of every other node, but with a programming model transparent to developers). Conversely, what is often referred to as \u201cpartitioning\u201d in a RDBMS is\u00a0purely a separation of\u00a0data into separate file groups within a\u00a0shared compute (database) environment. This is also often called \u201c<em>vertical partitioning<\/em>\u201d.<\/p>\n<p>Another option that is common pattern for NoSQL databases is to create a separate container to satisfy specific queries.\u00a0 For example, having a container for products based on category and another container for products based on geography.\u00a0 Both of those containers for my query\/app are being sourced from one that is my &#8220;main&#8221; or &#8220;source&#8221; container that is being updated (front end, or another app) and the change feed attached to that pushes out to my other containers that I use for my queries.\u00a0 This means duplicating data, but storage is cheap and you save costs to retrieve data (think of those extra containers as covering indexes in the relational database world).<\/p>\n<p>Since joining data can involve multiple ways of reading the data, it&#8217;s important to understand the two ways to read data using the Azure Cosmos DB Core (SQL) API:<\/p>\n<ul class=\"wp-block-list\">\n<li>Point reads\u00a0&#8211; You can do a key\/value lookup on a single\u00a0<em>item ID<\/em>\u00a0and partition key. The\u00a0<em>item ID<\/em>\u00a0and partition key combination is the key and the item itself is the value. For a 1 KB document, point reads typically cost 1\u00a0<a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/request-units\">request unit<\/a>\u00a0with a latency under 10ms. Point reads return a single item<\/li>\n<li>SQL queries\u00a0&#8211; You can query data by writing queries using the Structured Query Language (SQL) as a JSON query language. Queries always cost at least 2.3 request units and, in general, will have a higher and more variable latency than point reads. Queries can return many items. See <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/sql-query-getting-started\">Getting started with SQL queries<\/a><\/li>\n<\/ul>\n<p><em>The key in deciding when to use a normalized data model is how frequently the data will change<\/em>.\u00a0 If the data only changes once a year it may not be worthwhile to create a reference document and instead just do an update to all the documents.\u00a0 But be aware that the update has to be done from the client side spread over the affected documents, doing it in batches as one big UPDATE statement does not exist in Cosmos DB.\u00a0 You will need to retrieve the entire document from Azure Cosmos DB, update the property\/properties in your application and then call the \u2018Replace\u2019 method in the zure Cosmos DB SDK to replace the document in question (see <a href=\"https:\/\/thedavidmasters.com\/2019\/01\/01\/cosmosdb-updating-a-document-partially\/\">Cosmos DB \u2013 Updating a Document (Partially)<\/a>). If you are using Core (SQL) API and .NET or Java, you can consider using <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/bulk-executor-dot-net\">bulk support<\/a>\u00a0(.NET) or <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/bulk-executor-java\">bulk executor<\/a> (Java). Other ideas would involve using <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/change-feed\">change feed<\/a>, or if you really need a level of ACID consistency, you can achieve this using <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/how-to-write-stored-procedures-triggers-udfs\">stored procedures<\/a>, with snapshot isolation scoped to a single partition\u00a0(this is not the same as stored procedures in SQL &#8211; rather these are designed specifically to support multi-doc transactions).<\/p>\n<p>Also be aware that because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships that you have in documents are effectively &#8220;weak links&#8221; and will not be verified by the database itself.\u00a0 If you want to ensure that the data a document is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.<\/p>\n<h4 class=\"wp-block-heading\" id=\"choices\"><strong>What are OLTP scenarios where a relational database is essential?<\/strong><\/h4>\n<p>Avoiding the broader topic of when to use a <a href=\"https:\/\/www.jamesserra.com\/archive\/2015\/08\/relational-databases-vs-non-relational-databases\/\">relational database over a non-relational database<\/a>, there are a few use cases where a relational database is essential:<\/p>\n<ul class=\"wp-block-list\">\n<li>The customer experience and comfort zone is with relational databases. It is a reality that relational databases are ahead in the maturity curve with respect to tooling (an example would be foreign-key constraint behavior). However, it should be noted that this is not the same as saying that \u201cmore use cases are technically better suited to the relational model\u201d. Rather, the barrier to entry in new customer projects tends to be lower because mindshare is greater in the relational space. In these cases, it often isn\u2019t worth the effort for companies to upskill<\/li>\n<li>The system really needs strict ACID semantics across the entire dataset. Sharded\/partitioned databases like Azure Cosmos DB will not provide ACID guarantees across the entire set of physical partitions (and likely never will). In reality, however, the use cases where this is necessary is quite small. Things like transaction management and other SDK-level things that go along with these aspects come easier in the RDBMS space, but this is really the same as above point \u2013 RDBMS is ahead on maturity curve for user-level tooling to help abstract paradigm specific concepts \u2013 but this does not make the paradigm better suited to a greater number of use cases<\/li>\n<li>Having a single data store that services both operational and analytical needs with equal utility, including tabular models \u2013 this is probably the most powerful argument, and NoSQL engines are likely never going to serve a data structure that coalesces as well into tabular models that produce reports, charts, graphs, etc. But again, history has proven that, at scale, the \u201cone fits all\u201d approach can have some non-trivial drawbacks. And the new <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/analytical-store-introduction\">Analytical Store<\/a> in Azure Cosmos DB is addressing the need to service both operational and analytical needs<\/li>\n<\/ul>\n<p>You can create complex hierarchical \u201crelationships\u201d in Azure Cosmos DB, which would have to be modelled in separate tables in an RDBMS. Azure Cosmos DB can\u2019t handle them using joins \u2013 but again, this is a paradigmatic\/semantic difference, not a fundamental flaw in the database model itself. In order to do the equivalent of what one may be trying to achieve in a relational database, you may have to \u201cunlearn what you have learned\u201d, but this comes back to your comfort level with a RDBMS, which is not a trivial thing and can be the main and very valid reason for staying with a RDBMS.<\/p>\n<p>In summary, in a NoSQL database like Azure Cosmos DB, most use cases are covered. Some things are a little harder (due to lack of maturity in tooling), but most things are easier, many things can only be done in NoSQL (i.e. handling millions of transactions per second), and very few things cannot be done in a NoSQL database. Most NoSQL engines are characterized by having a lot more configurability, tunability, and flexibility than a RDBMS. And in many ways, that is the hardest challenge for newcomers.<\/p>\n<p>More info:<\/p>\n<p><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/modeling-data\">Data modeling in Azure Cosmos DB<\/a><\/p>\n<p>Video <a href=\"https:\/\/myignite.techcommunity.microsoft.com\/sessions\/79932?source=sessions\">Data modelling and partitioning in Azure Cosmos DB: What every relational database user needs to know<\/a><\/p>\n<p>Video <a href=\"https:\/\/www.youtube.com\/watch?time_continue=211&amp;v=CgYQo6uHyt0&amp;feature=emb_logo\">A tour of Azure Cosmos DB database operations models<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How should you handle reference tables that are common in the relational database world when working in Azure Cosmos DB? This post walks through some of the differences SQL professionals will encounter, and how developing in a Azure Cosmos DB environment compares to a relational database management service (RDBMS).<\/p>\n","protected":false},"author":37831,"featured_media":61,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14,1611],"tags":[],"class_list":["post-1602","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-core-sql-api","category-data-architecture"],"acf":[],"blog_post_summary":"<p>How should you handle reference tables that are common in the relational database world when working in Azure Cosmos DB? This post walks through some of the differences SQL professionals will encounter, and how developing in a Azure Cosmos DB environment compares to a relational database management service (RDBMS).<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/1602","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/37831"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=1602"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/1602\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/61"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=1602"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=1602"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=1602"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}