{"id":1017,"date":"2020-05-04T06:30:49","date_gmt":"2020-05-04T13:30:49","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=1017"},"modified":"2020-05-06T09:11:22","modified_gmt":"2020-05-06T16:11:22","slug":"migrating-relational-data-into-azure-cosmos-db-using-only-azure-data-factory","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/migrating-relational-data-into-azure-cosmos-db-using-only-azure-data-factory\/","title":{"rendered":"Migrate relational data into Azure Cosmos DB using Azure Data Factory"},"content":{"rendered":"<p>In my <a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/migrating-relational-data-into-cosmos-db-using-azure-data-factory-and-azure-databricks\/\" target=\"_blank\" rel=\"noopener noreferrer\">previous post<\/a>, I explained how to use Azure Databricks and the Apache Spark <a href=\"https:\/\/spark.apache.org\/docs\/2.4.0\/api\/python\/pyspark.sql.html?highlight=collect_list#pyspark.sql.functions.collect_list\" target=\"_blank\" rel=\"noopener noreferrer\">collect_list<\/a> function to perform a two-table relational data migration to NoSQL, using the embedding approach to support a one-to-many relationship. I used Apache Spark because at that time we didn&#8217;t have the right native functions in Azure Data Factory (ADF) to support this transformation. Well, <a href=\"https:\/\/techcommunity.microsoft.com\/t5\/azure-data-factory\/adf-adds-new-hierarchical-data-handling-and-new-flexibility-for\/ba-p\/1353956#\" target=\"_blank\" rel=\"noopener noreferrer\">now we have it<\/a> and it is (not surprisingly) called <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/data-flow-expression-functions#collect\" target=\"_blank\" rel=\"noopener noreferrer\">collect<\/a>. This function will take multiple values and aggregate them into an array. We can use collect to create arrays or long strings:<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 50%;\"><code>collect(ProductID)<\/code><\/td>\n<td style=\"width: 50%;\">[52734, 734625, 742325, 946256, 245462]<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 50%;\"><code>toString ( reduce ( collect (ProductID) , '', #item + ', ' + #acc, #result) )<\/code><\/td>\n<td style=\"width: 50%;\">&#8220;52734, 734625, 742325, 946256, 245462&#8221;<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Or we can use them to create more complex structures:<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 50.3963%;\"><code>collect(@(prodId = ProductID,\nqty = OrderQty))<\/code><\/td>\n<td style=\"width: 49.6037%;\"><code>[<\/code>\n<code>{<\/code><code>prodId: 52734,<\/code><code>qty: 24<\/code><code>},<\/code>\n<code>{<\/code><code>prodId: 734625,<\/code><code>qty: 13<\/code><code>},<\/code>\n<code>{<\/code><code>prodId: 742325,<\/code><code>qty: 18<\/code><code>},<\/code>\n<code>{<\/code><code>prodId: 946256,<\/code><code>qty: 2<\/code><code>},<\/code>\n<code>{<\/code><code>prodId: 245462,<\/code><code>qty: 71<\/code><code>}<\/code>\n<code>]<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This post will show you how to migrate relational data to Azure Cosmos DB using only Azure Data Factory, with no code needed. The use case is exactly the same as in my previous post, I&#8217;m adding it here again for quick reference:<\/p>\n<h3><strong><span style=\"font-size: 18pt;\">One-to-many relationships using the embedding approach<\/span><\/strong><\/h3>\n<p>In some One-to-Many scenarios, the recommended approach is to Embed the many side into the one side, thus eliminating the need for joins. A common example is when we have a master\/detail pair of tables like Order Header and Order Detail.<\/p>\n<p><a href=\"http:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailresults-scaled.jpg\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-411\" src=\"http:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailresults-1024x366.jpg\" alt=\"Order Header and Order Detail results\" width=\"640\" height=\"229\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailresults-1024x366.jpg 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailresults-300x107.jpg 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailresults-768x274.jpg 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailresults-1536x549.jpg 1536w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailresults-scaled.jpg 2048w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>Here we have one record for the Order Header and three corresponding records for the Order Detail. In a relational world, we are required to join these two tables (by SalesOrderID) to get a complete picture of sales data. When using the embedded approach to migrate this data to an Azure Cosmos DB (Core SQL API), the data will look like a single document with data for the order, and an array of elements representing data for the detail..<\/p>\n<p><a href=\"http:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsDocument.jpg\"><img decoding=\"async\" class=\"aligncenter wp-image-412\" src=\"http:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsDocument-654x1024.jpg\" alt=\"Order Header and Order Detail As Document\" width=\"400\" height=\"627\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsDocument-654x1024.jpg 654w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsDocument-192x300.jpg 192w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsDocument-768x1203.jpg 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsDocument-981x1536.jpg 981w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsDocument.jpg 1045w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/a><\/p>\n<p>Notice that I left the SalesOrderID element on the embedded documents just for reference. The final implementation will remove these elements as they are not necessary anymore.<\/p>\n<p>&nbsp;<\/p>\n<h3><strong><span style=\"font-size: 18pt;\">The solution: migrating relational data<\/span><\/strong><\/h3>\n<p>The solution has a single Azure Data Factory pipeline with a single Mapping Data Flow activity that reads the relational data, transforms (embed) the data, and finally loads the data to migrate relational data into Azure Cosmos DB. The final data flow should look like this:<\/p>\n<p><figure id=\"attachment_1032\" aria-labelledby=\"figcaption_attachment_1032\" class=\"wp-caption aligncenter\" ><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionComplete-1.jpg\"><img decoding=\"async\" class=\"wp-image-1032 size-full\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionComplete-1.jpg\" alt=\"example of a mapping data flow using Azure Data Factory\" width=\"871\" height=\"368\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionComplete-1.jpg 871w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionComplete-1-300x127.jpg 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionComplete-1-768x324.jpg 768w\" sizes=\"(max-width: 871px) 100vw, 871px\" \/><\/a><figcaption id=\"figcaption_attachment_1032\" class=\"wp-caption-text\">Azure Data Factory Mapping Data Flow<\/figcaption><\/figure><\/p>\n<p>&nbsp;<\/p>\n<p>The DecimalToDouble transformation is required because Azure Cosmos DB can&#8217;t store Decimals with set precision. To create the required Mapping Data Flow:<\/p>\n<ol>\n<li>First we add two <strong>Data Sources<\/strong>: Sales Order Header and Sales Order Detail. Optionally, we could set a hash partition by SalesOrderID on both datasets in the Optimize options.<\/li>\n<li>Then, we add an <strong>Aggregate<\/strong> transform on the Sales Order Detail source grouping by SalesOrderID. We will add one single Aggregate column called Details. This will include all columns we want to &#8220;embed&#8221;. Make sure to wrap the structure into a <code>collect<\/code> function. The expression for the Details field should be:\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\"><code>collect(@(SalesOrderDetailID=SalesOrderDetailID,\nOrderQty=OrderQty,\nProductID=ProductID,\nUnitPrice=toDouble(UnitPrice),\nUnitPriceDiscount=toDouble(UnitPriceDiscount),\nLineTotal=toDouble(LineTotal),\nrowguid=rowguid,\nModifiedDate=ModifiedDate) )<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We use <code>toDouble<\/code> here to make sure we don&#8217;t send decimals to Azure Cosmos DB. The Data Preview on the new Aggregate step should look like this:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunction.jpg\"><img decoding=\"async\" class=\"size-full wp-image-1025 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunction.jpg\" alt=\"Image ADFMappingDataFlowCollectFunction\" width=\"632\" height=\"392\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunction.jpg 632w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunction-300x186.jpg 300w\" sizes=\"(max-width: 632px) 100vw, 632px\" \/><\/a><\/li>\n<li>Now we can <strong>Join<\/strong> the output of the Aggregate transformation with the original SalesOrderHeader source. We use an inner join by SalesOrderID. The output of step has all the columns we need to store in Azure Cosmos DB, and at the right level of granularity (SalesOrderID)\n<a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionInnerJoin.jpg\"><img decoding=\"async\" class=\"size-full wp-image-1028 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionInnerJoin.jpg\" alt=\"Image ADFMappingDataFlowCollectFunctionInnerJoin\" width=\"1025\" height=\"305\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionInnerJoin.jpg 1025w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionInnerJoin-300x89.jpg 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowCollectFunctionInnerJoin-768x229.jpg 768w\" sizes=\"(max-width: 1025px) 100vw, 1025px\" \/><\/a><\/li>\n<li>Then add a Derived Column transformation to cast all remaining Decimal columns to double. These are the four Decimal columns coming from SalesOrderHeader:<a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowDecimaltoDouble.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1033\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowDecimaltoDouble.jpg\" alt=\"Image ADFMappingDataFlowDecimaltoDouble\" width=\"892\" height=\"105\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowDecimaltoDouble.jpg 892w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowDecimaltoDouble-300x35.jpg 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/05\/ADFMappingDataFlowDecimaltoDouble-768x90.jpg 768w\" sizes=\"(max-width: 892px) 100vw, 892px\" \/><\/a><\/li>\n<li>The final step is a Sink operation directly into our Azure Cosmos DB collection to load the transformed data. Make sure to Allow Schema Drift and to Remove Duplicated Input columns.<\/li>\n<li>Once loaded into the collection, documents will look like this from Azure Cosmos DB Data Explorer<\/li>\n<\/ol>\n<p><a href=\"http:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsCombinedDocumentinCosmosDB.jpg\"><img decoding=\"async\" class=\"aligncenter size-large wp-image-433\" src=\"http:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsCombinedDocumentinCosmosDB-1024x840.jpg\" alt=\"Order Header and Order Detail As Combined Document in Cosmos DB\" width=\"640\" height=\"525\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsCombinedDocumentinCosmosDB-1024x840.jpg 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsCombinedDocumentinCosmosDB-300x246.jpg 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsCombinedDocumentinCosmosDB-768x630.jpg 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2020\/03\/OrderHeaderandOrderDetailAsCombinedDocumentinCosmosDB.jpg 1034w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><strong><span style=\"font-size: 18pt;\">Implementation Notes<\/span><\/strong><\/h3>\n<p><span style=\"font-size: 1rem;\">Using Azure Data Factory Mapping Data Flows no-code approach makes it really easy to migrate relational data to Azure Cosmos DB. You can use this same approach to create even more complex multi-level hierarchies or create arrays of values when needed. Read more about how to use <a href=\"https:\/\/docs.microsoft.com\/azure\/data-factory\/how-to-sqldb-to-cosmosdb\" target=\"_blank\" rel=\"noopener noreferrer\">Collect with Azure Cosmos DB<\/a>.<\/span><\/p>\n<h3><span style=\"font-size: 18pt;\"><strong>Get started with Azure Cosmos DB<\/strong><\/span><\/h3>\n<ul>\n<li>Create a new account using <a href=\"https:\/\/portal.azure.com\">Azure Portal<\/a>, ARM template or Azure CLI and connect to it using your favorite tools.<\/li>\n<li>Stay up-to-date on the latest <a href=\"https:\/\/twitter.com\/search?q=%23azurecosmosdb\" target=\"_blank\" rel=\"noopener noreferrer\">#AzureCosmosDB<\/a>\u202fnews and features by following us on Twitter\u202f<a href=\"https:\/\/twitter.com\/azurecosmosdb\" target=\"_blank\" rel=\"noopener noreferrer\">@AzureCosmosDB<\/a>. We are really excited to see what you will build with Azure Cosmos DB!<\/li>\n<\/ul>\n<h3><span style=\"font-size: 18pt;\"><strong>About Azure Cosmos DB<\/strong><\/span><\/h3>\n<p><a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/introduction\" target=\"_blank\" rel=\"noopener noreferrer\">Azure Cosmos DB<\/a> is a fully managed NoSQL database for modern app development, with <a href=\"https:\/\/azure.microsoft.com\/support\/legal\/sla\/cosmos-db\/v1_3\/\" target=\"_blank\" rel=\"noopener noreferrer\">SLA-backed speed and availability<\/a>, automatic and instant <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/scaling-throughput\" target=\"_blank\" rel=\"noopener noreferrer\">scalability<\/a>, and open source APIs for MongoDB, Cassandra, and other NoSQL engines.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s now possible to migrate relational data to Azure Cosmos DB in just six easy steps. Discover the no-code way by using the mapping data flows capability of Azure Data Factory.<\/p>\n","protected":false},"author":20289,"featured_media":1032,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[996,19],"tags":[499,344,742,293,285],"class_list":["post-1017","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-migration","category-tips-and-tricks","tag-azure-cosmos-db","tag-data-factory","tag-data-migration","tag-one-to-few","tag-relational"],"acf":[],"blog_post_summary":"<p>It&#8217;s now possible to migrate relational data to Azure Cosmos DB in just six easy steps. Discover the no-code way by using the mapping data flows capability of Azure Data Factory.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/1017","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/20289"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=1017"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/1017\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/1032"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=1017"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=1017"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=1017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}