{"id":4437,"date":"2022-06-28T06:51:17","date_gmt":"2022-06-28T13:51:17","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=4437"},"modified":"2022-06-28T06:51:17","modified_gmt":"2022-06-28T13:51:17","slug":"benchmarking-data-migration-from-cassandra-to-azure-cosmos-db-cassandra-api","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/benchmarking-data-migration-from-cassandra-to-azure-cosmos-db-cassandra-api\/","title":{"rendered":"Benchmarking Data Migration from Cassandra to Azure Cosmos DB Cassandra API"},"content":{"rendered":"<p>About the authors: \u00a0<a href=\"https:\/\/www.linkedin.com\/in\/akashshankaran\/\">Akash<\/a> &amp; <a href=\"https:\/\/www.linkedin.com\/in\/kayaalp\/\">Alp<\/a><\/p>\n<p>We are working with many customers who for a variety of reasons such as not having to deal with patching the O\/S, upgrades, scalability, etc. are looking to move their Cassandra IaaS workloads to Azure and one of the destinations is <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/cassandra\/cassandra-introduction\">Azure Cosmos DB Cassandra API<\/a>. We wanted to explore and dive deeper into performance details when we execute a historical data migration from Cassandra IaaS into Azure Cosmos DB Cassandra API. \u00a0By historical data migration, we are referring to the one-time data movement as opposed to the incremental data updates which will typically come from the application-level modifications. \u00a0Specifically, we wanted to explore the linear scalability of Azure Cosmos DB in terms of INSERT throughput from the lens of this historical data migration. \u00a0Linear scalability is defined as increasing a compute resource by X% and observing at least a corresponding X% improvement in load performance. To validate this, we ran a series of experiments, and benchmarked the results. In these experiments, we loaded a set of documents under varying scales of Azure Cosmos DB and Azure Databricks with the objective to observe load performance scale and improvements with increasing Azure Cosmos DB capacity.<\/p>\n<p>To evaluate our scalability theory, we deployed the architecture consisting of the following components:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1.png\"><img decoding=\"async\" class=\"size-large wp-image-4441 alignleft\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-1024x529.png\" alt=\"Image Screenshot 2022 06 27 082430 component architecture v1\" width=\"640\" height=\"331\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-1024x529.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-300x155.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-768x397.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-1536x794.png 1536w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-2048x1058.png 2048w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-size: 10pt;\"><em>See Appendix for deployment details of each of the above solutions<\/em><\/span><\/p>\n<ol>\n<li><strong>Cassandra<\/strong> was deployed on Azure as an IaaS instance. We loaded a dataset of 4.08M records into a keyspace\/table.<\/li>\n<li><strong>Azure Databricks<\/strong> is a Spark based data integration platform and was leveraged to read from IaaS Cassandra and write to Cosmos DB Cassandra API.<\/li>\n<li><strong>Azure Cosmos DB<\/strong> with the Cassandra API is where the Cassandra IaaS data will be migrated.<\/li>\n<\/ol>\n<h3>Setup<\/h3>\n<p>A couple of factors to keep in mind as we go over the findings of the benchmark:<\/p>\n<ul style=\"list-style-type: square;\">\n<li>The Cassandra IaaS, Databricks and Azure Cosmos DB Cassandra instances are co-located in the same Azure region. This avoids latencies due to cross-region network traffic.<\/li>\n<li>Cosmos DB uses <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/request-units\">RUs (Request Units)<\/a> to scale the database. Please refer to the link describing RUs to understand cost per request breakdown.<\/li>\n<li>The Cassandra instance was in a single region i.e. no geo-replication was involved.<\/li>\n<li>Data size per row was ~1.3kb and generated using <a href=\"https:\/\/cassandra.apache.org\/doc\/latest\/cassandra\/tools\/cassandra_stress.html\">Cassandra-stress<\/a> tool, which is publicly available. Total data size was approximately 5gb.<\/li>\n<li>Data is evenly distributed using a composite primary key generated using random distribution. A well sharded partition key will help avoid hot partition problems. <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/partitioning-overview\">This article<\/a> goes further into partitioning in Azure Cosmos DB service.<\/li>\n<li>This benchmark focuses primarily on the write path.<\/li>\n<li>Environment details like machine types and worker nodes can be found in the Appendix<\/li>\n<\/ul>\n<h4>Benchmark Results<\/h4>\n<p>The following is a summary benchmark of moving the initial dataset from Cassandra\/IaaS to Azure Cosmos DB Cassandra API. Please see details on each run, and the findings after the summary section below:<\/p>\n<p>&nbsp;<\/p>\n<table style=\"width: 47.9999%; height: 1566px;\">\n<tbody>\n<tr>\n<td style=\"width: 8.90625%;\" width=\"57\">Run<\/p>\n<p>&nbsp;<\/td>\n<td style=\"width: 12.8125%;\" width=\"81\">Source Data in Cassandra<\/td>\n<td style=\"width: 73.5901%;\" width=\"336\">Parameters<\/td>\n<td style=\"width: 5.51997%;\" width=\"69\">Duration (mins)<\/td>\n<td style=\"width: 11.0938%;\" width=\"69\">Throttled Requests %<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 8.90625%;\" width=\"57\">#1<\/td>\n<td style=\"width: 12.8125%;\" width=\"81\">4.08M rows<\/td>\n<td style=\"width: 73.5901%;\" width=\"336\">Cosmos:<\/p>\n<p>24,000 RUs<\/p>\n<p>Azure Databricks:<\/p>\n<p>spark.cassandra.output.concurrent.writes -&gt; 25, spark.cassandra.concurrent.reads -&gt; 512,<\/p>\n<p>spark.cassandra.output.batch.grouping.buffer.size -&gt; 512<\/p>\n<p>&nbsp;<\/td>\n<td style=\"width: 5.51997%;\" width=\"69\">52.36 mins<\/td>\n<td style=\"width: 11.0938%;\" width=\"69\">81.5%<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 8.90625%;\" width=\"57\">#2<\/td>\n<td style=\"width: 12.8125%;\" width=\"81\">4.08M rows<\/td>\n<td style=\"width: 73.5901%;\" width=\"336\">Cosmos:<\/p>\n<p><span style=\"color: #99cc00;\">40,000 RUs<\/span><\/p>\n<p>Azure Databricks:<\/p>\n<p>spark.cassandra.output.concurrent.writes -&gt; 25, spark.cassandra.concurrent.reads -&gt; 512,<\/p>\n<p>spark.cassandra.output.batch.grouping.buffer.size -&gt; 512<\/p>\n<p>&nbsp;<\/td>\n<td style=\"width: 5.51997%;\" width=\"69\">23.07 mins<\/td>\n<td style=\"width: 11.0938%;\" width=\"69\">62%<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 8.90625%;\" width=\"57\">#3<\/td>\n<td style=\"width: 12.8125%;\" width=\"81\">4.08M rows<\/td>\n<td style=\"width: 73.5901%;\" width=\"336\">Cosmos:<\/p>\n<p><span style=\"color: #99cc00;\">60,000 RUs<\/span><\/p>\n<p>Azure Databricks:<\/p>\n<p>spark.cassandra.output.concurrent.writes -&gt; 25, spark.cassandra.concurrent.reads -&gt; 512,<\/p>\n<p>spark.cassandra.output.batch.grouping.buffer.size -&gt; 512<\/p>\n<p>&nbsp;<\/td>\n<td style=\"width: 5.51997%;\" width=\"69\">14.62 mins<\/td>\n<td style=\"width: 11.0938%;\" width=\"69\">19.8%<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 8.90625%;\" width=\"57\">#4<\/td>\n<td style=\"width: 12.8125%;\" width=\"81\">4.08M rows<\/td>\n<td style=\"width: 73.5901%;\" width=\"336\">Cosmos:<\/p>\n<p><span style=\"color: #99cc00;\">80,000 RUs<\/span><\/p>\n<p>Azure Databricks:<\/p>\n<p>spark.cassandra.output.concurrent.writes -&gt; 25, spark.cassandra.concurrent.reads -&gt; 512,<\/p>\n<p>spark.cassandra.output.batch.grouping.buffer.size -&gt; 512<\/p>\n<p>&nbsp;<\/td>\n<td style=\"width: 5.51997%;\" width=\"69\">11.53 mins<\/td>\n<td style=\"width: 11.0938%;\" width=\"69\">0%<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 8.90625%;\" width=\"57\">#5<\/td>\n<td style=\"width: 12.8125%;\" width=\"81\">4.08M rows<\/td>\n<td style=\"width: 73.5901%;\" width=\"336\">Cosmos: 80,000 RUs<\/p>\n<p>Azure Databricks:<\/p>\n<p><span style=\"color: #99cc00;\">spark.cassandra.output.concurrent.writes -&gt; 20<\/span>, spark.cassandra.concurrent.reads -&gt; 512,<\/p>\n<p>spark.cassandra.output.batch.grouping.buffer.size -&gt; 512<\/p>\n<p>&nbsp;<\/td>\n<td style=\"width: 5.51997%;\" width=\"69\">5.72 mins<\/td>\n<td style=\"width: 11.0938%;\" width=\"69\">0%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><em><span style=\"color: #99cc00;\">Green text<\/span> above indicates what was changed from the previous run<\/em><\/p>\n<h5>Run#1<\/h5>\n<p>In our first run, based on estimates of how we plan to use the data set, we arrived upon an initial RU value of 24,000. <a href=\"https:\/\/cosmos.azure.com\/capacitycalculator\/\">Azure Cosmos DB capacity calculator<\/a> can also be used to simplify this determination. We captured a baseline time of 52.36 mins to INSERT 4.08M rows.\u00a0 Following the first run, we adjusted the environment depending on the bottleneck(s) we found. For instance, if we found we were getting 429 errors (<a href=\"https:\/\/docs.microsoft.com\/en-us\/rest\/api\/cosmos-db\/http-status-codes-for-cosmosdb\">which translates to throttling on Azure Cosmos DB<\/a>), we would increase the request units (i.e. throughput) of Azure Cosmos DB or adjust spark job parallelism. Our approach was to make <u>only one change<\/u> at a time and re-execute the run to capture the findings.\u00a0 In our final run, for the same dataset we were able to migrate the same 4.08M rows in 5.72 mins.<\/p>\n<p>&nbsp;<\/p>\n<h5>Adjustments After Run #1<\/h5>\n<p>After executing Run #1, we checked the monitoring metrics for Azure Cosmos DB and we found there were many 429 throttled requests messages returned \u2013 see diagram below.\u00a0 Given we had a little over 4 million documents at the source, and we were seeing over 3 million 429 errors through the run duration, we knew we had to increase the RUs.\u00a0 Given around 81% of requests got throttled, we knew we had to increase the RUs (i.e., Cosmos DB throughput) to a larger size. The number of throttled requests vs the total requests on Azure Cosmos DB can be viewed from the Metrics or Metrics (Classic) blades within the Cosmos DB instance.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/run-1-429-errors.png\"><img decoding=\"async\" class=\"alignleft wp-image-4446 \" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/run-1-429-errors.png\" alt=\"Image run 1 429 errors\" width=\"507\" height=\"203\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>In addition to checking the various architecture components for bottlenecks, we also checked the source Cassandra IaaS VM to see if there were contentions. However, the machine was not under any strain. \u00a0Although not captured in a screenshot similar to the Cassandra IaaS VM, we also reviewed the Azure Databricks environment, where CPU utilization was below 10%.\u00a0 Given the Cassandra Iaas VM and Azure Databricks were not under any resource strain, we could really look into increasing the parallel reads (i.e specifically within the Databricks notebook -&gt; spark.cassandra.concurrent.reads -&gt; 512) by adjusting the scala read settings.\u00a0 \u00a0The approach of attempting to parallelize the read could also be a good strategy to improve performance by magnitudes order, once the 429 throttling is addressed.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/cassandra-iaas-vm-on-azure-with-cpu-monitoring.png\"><img decoding=\"async\" class=\"alignleft wp-image-4449 \" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/cassandra-iaas-vm-on-azure-with-cpu-monitoring.png\" alt=\"Image cassandra iaas vm on azure with cpu monitoring\" width=\"580\" height=\"282\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>We also checked the partitions to see if they were relatively evenly distributed.\u00a0 Although not this article\u2019s objective, your partition key and objective of evenly distributing data plus throughput are important. As can been in the screenshot below, our data and throughput are evenly distributed across the partitions.\u00a0 Please note \u2013 we also checked in detail to see if any of the partitions were hot (over used) and they were not.\u00a0 It is important to be aware that the RUs are distributed among the partitions and so each partition does not get the full RU allocated but only a fraction (specifically each partition will get RUs as follows (partition N \/ partition count total) * RUs total).<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/partition-distribution-cassandra-cosmos-db.png\"><img decoding=\"async\" class=\"alignleft size-medium wp-image-4450\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/partition-distribution-cassandra-cosmos-db-300x278.png\" alt=\"Image partition distribution cassandra cosmos db\" width=\"300\" height=\"278\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/partition-distribution-cassandra-cosmos-db-300x278.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/partition-distribution-cassandra-cosmos-db.png 471w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h5>Adjustments After Run #2 and #3<\/h5>\n<p>After run # 2 we still saw 429 errors, but at a smaller volume and for shorter duration as compared to run#1. After run #2, we specifically had over 2.5 million 429 errors and so as a percentage it still translated to a large percentage of 62%.\u00a0 So, we decided to further increase throughput to 60,000 RUs in Run #3.\u00a0 As you can see on the diagram below, the number of 429 errors along with the duration (represented on the X-axis) was significantly lower than run #1. Specifically, for run # 3, we had 1.2 million 429 errors at peak, plus a much shorter duration (19.8% of requests were throttled)<\/p>\n<p>Run #3<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/429-errors-after-run-3-azure-cosmosdb.png\"><img decoding=\"async\" class=\"alignleft wp-image-4452 \" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/429-errors-after-run-3-azure-cosmosdb.png\" alt=\"Image 429 errors after run 3 azure cosmosdb\" width=\"471\" height=\"209\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Since we were still getting 429 errors, we decided to bump up the RUs on Azure Cosmos DB to 80,000 and moved on to run # 4.<\/p>\n<h5>Adjustments \u2013 Run #4<\/h5>\n<p>After adjusting the RUs to 80,000, we no longer received any 429 errors and found the right capacity for this workload from an initial data migration perspective.\u00a0 Please see diagram below with no 429 errors at 80,000 RUs.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/429-errors-after-run-4-azure-cosmosdb-no-remaining-errors.png\"><img decoding=\"async\" class=\"alignleft size-medium wp-image-4454\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/429-errors-after-run-4-azure-cosmosdb-no-remaining-errors-300x173.png\" alt=\"Image 429 errors after run 4 azure cosmosdb no remaining errors\" width=\"300\" height=\"173\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/429-errors-after-run-4-azure-cosmosdb-no-remaining-errors-300x173.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/429-errors-after-run-4-azure-cosmosdb-no-remaining-errors-768x443.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/429-errors-after-run-4-azure-cosmosdb-no-remaining-errors.png 958w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>We then decided to optimize the migration scenario further and execute one more run to try and configure some of the settings on the Azure Databricks notebook.\u00a0 See details of Run #5 below.<\/p>\n<h5>Final Run Adjustments \u2013 Run #5<\/h5>\n<p>Our Azure Databricks was still used &lt; 10% in terms of CPU utilization. So, in the final run, we decided to modify the notebook parameters and thought we could tweak the concurrent write setting.\u00a0 Our thinking was to walk down the architecture and optimize the next component. Since the 80,000 Rus capacity was enough to prevent Azure Cosmos DB throttling, we switched focus in this run on Azure Databricks optimization, and decided to address the contention on the write side than the read side.\u00a0 With NoSQL databases this is often true given that each INSERT () has a significant cost because of the replication of the data along with the index creation. Given Cosmos DB maintains 4 replicas of data locally within a region, an INSERT operation leads ultimately to 4 writes. Given this, we thought we would reduce the concurrent writes slightly to 20 and we found that our performance further improved to 5.72 mins.\u00a0\u00a0 Reducing the writes to 20 means queuing a smaller number of write tasks and reduces the chances of spark jobs restarting if the Cassandra server is not able to keep up. While the 80,000 RUs were just barely enough to prevent throttling, reducing the concurrent writes per spark executor in addition gave us a further increase in performance.\u00a0 An alternative could\u2019ve been to increase the concurrent writes and simultaneously increase the Azure Cosmos DB throughput above 80,000 RUs. The key takeaway is for the same resource costs, more tweaking and modifications will ultimately lead to huge performance gains.<\/p>\n<p>So, to summarize, for a given dataset on an initial\/one-time load, we were able to reduce the load time from 52 mins down to 5.72 mins \u2013 which translates to an order of magnitude better performance and more than linear scalability as we started from 24,000 RUs to 80,000 RUs.<\/p>\n<p>&nbsp;<\/p>\n<h5>Other Considerations\u00a0 &#8211; Cost<\/h5>\n<p>Azure Cosmos DB has 2 cost components:<\/p>\n<ol>\n<li>Azure Cosmos DB compute which is in the form of Request Units (RUs) (typically the largest cost portion)<\/li>\n<li>Storage Cost (typically the lowest cost portion)<\/li>\n<\/ol>\n<p>At a high level:<\/p>\n<p>100 RUs * 730 Hours per Month * $0.008 RU\/s per hour<\/p>\n<p>80,000 RUs * 24 hours (assuming you ran your initial load for the entire day) * 0.008 100 RU\/s per hour<\/p>\n<p>$153.60 USD per day<\/p>\n<p>&nbsp;<\/p>\n<p>After the initial load is complete, you can reduce your 80,000 RUs to a minimum of 10% of the maximum throughput ever provisioned or 8,000 RUs. This is possible given Azure Cosmos DB is a truly elastic NoSql database, which can be scaled up or down to meet the needs of the workload. The scaling functionality can be done either manually as described above, or by using the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/provision-throughput-autoscale\">Autoscale<\/a> built-in capability. \u00a0So, for on-going operations, 8000 RUs would be the minimum allowable setting:<\/p>\n<p>8,000 RUs * 24 hours\/day * 0.008 100 RU\/s per hour =<\/p>\n<p>$15.36 USD per day<\/p>\n<p>Please note, we incrementally increased RUs through the experiments, and didn\u2019t immediately set the Azure Cosmos DB at a high RU because we knew we had the ability to reduce the size to a minimum of 10% of the RU.\u00a0 For instance, if we had provisioned 1,000,000 RUs initially, then we could at most reduce it to 100,000 RUs for on-going operations.\u00a0 Something to be aware of while onboarding your workloads and configuring the throughput for Azure Cosmos DB is to plan for regular usage and the ability to scale up 10X to meet your peak needs such as data loads or migrations.<\/p>\n<h5>Conclusion \/ Lessons Learned<\/h5>\n<ul>\n<li>Azure Cosmos DB Cassandra API was able to scale at a better than linear rate with each unit of increased capacity. In summary, we tripled the RUs and saw load performance improve more than 9 times:<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"319\">Run Summary<\/td>\n<td width=\"319\">Load Duration<\/td>\n<\/tr>\n<tr>\n<td width=\"319\">24,000 RUs to 80,000 RUs ~3 times increase in capacity<\/td>\n<td width=\"319\">52 mins decrease to 5.7 minutes \u2013 9 times better performance<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<ul style=\"list-style-type: square;\">\n<li>Performance can be improved further while keeping costs consistent, by increasing the read rate on Azure Databricks environment. In essence, modifying the scala script to parallelize even further could lead to further performance improvements.<\/li>\n<li>We were able to migrate around 4M rows, which is about 5 GB of data within 6 minutes. This was achieved within reasonable cost as highlighted above.<\/li>\n<li>The goal was to find an optimal tradeoff between costs on Cosmos DB i.e., RUs and the speed of the migration, given a sample dataset.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p><!--nextpage--><\/p>\n<p>&nbsp;<\/p>\n<h3>Appendix<\/h3>\n<p>&nbsp;<\/p>\n<p>Environment Details<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1.png\"><img decoding=\"async\" class=\"alignleft size-medium wp-image-4441\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-300x155.png\" alt=\"Image Screenshot 2022 06 27 082430 component architecture v1\" width=\"300\" height=\"155\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-300x155.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-1024x529.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-768x397.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-1536x794.png 1536w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/Screenshot-2022-06-27-082430-component-architecture-v1-2048x1058.png 2048w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<ol>\n<li>Cassandra IaaS on Azure (single VM)<\/li>\n<\/ol>\n<ul style=\"list-style-type: square;\">\n<li>Azure VM SKU DS14-8 v2<\/li>\n<li>8 vCPUs<\/li>\n<li>112 GB RAM<\/li>\n<li>Single node cluster<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>Sample of the data that was migrated to Cosmos DB Cassandra API<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/data-sample-of-rows-cassandra-iaas-VM.png\"><img decoding=\"async\" class=\"alignleft wp-image-4458 size-large\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/data-sample-of-rows-cassandra-iaas-VM-1024x126.png\" alt=\"Image data sample of rows cassandra iaas VM\" width=\"640\" height=\"79\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/data-sample-of-rows-cassandra-iaas-VM-1024x126.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/data-sample-of-rows-cassandra-iaas-VM-300x37.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/data-sample-of-rows-cassandra-iaas-VM-768x94.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/data-sample-of-rows-cassandra-iaas-VM.png 1369w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Keyspace\/table schema:<\/p>\n<p>&nbsp;<\/p>\n<p>stresscql.typestest10m<\/p>\n<p>(\u00a0\u00a0\u00a0 name text,<\/p>\n<p>choice boolean,<\/p>\n<p>date timestamp,<\/p>\n<p>address inet,<\/p>\n<p>dbl double,<\/p>\n<p>lval bigint,<\/p>\n<p>ival int,<\/p>\n<p>uid timeuuid,<\/p>\n<p>value blob,<\/p>\n<p>col1 text,<\/p>\n<p>col2 text,<\/p>\n<p>col3 text,<\/p>\n<p>col4 text,<\/p>\n<p>col5 text)<\/p>\n<p>&nbsp;<\/p>\n<ol>\n<li>Azure Databricks\n<ul style=\"list-style-type: square;\">\n<li>Standard Cluster\/Runtime 9.1 LTS (Scala 2.12\/Spark 3.1.2)<\/li>\n<li>Standard DS3_v2<\/li>\n<li>Minimum 1 to maximum 4 worker 14 GB RAM 4 cores<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<p>Azure Databricks provides a baseline scala notebook which just needs to be configured for your environment in order to function. After making the configurations, the baseline scala notebook can read from your Cassandra environment and write to your Cosmos DB.\u00a0 Baseline Azure Databricks configuration file can be found <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/cassandra\/migrate-data-databricks\">here<\/a><\/p>\n<p>Sample of the Azure Databricks notebook and parameters that can be modified:<\/p>\n<p>stresscql, table -&gt; typestest10m, cosmosCassandra: scala.collection.immutable.Map[String,String] = Map(<span style=\"color: #0000ff;\">spark.cassandra.output.concurrent.writes -&gt; 25<\/span>, <span style=\"color: #0000ff;\">spark.cassandra.concurrent.reads -&gt; 512<\/span>, spark.cassandra.connection.ssl.enabled -&gt; true, <span style=\"color: #0000ff;\">spark.cassandra.connection.keep_alive_ms -&gt; 600000000<\/span>, spark.cassandra.output.batch.size.rows -&gt; 1, <span style=\"color: #0000ff;\">spark.cassandra.output.batch.grouping.buffer.size -&gt; 512<\/span>,<\/p>\n<p>note: <span style=\"color: #0000ff;\">blue<\/span> text above indicate configurations we can change<\/p>\n<ol>\n<li>Azure Cosmos DB\n<ul style=\"list-style-type: square;\">\n<li>Single region deployment<\/li>\n<li>Request Units sizes varied from run #1 24,000 RUs to run #5 with 80,000 RUs<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p><em>See sample of data from Cassandra that was inserted into Cosmos DB<\/em><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/sampledata-in-azure-cosmos-db-after-load.png\"><img decoding=\"async\" class=\"alignleft size-large wp-image-4459\" src=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/sampledata-in-azure-cosmos-db-after-load-1024x332.png\" alt=\"Image sampledata in azure cosmos db after load\" width=\"640\" height=\"208\" srcset=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/sampledata-in-azure-cosmos-db-after-load-1024x332.png 1024w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/sampledata-in-azure-cosmos-db-after-load-300x97.png 300w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/sampledata-in-azure-cosmos-db-after-load-768x249.png 768w, https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-content\/uploads\/sites\/52\/2022\/06\/sampledata-in-azure-cosmos-db-after-load.png 1252w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>About the authors: \u00a0Akash &amp; Alp We are working with many customers who for a variety of reasons such as not having to deal with patching the O\/S, upgrades, scalability, etc. are looking to move their Cassandra IaaS workloads to Azure and one of the destinations is Azure Cosmos DB Cassandra API. We wanted to [&hellip;]<\/p>\n","protected":false},"author":94427,"featured_media":61,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"image","meta":{"_acf_changed":false,"footnotes":""},"categories":[16,1809,996,1778,19],"tags":[499,1075,287,957,286,290],"class_list":["post-4437","post","type-post","status-publish","format-image","has-post-thumbnail","hentry","category-cassandra-api","category-customers","category-migration","category-spark","category-tips-and-tricks","tag-azure-cosmos-db","tag-cassandra-api","tag-cosmos-db","tag-cosmosdb","tag-migration","tag-spark","post_format-post-format-image"],"acf":[],"blog_post_summary":"<p>About the authors: \u00a0Akash &amp; Alp We are working with many customers who for a variety of reasons such as not having to deal with patching the O\/S, upgrades, scalability, etc. are looking to move their Cassandra IaaS workloads to Azure and one of the destinations is Azure Cosmos DB Cassandra API. We wanted to [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/4437","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/94427"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=4437"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/4437\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/61"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=4437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=4437"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=4437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}