{"id":567,"date":"2020-10-20T10:02:22","date_gmt":"2020-10-20T17:02:22","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sql\/?p=567"},"modified":"2023-10-04T10:14:46","modified_gmt":"2023-10-04T17:14:46","slug":"azure-sql-database-as-a-key-value-store","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sql\/azure-sql-database-as-a-key-value-store\/","title":{"rendered":"Testing performance of Azure SQL Database as a key-value store"},"content":{"rendered":"<h2>Executive summary<\/h2>\n<p>Our testing shows that Azure SQL Database can be used as a highly scalable low latency key-value store. Starting with a cost-efficient 4-core General Purpose database, we see an order of magnitude increase in workload throughput as we increase dataset size by 100x and scale across the spectrum of database SKUs to a Business Critical database with 128 cores, <strong>with read and write latency remaining in single-digit milliseconds<\/strong>.<\/p>\n<h2>Key-value store in a RDBMS<\/h2>\n<p>Azure SQL is a relational database, so the title of this blog may come as a surprise. But Azure SQL is also a general-purpose database. The strength of a general-purpose DBMS is in accommodating multiple data models and designs, not all of them necessarily relational, in the same physical database, and still providing robust functionality and sufficiently good performance and scale. To that end, in addition to representing data relationally using tables, columns, and rows, Azure SQL also provides <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/multi-model-features\">multi-model<\/a> capabilities, including support for JSON, XML, and Graph data. Regardless of data model, developers can continue using familiar SQL language.<\/p>\n<p>One non-relational data model popular among developers is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Key%E2%80%93value_database\">key-value store<\/a>. There are many specialized key-value data stores, each with its unique capabilities and strengths. In this blog we will take a closer look at the performance and scalability of Azure SQL Database, when used as a key-value store.<\/p>\n<p>To do that, we will use <a href=\"https:\/\/github.com\/brianfrankcooper\/YCSB\">YCSB<\/a>, or Yahoo! Cloud Serving Benchmark. To quote from the project Github page, <em>The goal of the YCSB project is to develop a framework and common set of workloads for evaluating the performance of different \u201ckey-value\u201d and \u201ccloud\u201d serving stores<\/em>. YCSB is a mature benchmark. In the last 10 years, it has been used to test well-known NoSQL and key-value data stores, such as Cassandra, MongoDB, Redis, DynamoDB, and many others, over thirty in total. Notably, YCSB also supports JDBC as a generic interface to relational databases. Our YCSB tests used JDBC to talk to Azure SQL.<\/p>\n<h2>Test workload<\/h2>\n<p>YCSB provides several canned <a href=\"https:\/\/github.com\/brianfrankcooper\/YCSB\/wiki\/Core-Workloads\">workloads<\/a>. For our tests, we chose Workload A in default configuration. This is an update heavy workload with a 50\/50 read\/write ratio and a uniform distribution of read and write requests across the dataset.<\/p>\n<p>In a relational database, YCSB uses a single table to represent the key-value dataset:<\/p>\n<pre class=\"prettyprint\">CREATE TABLE usertable (\r\nYCSB_KEY varchar(255) NOT NULL,\r\nFIELD0 varchar(100) NOT NULL, \r\nFIELD1 varchar(100) NOT NULL, \r\nFIELD2 varchar(100) NOT NULL, \r\nFIELD3 varchar(100) NOT NULL, \r\nFIELD4 varchar(100) NOT NULL, \r\nFIELD5 varchar(100) NOT NULL, \r\nFIELD6 varchar(100) NOT NULL, \r\nFIELD7 varchar(100) NOT NULL, \r\nFIELD8 varchar(100) NOT NULL, \r\nFIELD9 varchar(100) NOT NULL\r\nCONSTRAINT pk_usertable PRIMARY KEY (YCSB_KEY)\r\n);<\/pre>\n<p>We kept the default YCSB schema with a key column and ten value columns, but used the <em>varchar(100)<\/em> data type for value columns instead of the default <em>text<\/em>, which is deprecated in SQL Server.<\/p>\n<p>As the first step before running each test, we used YCSB to load this table with a configurable number of rows. In the generated dataset, each FIELD[N] column contains a string of random 100 characters. With the default workload configuration, each read request is a SELECT query returning one row matching a given YCSB_KEY value, with all columns included in the result set, and each write request is an UPDATE query updating one of FIELD[N] columns for a given YCSB_KEY value. Each test runs as many total operations (requests) as there are rows in the table. With a 50\/50 read\/write ratio, these total requests are split about equally between SELECT and UPDATE queries.<\/p>\n<p>Single-row requests are common in key-value stores. On one hand, such queries are simple for the query processing engine of a relational database to optimize and execute. Query execution plans we observed were trivial. On the other hand, using single-row queries creates a very chatty workload, with respect to both network and storage traffic. For chatty workloads, higher (when compared to typical latencies in on-premises data centers) network and storage latencies in public cloud may create a performance challenge.<\/p>\n<p>YCSB supports many configuration options for its core workloads, and supports building custom workloads as well. In our testing, we intentionally chose to focus on a more challenging update heavy workload and larger datasets (10 million \u2013 1 billion rows) to determine performance and scalability across the range of Azure SQL Database SKUs.<\/p>\n<h2>Running YCSB against Azure SQL<\/h2>\n<p>Our testing started with a relatively small and cost-efficient database, and then tested two increasingly larger SKUs from the spectrum of Azure SQL Database offerings. Our testing environment is described in the Appendix.<\/p>\n<h3>Entry level database<\/h3>\n<p>Our first test was on a General Purpose database with 4 CPU cores and 20 GB of memory, using Gen5 <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/service-tiers-vcore?tabs=azure-portal#hardware-generations\">hardware<\/a>. <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/service-tier-general-purpose\">General Purpose<\/a> databases use Azure Premium storage, which is remote (network attached) with respect to the machine running the database engine.<\/p>\n<p>With a 10 million row dataset, and using 20 concurrent threads issuing requests against the database, we observed the following results (here and below we show abbreviated output from YCSB at the end of a test run):<\/p>\n<p>YCSB output:<\/p>\n<pre class=\"prettyprint\">[OVERALL], RunTime(ms), 1089898\r\n[OVERALL], Throughput(ops\/sec), 9175.170520544125\r\n[READ], Operations, 4999607\r\n[READ], AverageLatency(us), 1106.8701749957547\r\n[READ], MinLatency(us), 183\r\n[READ], MaxLatency(us), 311807\r\n[READ], 95thPercentileLatency(us), 2265\r\n[READ], 99thPercentileLatency(us), 4767\r\n[READ], Return=OK, 4999607\r\n[UPDATE], Operations, 5000393\r\n[UPDATE], AverageLatency(us), 3176.099026216539\r\n[UPDATE], MinLatency(us), 224\r\n[UPDATE], MaxLatency(us), 313087\r\n[UPDATE], 95thPercentileLatency(us), 7403\r\n[UPDATE], 99thPercentileLatency(us), 15175\r\n[UPDATE], Return=OK, 5000393 \r\n<\/pre>\n<p><figure id=\"attachment_571\" aria-labelledby=\"figcaption_attachment_571\" class=\"wp-caption alignnone\" ><a style=\"font-weight: bold; font-size: inherit; background-color: #f7f7f9;\" href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/GP_Gen5_4-10M-20threads-waits.png\"><img decoding=\"async\" class=\"wp-image-571 size-full\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/GP_Gen5_4-10M-20threads-waits.png\" alt=\"Database metrics for entry-level database test\" width=\"1544\" height=\"1686\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/GP_Gen5_4-10M-20threads-waits.png 1544w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/GP_Gen5_4-10M-20threads-waits-275x300.png 275w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/GP_Gen5_4-10M-20threads-waits-938x1024.png 938w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/GP_Gen5_4-10M-20threads-waits-768x839.png 768w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/GP_Gen5_4-10M-20threads-waits-1407x1536.png 1407w\" sizes=\"(max-width: 1544px) 100vw, 1544px\" \/><\/a><figcaption id=\"figcaption_attachment_571\" class=\"wp-caption-text\">Database metrics for entry-level database test<\/figcaption><\/figure><\/p>\n<p><span style=\"font-size: 1rem;\">Observations:<\/span><\/p>\n<ol>\n<li>This 18-minute test started at a relatively low throughput of about 5,800 batch requests per second, or, in YCSB terms, operations per second (ops\/sec). We see that the primary wait type in the database, by far, is WRITELOG. This is not surprising \u2013 as we mentioned, a General Purpose database uses remote storage, so when each single-row UPDATE statement in this write-heavy workload commits and writes to the transaction log, the database engine must wait for an acknowledgement from storage that data has been hardened. Thus, on each commit we incur the latency of a network round-trip call, and that leads to high WRITELOG waits. Performance impact of higher network and storage latency can be reduced using batching techniques, as described in this documentation <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/performance-improve-use-batching\">article<\/a>. This is the recommended approach; however, it requires application\/workload changes.\nAnother way to improve performance when storage write latency is high is to enable <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/logs\/control-transaction-durability\">delayed durability<\/a>. With delayed durability, transaction log writes no longer occur with every UPDATE statement. Instead, they occur when a log buffer in the database engine becomes full, which, with small transactions, happens much less frequently. Thus, the impact of high storage latency is greatly reduced.<\/li>\n<li>About half-way into this test, we enabled delayed durability on the test database. The sharp inflection point at around 02:14 demonstrates the positive performance impact. WRITELOG waits have all but disappeared, throughput sharply increases to around 14,000 batch requests per second, and there is a corresponding increase in CPU utilization and transaction log write rate. Write latency has sharply decreased as well.\n<strong style=\"font-size: 1rem;\">Important warning<\/strong><span style=\"font-size: 1rem;\">: Delayed durability means that <\/span><span style=\"text-decoration: underline;\">data modified in recently committed transactions may be <em>permanently lost<\/em><\/span><span style=\"font-size: 1rem;\"> if the database engine crashes. Despite a drastic improvement in throughput and latency, this is not an option that can be enabled indiscriminately and without considering pros and cons. Nevertheless, we wanted to show the positive performance impact this achieved, because in some scenarios, such as dev\/test workloads or when processing transient data, the possibility of bounded data loss may be acceptable. In those cases, using the less expensive General Purpose service tier with delayed durability enabled, rather than a more expensive Premium or Business Critical database with low latency local storage, can achieve your performance goals. When considering delayed durability, keep in mind that it can be enabled per-transaction. That is useful if most data in the database must be fully durable, but for some transactions, higher throughput is preferred to full durability.<\/span><\/li>\n<li>Even with increased throughput in the second part of the test, the workload did not consume all available resources (CPU, log write, worker threads). Other workloads against the same database could still run using remaining resource capacity.<\/li>\n<\/ol>\n<h3>Mid-range database<\/h3>\n<p>Our next test was on a <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/service-tier-business-critical\">Business Critical<\/a> database with 14 CPU cores and 412 GB of memory, using M-series <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/service-tiers-vcore?tabs=azure-portal#hardware-generations\">hardware<\/a>. We used a 100 million row dataset, and ran the test with 200 concurrent threads issuing requests against the database. In other words, we increased dataset size and workload intensity 10x compared to the previous test. Unlike in the previous test, delayed durability was disabled.<\/p>\n<p>We made one important change to the test table, making it a <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/in-memory-oltp\/introduction-to-memory-optimized-tables\">memory-optimized table<\/a>:<\/p>\n<pre class=\"prettyprint\">CREATE TABLE usertable\r\n(\r\nYCSB_KEY varchar(255) NOT NULL,\r\nFIELD0 varchar(100) NOT NULL, \r\nFIELD1 varchar(100) NOT NULL, \r\nFIELD2 varchar(100) NOT NULL, \r\nFIELD3 varchar(100) NOT NULL, \r\nFIELD4 varchar(100) NOT NULL, \r\nFIELD5 varchar(100) NOT NULL, \r\nFIELD6 varchar(100) NOT NULL, \r\nFIELD7 varchar(100) NOT NULL, \r\nFIELD8 varchar(100) NOT NULL, \r\nFIELD9 varchar(100) NOT NULL\r\nCONSTRAINT pk_usertable PRIMARY KEY NONCLUSTERED HASH (YCSB_KEY) WITH (BUCKET_COUNT = 100000000)\r\n)\r\nWITH (MEMORY_OPTIMIZED = ON);<\/pre>\n<p>A memory-optimized table is well suited for this workload. Not only does it eliminate reads from disk, it also supports very efficient single-row lookups in SELECT and UPDATE queries using a hash index on the YCSB_KEY column.<\/p>\n<p>With a large dataset now fully resident in memory, using M-series hardware becomes very attractive, because it provides 29 GB of memory per core, compared to 5.1 GB per core on Gen5 hardware. Because Azure SQL databases are priced per core, for the same total amount of memory M-series databases are much less expensive than Gen5 databases. For example, if we had chosen Gen5 hardware for this test, we would need to use a much more expensive database with 80 cores to get the amount of memory we needed for this 100 million row dataset, instead of only needing 14 cores with M-series.<\/p>\n<p>YCSB output:<\/p>\n<pre class=\"prettyprint\">[OVERALL], RunTime(ms), 2048116\r\n[OVERALL], Throughput(ops\/sec), 48825.35950112201\r\n[READ], Operations, 50003240\r\n[READ], AverageLatency(us), 2645.3984274019044\r\n[READ], MinLatency(us), 1003\r\n[READ], MaxLatency(us), 389887\r\n[READ], 95thPercentileLatency(us), 4447\r\n[READ], 99thPercentileLatency(us), 8055\r\n[READ], Return=OK, 50003240\r\n[UPDATE], Operations, 47735666\r\n[UPDATE], AverageLatency(us), 4838.112348573915\r\n[UPDATE], MinLatency(us), 1646\r\n[UPDATE], MaxLatency(us), 386559\r\n[UPDATE], 95thPercentileLatency(us), 7219\r\n[UPDATE], 99thPercentileLatency(us), 11071\r\n[UPDATE], Return=OK, 47735666\r\n[UPDATE], Return=ERROR, 2261094<\/pre>\n<p><figure id=\"attachment_569\" aria-labelledby=\"figcaption_attachment_569\" class=\"wp-caption alignnone\" ><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_14-100M-200threads-waits.png\"><img decoding=\"async\" class=\"wp-image-569 size-full\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_14-100M-200threads-waits.png\" alt=\"Database metrics for mid-range database test\" width=\"1536\" height=\"1680\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_14-100M-200threads-waits.png 1536w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_14-100M-200threads-waits-274x300.png 274w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_14-100M-200threads-waits-936x1024.png 936w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_14-100M-200threads-waits-768x840.png 768w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_14-100M-200threads-waits-1404x1536.png 1404w\" sizes=\"(max-width: 1536px) 100vw, 1536px\" \/><\/a><figcaption id=\"figcaption_attachment_569\" class=\"wp-caption-text\">Database metrics for mid-range database test<\/figcaption><\/figure><\/p>\n<p>Observations:<\/p>\n<ol>\n<li>In this 35-minute test, the average throughput is 48,825 ops\/sec, which is an almost <strong>10x increase <\/strong>compared to the first test, <strong>despite a<\/strong> <strong>10x increase in dataset size<\/strong>. Yet the increase in CPU capacity was only 3.5x, from 4 cores to 14 cores. The qualitative performance improvement in this test is due to using a memory-optimized table, and due to using low latency local SSD storage in this Business Critical database.<\/li>\n<li>The dominating wait type is no longer WRITELOG, it is HADR_SYNC_COMMIT. This wait type is encountered when the database engine synchronously commits each transaction to database replicas in an availability group, used to implement <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/high-availability-sla#premium-and-business-critical-service-tier-availability\">high availability<\/a> for Business Critical databases. While transaction commits still require a network round trip between compute nodes hosting the primary and secondary replicas, the latency of writes to local SSD storage is much lower than to remote Azure Premium storage.<\/li>\n<li>As in the previous test, there is remaining headroom to sustain other workloads in the same database, in addition to the key-value store workload we are running.<\/li>\n<li>In the output from YCSB above, we see that some UPDATE calls failed with an error. The error number is 41302, and the message is <em>The current transaction attempted to update a record that has been updated since this transaction started. The transaction was aborted.<\/em> This happens because with memory-optimized tables, the database engine uses optimistic concurrency model by design. Transaction isolation and consistency are validated at commit time, and if a conflict is detected, all but one of the conflicting transactions are aborted. The application is expected to implement retry logic to deal with this, as described in <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/in-memory-oltp\/transactions-with-memory-optimized-tables\">Transactions with Memory-Optimized Tables<\/a>. (As an aside, optimistic concurrency is a common approach in NoSQL key-value stores, where similar error handling is required.) But since YCSB is talking to the Azure SQL database via a generic JDBC interface, it has no provision for handling this kind of database engine-specific errors. That said, a relatively small number of failed UPDATEs observed in this test does not materially change the overall throughput.<\/li>\n<\/ol>\n<h3>High-end database<\/h3>\n<p>In our next and final test, we wanted to get to the far end of the scalability spectrum in Azure SQL. To that end, we used a <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/service-tier-business-critical\">Business Critical<\/a> database with 128 CPU cores and 3.7 TB of memory on M-series <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/service-tiers-vcore?tabs=azure-portal#hardware-generations\">hardware<\/a>, which, as of this writing, is the largest available Azure SQL database in terms of CPU and memory capacity. This test was done with a 1 billion row dataset, using 600 concurrent workload threads. We continued using the same memory-optimized table, but increased the number of hash buckets tenfold to accommodate the 10x increase in dataset size.<\/p>\n<p>YCSB output:<\/p>\n<pre class=\"prettyprint\">[OVERALL], RunTime(ms), 10292190\r\n[OVERALL], Throughput(ops\/sec), 97161.05124371004\r\n[READ], Operations, 499994473\r\n[READ], AverageLatency(us), 3254.8293574877175\r\n[READ], MinLatency(us), 210\r\n[READ], MaxLatency(us), 1798143\r\n[READ], 95thPercentileLatency(us), 7011\r\n[READ], 99thPercentileLatency(us), 22511\r\n[READ], Return=OK, 499994473\r\n[UPDATE], Operations, 439151009\r\n[UPDATE], AverageLatency(us), 7248.006196554134\r\n[UPDATE], MinLatency(us), 865\r\n[UPDATE], MaxLatency(us), 1799167\r\n[UPDATE], 95thPercentileLatency(us), 15071\r\n[UPDATE], 99thPercentileLatency(us), 28783\r\n[UPDATE], Return=OK, 439151009\r\n[UPDATE], Return=ERROR, 60854518<\/pre>\n<p><figure id=\"attachment_570\" aria-labelledby=\"figcaption_attachment_570\" class=\"wp-caption alignnone\" ><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_128-1B-600threads-waits.png\"><img decoding=\"async\" class=\"wp-image-570 size-full\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_128-1B-600threads-waits.png\" alt=\"Database metrics for high-end database test\" width=\"1536\" height=\"1750\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_128-1B-600threads-waits.png 1536w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_128-1B-600threads-waits-263x300.png 263w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_128-1B-600threads-waits-899x1024.png 899w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_128-1B-600threads-waits-768x875.png 768w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2020\/10\/BC_M_128-1B-600threads-waits-1348x1536.png 1348w\" sizes=\"(max-width: 1536px) 100vw, 1536px\" \/><\/a><figcaption id=\"figcaption_attachment_570\" class=\"wp-caption-text\">Database metrics for high-end database test<\/figcaption><\/figure><\/p>\n<p>Observations:<\/p>\n<ol>\n<li>This test achieved average workload throughout of approximately 97,000 ops\/sec,<strong> doubling the throughput<\/strong> of the previous test. This was achieved with a <strong>10x larger dataset<\/strong>, and <strong>sustained for over 2 hours<\/strong> until the workload began to ramp down at the tail end of the test.<\/li>\n<li>We continued seeing ample resource headroom, in terms of CPU, log write rate, and worker threads. <strong>CPU utilization was below 30%<\/strong>, and the <strong>1 billion row dataset consumed only 57% of the quota<\/strong> for memory-optimized tables in this 128-core database.<\/li>\n<\/ol>\n<h3>Scaling further<\/h3>\n<p>Have we reached the limits in Azure SQL scalability in these tests? Not at all. To scale further, we can:<\/p>\n<ol>\n<li>Increase dataset size and workload threads, and\/or use multiple YCSB clients to run the workload. As mentioned earlier, we haven\u2019t exhausted compute capacity of chosen SKUs.<\/li>\n<li>Use <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-sql\/database\/read-scale-out\">readable secondary replicas<\/a> to serve read requests, in scenarios where small data latency for read data is acceptable. This is particularly useful in the Hyperscale service tier, where multiple readable secondary replicas are supported. An application can implement a separate Get API reading from secondary replicas, to be used when appropriate.<\/li>\n<li>Shard the dataset and workload over multiple databases, to scale out both reads and writes. With simple one-table design of a key-value store in Azure SQL, sharding by key range becomes straightforward. Each shard can be scaled independently in accordance with workload demand.<\/li>\n<li>When using memory-optimized tables, also use <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/in-memory-oltp\/native-compilation-of-tables-and-stored-procedures\">natively compiled<\/a> stored procedures for database reads and writes. While YCSB does not support calling database stored procedures over JDBC, an application using Azure SQL as a key-value store could easily implement this. A SELECT or UPDATE query in a natively compiled stored procedure (which is a C DLL loaded into the database engine process space) would be more efficient than the same query executed via interpreted T-SQL, and would provide higher throughput and lower latency.<\/li>\n<li>For some workloads, key-value data is purely transient and does not have to be durable at all. In that case, the memory-optimized table can be made <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/relational-databases\/in-memory-oltp\/defining-durability-for-memory-optimized-objects\">non-durable<\/a>. This eliminates all disk access, making the workload purely memory-resident, and further improving throughput and latency. An example of an application that achieved very high scale by implementing a key-value store using SQL Server non-durable memory-optimized tables is described in our earlier blog: <a href=\"https:\/\/techcommunity.microsoft.com\/t5\/datacat\/how-bwin-is-using-sql-server-2016-in-memory-oltp-to-achieve\/ba-p\/305413\">How bwin is using SQL Server 2016 In-Memory OLTP to achieve unprecedented performance and scale<\/a>.<\/li>\n<\/ol>\n<h2>Results<\/h2>\n<p>To recap the results, when using an entry-level General Purpose database with a 10 million row dataset, we observed the following throughput and latency:<\/p>\n<table style=\"border-collapse: collapse;\">\n<tbody>\n<tr>\n<td>Service tier<\/td>\n<td>General Purpose<\/td>\n<td>General Purpose<\/td>\n<\/tr>\n<tr>\n<td>CPU cores<\/td>\n<td>4<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>Hardware generation<\/td>\n<td>Gen5<\/td>\n<td>Gen5<\/td>\n<\/tr>\n<tr>\n<td>Table type<\/td>\n<td>Disk<\/td>\n<td>Disk<\/td>\n<\/tr>\n<tr>\n<td>Durability<\/td>\n<td style=\"background-color: #ffff00;\">Full<\/td>\n<td style=\"background-color: #ffff00;\">Delayed<\/td>\n<\/tr>\n<tr>\n<td>Dataset size (rows)<\/td>\n<td>10,000,000<\/td>\n<td>10,000,000<\/td>\n<\/tr>\n<tr>\n<td>Workload threads<\/td>\n<td>20<\/td>\n<td>20<\/td>\n<\/tr>\n<tr>\n<td>Average ops\/sec<\/td>\n<td style=\"background-color: #ffff00;\">5,860<\/td>\n<td style=\"background-color: #ffff00;\">13,640<\/td>\n<\/tr>\n<tr>\n<td>Average read latency (ms)<\/td>\n<td style=\"background-color: #ffff00;\">0.6<\/td>\n<td style=\"background-color: #ffff00;\">1.4<\/td>\n<\/tr>\n<tr>\n<td>Average write latency (ms)<\/td>\n<td style=\"background-color: #ffff00;\">6.5<\/td>\n<td style=\"background-color: #ffff00;\">1.4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>We see a major improvement in throughput and write latency when delayed durability is used, but do heed the warning about potential data loss earlier in the article.<\/p>\n<p>Switching to Business Critical databases and memory-optimized tables, we started with a 10x larger dataset, compared to earlier tests on General Purpose. We then increased the dataset by another 10x, to 1 billion rows. Despite these order of magnitude increases in dataset size, we saw 2-3x increases in throughput in each test, while keeping read and write latency in single-digit milliseconds.<\/p>\n<table style=\"border-collapse: collapse;\">\n<tbody>\n<tr>\n<td>Service tier<\/td>\n<td>Business Critical<\/td>\n<td>Business Critical<\/td>\n<\/tr>\n<tr>\n<td>CPU cores<\/td>\n<td style=\"background-color: #ffff00;\">14<\/td>\n<td style=\"background-color: #ffff00;\">128<\/td>\n<\/tr>\n<tr>\n<td>Hardware generation<\/td>\n<td>M<\/td>\n<td>M<\/td>\n<\/tr>\n<tr>\n<td>Table type<\/td>\n<td>Memory-optimized<\/td>\n<td>Memory-optimized<\/td>\n<\/tr>\n<tr>\n<td>Durability<\/td>\n<td>Full<\/td>\n<td>Full<\/td>\n<\/tr>\n<tr>\n<td>Dataset size (rows)<\/td>\n<td style=\"background-color: #ffff00;\">100,000,000<\/td>\n<td style=\"background-color: #ffff00;\">1,000,000,000<\/td>\n<\/tr>\n<tr>\n<td>Workload threads<\/td>\n<td style=\"background-color: #ffff00;\">200<\/td>\n<td style=\"background-color: #ffff00;\">600<\/td>\n<\/tr>\n<tr>\n<td>Average ops\/sec<\/td>\n<td style=\"background-color: #ffff00;\">48,825<\/td>\n<td style=\"background-color: #ffff00;\">97,161<\/td>\n<\/tr>\n<tr>\n<td>Average read latency (ms)<\/td>\n<td style=\"background-color: #ffff00;\">2.7<\/td>\n<td style=\"background-color: #ffff00;\">3.3<\/td>\n<\/tr>\n<tr>\n<td>Average write latency (ms)<\/td>\n<td style=\"background-color: #ffff00;\">3.2<\/td>\n<td style=\"background-color: #ffff00;\">7.3<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Appendix<\/h2>\n<p>In this appendix we describe the details of testing environment we used.<\/p>\n<p>The latest YCSB version as of this writing is v0.17.0. We weren\u2019t able to use that version because of a YCSB configuration <a href=\"https:\/\/github.com\/brianfrankcooper\/YCSB\/issues\/1458\">issue<\/a> we ran into. As a workaround, we used the <a href=\"https:\/\/github.com\/brianfrankcooper\/YCSB\/releases\/download\/0.16.0\/ycsb-0.16.0.tar.gz\">bits<\/a> for the previous release, v0.16.0.<\/p>\n<p>To run YCSB, we used an F16s_v2 Azure VM with 16 cores and 32 GB of memory, running Ubuntu 18.04. A smaller VM could be used for all but the last test, which required higher compute to sustain YCSB client load.<\/p>\n<p>We set up YCSB using the steps in <a href=\"https:\/\/github.com\/brianfrankcooper\/YCSB#getting-started\">Getting Started<\/a>, but downloading YCSB v0.16.0 as noted above.<\/p>\n<p>After downloading and extracting YCSB bits to <em>ycsb-0.16.0<\/em> directory, we modified the <em>jdbc-binding\/conf\/db.properties<\/em> file as follows:<\/p>\n<pre class=\"prettyprint\">db.driver=com.microsoft.sqlserver.jdbc.SQLServerDriver\r\ndb.url=jdbc:sqlserver:\/\/server-name-here.database.windows.net:1433;databaseName=database-name-here;sendStringParametersAsUnicode=false;disableStatementPooling=false;statementPoolingCacheSize=10;responseBuffering=full\r\ndb.user=ycsb\r\ndb.passwd=strong-password-here\r\ndb.batchsize=5000\r\njdbc.batchupdateapi=true<\/pre>\n<p>The JDBC driver parameters in the <em>db.url<\/em> configuration after the <em>databaseName<\/em> parameter are optional. We found that performance of the YCSB workload was slightly improved with this driver configuration.<\/p>\n<p>We <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/connect\/jdbc\/download-microsoft-jdbc-driver-for-sql-server\">downloaded<\/a> the latest Microsoft JDBC driver for SQL Server, v8.4.1. After <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/connect\/jdbc\/deploying-the-jdbc-driver#deploying-the-driver-on-unix-systems\">extracting<\/a> the bits, we copied the <em>mssql-jdbc-8.4.1.jre11.jar<\/em> file from <em>sqljdbc_8.4\/enu<\/em> to <em>jdbc-binding\/conf\/lib<\/em> directory.<\/p>\n<p>We installed JRE v11 using <em>sudo apt install openjdk-11-jre-headless<\/em><\/p>\n<p>We created a <em>ycsb<\/em> user with password in the test database, and granted this user SELECT, INSERT, and UPDATE permissions on the <em>usertable<\/em> table.<\/p>\n<p>To load data, we used a command similar to the following:<\/p>\n<pre class=\"prettyprint\">bin\/ycsb load jdbc -P workloads\/workloada -p recordcount=10000000 -P jdbc-binding\/conf\/db.properties -cp mssql-jdbc-8.4.1.jre11.jar -threads 200 -s<\/pre>\n<p>This example loads 10 million rows into the database referenced in the <em>db.properties<\/em> file, using 200 concurrent threads. Note that data loading performance is greatly improved by setting <em>jdbc.batchupdateapi<\/em> to true, and specifying a sufficiently large value for <em>db.batchsize<\/em>. With this configuration, multiple rows will be committed to the database in the same transaction, reducing storage latency impact.<\/p>\n<p>To run a test, we used a command similar to the following:<\/p>\n<pre class=\"prettyprint\">bin\/ycsb run jdbc -P workloads\/workloada -p operationcount=10000000 -P jdbc-binding\/conf\/db.properties -cp mssql-jdbc-8.4.1.jre11.jar -threads 200 -s<\/pre>\n<p>This example runs a test with 10 million operations. Note that the number of operations should be equal or less than the number of rows in the <em>usertable<\/em> table.<\/p>\n<p>For tests using memory-optimized tables, we modified a database option as follows:<\/p>\n<pre class=\"prettyprint\">ALTER DATABASE CURRENT SET memory_optimized_elevate_to_snapshot ON;<\/pre>\n<p>We used a <a href=\"https:\/\/techcommunity.microsoft.com\/t5\/azure-sql\/monitoring-azure-sql-database-with-telegraf\/ba-p\/882790\">monitoring solution<\/a> based on telegraf, InfluxDB, and Grafana to monitor database metrics during tests.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Executive summary Our testing shows that Azure SQL Database can be used as a highly scalable low latency key-value store. Starting with a cost-efficient 4-core General Purpose database, we see an order of magnitude increase in workload throughput as we increase dataset size by 100x and scale across the spectrum of database SKUs to a [&hellip;]<\/p>\n","protected":false},"author":43124,"featured_media":81,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[465,478,164],"class_list":["post-567","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sql","tag-azuresql","tag-key-value-store","tag-performance"],"acf":[],"blog_post_summary":"<p>Executive summary Our testing shows that Azure SQL Database can be used as a highly scalable low latency key-value store. Starting with a cost-efficient 4-core General Purpose database, we see an order of magnitude increase in workload throughput as we increase dataset size by 100x and scale across the spectrum of database SKUs to a [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/567","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/users\/43124"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/comments?post=567"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/567\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media\/81"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media?parent=567"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/categories?post=567"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/tags?post=567"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}