{"id":3848,"date":"2022-01-20T06:00:14","date_gmt":"2022-01-20T14:00:14","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=3848"},"modified":"2022-01-18T11:29:19","modified_gmt":"2022-01-18T19:29:19","slug":"read-many-items-fast-with-the-java-sdk-for-azure-cosmos-db","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/read-many-items-fast-with-the-java-sdk-for-azure-cosmos-db\/","title":{"rendered":"Read many items fast with the Java SDK for Azure Cosmos DB"},"content":{"rendered":"<p>Typically, when retrieving many records from a database, you will run filter or aggregation queries. In these scenarios, latency is usually less important than the accuracy of the results. However, there are some use cases where many separate records need to be collected back very quickly. For example, you may be building a comparison site where thousands of entities are maintained independently, but frequently need to be pulled back during user comparisons.<\/p>\n<p>In this situation, you can have a trade-off between the simplicity (but high latency) of running a large query, or the lower latency (but high client-side compute cost and complexity) of hitting the database with many <a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/point-reads-versus-queries\/\">point reads<\/a> in parallel.<\/p>\n<p>The following has been extracted from a real-world example, and has been codified into a more generic sample <a href=\"https:\/\/github.com\/Azure-Samples\/cosmosdb-read-many-items-java\">here.<\/a> The sample demonstrates how the <span style=\"font-size: 1rem;\"><a href=\"https:\/\/docs.microsoft.com\/java\/api\/com.azure.cosmos.cosmosasynccontainer.readmany?view=azure-java-stable\">readMany<\/a><\/span> method in <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/sql\/sql-api-sdk-java-v4\">Azure Cosmos DB Java SDK<\/a> can help in this scenario. In this case we are using the <span style=\"font-size: 1rem;\"><a href=\"https:\/\/docs.microsoft.com\/java\/api\/com.azure.cosmos.cosmosasynccontainer.readmany?view=azure-java-stable\">readMany<\/a><\/span> method along with <a href=\"https:\/\/docs.microsoft.com\/java\/api\/com.azure.cosmos.cosmosasynccontainer?view=azure-java-stable\">CosmosAsyncContainer<\/a> in a multi-threaded application using the <a href=\"https:\/\/projectreactor.io\/\">reactor<\/a> framework (note: the <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/sql\/sql-api-sdk-dotnet-standard\">Azure Cosmos DB .NET SDK<\/a> has the <a href=\"https:\/\/docs.microsoft.com\/dotnet\/api\/microsoft.azure.cosmos.container.readmanyitemsasync?view=azure-dotnet\">same method<\/a>, which can be used in the same way).<\/p>\n<p>Suppose you have a client machine (or machines), and you want to bring back 1,000 records using point reads as quickly as possible. In our <a href=\"https:\/\/github.com\/Azure-Samples\/cosmosdb-read-many-items-java\">demo sample,<\/a> we have two methods that are compared for differences in their performance.<\/p>\n<p>First, we have a method that executes point reads using as many concurrent threads as possible on the local client machine:<\/p>\n<div>\n<div>\n<div>\n<pre class=\"prettyprint\">\u00a0 \u00a0 private void readManyItems() throws InterruptedException {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \/\/ collect the ids that were generated when writing the data.\r\n\u00a0 \u00a0 \u00a0 \u00a0 List&lt;String&gt; list = new ArrayList&lt;String&gt;();\r\n\u00a0 \u00a0 \u00a0 \u00a0 for (final JsonNode doc : docs) {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 list.add(doc.get(\"id\").asText());\r\n\u00a0 \u00a0 \u00a0 \u00a0 }\r\n\u00a0 \u00a0 \u00a0 \u00a0 final long startTime = System.currentTimeMillis();\r\n\u00a0 \u00a0 \u00a0 \u00a0 Flux.fromIterable(list)\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 .flatMap(id -&gt; container.readItem(id, new PartitionKey(id), Item.class))\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 .flatMap(itemResponse -&gt; {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 if (itemResponse.getStatusCode() == 200) {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 double requestCharge = itemResponse.getRequestCharge();\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 BinaryOperator&lt;Double&gt; add = (u, v) -&gt; u + v;\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 totalRequestCharges.getAndAccumulate(requestCharge, add);\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 } else\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 logger.info(\"WARNING insert status code {} != 200\" + itemResponse.getStatusCode());\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 request_count.getAndIncrement();\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 return Mono.empty();\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 }).subscribe();\r\n\u00a0 \u00a0 \u00a0 \u00a0 logger.info(\"Waiting while subscribed async operation completes all threads...\");\r\n\u00a0 \u00a0 \u00a0 \u00a0 while (request_count.get() &lt; NUMBER_OF_DOCS) {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \/\/ looping while subscribed async operation completes all threads\r\n\u00a0 \u00a0 \u00a0 \u00a0 }\r\n\u00a0 \u00a0 \u00a0 \u00a0 if (request_count.get() == NUMBER_OF_DOCS) {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 request_count.set(0);\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 final long endTime = System.currentTimeMillis();\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 final long duration = (endTime - startTime);\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 totalReadLatency.getAndAdd(duration);\r\n\u00a0 \u00a0 \u00a0 \u00a0 }\r\n\u00a0 \u00a0 }<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<p>Next, we have a method that divides the requests into chunks using the <span style=\"font-size: 1rem;\">NUMBER_OF_DOCS_PER_THREAD parameter, and each thread will send a list of id and partition key tuples to the database to be executed using <a href=\"https:\/\/docs.microsoft.com\/java\/api\/com.azure.cosmos.cosmosasynccontainer.readmany?view=azure-java-stable\">readMany<\/a><\/span>:<\/p>\n<pre class=\"prettyprint\">    private void readManyItemsEnhanced() throws InterruptedException {\r\n        \/\/ collect the ids that were generated when writing the data.\r\n        List&lt;String&gt; list = new ArrayList&lt;String&gt;();\r\n        for (final JsonNode doc : docs) {\r\n            list.add(doc.get(\"id\").asText());\r\n        }\r\n        List&lt;List&lt;String&gt;&gt; lists = ListUtils.partition(list, NUMBER_OF_DOCS_PER_THREAD);\r\n\r\n        final long startTime = System.currentTimeMillis();\r\n        Flux.fromIterable(lists).flatMap(x -&gt; {\r\n\r\n            List&lt;CosmosItemIdentity&gt; pairList = new ArrayList&lt;&gt;();\r\n\r\n            \/\/ add point reads in this thread as a list to be sent to Cosmos DB\r\n            for (final String id : x) {\r\n                \/\/ increment request count here so that total requests will equal total docs\r\n                request_count.getAndIncrement();\r\n                pairList.add(new CosmosItemIdentity(new PartitionKey(String.valueOf(id)), String.valueOf(id)));\r\n            }\r\n\r\n            \/\/ instead of reading sequentially, send CosmosItem id and partition key tuple of items to be read\r\n            Mono&lt;FeedResponse&lt;Item&gt;&gt; documentFeedResponse = container.readMany(pairList, Item.class);                    \r\n            double requestCharge = documentFeedResponse.block().getRequestCharge();\r\n            BinaryOperator&lt;Double&gt; add = (u, v) -&gt; u + v;\r\n            totalEnhancedRequestCharges.getAndAccumulate(requestCharge, add);\r\n            return documentFeedResponse;\r\n        })\r\n                .subscribe();\r\n\r\n        logger.info(\"Waiting while subscribed async operation completes all threads...\");\r\n        while (request_count.get() &lt; NUMBER_OF_DOCS) {\r\n            \/\/ looping while subscribed async operation completes all threads\r\n        }\r\n        if (request_count.get() == NUMBER_OF_DOCS) {\r\n            request_count.set(0);\r\n            final long endTime = System.currentTimeMillis();\r\n            final long duration = (endTime - startTime);\r\n            totalEnhancedReadLatency.getAndAdd(duration);\r\n        }\r\n    }<\/pre>\n<\/div>\n<div><\/div>\n<div>First, let&#8217;s see what happens when we run the demo and configure just 10 documents per thread for the enhanced <span style=\"font-size: 1rem;\">readManyItemsEnhanced() method:<\/span><\/div>\n<div><code><code><\/code><\/code><\/div>\n<div>\n<pre class=\"prettyprint\">public static final int NUMBER_OF_DOCS = 1000;\r\npublic static final int NUMBER_OF_DOCS_PER_THREAD = 10;\r\n\r\nINFO: Reading many items.... \r\nINFO: Waiting while subscribed async operation completes all threads...\r\nINFO: Reading many items (enhanced using readMany method)\r\nINFO: Waiting while subscribed async operation completes all threads...\r\nINFO: Total latency with standard multi-threading: 907 \r\nINFO: Total latency using readMany method: 2109 \r\nINFO: Total request charges with standard multi-threading: 1000.0 \r\nINFO: Total request charges using readMany method: 927.1299999999991\r\n\r\n<\/pre>\n<\/div>\n<div>\n<p>&nbsp;<\/p>\n<p>In this test, we can see that latency is higher when using <span style=\"font-size: 1rem;\"><a href=\"https:\/\/docs.microsoft.com\/java\/api\/com.azure.cosmos.cosmosasynccontainer.readmany?view=azure-java-stable\">readMany<\/a><\/span>. With the local resources available, executing point reads with standard multi-threading performs better than splitting the requests and sending 10 requests per thread using <span style=\"font-size: 1rem;\"><a href=\"https:\/\/docs.microsoft.com\/java\/api\/com.azure.cosmos.cosmosasynccontainer.readmany?view=azure-java-stable\">readMany<\/a><\/span>. However, notice that the total <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/request-units\">request unit<\/a> charges are slightly lower with <span style=\"font-size: 1rem;\"><a href=\"https:\/\/docs.microsoft.com\/java\/api\/com.azure.cosmos.cosmosasynccontainer.readmany?view=azure-java-stable\">readMany<\/a><\/span>. Now let&#8217;s see what happens when we increase the number of NUMBER_OF_DOCS_PER_THREAD to 100:<\/p>\n<div>\n<div>\n<pre class=\"prettyprint\">public static final int NUMBER_OF_DOCS = 1000;\r\npublic static final int NUMBER_OF_DOCS_PER_THREAD = 100;\r\n\r\nINFO: Reading many items.... \r\nINFO: Waiting while subscribed async operation completes all threads...\r\nINFO: Reading many items (enhanced using readMany method)\r\nINFO: Waiting while subscribed async operation completes all threads...\r\nINFO: Total latency with standard multi-threading: 825\r\nINFO: Total latency using readMany method: 380\r\nINFO: Total request charges with standard multi-threading: 1000.0\r\nINFO: Total request charges using readMany method: 212.79000000000002<\/pre>\n<p><code><\/code><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<p>We can now see that latency is significantly reduced compared to using multi-threading. Not only that, but RU charges are relatively much lower for the same number of point reads! This approach is a great solution if you need to run many point reads in parallel, but don&#8217;t have a large number of client machines or cores standing by to serve a large &#8220;scatter gather&#8221; request, and need to mitigate the trade-off on overall latency. Keep in mind that the extent to which this approach will benefit latency in the application will depend on the average size of each document, so you should benchmark the performance when implementing this.<\/p>\n<h3 id=\"get-started\">Get Started<i class=\"fabric-icon fabric-icon--Link\" aria-hidden=\"true\"><\/i><\/h3>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/sql\/sql-api-sdk-java-v4\" target=\"_blank\" rel=\"noopener\">Azure Cosmos DB Java SDK v4 technical documentation<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/sql\/sql-api-java-sdk-samples\" target=\"_blank\" rel=\"noopener\">Java SDK v4 getting started sample application<\/a><\/li>\n<li class=\"\"><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cosmos-db\/sql\/sql-api-sdk-java-v4\" target=\"_blank\" rel=\"noopener\">Release notes and additional resources<\/a><\/li>\n<li><a href=\"https:\/\/devblogs.microsoft.com\/cosmosdb\/java-sdk-v4-async-vs-sync\/\">Exploring the Async API (reactor programming)<\/a><\/li>\n<\/ul>\n<h3 id=\"about-azure-cosmos-db\">About Azure Cosmos DB<i class=\"fabric-icon fabric-icon--Link\" aria-hidden=\"true\"><\/i><\/h3>\n<p><a href=\"https:\/\/azure.microsoft.com\/services\/cosmos-db\/\" target=\"_blank\" rel=\"noopener\">Azure Cosmos DB<\/a>\u00a0is a fast and scalable distributed NoSQL database, built for modern application development. Get guaranteed single-digit millisecond response times and 99.999-percent availability,\u00a0<a href=\"https:\/\/azure.microsoft.com\/support\/legal\/sla\/cosmos-db\/\" target=\"_blank\" rel=\"noopener\" data-bi-an=\"content-overview-01\" data-bi-tn=\"undefined\">backed by SLAs<\/a>,\u00a0<a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/scaling-throughput\" target=\"_blank\" rel=\"noopener\" data-bi-an=\"content-overview-01\" data-bi-tn=\"undefined\">automatic and instant scalability<\/a>, and open-source APIs for MongoDB and Cassandra. Enjoy fast writes and reads anywhere in the world with turnkey data replication and multi-region writes.<\/p>\n<p class=\"\">To easily build your first database, watch our\u00a0<a href=\"https:\/\/youtube.com\/playlist?list=PLmamF3YkHLoLLGUtSoxmUkORcWaTyHlXp\" target=\"_blank\" rel=\"noopener\">Get Started videos<\/a> on YouTube and explore ways to <a href=\"https:\/\/docs.microsoft.com\/azure\/cosmos-db\/optimize-dev-test\" target=\"_blank\" rel=\"noopener\">dev\/test free<\/a>.<\/p>\n<\/div>\n<div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Discover ways to collect lots of separate records quickly using the Java SDK for Azure Cosmos DB Java.<\/p>\n","protected":false},"author":9387,"featured_media":3976,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14,643],"tags":[1806,1807,1805,1804],"class_list":["post-3848","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-core-sql-api","category-java-sdk","tag-cosmos-db-java-sdk","tag-many-point-reads","tag-read-many-items","tag-readmany"],"acf":[],"blog_post_summary":"<p>Discover ways to collect lots of separate records quickly using the Java SDK for Azure Cosmos DB Java.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3848","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/9387"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=3848"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/3848\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/3976"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=3848"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=3848"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=3848"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}