Reading data with Spring Data Azure Cosmos DB v3

The repository abstraction of Spring Data Azure Cosmos DB v3 for Core (SQL) API provides multiple ways of reading data from Azure Cosmos DB. In this blog post, I will highlight some of the APIs and best practices to help improve the read performance of Spring applications when using the Spring Data Azure Cosmos DB v3.

There are two ways to read data from Azure Cosmos DB: point reads and queries. Point reads are both cheaper and faster compared to queries when retrieving a single document. This is because the query requires the query engine while point reads bypass it and read the data directly. A point read is a simple key/value lookup on a single item by the items ID and partition key values. Read more about how point reads compare with quires here.

Point reads:

Imagine a collection of books partitioned by category. You have an id and partition key values of a book that you want to retrieve from Azure Cosmos DB. In Spring Data Azure Cosmos DB, there is an overloaded method that allows you to do a point read instead of a query. You can do a point read by using the following overloaded method which takes both id and partition key as the arguments.

  bookRepository.findById(id, new PartitionKey(category))

If you use the following overloaded method prior to version 3.36.0 of the Spring Data Module for Cosmos DB, this will be a fanout query incurring both higher costs and latency. For version 3.36.0 and higher, the below method will execute a point read only if the partition key has been defined as id. Otherwise, you will need to provide the partition key as in the above method to ensure a point read is executed.

  bookRepository.findById(id)

Queries:

Real world workloads will contain both point reads and SQL queries. As we have seen point reads can only do key/value lookups, for everything else you will need to use SQL queries. When querying data from containers, if the query has a partition key filter specified, Spring Data Azure Cosmos DB routes the query to the physical partitions corresponding to the partition key values, resulting in lower cost and latency. Read more about in-partition and cross-partition queries here.

Spring Data Azure Cosmos DB provides multiple ways to query data. All the options allow you to specify partition key values. Let’s take a quick look at the options:

@Query annotation

When you query data from containers using @Query annotation, you define the native Azure Cosmos DB SQL query. Since it’s a native SQL query you can specify the partition keys in the query filter.

Following is an example of a query scoped to a single partition

  @Query(value = "select * from c where c.category=@category")
  Flux<Book> findByCategoryQuery(String category);

Following is an example of a query scoped to two partitions

  @Query(value = "select * from c where c.category in (@category1,@category2)")
  Flux<Book> findByCategoriesQuery(String category1, String category2);

Derived Query Methods

When you query data using Derived Query Method, if you specify the partition key property name in the criteria and supply the partition key value as an argument to the method, Spring Data Azure Cosmos DB routes the query to the physical partitions corresponding to the partition key values specified in the filter, resulting in lower cost and lower latency.

There is an added advantage when using Derived Query Method over @Query annotation. The former, when scoped to a single partition will take advantage of query plan caching. Once cached the subsequent executions of the derived query method will not incur the query plan call, instead it directly executes the query on the replicas. Read more about query plan caching here.

Flux<Book> findByCategory(String category);

Pagination

Sometimes query results might contain many items and you want to retrieve the results in smaller chunks. In Spring Data Azure Cosmos DB, you can achieve this by passing CosmosPageRequest, a Pageable type, as a parameter to your repository method. CosmosPageRequest constructor has parameters for page index and page size; the latter is used behind the scenes to set “x-ms-max-item-count” header value on the query request sent to the replicas. The “x-ms-max-item-count” header value is specified per request and tells the query engine to return that number of items or fewer. Due to this variable number of returned items in every iteration, you should iterate over pageable as shown in the following example.

    List<Book> getBookByCategoryByPage(String category, int pageIndex, int pageSize) {
        final Pageable pageRequest = new CosmosPageRequest(pageIndex, pageSize, null);
        Slice<Book> books = bookRepository
                .findByCategory(category,pageRequest);
       List<Book>result = new ArrayList<>();
       result.addAll(books.getContent());
       while(books.hasNext()){
           Pageable nextPageable = books.nextPageable();
          books =bookRepository.findByCategory(category,nextPageable);
           result.addAll(books.getContent());
       }
        return result;
    }

If you need sorting for results on each page, there is an overloaded constructor for CosmosPageRequest that takes an additional parameter of type “org.springframework.data.domain.Sort”.

As always, we highly recommend upgrading to the latest version of the SDK for best results.

Get Started

About Azure Cosmos DB

Azure Cosmos DB is a fast and scalable distributed NoSQL database, built for modern application development. Get guaranteed single-digit millisecond response times and 99.999-percent availability, backed by SLAs, automatic and instant scalability, and open-source APIs for MongoDB and Cassandra. Enjoy fast writes and reads anywhere in the world with turnkey data replication and multi-region writes.

To easily build your first database, watch our Get Started videos on YouTube and find ways to dev/test free.