April 4th, 2023

Building multi-tenant Java apps using Spring Data and Azure Cosmos DB

Theo van Kraay
Principal Program Manager

There are many factors to take into account when designing a multi-tenant application in Azure Cosmos DB. We’ve covered many of these aspects at a high level in our documentation on Multitenancy and Azure Cosmos DB.

In this blog, we’ll quickly dive into the mechanics of implementing a database per tenant or container per tenant performance isolation model using the Spring Data library for Azure Cosmos DB. We’ll show samples where each app instance can reference and/or create multiple databases or containers on-the-fly, and we’ll discuss some of the trade-offs between these two approaches.

 

Flexible Tenant Density

Sometimes each tenant in your application architecture may require physically separate databases or entities in order to achieve a certain level of performance or storage isolation. However, in many databases, creating an entirely new cluster or instance of the database may require time consuming deployment in order to guarantee performance isolation. Unlike other databases, in Azure Cosmos DB, a database and/or container with guaranteed throughput can be created within seconds. This allows you to build multi-tenant apps that can quickly and dynamically create new and/or reference existing tenant databases or containers with guaranteed throughput isolation.

By default, the Spring Data framework requires deployment of separate application instances for each tenant in this scenario. This is because both database and entities are usually tight coupled at project start-up. However, in this blog, we’re exposing changes we’ve recently made to the Cosmos Spring Data Client Library that allow you to reference and create databases and containers on-the-fly within the same app instance, while preserving all the functionality in your Spring Data Repository classes. This means you can maximise tenant density in the application tier, while still retaining performance isolation at the data tier, taking full advantage of Azure Cosmos DB’s multi-tenant capabilities!

 

Database per tenant

Lets take a quick look at how to set up a database per tenant isolation model with Spring Data in Azure Cosmos DB. First, make sure you’ve taken version 3.33.0, or later, of the Cosmos Spring Data Client Library. Then, create a class which extends CosmosFactory:

public class MultiTenantDBCosmosFactory extends CosmosFactory {
    private static final Logger logger = LoggerFactory.getLogger(MultiTenantDBCosmosFactory .class);
    private CosmosAsyncClient client;
    public String tenantId;

    public MultiTenantDBCosmosFactory (CosmosAsyncClient cosmosAsyncClient, String databaseName) {
        super(cosmosAsyncClient, databaseName);
        this.client = cosmosAsyncClient;
        this.tenantId = databaseName;
    }

    @Override
    public String getDatabaseName() {
        //have some logic here that will determine the tenant id in the current thread
        //so you can set this.tenantId to be the current tenant
        return this.tenantId;
    }
}

Next, in your configuration class, add your new MultiTenantDBCosmosFactory class as a bean:

@Bean 
 public MultiTenantDBCosmosFactory cosmosFactory(CosmosAsyncClient cosmosAsyncClient) {
    return new MultiTenantDBCosmosFactory (cosmosAsyncClient, getDatabaseName());
 }

That’s it! Now you need to add logic in your new MultiTenantDBCosmosFactory class to capture the tenant id of the current request. You can do that in a variety of different ways, depending on your application’s security and authentication needs. If you intend to create tenant databases separately in a different workflow or app, then that’s all you need to do!

If you want to go a bit further and create tenant databases on-the-fly that don’t already exist, you will need to implement that logic in the overridden getDatabaseName() method of your new MultiTenantDBCosmosFactory. Here’s a sample Spring REST app that uses WebRequestInterceptor to capture a http request header of TenantId, creates databases on-the-fly based on the id, and maintains a thread-safe list of pre-created tenant databases that subsequent requests can check against. As you will see in the controller classes, all of the repository methods can be used as normal. In fact, if you already have an existing REST app built using Spring, you could just drop the tenant folder classes into your project, and add MultiTenantDBCosmosFactory as a bean per the above, to turn your single tenant app-instance into a multi-tenant app-instance!

 

Container per tenant

There are some advantages to maintaining a tenant per container, instead of a tenant per database. Firstly, this opens up the possibility of using shared throughput, allowing you to increase tenant density at the database level, while still being able to pick and choose performance isolation for certain select tenants. You can also co-locate entities in a container, where they are partitioned by the same key, allowing you to retrieve all entities in a single request (sometimes this can be an important performance consideration if required often, as Azure Cosmos DB does not support joins).

We’ve illustrated this in a container by tenant version of the sample above. As with the database sample, the classes for multi-tenant functionality are all in the tenant folder. When you add MultiTenantContainerCosmosFactory as a bean in your app configuration, entities will then be created and maintained in the same container, and you will still be able to use the Spring Repository APIs as before. However, unlike the database sample, this sample also captures a TenantTier http header, to determine whether the tenant should use shared throughput, or dedicated throughput. In the UserController and OrderController classes, you will see that entities are queried using custom Spring queries that filter by a “type” field to determine the entity (this is necessary since both entities are co-located in the same container). In the HomeController class, you will also see that both Order and User entities are retrieved from the same container, again using a custom query.

The container per tenant model is a better approach if you need to balance the trade-off between performance isolation and tenant density and/or your tenant data model is simple and can benefit from entity co-location (or only has a single entity). But if you need performance isolation for every tenant and/or your data model is more complex, e.g. with different partition keys for each entity, the database per tenant model is probably a better fit.

Get Started with Java in Azure Cosmos DB

About Azure Cosmos DB

Azure Cosmos DB is a fast and scalable distributed NoSQL database, built for modern application development. Get guaranteed single-digit millisecond response times and 99.999-percent availability, backed by SLAs, automatic and instant scalability, and open-source APIs for MongoDB and Cassandra. Enjoy fast writes and reads anywhere in the world with turnkey data replication and multi-region writes.

Author

Theo van Kraay
Principal Program Manager

Principal Program Manager on the Azure Cosmos DB engineering team. Currently focused on AI, programmability, and developer experience for Azure Cosmos DB.

0 comments

Discussion are closed.