NoSQL vs. relational: Which database should you use for your app?
Microsoft recently announced the introduction of Azure Cosmos DB for PostgreSQL, a distributed relational database, making Azure the first cloud platform to offer support for both relational and NoSQL (non-relational) data in a single database service. This means Azure Cosmos DB developers now have ultimate flexibility when building cloud-native apps: with using Azure Cosmos DB’s own NoSQL database as well as others like MongoDB and Apache Cassandra, or with relational data using increasingly popular PostgreSQL tools and extensions. Which one would you choose for your app?
Traditionally, the answer has been simple: developers would choose NoSQL databases if they needed massive scale for high throughput or very large database sizes (or both). Since relational databases have traditionally only scaled up, not out, they reach a limit beyond which they can’t handle anymore requests or data. So, developers have generally sought NoSQL databases for horizontal scaling. On the flip side, developers would choose relational databases over NoSQL systems if they needed ACID compliance and data accuracy for their applications.
With the introduction of distributed relational databases like Azure Cosmos DB for PostgreSQL, developers now have the best of both worlds: the scale-out abilities, flexibility, and performance of NoSQL databases, and the ACID compliance and rich SQL querying of relational databases.
Yet the question remains: when should you use which database system? This is a question we hear a lot from our readers. This blog post will dig into the answer by exploring some of the main factors developers should consider when building their apps.
Developer table stakes: Factors to consider when determining which database to use
There have been two types of database systems commonly used in cloud-native app development: relational and NoSQL.
Relational databases, or relational database management systems (RDBMS), store information in tables with rows and columns, with the tables having shared data attributes. This makes the data highly organized and structured with a fixed schema. The most common way to query the data is through SQL (Structured Query Language). As this language has been around for decades, the knowledge and tooling available for it is vast making it easier to obtain help and leverage existing integrations with specialized reporting and business software. Relational databases support Atomicity, Consistency, Isolation, and Durability (ACID) guarantees, which ensure the reliability of database transactions.
It’s worth mentioning that PostgreSQL, a relational database, behaves a bit differently. It has a native JSONB data type and the ability to query JSON documents and even build indexes against their complex structure. This often brings PostgreSQL on par with NoSQL systems when it comes to performance and the feature set of SQL storage.
NoSQL databases store data non-relationally, or without the tables, rows, and columns approach of relational databases. These types of databases are best for schema that needs to be flexible or may change in the future, and for cases when you need to scale horizontally. Rather than using table joins, NoSQL systems store unstructured or semi-structured data often in key-value pairs, JSON documents, key-value pairs in wide-column stores, graph structures and others. While NoSQL databases generally don’t support ACID transactions, at least to the same degree as relational databases, if you’re seeking to avoid up-front schema maintenance, have high volume services that require no downtime and have large data stores that are predicted to grow, NoSQL databases are often a good fit.
When to use which database system often depends on the use case and on a variety of considerations such as database structure, concurrency, queries, scalability, and migration.
|Consider NoSQL when:
|Consider relational when:
|Your data is semi-structured or unstructured, and workloads require predictable latency at a large scale
|Your data is structured, and operations require transactional consistency and enforced referential integrity
|Your workload volume is dynamic or unpredictable and can go from small to extremely large
|Your workload volume generally fits within thousands of transactions per second
|Relationships and aggregations can be de-normalized and materialized within the data model
|Your workload is comprised of joins and aggregations
|Your workload requires very high availability with tunable RPO or has geographically distributed users requiring low latency data access
|Your workload requires in-region or cross-region scalable reads with asynchronous latency
|You want to do exploratory programming – i.e. exploring the data structure for your application
|Your data structure is known, but you need to explore what your data queries will be (what ad-hoc questions will you ask of your system).
|You need to ingest unstructured data from many sources. Good solution when the database is the integration target (you have many tools that write).
|You want to plug into a database and generate reports/work directly with the data. Good solution when the database is the integration or synchronization source (you have many tools that read & write).
|You’re already using NoSQL databases such as MongoDB or Apache Cassandra
|Your existing application is built on a relational database and migrating to relational is simpler
In general, NoSQL databases are well-suited for storing large amounts of semi-structed or unstructured data, serving real-time data to users, and big data and analytics applications. Relational databases are a good fit for applications that rely on relationships between data, and need easy organization and retrieval of data, complex querying, and transaction support. It’s important to note that no developer’s workloads will fall exactly into one category or another in the table above; developers should typically opt for the platform where the benefits of one outweigh the tradeoffs of the other for their given workload.
Additionally, the use case examples provided below are not mutually exclusive. Either a NoSQL or relational database can be a good fit for building, for example, a health care or banking app but the advantages and considerations for each will differ. We look at some of these considerations below.
If your data is semi-structured or unstructured and your workloads require predictable latency at a large scale, you should consider a NoSQL database. Take for example, a personalization or recommendation engine that needs fast response times and predictable low latency as the customer base grows. A NoSQL database can handle different types of data at scale, from a range of sources, and offers high flexibility for continuous innovation and customer experience improvement.
If your data is structured and has a fixed schema, and your workloads require transactional consistency, a relational database is a good bet. Financial applications, for example, have highly structured data and require data integrity. Relational databases offer the ability to scale rapidly while maintaining ACID properties.
Consider NoSQL databases if you have workloads that are dynamic and unpredictable and can reach massive volumes of data, like IoT and device sensor applications. NoSQL databases can be a great match for these types of apps because they send huge amounts of data and often run 24/7. These apps benefit from the instant scalability and elasticity of NoSQL databases which are needed to handle write-heavy data ingestion. You may not need ACID guarantees, for example, for systems that store device read-outs but you need the performance of write multiplication from different devices writing to the database. For example, take a smart thermometer reading out temperatures every 5 minutes. Say you have a thousand thermometers but only a few hundred users looking at them – this means you have many more writes than reads, making a NoSQL database ideal.
On the flip side, let’s say you have a smaller number of writes, and many more reads, for example in a social media app where a popular person is posting something. It may be easier to support this on a relational database if you can divide your queries into read-only queries and write queries, where the writes can be handled by a single machine, allowing you to then scale reading without any limits and without losing ACID properties. Therefore, if you have workload volumes that generally fit within the thousands of transactions per second, like a social media app that has thousands of updates per second and tens of thousands of reads per second, a relational database can be a good fit. A cluster with scalable replicas can support that.
NoSQL databases are ideal for apps where data relationships and aggregations can be de-normalized and materialized. With de-normalization, fewer table joins are necessary, meaning you can execute queries and retrieve data faster. This makes NoSQL databases a strong choice for workloads requiring real-time analytics and that have large data sets, such as fraud detection apps. NoSQL databases are schema-agnostic, giving developers the flexibility to leverage data from multiple sources and to detect anomalies in real-time.
Apps that require data integrity, for example handling private health information, can benefit from relational databases, where workloads comprise of joins and aggregations. Normalization and defined constraints help enforce data integrity and security, and avoid data duplication. Relational databases are well-suited for these types of applications because they are based on relationships between different data items, which allows for complex querying and data analysis.
Consider a NoSQL database if your workloads need high availability and have geographically distributed users requiring low latency data access. For example, NoSQL systems are a good fit for online retail/e-commerce applications with large, constantly growing catalogs and inventory. Online fashion retailer ASOS uses Azure Cosmos DB to provide real-time recommendations and order updates for 15.4 million customers around the world.
Distributed relational databases
As mentioned in the opening of this article, distributed relational or distributed SQL databases are an emerging class of database systems that bring together the scale-out features of NoSQL systems and the ACID properties of relational databases. In Azure Cosmos DB, distributed PostgreSQL support allows businesses to process high volumes of data and scale out operational workloads to execute on a cluster on machines, with full ACID compliance.
One group of apps which can benefit from distributed PostgreSQL capabilities is multi-tenant SaaS. Digital natives who are building multi-tenant apps can scale out Postgres to millions of tenants, without having to rearchitect their application. They can benefit from features such as tenant isolation with high availability, optimizing for cost and performance as their app grows.
* * *
This blog post explored the differences between NoSQL and relational databases and looked at some key considerations developers should think about when deciding which database to use. We also examined use case examples that each type of database can be well-suited to. To learn more about Azure Cosmos DB’s NoSQL and relational capabilities, visit our website or check out our documentation. If you’re ready to start building, try Azure Cosmos DB for free.