Distributed PostgreSQL comes to Azure Cosmos DB
Today, we’re excited to announce Azure Cosmos DB for PostgreSQL, a new Generally Available service to build cloud-native relational applications. This service brings developers the latest PostgreSQL features, allows you to start with a free trial, and scale out your database as your workload grows.
With this announcement, Azure also becomes the first cloud provider to offer its own single database service that supports both relational and NoSQL workloads. You can now build cloud-native applications for relational and non-relational data using the familiar Azure Cosmos DB database.
This blog post provides a high-level overview of our service, powered by open source Citus and PostgreSQL; and shares some of its key features.
Azure Cosmos DB for PostgreSQL
Azure Cosmos DB for PostgreSQL is the first managed database that brings together a combination of three key properties:
- True PostgreSQL, with latest versions: We work with and contribute to open-source PostgreSQL. This way, you don’t get a partial API. You get the full familiarity and benefits of PostgreSQL, within two weeks of each release.
- Cloud database: Benefit from a broad range of managed database features – so that you don’t have to worry about your database again. For example, you can create high availability configurations across availability zones, fork or restore your cluster to a particular point in time, or one-click upgrade your PostgreSQL & database extensions.
- Start small, scale globally—powered by Citus: Start testing your apps with a Free Trial. As your workload grows, scale it out by enabling distributed tables, powered by the Citus open source extension to PostgreSQL. This way, we’ll take care of relational features at scale – distributed transactions, deadlocks, foreign keys, and more – for you. If you need to go global, enable cross-region replication for lower latency & global availability.
If you’re just starting with Azure Cosmos DB, let’s see how you can benefit from these three properties with a few examples.
The Azure Cosmos DB free trial is the easiest way to get started in building your cloud-native app. With this feature, you get all the native capabilities that comes with PostgreSQL, including rich JSON support, powerful indexing, extensive datatypes, full text search, and much more. Furthermore, as PostgreSQL releases new versions, we make those versions available to you within two weeks. This way, you can benefit from the latest features in PostgreSQL without delays.
Of course, Free Trial is enough to get started but you’ll need more as your application gets more serious. When this happens, you can click the Upgrade button to enable a broad range of new features.
Your cloud-native database
Upgrading from the free trial, or creating a new database for PostgreSQL, gives you many new capabilities. Example features include:
- High Availability across Availability Zones (AZ)
- Automatic backup/restore & ability to rewind to a particular point-in-time
- One-click upgrade to latest PostgreSQL & extension versions
- Scale up/down your CPU and storage resources
- Encryption at rest and private endpoints
- Compliance with global and local certifications across 30 Azure regions
- Global distribution across Azure regions to tolerate regional failures
- And more
With these features, you get managed database capabilities native to the cloud. We also provide cloud integrations so that you have an easier time building on Azure.
Azure cloud integrations
Another key feature of a cloud-native database is how well it integrates with the rest of the cloud. Prior to today, if you had data in Azure Blob Storage, you’d need to download that data to another VM and then upload it to your database. This introduced unnecessary friction when you were building your application.
Starting now, you can directly interface with Azure Blob Storage through a brand-new PostgreSQL extension, pg_azure_storage. After connecting to your PostgreSQL database, you just need to run the following commands:
SELECT create_extension('azure_storage'); SELECT azure_storage.account_add('mystorageaccount', 'SECRET_ACCESS_KEY'); CREATE TABLE github_events ( event_id bigint, event_type text, event_public boolean, repo_id bigint, payload jsonb, repo jsonb, user_id bigint, org jsonb, created_at timestamp ); COPY github_events FROM 'https://mystorageaccount.blob.core.windows.net/data/github_events.csv' WITH (format 'csv');
With pg_azure_storage, you can also make modifications as you’re ingesting data from Azure Blob Storage using user-defined functions. For a detailed list of all features, you can refer to our documentation here.
With these cloud-native capabilities, you can build your application with ease. And better yet, you can build your applications ready for running at any scale. For this, our service for PostgreSQL has the Citus extension built-in and allows you to scale-out your database without limitations. Packaged as a fully open-source extension, Citus extends PostgreSQL with the power of distributed tables, enabling distributed query execution and performance at scale. Citus does this while preserving true PostgreSQL at its core, with support for JSONB, geospatial, rich indexing, relational semantics, and more.
Postgres with the power of distributed tables
With our service for PostgreSQL, you can start building your apps on a single node server group, the same way you would with PostgreSQL. As your app’s scalability and performance requirements grow, you can enable distributed tables and seamlessly scale to multiple nodes.
Azure Cosmos DB makes this transition – enabling distributed tables – easy. Previously, if you wanted to use the Citus extension to create a distributed table, you’d first have to pick a sharding key. You’d then have to run a command that would block write operations. With Citus 11.1, creating a distributed table and many previously write blocking operations, become fully online.
If you want to see how this online operation works, you’ll love this 1-minute video.
Once you create a distributed table, Citus takes care of the rest. Example features include:
- Distributed transactions & distributed deadlock detection
- Automatic colocation groups that allow you to enforce foreign keys, constraints, and easily join your data without costly repartition operations
- Distributed query processing, where computations are shipped to the data
- Distributed utility commands, such as index creation, Vacuum / Analyze
- Ability to read from and write to any one of the nodes in the cluster
- Online shard rebalancing & isolating noisy tenants / shards
- And more
Using these features, you can scale out many types of applications. Real-world customer applications built this way include multi-tenant SaaS, real-time operational analytics, and high throughput transactional apps. These apps span across various verticals such as sales & marketing automation, healthcare, IOT/telemetry, asset tracking & logistics, finance, and search.
Performance at scale
The primary benefit of scaling is performance. Since users run PostgreSQL across many workloads, we use various benchmarks in testing our service’s performance. For these workloads and their respective benchmarks, we shared a detailed description here.
Among these benchmarks, HammerDB is an open-source one that implements the TPC-C specification. HammerDB also provides benchmark implementations for a lot of different databases, including Citus database. This makes it easy to compare results across different database engines.
For our tests, we first thought about running HammerDB against a custom hardware config with the goal of showing high performance results. However, we then decided to test our service’s performance in an easily repeatable way, using the exact same setup and features you’d get in production.
So, we open sourced a benchmarking tool that provisions a production cluster in Azure using our service for PostgreSQL. Once you have the benchmark and an Azure subscription, all you need to do is run this simple command:
# IMPORTANT NOTE: Running this command will provision 4 new Citus clusters # and 4 times a 64-vCore driver VM in your Azure subscription. So, running # the following command will cost you (or your employer) money! azure/bulk-run.sh azure/how-to-benchmark-blog.runs | tee -a results.csv
When we ran HammerDB’s TPC-C implementation on a Citus cluster of 20 nodes, we saw results exceeding 2.0 million NOPM. Even more exciting, this result didn’t come with a custom setup, but rather with our regular managed service and all its available features.
You can read more about our 2M NOPM HammerDB results here.
Globally distributed database
Another key benefit to Azure Cosmos DB is global availability. With Azure Cosmos DB, you can create clusters spanning across regions and have your application query the database across those regions. We aspire to bring you the same benefits with our service for PostgreSQL.
Starting today, you can create read replicas for PostgreSQL in any supported region. You can also promote a replica to an independent server group that is readable and writable. Cross-region read replicas along with cluster promotion then brings you the following benefits:
- Low latency reads: For geo-distributed applications, you can serve reads from the same or nearest region
- Disaster recovery: If you’re observing a regional outage that covers multiple Availability Zones, you can failover to another region by promoting the replica in that region
- Migrating to another region: If you want to move to another region, you can create a replica in the new region, wait for the data to catch up, and then promote the replica
Start your journey towards a globally distributed PostgreSQL database
Today, we’re excited to announce General Availability for Azure Cosmos DB for PostgreSQL. With this service, you can now start your journey in building cloud-native applications using our Free Trial. You can then continue onto using a feature-rich managed database, natively integrate with other Azure cloud services, scale out your database as your workload grows, and globally distribute your database across regions.
Thanks to these features, you can focus on your application and stop worrying about your database.
If this sounds interesting, you can spin up a new instance using our Try Azure Cosmos DB Free trial today. If you’re further along in your journey and need access to all features, you can create a small instance through the Azure Portal instead.
Of course, if you have questions or comments in your journey, we’d be happy to hear from you. Please feel free to reach us anytime.
Postgres, PostgreSQL and the Slonik Logo are trademarks or registered trademarks of the PostgreSQL Community Association of Canada, and used with their permission.