Enabling PostgreSQL 15 in Azure Cosmos DB for PostgreSQL – how we did it
As you may have heard, we recently made PostgreSQL 15 generally available in Azure Cosmos DB for PostgreSQL within just 1 week of the PostgreSQL 15 release. The Postgres 15 version is available for you whether you need to create a new cluster in Azure Cosmos DB for PostgreSQL, or upgrade your existing cluster. (Note: you can do in-place major version upgrades in Azure Cosmos DB for PostgreSQL.) And the PostgreSQL 15 support is available in all Azure regions that support Azure Cosmos DB for PostgreSQL. You may be surprised since it’s usually not the norm for a managed database service to start supporting the new major PostgreSQL version that early… This post will walk you through what’s going on behind the scenes that enables us to do such a feat.
Some background before diving in:
Azure Cosmos DB for PostgreSQL is powered by native Postgres and Citus open source—and enables you to run PostgreSQL at any scale, from a single node to a large, distributed cluster. Customers can also scale out as much as they want depending on their needs with many additional features. The Hyperscale (Citus) managed service recently moved into Azure Cosmos DB family (more info on the launch of Azure Cosmos DB for PostgreSQL in this blog post) and with that introduced try Azure Cosmos DB for PostgreSQL for free where you can try out PostgreSQL 15 with Citus 11.1.
Figure 1: Screenshot of the the Quick start page for Azure Cosmos DB for PostgreSQL in the Azure portal. You can try out PostgreSQL 15 with Citus 11.1 in Azure Cosmos DB for PostgreSQL via free trial now.
In this blog post, you will learn about how these 3 components in Azure Cosmos DB for PostgreSQL were integrated with PostgreSQL 15 and how we did it in such a short amount of time:
Our service architecture has stayed mostly the same since Ozgun Erdogan’s How We Shipped PostgreSQL 14 on Azure Within One Day of its Release – Microsoft Community Hub post last year. In the “Releasing a new PostgreSQL version” section of Ozgun’s blog post, there is a good starting point for understanding what we have behind the scene and some of our best practices like testing and Safe Deployment Practice/Policy (SDP). Quoting the following bit to give a brief summary of the design:
“In our architecture, the control plane is responsible for the business logic for managing Postgres/Citus databases. This logic includes periodic health checks, high availability and failover, backup and restore, read replicas, regular maintenance operations, and others. The data plane is solely responsible for running the database. As such, the data plane contains almost nothing else other than stock PostgreSQL and its extensions”.
More info regarding the Safe Deployment Practices can be found here. Usual deployment cycle looks like the general definition made there which is:
Figure 2: Azure Safe Deployment Practices general diagram. Each phase of deployment contains automated health checks and rollout moves to the next phase only if all of the regions inside the previous phase’s completed successfully.
The Azure Cosmos DB for PostgreSQL service team also made the call to change deployment scheduling from bi-weekly to weekly recently, in order to be more agile and to make the cross-team collaborations easier—with the end goal of delivering PostgreSQL 15 within the service sooner.
Active Monitoring Infrastructure Ensures Compatibility
On top of having 100% unit test code coverage and E2E test pipeline (using Azure Pipelines) that serve as a gatekeeper for our codebase, we also have active monitoring in place that continuously runs against the managed Azure Cosmos DB for PostgreSQL service directly in each of the Azure regions we support. The idea is to execute various E2E scenarios in production environment by only using service endpoints via Azure Resource Manager (ARM).
Each job also produces telemetry (logs and metrics) and we have internal monitors in place where it would generate incident tickets for our on-call engineer whenever it detects an issue. This setup comes in handy in multiple places:
- It’s integrated in our deployment process where we wait for active monitoring to give a green light for each region before we can conclude deployment officially finished.
- Its alerting system also works as overall system health per region during non-deployment times.
- Finally, we used it as preview testing for PostgreSQL 15 which I will go into more detail in a bit.
As you can imagine, having this many layers of “testing” on top of having a solid architecture with separation of concerns and good coding practices gives us more confidence and makes it easier to introduce important changes to our service in a short time.
Getting Beta and Release Candidate Versions of PostgreSQL 15 Ready for Preview
For this major release, we decided to prepare and deploy beta and release candidate versions of PostgreSQL 15 internally, so that the integration between Azure Portal, the Citus database extension, and Azure Cosmos DB for PostgreSQL service teams would go smoothly during the final phase when we got closer to the actual release.
There were 3 steps that needed to be done before we shipped an internal preview version for internal QA purposes:
- Fixing any breaking changes introduced by PG15. The first one was cleaning up any usage of non-existing GUCs from our service since starting with PG15 doing that would give an error. The second part was removal of long-deprecated exclusive backup mode.
- Temporarily excluding extensions that didn’t have PG15 supported versions available yet.
- Creating an internal Azure Feature Exposure Control (AFEC) flag and allowing PG15 interaction only to subscriptions that have it enabled. Will give more details about AFEC under Azure Portal section.
Azure Portal is one of the ways customers can interact with our service backend as seen in the figure below and can mainly be considered as our service’s front. The way Azure Portal communicates with our service is via ARM (Azure Resource Manager). ARM acts as management layer that also handles authentication/security of incoming requests
Figure 3: Azure Resource Manager as management layer which acts as a bridge between user interactable layers like Azure Portal, Azure CLI etc. and Azure Services.
In our case, if you tried to create a new Azure Cosmos DB for PostgreSQL cluster on (quick start on how), Azure Portal then would create ARM template with all the information you specified converted to suitable syntax. ARM then would redirect the request to that region’s control plane. Upsert requests are defined in async nature and tracked by their “request id”. Azure Portal continuously checks the status of the async operation to give users visual status update.
Going back to the topic of enabling PostgreSQL 15 support, we had two goals on the Azure Portal front
- Supporting beta and release candidate versions of PG15 for internal testing; meaning we wanted to have certain internal development subscriptions to have access to it both from resource creation UI and in-place upgrade blade in Azure Portal. This is where the Azure Feature Exposure Control (AFEC) came in handy.We created a new internal feature for preview purposes, registered internal test subscriptions to that feature, then had a logic deployed to Control Plane that enabled PG15 preview version depending on the feature state. In the end PG15 option only showed up in Azure Portal resource creation page for those internal subscriptions.
- Having the ability to make PG15 publicly available at the same time in all regions at the time we wanted. This is important for multiple reasons; due to nature of our SDP deployment process, new deployment payload would be finished on some regions earlier than others (in some cases even multiple days earlier) and we also wanted to do validations in each region after the deployment. This was done in a similar feature flag way where after we gave the final confirmation Portal Team enabled PG15 for every subscription
Azure Cosmos DB for PostgreSQL is mainly powered by Citus extension but we also provide a lot of extensions, some enabled by default and some set as optional. Optional ones can be enabled from the database itself by running the following after connecting with admin user (here postgis can be changed with any other):
Some of these extensions would require fixes in order to support the new major PostgreSQL 15 version depending on the breaking changes; which would mean getting our hands on the packages for the newest versions of these Postgres extensions as early as possible.
The work here was divided into two parts.
- Postgres open-source extensions that we—the Postgres team in Azure—maintain: Citus, pg_cron, HLL (HyperLogLog), and TopN.
- Other important open-source Postgres extensions like PostGIS and pg_partman, that we provide as a part of our service that had to be updated to support PG15 that were updated by other committers (as opposed to committers in our own team) and we had to work with them on timing and getting (public) packages as soon as humanly possible.
Making the Citus database extension Compatible with PostgreSQL 15
Extensions other than Citus in this area were very easily updated to support PG15 (usually within a day of effort) and new versions were tagged and released. This means most of the Citus Engine team’s effort went into updating Citus open source. Enabling the new PostgreSQL major version in Citus extension can be divided into 3 parts.
- Fixing any breaking changes introduced so that extensions can be compiled against the new PG version. There were multiple pre-release versions released during PG15’s development period; ranging from Beta 1 to Beta 4, then Release Candidate 1 and 2. We iterated over this step for every new pre-release version so that we would be ready for the PG15 GA packages as soon as possible. The transition from Beta3 to Beta4 was especially interesting since instead of usual bug fixes and minor additions, one of the changes was: “The SQL/JSON features proposed for this release have been removed”. Some of the other extensions like PostGIS also needed to release a new version to support Beta 4. The Citus database engine team released the Citus 11.1 open source release with PG 15 Beta 4 support (blog post here).
- Going over new PG features and add support for them within our extension’s source code. You can look at this issue If you want to learn about Citus support for all PG15 features in detail. MERGE command, CLUSTER command on partitioned distributed tables and ALTER TABLE .. SET ACCESS METHOD command features are not being supported at the moment due to current limitations in Citus but we are working on integrating them into Citus with future releases.
- Simply doing a lot of testing. These range from Continuous Integration (CI) tests to benchmarks and release tests with variety of tools like Valgrind, SQLancer etc. Last year’s blog post about how we shipped PostgreSQL 14 on Azure within one day of its release has a section called “Making Citus & other extensions compatible with PostgreSQL 14” that goes into more detail if you want to learn more.This time for PG15, we also decided to generate nightly packages during development which would mean we had the ability to test the latest state on the field in our service under the umbrella of “internal preview version release” I mentioned before. Doing that gave us another confidence boost and helped iron out some issues earlier otherwise would be found much later the development cycle.
Testing with new versions of other key extensions, including PostGIS
The data plane for the Azure Cosmos DB for PostgreSQL service currently uses CentOS on the server VMs, and most of the extension package management is done by using YUM. External packages are managed by and directly consumed from Public PostgreSQL Yum Repository. Some of the PostgreSQL 15 compatible rpm packages for the extensions were shipped a few days after the PostgreSQL 15 release thus we waited for all of them to be ready. It’s also important to consume and install necessary debug info packages for service management.
How everything came together after PostgreSQL 15 GA
PostgreSQL 15 was officially released on October 13, Thursday. At that point we had already finished and deployed most of the necessary changes in our service code. The remaining bits were disabling the AFEC feature check and adding the final extension versions to the service. It turned out the final PG15 GA version had some breaking changes for Citus extension, so the Citus database engine team started working on an update. On the next day, the Citus engine team announced the public release of version 11.1.3 which had PG15 GA support. At a similar time towards Friday evening, the latest PostGIS package that supported PG15 was also shipped to PostgreSQL Yum Repository.
We started the safe deployment process on Monday which finished on Wednesday for every region. After finishing the final tests/validations and doing necessary cleanups for some of our existing customer clusters so they can upgrade to PG15 whenever they want, we enabled the feature on Azure Portal for everyone on October 20 Thursday. Following that was the announcement post on the Azure Cosmos DB Blog on Friday October 21st.
PostgreSQL 15 is Available for New Clusters and In-Place Upgrades
To sum things up: we made Postgres 15 generally available in Azure Cosmos DB for PostgreSQL within a week of the PG 15 GA—and it’s super easy to get a PG15 cluster or upgrade an existing one. If you want to scale out PostgreSQL 15 on cloud, you can create a new Azure Cosmos DB for PostgreSQL via Azure Portal.
Figure 4: PostgreSQL 15 is now available as an option in Azure Cosmos DB for PostgreSQL cluster creation
Existing Azure Cosmos DB for PostgreSQL customers also can upgrade older PG versions in place via the in-place upgrade option in the Azure Portal as well. With minimum downtime, you can upgrade your existing clusters from any supported major version to any supported major version in a single operation. Find out more in the docs for in-place major version upgrade support.
Please let us know what you think via email at Ask Azure Cosmos DB for PostgreSQL.
New to Azure Cosmos DB?
Azure Cosmos DB is a fast, distributed NoSQL and relational database built for applications of any size or scale.
- Use the free trial for Azure Cosmos DB for PostgreSQL.
- Watch this Microsoft Mechanics intro to Azure Cosmos DB for PostgreSQL