{"id":6201,"date":"2023-05-19T09:20:28","date_gmt":"2023-05-19T16:20:28","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/cosmosdb\/?p=6201"},"modified":"2023-05-19T09:20:28","modified_gmt":"2023-05-19T16:20:28","slug":"auto-scaling-azure-cosmos-db-for-postgresql-with-citus-grafana-azure-functions","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cosmosdb\/auto-scaling-azure-cosmos-db-for-postgresql-with-citus-grafana-azure-functions\/","title":{"rendered":"Auto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, &#038; Azure Functions"},"content":{"rendered":"<p class=\"pg-section section-active\">One of the top Citus features is the ability to run PostgreSQL at any scale, on a single node as well as a distributed database cluster.<\/p>\n<p class=\"pg-section section-active\">As your application needs to scale, you can add more nodes to the Citus cluster, rebalance existing data to the new Postgres nodes, and seamlessly scale out. However, these operations require manual intervention: a) first you must create alerts on metrics, b) then, based on those alerts, you need to add more nodes, c) then you must kick off and monitor the shard rebalancer. Automating these steps will give you a complete auto scale experience\u2014and make your life so much easier.<\/p>\n<p class=\"pg-section section-active\">In this blog post, you will learn how to build a full-fledged auto scaling setup for the Citus database extension running as a managed service on Azure\u2014called\u00a0<a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/postgresql\/introduction\">Azure Cosmos DB for PostgreSQL<\/a>. You\u2019ll also learn how you can easily add nodes to the Azure Cosmos DB for PostgreSQL cluster and use any metrics available to trigger actions in your cluster! Let\u2019s dive into the following chapters:<\/p>\n<ul class=\"pg-section section-active\">\n<li><a href=\"#overview\">Overview of the components for Auto Scaling<\/a><\/li>\n<li><a href=\"#creating-populating-cluster\">Step 1: Creating &amp; populating the cluster<\/a><\/li>\n<li><a href=\"#scale-out-cluster\">Step 2: Use Azure Functions &amp; ARM templates to scale out your Citus cluster<\/a><\/li>\n<li><a href=\"#configuring-grafana\">Step 3: Configuring Grafana data sources<\/a><\/li>\n<li><a href=\"#creating-alerts\">Step 4: Creating Grafana alerts<\/a><\/li>\n<li><a href=\"#testing-setup\">Step 5: Testing the auto scaling setup<\/a><\/li>\n<li><a href=\"#monitor-tenants\">Using Grafana &amp; Citus 11.3 to monitor tenants<\/a><\/li>\n<li><a href=\"#key-takeaways\">Key takeaways for auto scaling Azure Cosmos DB for PostgreSQL<\/a><\/li>\n<\/ul>\n<p class=\"pg-section\"><strong>You can also see a demo of the auto scaling setup in action in a talk I gave recently at Citus Con:<\/strong><\/p>\n<figure>\n<div class=\"youtube-video-place video-wrapper\" tabindex=\"0\" role=\"button\" aria-label=\"Click for video\" data-yt-title=\"Azure Cosmos DB for PostgreSQL with Citus, Grafana, and Azure Serverless\" data-yt-url=\"https:\/\/www.youtube-nocookie.com\/embed\/YGdqbbQ-dlw?start=912&amp;enablejsapi=1&amp;rel=0&amp;autoplay=1\" data-yt-id=\"YGdqbbQ-dlw\">\n<div class=\"play-youtube-video\">\n<picture><source srcset=\"..\/..\/..\/..\/..\/assets\/images\/blog\/cituscon23-odsession-lucas.webp\" type=\"image\/webp\" \/><\/picture><iframe src=\"\/\/www.youtube.com\/embed\/YGdqbbQ-dlw\" width=\"560\" height=\"314\" allowfullscreen=\"allowfullscreen\" data-mce-fragment=\"1\"><\/iframe><\/p>\n<div class=\"overlay\" aria-hidden=\"true\"><\/div>\n<\/div>\n<\/div><figcaption><strong>Figure 1:<\/strong>\u00a0Video of the conference talk I gave at Citus Con: An Event for Postgres 2023, titled \u201cAuto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, &amp; Azure Serverless\u201d<\/figcaption><\/figure>\n<h2 id=\"overview\"><a id=\"overview\"><\/a>Overview of the components for Auto Scaling<\/h2>\n<p class=\"pg-section\">The auto scaling architecture this blog proposes combines multiple components that could be either managed in the cloud or on-premises. These components are:<\/p>\n<ul class=\"pg-section\">\n<li><strong>Azure Functions<\/strong><\/li>\n<li><strong>Azure Resource Manager (ARM)<\/strong><\/li>\n<li><strong>Grafana<\/strong><\/li>\n<li><strong>Azure Cosmos DB for PostgreSQL (aka Citus on Azure)<\/strong><\/li>\n<\/ul>\n<p class=\"pg-section\">For the examples presented in this blog post, you will use the managed version of Citus on Azure, by using the Azure Cosmos DB for PostgreSQL service. The same architecture and steps outlined in this blog post will work for a\u00a0<a href=\"https:\/\/github.com\/citusdata\/citus\">Citus open source<\/a>\u00a0cluster, with the only change being the API used to trigger cluster modifications (e.g., add new nodes, scale up compute power, etc.). In the Azure example in this blog, cluster modifications are done through the Azure Resource Manager (ARM) APIs.<\/p>\n<p class=\"pg-section\">First let\u2019s walk through each of these components. Feel free to skip ahead to\u00a0<a href=\"#creating-populating-cluster\">Step 1: Creating &amp; populating the cluster<\/a>\u00a0if you are already familiar with all of the components.<\/p>\n<h3>What are Azure Functions?<\/h3>\n<p class=\"pg-section\"><a href=\"https:\/\/learn.microsoft.com\/azure\/azure-functions\/functions-overview\">Azure Functions<\/a>\u00a0is a serverless solution offered by Microsoft that allows you to deploy your applications without having to worry about servers. (I sometimes refer to it conversationally as Azure Serverless.) The cloud infrastructure is all managed by Azure and you only have to focus on the code that matters to you. It is also a great way to save costs and run workloads that don\u2019t require servers up and running 24\/7.<\/p>\n<p class=\"pg-section\">These functions can be created in many languages, and Azure also provides a CLI to create boilerplates and deploy these functions in the cloud. See the following link for more details:\u00a0<a href=\"https:\/\/learn.microsoft.com\/azure\/azure-functions\/functions-run-local\">Azure Functions Core Tools<\/a><\/p>\n<p class=\"pg-section\">After deploying the functions, you can use REST APIs to trigger them. It\u2019s important to mention that a function may have a maximum execution time, so it\u2019s advisable to not block them on I\/O or other external integrations. You should consider an asynchronous model when possible.<\/p>\n<h3>What is Azure Resource Manager (ARM)?<\/h3>\n<p class=\"pg-section\"><a href=\"https:\/\/learn.microsoft.com\/azure\/azure-resource-manager\/management\/overview\">Azure Resource Manager<\/a>\u00a0(also known as ARM) is a deployment and management service for Azure. ARM can be used to interact with the Azure cloud to create, update, and delete resources.<\/p>\n<p class=\"pg-section\">ARM uses templates that provide a declarative way to define your resources, and you can apply these through Azure APIs such as REST clients, SDKs, CLI and even in the Azure portal itself.<\/p>\n<h3>What is Grafana?<\/h3>\n<p class=\"pg-section\">Grafana is an amazing open source tool that helps improve software observability with operational dashboards. It provides analytics and interactive visualizations for data from many sources, including PostgreSQL and Azure Monitor. These visualizations can exist in the form of tables, graphs and even alerts to external applications. One common usage of Grafana is to monitor application health and create alert triggers based on certain events, such as a high CPU usage spike, for example.<\/p>\n<p class=\"pg-section\">Grafana can be installed on-premises but can also be found in the form of a managed service in the cloud, such as the\u00a0<a href=\"https:\/\/learn.microsoft.com\/azure\/managed-grafana\/overview\">Azure Managed Grafana<\/a>.<\/p>\n<h2 id=\"creating-populating-cluster\"><a id=\"creating-populating-cluster\"><\/a>Step 1: Creating &amp; populating the cluster<\/h2>\n<div class=\"highlight\">\n<pre class=\"highlight sql\"><\/pre>\n<p class=\"pg-section\">The first step is to create a Citus database cluster. For this project you will use the Azure Cosmos DB for PostgreSQL managed service, which is powered by the Citus extension. You can think of it as Citus on Azure. The cluster specification used in this example is:<\/p>\n<ul class=\"pg-section\">\n<li><strong>Coordinator:<\/strong>\u00a032vCores \/ 128 GiB RAM<\/li>\n<li><strong>Workers:<\/strong>\u00a016vCores \/ 128 GiB RAM<\/li>\n<li><strong>PostgreSQL v15 \/ Citus 11.3<\/strong><\/li>\n<\/ul>\n<p class=\"pg-section\">You can find instructions on how to create an Azure Cosmos DB for PostgreSQL cluster in the following link:\u00a0<a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/postgresql\/quickstart-create-portal\">Create an Azure Cosmos DB for PostgreSQL cluster<\/a>.<\/p>\n<p class=\"pg-section\">Once you have provisioned the database cluster, connect to it and execute the following statements:<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight sql\"><code><span class=\"k\">CREATE<\/span> <span class=\"k\">TABLE<\/span> <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">companies<\/span> <span class=\"p\">(<\/span>\r\n  <span class=\"n\">id<\/span> <span class=\"n\">bigserial<\/span> <span class=\"k\">PRIMARY<\/span> <span class=\"k\">KEY<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">name<\/span> <span class=\"nb\">text<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">image_url<\/span> <span class=\"nb\">text<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">created_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">updated_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span>\r\n<span class=\"p\">);<\/span>\r\n\r\n<span class=\"k\">CREATE<\/span> <span class=\"k\">TABLE<\/span> <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">campaigns<\/span> <span class=\"p\">(<\/span>\r\n  <span class=\"n\">id<\/span> <span class=\"n\">bigserial<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">company_id<\/span> <span class=\"nb\">bigint<\/span> <span class=\"k\">REFERENCES<\/span> <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">companies<\/span> <span class=\"p\">(<\/span><span class=\"n\">id<\/span><span class=\"p\">),<\/span>\r\n  <span class=\"n\">name<\/span> <span class=\"nb\">text<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">cost_model<\/span> <span class=\"nb\">text<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"k\">state<\/span> <span class=\"nb\">text<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">monthly_budget<\/span> <span class=\"nb\">bigint<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">blacklisted_site_urls<\/span> <span class=\"nb\">text<\/span><span class=\"p\">[],<\/span>\r\n  <span class=\"n\">created_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">updated_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"k\">PRIMARY<\/span> <span class=\"k\">KEY<\/span> <span class=\"p\">(<\/span><span class=\"n\">company_id<\/span><span class=\"p\">,<\/span> <span class=\"n\">id<\/span><span class=\"p\">)<\/span>\r\n<span class=\"p\">);<\/span>\r\n\r\n<span class=\"k\">CREATE<\/span> <span class=\"k\">TABLE<\/span> <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">ads<\/span> <span class=\"p\">(<\/span>\r\n  <span class=\"n\">id<\/span> <span class=\"n\">bigserial<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">company_id<\/span> <span class=\"nb\">bigint<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">campaign_id<\/span> <span class=\"nb\">bigint<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">name<\/span> <span class=\"nb\">text<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">image_url<\/span> <span class=\"nb\">text<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">target_url<\/span> <span class=\"nb\">text<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">impressions_count<\/span> <span class=\"nb\">bigint<\/span> <span class=\"k\">DEFAULT<\/span> <span class=\"mi\">0<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">clicks_count<\/span> <span class=\"nb\">bigint<\/span> <span class=\"k\">DEFAULT<\/span> <span class=\"mi\">0<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">created_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">updated_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"k\">PRIMARY<\/span> <span class=\"k\">KEY<\/span> <span class=\"p\">(<\/span><span class=\"n\">company_id<\/span><span class=\"p\">,<\/span> <span class=\"n\">id<\/span><span class=\"p\">),<\/span>\r\n  <span class=\"k\">FOREIGN<\/span> <span class=\"k\">KEY<\/span> <span class=\"p\">(<\/span><span class=\"n\">company_id<\/span><span class=\"p\">,<\/span> <span class=\"n\">campaign_id<\/span><span class=\"p\">)<\/span>\r\n    <span class=\"k\">REFERENCES<\/span> <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">campaigns<\/span> <span class=\"p\">(<\/span><span class=\"n\">company_id<\/span><span class=\"p\">,<\/span> <span class=\"n\">id<\/span><span class=\"p\">)<\/span>\r\n<span class=\"p\">);<\/span>\r\n\r\n<span class=\"k\">CREATE<\/span> <span class=\"k\">TABLE<\/span> <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">clicks<\/span> <span class=\"p\">(<\/span>\r\n  <span class=\"n\">id<\/span> <span class=\"n\">bigserial<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">company_id<\/span> <span class=\"nb\">bigint<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">ad_id<\/span> <span class=\"nb\">bigint<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">clicked_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">site_url<\/span> <span class=\"nb\">text<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">cost_per_click_usd<\/span> <span class=\"nb\">numeric<\/span><span class=\"p\">(<\/span><span class=\"mi\">20<\/span><span class=\"p\">,<\/span><span class=\"mi\">10<\/span><span class=\"p\">),<\/span>\r\n  <span class=\"n\">user_ip<\/span> <span class=\"n\">inet<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">user_data<\/span> <span class=\"n\">jsonb<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"k\">PRIMARY<\/span> <span class=\"k\">KEY<\/span> <span class=\"p\">(<\/span><span class=\"n\">company_id<\/span><span class=\"p\">,<\/span> <span class=\"n\">id<\/span><span class=\"p\">)<\/span>\r\n<span class=\"p\">);<\/span>\r\n\r\n<span class=\"k\">CREATE<\/span> <span class=\"k\">TABLE<\/span> <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"nv\">`worker_count`<\/span><span class=\"p\">(<\/span>\r\n  <span class=\"n\">id<\/span> <span class=\"n\">bigserial<\/span> <span class=\"k\">PRIMARY<\/span> <span class=\"k\">KEY<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"n\">created_at<\/span> <span class=\"nb\">timestamp<\/span> <span class=\"k\">with<\/span> <span class=\"nb\">time<\/span> <span class=\"k\">zone<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">NULL<\/span><span class=\"p\">,<\/span>\r\n  <span class=\"k\">count<\/span> <span class=\"nb\">bigint<\/span>\r\n<span class=\"p\">);<\/span>\r\n\r\n<span class=\"k\">SELECT<\/span> <span class=\"n\">cron<\/span><span class=\"p\">.<\/span><span class=\"n\">schedule<\/span><span class=\"p\">(<\/span>\r\n    <span class=\"s1\">'worker-count'<\/span><span class=\"p\">,<\/span>\r\n    <span class=\"s1\">'5 seconds'<\/span><span class=\"p\">,<\/span>\r\n    <span class=\"err\">$$<\/span>\r\n        <span class=\"k\">INSERT<\/span> <span class=\"k\">INTO<\/span> <span class=\"nv\">`worker_count`<\/span><span class=\"p\">(<\/span><span class=\"n\">created_at<\/span><span class=\"p\">,<\/span> <span class=\"k\">count<\/span><span class=\"p\">)<\/span>\r\n        <span class=\"k\">SELECT<\/span>\r\n            <span class=\"n\">now<\/span><span class=\"p\">(),<\/span>\r\n            <span class=\"p\">(<\/span><span class=\"k\">select<\/span>\r\n                <span class=\"k\">count<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)<\/span> <span class=\"k\">as<\/span> <span class=\"nv\">`worker_count`<\/span>\r\n            <span class=\"k\">from<\/span>\r\n                <span class=\"n\">pg_dist_node<\/span>\r\n            <span class=\"k\">where<\/span>\r\n                <span class=\"n\">nodename<\/span> <span class=\"k\">ilike<\/span> <span class=\"s1\">'private-w%'<\/span> <span class=\"k\">limit<\/span> <span class=\"mi\">1<\/span><span class=\"p\">);<\/span>\r\n    <span class=\"err\">$$<\/span>\r\n<span class=\"p\">);<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">You will notice a\u00a0<code>worker_count<\/code>\u00a0table and a cron job creating a new entry every 5 seconds. This is to keep track of the number of nodes in the cluster and will be used to trigger a shard rebalancing operation after a new worker is added. If you are running aCitus open source cluster, make sure to install the\u00a0<a href=\"https:\/\/github.com\/citusdata\/pg_cron\">pg_cron extension<\/a>.<\/p>\n<p class=\"pg-section\">The next step is to distribute the tables:<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight sql\"><code><span class=\"k\">SELECT<\/span> <span class=\"n\">create_distributed_table<\/span><span class=\"p\">(<\/span><span class=\"s1\">'public.companies'<\/span><span class=\"p\">,<\/span> <span class=\"s1\">'id'<\/span><span class=\"p\">);<\/span>\r\n<span class=\"k\">SELECT<\/span> <span class=\"n\">create_distributed_table<\/span><span class=\"p\">(<\/span><span class=\"s1\">'public.campaigns'<\/span><span class=\"p\">,<\/span> <span class=\"s1\">'company_id'<\/span><span class=\"p\">);<\/span>\r\n<span class=\"k\">SELECT<\/span> <span class=\"n\">create_distributed_table<\/span><span class=\"p\">(<\/span><span class=\"s1\">'public.ads'<\/span><span class=\"p\">,<\/span> <span class=\"s1\">'company_id'<\/span><span class=\"p\">);<\/span>\r\n<span class=\"k\">SELECT<\/span> <span class=\"n\">create_distributed_table<\/span><span class=\"p\">(<\/span><span class=\"s1\">'public.clicks'<\/span><span class=\"p\">,<\/span> <span class=\"s1\">'company_id'<\/span><span class=\"p\">);<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">The last step is to fill those tables with data. I will omit this step as inserting data should be a straightforward process to do. INSERT..SELECT statements and scripts using benchmarking tools such as\u00a0<a href=\"https:\/\/www.postgresql.org\/docs\/current\/pgbench.html\">pgbench<\/a>\u00a0are both options to get data into the newly created cluster.<\/p>\n<h2 id=\"scale-out-cluster\"><a id=\"scale-out-cluster\"><\/a>Step 2: Use Azure Functions &amp; ARM templates to scale out your Citus cluster<\/h2>\n<p class=\"pg-section\">In the event of a CPU usage spike, we want to trigger Azure Functions that will:<\/p>\n<ol>\n<li>Use the Azure Resource Manager (ARM) API to add new nodes to the Citus cluster<\/li>\n<li>Connect to the coordinator PostgreSQL server and execute a shard rebalancing operation<\/li>\n<\/ol>\n<p class=\"pg-section\">Instead of creating 1 function for both steps, you should create 2 separate functions. The reason is because there\u2019s a limit in the execution time of any given function, and adding new nodes can take several minutes before completing. When Grafana detects that a new worker was added, then the 2nd function is triggered. I will explain this in more detail in the Grafana section.<\/p>\n<p class=\"pg-section\">Those functions were created using the Typescript language, and the\u00a0<a href=\"https:\/\/learn.microsoft.com\/azure\/azure-functions\/functions-run-local\">Azure Function Core Tools<\/a>\u00a0was used to create the initial boilerplates and to publish the code to the Azure cloud. The last step returns a URL for each function, and these should be used when configuring the Grafana alerts.<\/p>\n<h3>Scale-out function<\/h3>\n<p class=\"pg-section\">The scale-out function requires the following dependencies:<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight typescript\"><code><span class=\"k\">import<\/span> <span class=\"p\">{<\/span> <span class=\"nx\">DefaultAzureCredential<\/span> <span class=\"p\">}<\/span> <span class=\"k\">from<\/span> <span class=\"dl\">\"<\/span><span class=\"s2\">@azure\/identity<\/span><span class=\"dl\">\"<\/span>\r\n<span class=\"k\">import<\/span> <span class=\"p\">{<\/span> <span class=\"nx\">ResourceManagementClient<\/span> <span class=\"p\">}<\/span> <span class=\"k\">from<\/span> <span class=\"dl\">\"<\/span><span class=\"s2\">@azure\/arm-resources<\/span><span class=\"dl\">\"<\/span><span class=\"p\">;<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">The\u00a0<code>@azure\/identity<\/code>\u00a0dependency is required for AAD authentication using service principals and\u00a0<code>@azure\/arm-resources<\/code>\u00a0for the ARM API used to modify our Citus cluster.<\/p>\n<p class=\"pg-section\">AAD authentication is done through the\u00a0<code>new DefaultAzureCredential()<\/code>\u00a0call and uses environment variables to pull the credentials. Next, you have to pass the resulting object and the subscription id to the ARM API class constructor. The whole process can be seen below.<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight typescript\" tabindex=\"0\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">azureSubscriptionId<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">process<\/span><span class=\"p\">.<\/span><span class=\"nx\">env<\/span><span class=\"p\">.<\/span><span class=\"nx\">AZURE_SUBSCRIPTION_ID<\/span><span class=\"p\">;<\/span>\r\n<span class=\"kd\">const<\/span> <span class=\"nx\">credential<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nx\">DefaultAzureCredential<\/span><span class=\"p\">();<\/span>\r\n<span class=\"kd\">const<\/span> <span class=\"nx\">resourceClient<\/span> <span class=\"o\">=<\/span> <span class=\"k\">new<\/span> <span class=\"nx\">ResourceManagementClient<\/span><span class=\"p\">(<\/span><span class=\"nx\">credential<\/span><span class=\"p\">,<\/span> <span class=\"nx\">azureSubscriptionId<\/span><span class=\"p\">);<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">The next step is to build a new ARM template containing the desired state (i.e., the cluster containing new nodes), and use the ARM API to update the Citus cluster. You can find the template reference in the following link:\u00a0<a href=\"https:\/\/learn.microsoft.com\/azure\/templates\/microsoft.dbforpostgresql\/servergroupsv2\/nodeconfigurations?pivots=deployment-language-arm-template\">ARM template for Cosmos DB for PostgreSQL.<\/a><\/p>\n<p class=\"pg-section\">The final part of the function is to execute the following lines:<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight typescript\" tabindex=\"0\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">deployment<\/span><span class=\"p\">:<\/span> <span class=\"nx\">Deployment<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span> <span class=\"na\">properties<\/span><span class=\"p\">:<\/span> <span class=\"p\">{<\/span> <span class=\"na\">mode<\/span><span class=\"p\">:<\/span> <span class=\"dl\">\"<\/span><span class=\"s2\">Incremental<\/span><span class=\"dl\">\"<\/span><span class=\"p\">,<\/span> <span class=\"na\">template<\/span><span class=\"p\">:<\/span> <span class=\"nx\">template<\/span> <span class=\"p\">}<\/span> <span class=\"p\">};<\/span>\r\n<span class=\"k\">await<\/span> <span class=\"nx\">client<\/span><span class=\"p\">.<\/span><span class=\"nx\">deployments<\/span><span class=\"p\">.<\/span><span class=\"nx\">beginCreateOrUpdate<\/span><span class=\"p\">(<\/span><span class=\"nx\">resource_group<\/span><span class=\"p\">,<\/span> <span class=\"nx\">deploymentName<\/span><span class=\"p\">,<\/span> <span class=\"nx\">deployment<\/span><span class=\"p\">);<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">The\u00a0<code>Deployment<\/code>\u00a0object contains the ARM template and the\u00a0<code>beginCreateOrUpdated<\/code>\u00a0method is called with the resource group and a custom deployment name. The client is the same we created instantiating the\u00a0<code>ResourceManagementClient<\/code>\u00a0class.<\/p>\n<p class=\"pg-section\">After the\u00a0<code>beginCreateOrUpdated<\/code>\u00a0method is executed, the cluster is resized and the process takes about 10 minutes to finish.<\/p>\n<p class=\"pg-section\">You can find more information about all steps mentioned below:<\/p>\n<ul class=\"pg-section\">\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/developer\/javascript\/core\/use-azure-sdk\">Azure SDK for JavaScript and TypeScript<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/samples\/azure\/azure-sdk-for-js\/identity-typescript\/\">Azure Identity client library samples<\/a><\/li>\n<\/ul>\n<h3>Shard rebalancing function<\/h3>\n<p class=\"pg-section\">The Citus shard rebalancing function is very straightforward and has a PostgreSQL client dependency that can be seen below. Remember, Citus is just plain PostgreSQL, so any client library will work out-of-the-box!<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight typescript\"><code><span class=\"k\">import<\/span> <span class=\"o\">*<\/span> <span class=\"k\">as<\/span> <span class=\"nx\">postgres<\/span> <span class=\"k\">from<\/span> <span class=\"dl\">\"<\/span><span class=\"s2\">postgres<\/span><span class=\"dl\">\"<\/span><span class=\"p\">;<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">Next, using a connection string inside an environment variable, the function connects to the server and returns a client object.<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight typescript\"><code><span class=\"kd\">const<\/span> <span class=\"nx\">pgClient<\/span> <span class=\"o\">=<\/span> <span class=\"nx\">postgres<\/span><span class=\"p\">(<\/span><span class=\"nx\">process<\/span><span class=\"p\">.<\/span><span class=\"nx\">env<\/span><span class=\"p\">.<\/span><span class=\"nx\">PG_CONNECTION_URL<\/span><span class=\"p\">);<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">The\u00a0<code>pgClient<\/code>\u00a0object is used to trigger the shard rebalancing Citus UDF.<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight typescript\" tabindex=\"0\"><code><span class=\"k\">await<\/span> <span class=\"nx\">pgClient<\/span><span class=\"s2\">`\r\n    SELECT citus_rebalance_start(rebalance_strategy:='by_shard_count', shard_transfer_mode:='block_writes');\r\n`<\/span><span class=\"p\">;<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">After a few minutes the cluster shards will be fully rebalanced and the CPU usage per worker node will diminish. You can find the PostgreSQL library in the following link:\u00a0<a href=\"https:\/\/www.npmjs.com\/package\/postgres\">PostgreSQL JS library.<\/a><\/p>\n<p class=\"pg-section\">Next, we\u2019ll see how to configure Grafana to trigger the functions we just created using alerts.<\/p>\n<h2 id=\"configuring-grafana\"><a id=\"configuring-grafana\"><\/a>Step 3: Configuring Grafana data sources<\/h2>\n<p class=\"pg-section\">Grafana has the concept of data sources, which are connectors used to pull data from third-party software. For this project, we are interested in 2 data sources:<\/p>\n<ol>\n<li><strong>Azure Monitor<\/strong><\/li>\n<li><strong>PostgreSQL<\/strong><\/li>\n<\/ol>\n<p class=\"pg-section\">The first is needed to pull metrics from the Azure VMs used to provision the Azure Cosmos DB for PostgreSQL cluster, which by now you\u2019ve figured out is Citus on Azure. The Azure Monitor data source can be used to create time series charts and alerts on CPU, memory, disk usage, etc.<\/p>\n<p class=\"pg-section\">The second data source is responsible for providing an interface to a PostgreSQL database and can be used to create charts and alerts from custom queries. There is one important addition to Citus 11.3 that can be used together with custom PostgreSQL charts and alerts: Tenant monitoring or what I like to call\u00a0<a href=\"https:\/\/www.citusdata.com\/blog\/2023\/05\/12\/tenant-monitoring-in-citus-and-postgres-with-citus-stat-tenants\/\">tenant level metrics<\/a>.<\/p>\n<p class=\"pg-section\">If you have a Citus open source cluster, then you might need a different data source to get the server metrics. One possible solution could be a\u00a0<a href=\"https:\/\/prometheus.io\/docs\/introduction\/overview\/\">Prometheus<\/a>\u00a0instance connected to your cluster.<\/p>\n<p class=\"pg-section\">Here\u2019s the documentation on how to configure each of these data sources:<\/p>\n<ul class=\"pg-section\">\n<li><a href=\"https:\/\/grafana.com\/docs\/grafana\/latest\/datasources\/azure-monitor\/\">Azure Monitor data source<\/a><\/li>\n<li><a href=\"https:\/\/grafana.com\/docs\/grafana\/latest\/datasources\/postgres\/\">PostgreSQL data source<\/a><\/li>\n<li><a href=\"https:\/\/grafana.com\/docs\/grafana\/latest\/datasources\/prometheus\/\">Prometheus data source<\/a><\/li>\n<\/ul>\n<h2 id=\"creating-alerts\"><a id=\"creating-alerts\"><\/a>Step 4: Creating Grafana alerts<\/h2>\n<p class=\"pg-section\">With the Grafana data sources configured, now it\u2019s time to create alerts. These alerts will be used to call HTTP webhooks that in turn will trigger Azure Functions.<\/p>\n<p class=\"pg-section\">A Grafana alert allows you to learn about problems in your systems before or moments after they occur. You can create custom queries on data sources and specify conditions for these alerts to trigger, and when they do, take an action.<\/p>\n<p class=\"pg-section\">An action could be a Slack message, an email, a webhook triggering some software, etc. This is the beauty of Grafana alerts and what makes me love Grafana.<\/p>\n<p class=\"pg-section\">In this project, you will create a CPU usage alert that triggers the Azure Function that adds new nodes to the Citus cluster. You will also create an alert that triggers when the number of worker nodes change and invokes an Azure Function that connects to the Citus cluster and executes a shard rebalancing operation. The end result is the overall decrease of CPU usage in each individual node, making the alerts stop firing.<\/p>\n<p class=\"pg-section\">The threshold for the CPU alert can be 80%, but it could be configured in any way. For the worker count alert, you should create a custom query on the\u00a0<code>worker_count<\/code>\u00a0table for a given time frame, where if the newest value is different from the first value in the series, then the alert is fired. The idea behind is that if the number of worker nodes changes, then a shard rebalancing operation is needed.<\/p>\n<p class=\"pg-section\">Next is the query you can use to create the alert:<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight sql\"><code><span class=\"k\">SELECT<\/span>\r\n    <span class=\"n\">to_timestamp<\/span><span class=\"p\">(<\/span><span class=\"n\">trunc<\/span><span class=\"p\">(<\/span><span class=\"k\">extract<\/span><span class=\"p\">(<\/span><span class=\"n\">epoch<\/span> <span class=\"k\">from<\/span> <span class=\"n\">created_at<\/span><span class=\"p\">)))<\/span> <span class=\"k\">as<\/span> <span class=\"nb\">time<\/span><span class=\"p\">,<\/span>\r\n    <span class=\"k\">count<\/span> <span class=\"k\">as<\/span> <span class=\"n\">worker_count<\/span>\r\n<span class=\"k\">FROM<\/span>\r\n    <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">worker_count<\/span>\r\n<span class=\"k\">WHERE<\/span>\r\n    <span class=\"n\">created_at<\/span> <span class=\"o\">&gt;<\/span> <span class=\"n\">now<\/span><span class=\"p\">()<\/span> <span class=\"o\">-<\/span> <span class=\"s1\">'5m'<\/span><span class=\"p\">::<\/span><span class=\"n\">interval<\/span><span class=\"p\">;<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">You can learn more about how to set up alerts and notifications in the following links:<\/p>\n<ul class=\"pg-section\">\n<li><a href=\"https:\/\/grafana.com\/docs\/grafana\/latest\/alerting\/alerting-rules\/create-grafana-managed-rule\/\">Create Grafana-managed alert rules<\/a><\/li>\n<li><a href=\"https:\/\/grafana.com\/docs\/grafana\/latest\/alerting\/manage-notifications\/manage-contact-points\/\">Manage contact points<\/a><\/li>\n<li><a href=\"https:\/\/grafana.com\/docs\/grafana\/latest\/alerting\/manage-notifications\/create-notification-policy\/\">Manage notification policies<\/a><\/li>\n<\/ul>\n<p class=\"pg-section\">When configuring the webhook contact points, you should use the Azure Function URLs that were created previously. After that, create a notification policy routing both alerts to their respective contact points.<\/p>\n<h2 id=\"testing-setup\"><a id=\"testing-setup\"><\/a>Step 5: Testing the auto scaling setup<\/h2>\n<p class=\"pg-section\">In order to test whether the auto scaling setup worked or not, you can run a pgbench script with the two following distributed queries.<\/p>\n<p class=\"pg-section\"><strong>Query 1:<\/strong><\/p>\n<div class=\"highlight\">\n<pre class=\"highlight sql\" tabindex=\"0\"><code><span class=\"err\">\\<\/span><span class=\"k\">set<\/span> <span class=\"n\">company_id1<\/span> <span class=\"n\">random<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2000<\/span><span class=\"p\">)<\/span>\r\n<span class=\"err\">\\<\/span><span class=\"k\">set<\/span> <span class=\"n\">company_id2<\/span> <span class=\"n\">random<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2000<\/span><span class=\"p\">)<\/span>\r\n<span class=\"err\">\\<\/span><span class=\"k\">set<\/span> <span class=\"n\">company_id3<\/span> <span class=\"n\">random<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2000<\/span><span class=\"p\">)<\/span>\r\n<span class=\"k\">BEGIN<\/span><span class=\"p\">;<\/span>\r\n    <span class=\"k\">SELECT<\/span>\r\n        <span class=\"n\">ads<\/span><span class=\"p\">.<\/span><span class=\"n\">company_id<\/span><span class=\"p\">,<\/span>\r\n        <span class=\"n\">ads<\/span><span class=\"p\">.<\/span><span class=\"n\">id<\/span><span class=\"p\">,<\/span>\r\n        <span class=\"k\">count<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">)<\/span>\r\n    <span class=\"k\">FROM<\/span>\r\n        <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">ads<\/span> <span class=\"n\">ad<\/span>\r\n    <span class=\"k\">INNER<\/span> <span class=\"k\">JOIN<\/span>\r\n        <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">campaigns<\/span> <span class=\"n\">ca<\/span> <span class=\"k\">ON<\/span> <span class=\"n\">ad<\/span><span class=\"p\">.<\/span><span class=\"n\">campaign_id<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ca<\/span><span class=\"p\">.<\/span><span class=\"n\">id<\/span> <span class=\"k\">AND<\/span> <span class=\"n\">ad<\/span><span class=\"p\">.<\/span><span class=\"n\">company_id<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ca<\/span><span class=\"p\">.<\/span><span class=\"n\">company_id<\/span>\r\n    <span class=\"k\">WHERE<\/span>\r\n        <span class=\"n\">ads<\/span><span class=\"p\">.<\/span><span class=\"n\">company_id<\/span> <span class=\"k\">IN<\/span> <span class=\"p\">(<\/span>\r\n            <span class=\"p\">:<\/span><span class=\"n\">company_id1<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"p\">:<\/span><span class=\"n\">company_id2<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"p\">:<\/span><span class=\"n\">company_id3<\/span>\r\n        <span class=\"p\">)<\/span>\r\n    <span class=\"k\">GROUP<\/span> <span class=\"k\">BY<\/span>\r\n        <span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">END<\/span><span class=\"p\">;<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\"><strong>Query 2:<\/strong><\/p>\n<div class=\"highlight\">\n<pre class=\"highlight sql\"><code><span class=\"err\">\\<\/span><span class=\"k\">set<\/span> <span class=\"n\">company_id<\/span> <span class=\"n\">random<\/span><span class=\"p\">(<\/span><span class=\"mi\">1<\/span><span class=\"p\">,<\/span> <span class=\"mi\">2000<\/span><span class=\"p\">)<\/span>\r\n<span class=\"k\">BEGIN<\/span><span class=\"p\">;<\/span>\r\n    <span class=\"k\">SELECT<\/span>\r\n        <span class=\"n\">co<\/span><span class=\"p\">.<\/span><span class=\"n\">id<\/span> <span class=\"k\">as<\/span> <span class=\"n\">company_id<\/span><span class=\"p\">,<\/span>\r\n        <span class=\"n\">co<\/span><span class=\"p\">.<\/span><span class=\"n\">name<\/span> <span class=\"k\">as<\/span> <span class=\"n\">company_name<\/span><span class=\"p\">,<\/span>\r\n        <span class=\"k\">SUM<\/span><span class=\"p\">(<\/span><span class=\"n\">ca<\/span><span class=\"p\">.<\/span><span class=\"n\">monthly_budget<\/span><span class=\"p\">)<\/span> <span class=\"k\">as<\/span> <span class=\"n\">monthly_budget<\/span>\r\n    <span class=\"k\">FROM<\/span>\r\n        <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">companies<\/span> <span class=\"n\">co<\/span>\r\n    <span class=\"k\">INNER<\/span> <span class=\"k\">JOIN<\/span>\r\n        <span class=\"k\">public<\/span><span class=\"p\">.<\/span><span class=\"n\">campaigns<\/span> <span class=\"n\">ca<\/span> <span class=\"k\">ON<\/span> <span class=\"n\">co<\/span><span class=\"p\">.<\/span><span class=\"n\">id<\/span> <span class=\"o\">=<\/span> <span class=\"n\">ca<\/span><span class=\"p\">.<\/span><span class=\"n\">company_id<\/span>\r\n    <span class=\"k\">WHERE<\/span>\r\n        <span class=\"n\">co<\/span><span class=\"p\">.<\/span><span class=\"n\">id<\/span> <span class=\"o\">=<\/span> <span class=\"p\">:<\/span><span class=\"n\">company_id<\/span>\r\n    <span class=\"k\">GROUP<\/span> <span class=\"k\">BY<\/span> <span class=\"mi\">1<\/span><span class=\"p\">;<\/span>\r\n<span class=\"k\">END<\/span><span class=\"p\">;<\/span>\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">The following command was issued to execute both queries:<\/p>\n<div class=\"highlight\">\n<pre class=\"highlight shell\" tabindex=\"0\"><code>pgbench <span class=\"s1\">'postgres:\/\/citus:&lt;password&gt;@&lt;cluster-url&gt;:5432\/citus?sslmode=require'<\/span> <span class=\"nt\">-j<\/span> 4 <span class=\"nt\">-c<\/span> 16 <span class=\"nt\">-f<\/span> query01.sql <span class=\"nt\">-f<\/span> query02.sql <span class=\"nt\">-T<\/span> 2000 <span class=\"nt\">--progress<\/span><span class=\"o\">=<\/span>1\r\n<\/code><\/pre>\n<\/div>\n<p class=\"pg-section\">After a few seconds, the CPU usage surpassed the 80% threshold and after ~15 minutes the cluster was resized, rebalanced, and the alerts stopped firing.<\/p>\n<p class=\"pg-section\">The same demo can be seen in the\u00a0<a href=\"https:\/\/www.youtube.com\/watch?v=YGdqbbQ-dlw\">Citus Con 2023 talk<\/a>\u00a0I gave recently about Auto scaling Azure Cosmos DB for PostgreSQL with Citus, Grafana, and Azure Serverless.<\/p>\n<h2 id=\"monitor-tenants\"><a id=\"monitor-tenants\"><\/a>Using Grafana &amp; Citus 11.3 to monitor tenants<\/h2>\n<p class=\"pg-section\">The new\u00a0<a href=\"https:\/\/www.citusdata.com\/blog\/2023\/05\/05\/whats-new-in-citus-11-3-multi-tenant-saas\">Citus 11.3 release<\/a>\u00a0brings many improvements and one of the best (in my opinion), is the support for\u00a0<a href=\"https:\/\/www.citusdata.com\/blog\/2023\/05\/12\/tenant-monitoring-in-citus-and-postgres-with-citus-stat-tenants\/\">tenant monitoring<\/a>\u00a0with the\u00a0<code>citus_stat_tenants<\/code>\u00a0view. If you are running a multi-tenant SaaS application on Citus, then this view can give very good insights on how each tenant is performing in your cluster.<\/p>\n<p class=\"pg-section\">Some of the attributes available in the\u00a0<code>citus_stat_tenants<\/code>\u00a0view are:<\/p>\n<ul class=\"pg-section\">\n<li>Number of read queries for tenant in current period<\/li>\n<li>Number of read queries for tenant in the last period<\/li>\n<li>Number of read\/write queries for tenant in current period<\/li>\n<li>Number of read\/write queries for tenant in last period<\/li>\n<li>Seconds of CPU time spent for tenant in current period<\/li>\n<li>Seconds of CPU time spent for tenant in last period<\/li>\n<\/ul>\n<p class=\"pg-section\">These attributes in the\u00a0<code>citus_stat_tenants<\/code>\u00a0view\u2014in combination with the Grafana integration\u2014can be very powerful. You can use Citus tenant monitoring to detect possible tenant bottlenecks (e.g, outgrowing tenant unbalancing worker nodes, degradation of node performance due to a single tenant, etc). Citus offers a handful of PostgreSQL functions to respond to such events, including: tenant isolation, shard rebalancing, move tenant to different node, and more.<\/p>\n<h2 id=\"key-takeaways\"><a id=\"key-takeaways\"><\/a>Key takeaways for auto scaling Azure Cosmos DB for PostgreSQL<\/h2>\n<p class=\"pg-section\">The main takeaway from this blog is the possibility of extending and creating responsive Azure Cosmos DB for PostgreSQL clusters\u2014or Citus open source clusters\u2014with tenant-level metrics, server metrics, custom queries etc. Auto scaling, even though the focus of this blog, is one of the many possibilities of the setup I described. You can think of it as an event-based cluster management solution!<\/p>\n<p class=\"pg-section\">Because Azure Cosmos DB for PostgreSQL is powered by the <a href=\"https:\/\/github.com\/citusdata\/citus\">Citus open source extension to PostgreSQL<\/a>\u2014which is not a fork\u2014it works well with native PostgreSQL connectors and libraries. This versatility together with the power of Citus distributed tables makes Azure Cosmos DB for PostgreSQL one of the most powerful and extensible tools out there.<\/p>\n<h3>Next steps<\/h3>\n<p class=\"pg-section section-active\">I hope you enjoyed the ride and in case you want to learn more, here are some resources:<\/p>\n<ul class=\"pg-section section-active\">\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/cosmos-db\/postgresql\/introduction\">Azure Cosmos DB for PostgreSQL<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/managed-grafana\/overview\">Azure Managed Grafana<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/azure-functions\/functions-overview\">Azure Functions<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/azure-resource-manager\/management\/overview\">Azure Resource Manager (ARM)<\/a><\/li>\n<li><a href=\"https:\/\/learn.microsoft.com\/azure\/developer\/javascript\/core\/use-azure-sdk\">Azure SDKs for JavaScript and TypeScript<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/citusdata\/citus\">Citus open source repo<\/a><\/li>\n<\/ul>\n<h3><strong>Get Started with Azure Cosmos DB for free<\/strong><\/h3>\n<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/cosmos-db\/\" target=\"_blank\" rel=\"noopener\">Azure Cosmos DB<\/a>\u00a0is a fully managed NoSQL and relational database for modern app development with SLA-backed speed and availability, automatic and instant scalability, and support for open source PostgreSQL, MongoDB and Apache Cassandra.\u00a0<a href=\"https:\/\/cosmos.azure.com\/try\/\" target=\"_blank\" rel=\"noopener\">Try Azure Cosmos DB for free here<\/a>. To stay in the loop on Azure Cosmos DB updates, follow us on\u00a0<a href=\"https:\/\/twitter.com\/AzureCosmosDB\" target=\"_blank\" rel=\"noopener\">Twitter<\/a>,\u00a0<a href=\"https:\/\/www.youtube.com\/AzureCosmosDB\" target=\"_blank\" rel=\"noopener\">YouTube<\/a>, and\u00a0<a href=\"https:\/\/www.linkedin.com\/company\/azure-cosmos-db\/\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a>.<\/p>\n<pre class=\"highlight sql\"><code><\/code><\/pre>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>One of the top Citus features is the ability to run PostgreSQL at any scale, on a single node as well as a distributed database cluster. As your application needs to scale, you can add more nodes to the Citus cluster, rebalance existing data to the new Postgres nodes, and seamlessly scale out. However, these [&hellip;]<\/p>\n","protected":false},"author":119277,"featured_media":6203,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1838],"tags":[],"class_list":["post-6201","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-postgresql"],"acf":[],"blog_post_summary":"<p>One of the top Citus features is the ability to run PostgreSQL at any scale, on a single node as well as a distributed database cluster. As your application needs to scale, you can add more nodes to the Citus cluster, rebalance existing data to the new Postgres nodes, and seamlessly scale out. However, these [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/6201","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/users\/119277"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/comments?post=6201"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/posts\/6201\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media\/6203"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/media?parent=6201"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/categories?post=6201"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cosmosdb\/wp-json\/wp\/v2\/tags?post=6201"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}