Backups for Azure Managed Instance for Apache Cassandra®

Wenting Wu

In the dynamic landscape of cloud-based data management, Azure Managed Instance for Apache Cassandra® (Azure MI Cassandra) stands out as a fully managed service that simplifies the deployment and management of Cassandra clusters on the Azure platform. Beyond its core functionalities, Azure MI Cassandra offers a valuable backup feature, allowing users to capture snapshots of their Cassandra clusters at critical moments. In this blog, we’ll explore how to leverage this feature to create robust backups for your Azure MI Cassandra cluster.

Why Backups Matter

In the context of an Azure MI Cassandra cluster, backups are copies of your Azure Cassandra MI cluster’s data. Let’s delve into why backups matter and examine the diverse scenarios where they prove invaluable:

  • Disaster Recovery: Consider a scenario where a crucial table or keyspace within your Azure MI Cassandra cluster gets corrupted or accidentally deleted. In such dire circumstances, backups come to the rescue. With backups, Azure MI Cassandra can minimize data loss by rolling back your cluster to a previous state.
  • Data Transfer: Imagine the need to create a replica of your Azure MI Cassandra cluster for testing purposes. Using backups, Azure MI Cassandra can seamlessly transfer data from one cluster to another, ensuring a smooth replication process.
  • Data Retention: In today’s data-driven landscape, organizations are tasked with managing vast amounts of information, much of which may need to be retained for compliance, historical analysis, or reference purposes. Backups could serve as a reliable means of data retention, ensuring that valuable information is preserved for future needs.

How to Create Backups

Now, you may be wondering how to create backups. It’s a breeze. Just head over to the Azure MI Cassandra portal and configure backup policies for your cluster, as pictured below.

A screenshot of a computer Description automatically generated

These policies include things like how often you want backups to happen (Schedule) and how long you want to keep them around (Retention). Once you set your backup policies, Azure MI Cassandra springs into action and generates backups according to these policies. You can conveniently monitor these backups right from the portal.

Azure MI Cassandra securely stores your backups off-site, physically separating them from the data it backs up. Curious about storage costs? You can find all the details here.

Azure MI Cassandra employs a technique called differential backups, which only stores newly created data files since the last backup, which significantly reduces storage usage and saves you money.

In the subsequent sections of this blog, we’ll guide you through crafting optimal policies tailored to your requirements.

Backup Schedule

Firstly, let’s contemplate when and how frequently backups should be triggered, which can be set via Backup Schedule in backup policy using cron expressions.

Periodic backups are essential for any database, as they protect against data loss resulting from various factors, including software bugs and accidental deletions. If any of these events occur, having recent backups allows you to restore your data to a previous state, minimizing the impact on your business operations.

Although restoring your data from backups mitigates data loss to some extent, it’s important to note that it only offers partial protection. Backups capture the state of your data at a specific moment, akin to taking a photograph — it freezes everything as it exists at the time of backup creation. Consequently, any subsequent modifications or additions to the data are not reflected in that backup. For example, if you take a backup at 10:00 AM and then make changes to your data at 10:30 AM, those changes will not be included in the backup taken at 10:00 AM. This lack of capturing changes post-backup means that if you need to restore your data from that backup, you’ll lose any modifications made after the backup was taken. This entails creating backups frequently enough to minimize the time gap between the latest backup and the current time, thereby reducing potential data loss.

Here are some general guidelines to help you decide backup frequency based on the criticality of your data:

  1. Hourly Backups: For highly critical data that undergoes frequent changes throughout the day, such as real-time financial transactions.
  2. Daily Backups: For critical data that changes frequently, such as customer information.
  3. Weekly Backups: For data that is important but doesn’t change as frequently.
  4. Monthly or Quarterly Backups: For data with minimal business impact.

Furthermore, it’s essential to determine the busy hours of your cluster and avoid creating backups during these periods. This precaution is necessary because during the backup process, all writes stored in memory (memtable) are flushed to data files on disk (SSTables), which may subsequently trigger compaction. This process consumes system resources like CPU and memory, potentially impairing Cassandra’s performance. If you’re concerned about this impact, it’s advisable to avoid scheduling backups during busy hours. If you’re uncertain about busy hours, you can check metrics in the portal, such as CPU and memory usage, to determine when to schedule backups at the desired frequency.

Backup Retention

Having set the backup schedule, let’s now focus on backup retention — how long you should hang onto these backups. Azure MI Cassandra gives you quite the range, from one hour to 10 years.

Here’s the thing: the main point of backups is to restore your cluster when things go awry. So, you’ve got to think ahead. Imagine you only notice data corruption or deletion three days later. You’d want to roll back your cluster to a time before the mishap, right? But if your backup retention is only two days, you’re out of luck.

For effective recovery, it’s advisable to maintain a backup retention period between 7 and 90 days (about 3 months). A week’s worth of backups can usually cover your back in most situations. But hey, if you’re the cautious type or deal with sensitive data, extending that retention period might ease your worries. Just remember, once you go beyond 90 days (about 3 months), you’re entering the realm of diminishing returns – it’s rare that data older than that proves valuable.

Moreover, backups aren’t solely about disaster recovery. They can also serve as a tool for data retention, fulfilling various needs such as audit trails and compliance requirements. In such instances, you may need to set longer retention periods to align with regulatory standards or organizational policies.

Multiple Backup Policies

With Azure MI Cassandra supporting multiple backup policies, you can create several policies tailored to different purposes for your Azure MI Cassandra cluster. For example, you may require recent backups for immediate recovery purposes while also needing backups for long-term data retention over a five-year period. In such cases, define two backup policies:

  • daily backups with a 7-day retention period
  • quarterly backups for 5 years

This approach ensures that you have both short-term recovery options and long-term archival backups in place, covering a range of potential scenarios.

If unsure about creating effective backup policies for your cluster, you can adopt the Grandfather, Father, Son (GFS) method. For instance, you can establish three backup policies:

  • daily backups with a 7-day retention period (Son)
  • weekly backups with a one-month retention period (Father)
  • monthly backups with a one-quarter retention period (Grandfather)

By employing this tiered approach, you balance short-term recovery needs with long-term data retention requirements, ensuring a comprehensive backup strategy.

Conclusion

By adhering to these best practices and leveraging the backup feature of Azure MI Cassandra, you can fortify the resilience and reliability of your Cassandra clusters, ensuring business continuity and data integrity in the face of unforeseen events.

About Azure Cosmos DB

Azure Cosmos DB is a fully managed and serverless distributed database for modern app development, with SLA-backed speed and availability, automatic and instant scalability, and support for open-source PostgreSQL, MongoDB, and Apache Cassandra. Try Azure Cosmos DB for free here. To stay in the loop on Azure Cosmos DB updates, follow us on XYouTube, and LinkedIn.

Apache Cassandra, Cassandra, and the Cassandra logo, are either registered trademarks or trademarks of The Apache Software Foundation.

0 comments

Leave a comment

Feedback usabilla icon