Maintenance window strategy for database migrations

Marc van Duyn

Maintenance window strategy for database migrations

In a recent project, we were tasked with implementing a database migration strategy.The goal was to minimize the impact on our users and to ensure that duringthe database migration database would not be corrupted with writes and readsin the process. This blog post will describe the strategy we implemented andhow we applied it.

Database migrations can be a tricky task for any organization, especially whenit comes to maintaining the availability of the system during the migration process.One way to mitigate this risk is to implement a maintenance window strategy.

In a maintenance window, the application is taken offline for maintenance or updates.This approach enables IT teams to perform necessary updates or migrationswithout compromising the system’s availability for end users or risking acorrupt database.

As can be seen below in the diagram, the maintenance window strategy isimplemented by taking the system offline during the maintenance window andperforming the database migration. Once the migration is complete, the systemis brought back online. During the maintenance window, the system is notavailable for end users and any requests made to the system receive a 503http status code.

maintenance window sequence diagram

The criteria for implementing a maintenance window strategy for databasemigrations are:

  • Your database does not have migration tools that can be used to perform migrationswithout taking the database offline.
  • Your database is not hosted on a cloud provider that provides a managed databaseservice that can be used to perform migrations without taking the database offline.
  • You’re using a single database for your application and not a set of distributed database.

Real-world example

All source code in this article can be found here.

For a real-world example, we will look at a maintenance window strategy implementation fora FLASK application with a SQL based database. Given that the application uses SQLAlchemy and alembic fordatabase migrations we will have the following components:

Service context database model

The service context database model is a database model that is used to store thecurrent state of the service. This model is used to determine if the service isin maintenance mode or not. The model is defined as follows:

Note: The service context database model can also be extended to store otherinformation about the service such as the current version of the service.

from src.infrastructure.databases import sqlalchemy_db as db
from src.infrastructure.models.model_extension import ModelExtension

class ServiceContext(db.Model, ModelExtension):
    __tablename__ = 'service_context'
    id = db.Column(db.Integer, primary_key=True)
    maintenance = db.Column(db.Boolean, default=False)

We use a database model to store the current state of the service because it allowsus to store the state of the service persistently. This means that if theservice is restarted, the state of the service will be restored. Also, if the serviceis running on multiple instances, the state of the service will be consistent acrossall instances.

Service context service

The service context service is a service that is used to interact with the servicecontext database model. The service context service is defined as follows:

from src.infrastructure.models import ServiceContext
from src.infrastructure.databases import sqlalchemy_db as db

class ServiceContextService:

    def update(self, data):
        service_context = ServiceContext.query.first()

        if service_context is None:
            service_context = ServiceContext()

        service_context.update(db, data)
        return service_context

    def get_service_context(self):
        status = ServiceContext.query.first()

        if status is None:
            status = ServiceContext()

        return status

Maintenance mode activation and deactivation

To activate of deactivate the maintenance mode, we can use a route as shown below:

@blueprint.route('/service-context', methods=['PATCH'])
def update_service_context(
    validated_data = ServiceContextSchema().load(json_data)
    service_context = service_context_service.update(validated_data)
    return create_response(service_context, ServiceContextSchema)

When creating this public route, make sure that you have a way to authenticate the user that makes the request in order to determine that the user has the necessary permissions.

You can also create a management command with Flask cli toactivate the maintenance mode from within the application:

def activate_maintenance_mode():
    service_context_service = app.container.service_context_service()
    service_context_service.update({"maintenance": True})"Maintenance mode activated")

def deactivate_maintenance_mode():
    service_context_service = app.container.service_context_service()
    service_context_service.update({"maintenance": False})"Maintenance mode deactivated")

This allows you to activate or deactivate the maintenance mode from the command line ifyou have access to the server where the application is running.

Maintenance mode check

We are using the before_request decorator to run the maintenance mode check beforeeach request. The maintenance mode check is defined as follows:

def check_for_maintenance():
    service_context_service = app.container.service_context_service()
    status = service_context_service.get_service_context()

    if not ("maintenance" in request.path or "status" in request.path):
        if status.maintenance:
            return jsonify(
                {"message": "Service is currently enduring maintenance"}
            ), 503

With this implementation, the service will return a 503 http status code for allrequests that are not made to the maintenance mode activation, deactivationor status routes.

Applying migrations during maintenance window

You have several ways to apply migrations during a maintenance window.

One way is to do the migration manually by running the migration commands fromwithin the application. This is the simplest way to apply migrations and givesyou the most control over the migration process.

Manual migrations

An example of applying the migration manually is shown below in a kubernetes cluster:

  1. Access a pod that has access to a database and has the maintenance window strategy implemented
    kubectl exec -it <pod_name> -- /bin/bash
  2. Activate the maintenance mode
    python activate_maintenance_mode
  3. Apply the migration
    python db upgrade
  4. Deactivate the maintenance mode
    python deactivate_maintenance_mode

Pipeline based migration

This is a more advanced way to apply migrations during a maintenance window.The pipeline below is an azure devops pipeline and an example of an azureapp service deployment where a maintenance window is used to apply the migration.

The pipeline is defined as follows:

    - <service_path_from_rooth>/migrations/*

  production_web_app_service_url: "https://<production_service_url>"
  staging_web_app_service_url: "https://<staging_web_app_service_url>"
  projectRoot: $CI_PROJECT_DIR
  vmImageName: ubuntu-latest
  pythonVersion: 3.8
  isMain: $[eq(variables['Build.SourceBranch'], 'refs/heads/main')]

- stage: "deploy_stable"
  condition: and(always(), eq(variables['isMain'], True))
  - job: "build_web_app"
    displayName: "Build web app"
      vmImage: $(vmImageName)
    - task: UsePythonVersion@0
        versionSpec: '$(pythonVersion)'
    - script: |
        python -m venv venv
        source  venv/bin/activate
        python -m pip install --upgrade pip
        pip install setup
        pip install --target="./.python_packages/lib/site-packages" -r ./requirements.txt
      workingDirectory: $(projectRoot)
      displayName: "Install requirements"
    - task: ArchiveFiles@2
        rootFolderOrFile: '$(Build.SourcesDirectory)'
        includeRootFolder: false
        archiveType: 'zip'
        archiveFile: '$(Build.ArtifactStagingDirectory)/Application$(Build.BuildId).zip'
        replaceExistingArchive: true
    - publish: $(Build.ArtifactStagingDirectory)/Application$(Build.BuildId).zip
      displayName: 'Upload package'
      artifact: drop
    - script: |
        curl -X PATCH -H "Content-Type: application/json" -d '{"maintenance": true}' $(production_web_app_service_url)/service-context
      name: activate_maintenance_mode_production
      displayName: Activate maintenance mode on production
    - task: AzureWebApp@1
      displayName: "Deploy web app to staging"
        azureSubscription: '<Azure service connection>'
        appType: webAppLinux
        appName: '<name of web app>'
        deployToSlotOrASE: true
        resourceGroupName: '<name of resource group>'
        slotName: staging
    - script: |
        curl -X PATCH -H "Content-Type: application/json" -d '{"maintenance": true}' $(staging_web_app_service_url)/service-context
      name: activate_maintenance_mode_staging
      displayName: Activate maintenance mode on staging
    - task: AzureKeyVault@2
      displayName: Load key vault db connection string
        connectedServiceName: azure-keyvault-connection
        keyVaultName: $(keyVaultName)
        secretsFilter: '*'
    - script: |
        > .env
      name: setup_env_file
      displayName: Setup env file
    - script: |
        python db upgrade
      name: apply_migrations
      displayName: Apply migrations
    - task: AzureAppServiceManage@0
      displayName: 'Swap staging and production slots'
        azureSubscription: '<Azure service connection>'
        appType: webAppLinux
        WebAppName: '<name of web app>'
        ResourceGroupName: '<name of resource group>'
        SourceSlot: staging
        SwapWithProduction: true
    - script: |
        curl -X PATCH -H "Content-Type: application/json" -d '{"maintenance": false}' $(production_web_app_service_url)/service-context
      name: deactivate_maintenance_mode_production
      displayName: Deactivate maintenance mode on production

The pipeline is triggered when a migration is pushed to the main branch.The pipeline will then build the application,set the production and staging app in maintenance mode, apply the migration andswap the production and staging slots.

Closing remarks

When planning a maintenance window for a database migration, it’s important toconsider the following:

1) Identify the right time: Choose a time that has the least impact on end users, such as during off-peak hours or weekends.

2) Communicate with stakeholders: Notify all stakeholders, including customers and internal teams, of the planned maintenance window well in advance. This will give them ample time to plan and prepare for the interruption in service.

3) Test the migration process: Before the actual migration, test the process in a non-production environment to ensure it runs smoothly. This will also help identify any potential issues that need to be addressed.

4) Have a rollback plan: In case something goes wrong during the migration, it’s important to have a rollback plan in place to quickly restore the previous version of the database.

5) Monitor the migration: Continuously monitor the migration process to ensure it’s running smoothly and to quickly address any issues that may arise.

6) By implementing a maintenance window strategy, organizations can minimize the impact of database migrations on end users and ensure the availability of the system during the migration process.

It’s also important to note that some migrations such as those of very large databases orthose that require a lot of data transformation may require more than one maintenance window.In such scenarios it is important to plan and test the migration in chunks, allowing for a moregradual and controlled migration process.

Overall, a maintenance window strategy can be a useful tool for organizations looking to perform database migrationswhile minimizing disruption to end users. With careful planning, testing and monitoring, IT teams can ensure asmooth and successful migration process.

If you want to learn more or want to check out the code for yourself, you can find all source code here. This will give a cookiecutter repository that provides you with a flask onion architecture with support for maintenance mode and tests with test containers.