Cloud Design Patterns – Availability
In this post, App Dev Manager John Tran explores some important availability concepts you need to consider when moving applications to the cloud.
Moving to the cloud forces us to change the way we design and deploy applications. The traditional way of application development for on premises infrastructure is not the same as development for the cloud. The cloud has its own unique set of challenges and when developing an application to be cloud first, you must design your applications with several cloud Design Patterns in mind.
In this article, we explore the design patterns used to maximize an applications availability in the cloud. Availability is defined to be the proportion of time that a system is functional and working. An applications availability must be defined and measured. Traditional metrics such as uptime are not as accurate when used to measure todays modern applications. To accurately measure a systems availability, we must take into account all of its sub systems and dependencies. For example, if our web application is responsive but functionality is impacted because the database is down, can we say that the app is available?
There are a number of factors that have an impact on your applications availability:
- System errors – access/error longs
- Infrastructure problems – network latency, performance of compute/storage, network bandwidth
- Malicious attacks – DoS, DDoS
- Genuine system load – either planned or unexpected
There are primary 3 Design patterns that will help to maximize an applications availability within the cloud.
Design Pattern #1: Health checks – endpoint monitoring
To know when your application is experiencing problems, it’s a best practice to monitor the web application and back-end services. It is important to treat the system as a whole and take into account all of your applications sub systems and dependencies.
Also, consider that applications running in the cloud present some unique challenges. For example, your team may not have full control of the production environment as you do with on premises hosting.
When creating application monitors, some typical checks that can be performed include:
- Validating the response code. For example, an HTTP response of 1XX Informational, 2XX Success, 3XX Redirection, 4XX Client errors, 5XX Server Error. Check the content of the response to detect errors, even when a 200 (OK) status code is returned.
- Measuring the response time, which indicates a combination of the network latency and the time that the application took to execute the request. An increasing value can indicate an emerging problem with the application or network.
- Checking resources or services located outside the application, such as a content delivery network used by the application to deliver content from global caches.
- Checking for expiration of SSL certificates.
- Measuring the response time of a DNS lookup for the URL of the application to measure DNS latency and DNS failures.
- Validating the URL returned by the DNS lookup to ensure correct entries.
Monitoring and alerting is critical for any team to have the insights into how their applications are performing. The health checks must be written such that they exercise the critical functionality of the system and alerts must be set up to trigger a response from the operations team. Keep in mind that alerting thresholds must be set very carefully as false alarms can bring down the level of urgency and cause legitimate alarms to be ignored.
Don’t let your end users be your monitoring system. Setting up proactive monitors and alerts will enable your team to forecast and mitigate any potential problems before they become business impacting.
Design Pattern #2: Queue based load leveling
A queue acts as a buffer between a task and the services that it relies on. Queuing requests provides consistency during heavy load and helps to minimize the impact of peaks in demand on the availability of the application.
Implementing a queue means that you must refactor the solution or build your application with queuing right from the beginning. The application must be built so that tasks and services run asynchronously. The task will post to a message queue and then the queue will then act as a buffer– storing the messages up until it is retrieved by the service. Once the message has been retrieved and processed, it is removed from the queue. Requests can now be generated at a high and unpredictable rate; however, the queue will allow the service to consume these requests at a consistent rate, regardless of the volume of requests in the queue.
Queue based load leveling has the following benefits:
- It can help to maximize availability because delays arising in services won’t have an immediate and direct impact on the application, which can continue to post messages to the queue even when the service isn’t available or isn’t currently processing messages.
- It can help to maximize scalability because both the number of queues and the number of services can be varied to meet demand.
- It can help to control costs because the number of service instances deployed only have to be adequate to meet average load rather than the peak load.
Implementing this pattern would benefit any application that uses services that are subject to overloading, however, this pattern isn’t useful if the application expects a response from the service with minimal latency.
Design Pattern #3: Throttling load
The goal of throttling the load is to control the consumption of resources used by an application so that the system can still continue function even when under extreme demand. The load on a typical cloud application varies greatly. It may be dependent on the number of concurrent users, the type of tasks being performed, or the time of day. Load can also vary around events like holidays, media/press launches, viral content, and day to day business operations.
Throttling allows the application to consume resources until an upper limit is reached. The system will slow the flood of request allowing the system to respond in a timely manner and when the volume of requests subsides the limits are removed.
- Rejecting requests from an individual user who have already accessed system APIs more than n times per second over a given period of time.
- Disabling or degrading the functionality of selected nonessential services so that essential services can run unimpeded with sufficient resources.
- Deferring operations being performed on behalf of lower priority applications or tenants. These operations can be suspended or limited, with an exception generated to inform the tenant that the system is busy and that the operation should be retried later.
Planning for application availability is often forgotten or only addressed after an event has impacted the business. The cloud poses some challenges when planning for availability but also provides us with a number of tools that we can take advantage of to maximize availability.