Monitoring & Alerting – Operations in the Cloud
In this post, App Dev Manager John Tran discuss monitoring considerations and options for Azure.
Monitoring is a core tenant of DevOps. Monitoring our applications provides a health signal, gives us a performance baseline, and helps to notify us when things go wrong. Implementing a robust monitoring plan is important regardless of whether your application lives within an on premises environment or if it lives within the cloud; however, there are some small differences that you may not expect to encounter when moving from on-premises to the cloud.
When working on premises you have all the control in the world as well as all of the responsibility. You have access to all levels of the infrastructure as well as the application. You have full control of the monitoring thresholds as well as what you choose to monitor. Things can become more difficult when you move to the cloud. You may have limited access, and you may not have the visibility that you are used to seeing.
Modern cloud applications == less control for operations & development teams
Monitoring is traditionally managed by the operations/development teams. These groups are used to having a very high level of control. They must know what is happening with their applications at all levels, and at all times. This is fairly simple when you have full control of your environment, but this can be difficult when working within the cloud. How can a team give up some of the control that they once had while still implementing a monitoring plan which covers all of their requirements?
Below are some guidelines to help you implement a monitoring strategy which covers your cloud platform, your IaaS Infrastructure, and your PaaS services.
- Trust in your provider. In some situations, your cloud provider may not publish certain monitoring statistics or give you visibility into the inner workings of the platform. This can be due to security constraints.
- Know the SLA’s and what application/infrastructure requirements are needed. For example, some Azure VM’s require that you place them in an availability set in order to qualify for the advertised SAL.
- Put in place monitors and ensure that they are being watched. Alerts must have defined actions, document them!
A platform such as Microsoft Azure is comprised of many services. The platform allows customers to develop, run, and manage applications. These platforms are fully managed and are built to be highly available and incredibly resilient. Because of the wide breath of services within azure it can be difficult to diagnose an issue and determine if the problem is within the platform or within your application. By enabling platform monitors you can easily notify your team when there is an issue with the platform so that they can quickly react to the impacts on the application layer.
E.g.: Azure Service Health
Azure Status page: https://status.azure.com/status/
A global view of the health of all Azure services. This page provided a quick reference used to determine whether a service is up or down.
Service Health: https://portal.azure.com/#blade/Microsoft_Azure_Health/AzureHealthBrowseBlade/serviceIssues
Allows users to personalize a view of the health of the Azure services and regions which they are using. Because this is targeted to your environment and the services which you are subscribed to, this is the best place to look for communications regarding outages, planned maintenance activities and other health advisories.
Other Microsoft platforms have different methods of notifying users of service interruptions. Some examples are can include:
IaaS Infrastructure Monitoring
Using Azure Monitor, you can configure alerts to notify you of availability changes to your cloud resources. Azure Monitor notifications will help you stay better informed about the availability of your resources in real time and quickly assess whether an issue is due to a problem on your side or related to an Azure platform event.
PaaS Service Monitoring
Application Insights monitors the availability, performance, and usage of your web applications. It provides you with deep insights into your application’s operations and diagnose errors without waiting for a user to report them. It gives you the ability to continuously improve performance and usability. It works for apps on many platforms including .NET, Node.js and Java EE, hosted on-premises, hybrid, or any public cloud. It integrates with your DevOps process, and has connection points to a variety of development tools.