Optimize Uptime While Staying Secure

Steve Michelotti

With the recent announcement of the industry-wide, hardware-based CPU vulnerability, organizations must ensure they have appropriately updated their infrastructure to secure their environment. Microsoft Azure Government customer environments were patched prior to the public announcement. Additionally, Microsoft publicly announced the patching procedures this week.

If you are an Azure Government customer, you have peace of mind knowing that Microsoft is proactively securing your infrastructure with minimal to no time investment of your own. However, the question often gets asked: Will I experience downtime while my environment is patched and secured?

Even during an update like this, Microsoft maintains our commitment to SLAs on Availability Sets, VM Scale Sets, and Cloud Services. Also, if you’re using PaaS (Platform as a Service) services, the entire underlying infrastructure is updated for you seamlessly – yet another reason to use PaaS services.

For IaaS (Infrastructure as a Service) workloads running direct Virtual Machine (VM) workloads, we must look a little more closely. If you have not followed high-availability best practices, you may experience some brief downtime during the update. The best thing you can do with your IaaS workloads is to ensure that all of your Virtual Machines leverage Availability Sets with fault domains and update domains. But what does all this mean?

Availability Set – An availability set is a logical grouping of VMs (two or more) that enable Azure to provide redundancy and availability by coordinating functions such as machine reboots.

Fault Domain – A fault domain is a logical group of VMs (or other underlying hardware) that shares a common power source and network switch. Each VM in an availability set is automatically assigned a fault domain.

Update Domain – An update domain is a logical group of VMs (or other underlying hardware) that can undergo updates (e.g., a reboot) at the same time. Each VM in an availability set is automatically assigned an update domain.

You can specify which Availability Set(s) your VMs are assigned to (and you need to ensure at least 2 VMs are in each Availability Set), but Azure will determine the fault domains and update domains for you.

In the diagram below, we see an infrastructure with 12 VMs. Four VMs are assigned to one fault domain (FD0), four are assigned to a second fault domain (FD1), and four are assigned to a third fault domain (FD3). This ensures that, even in the case of an unexpected power/network failure, only 4 of the VMs will be affected and the other 8 VMs will continue to operate smoothly. During this time, Azure will perform a self-healing process on the other 4 VMs in the offending fault domain.

The diagram also shows that the VMs are assigned to different update domains across fault domains. This ensures that the updates/reboots will not occur on your entire fleet of VMs at one time.

Additionally, because this is spread across fault domains, this ensures that there is always at least one of your VMs running at all time.

For example, it enables you to avoid the theoretical scenario where reboots occur on some of your VMs, and then you lose power to the remaining VMs during the reboots. This is exactly the scenario that Availability Sets (along with fault domains and update domains) protect you against.

 

Although the fault domain and update domain are assigned for you by Azure, you can always inspect which domains they are assigned to in the Azure Government portal (or PowerShell):

The Azure Government team at Microsoft views your security as our highest priority. In addition to all the features we provide behind the scenes for our SLA, you should also make sure you are following high availability best practices so that you maximize your uptime.

For the latest updates and information about Azure Government, subscribe to our blog.

0 comments

Discussion is closed.

Feedback usabilla icon