Azure DevOps Service

Featured posts

We’ve Moved! – Introducing Azure DevOps Service Status Portal
Nov 28, 2018
0

We’ve Moved! – Introducing Azure DevOps Service Status Portal

Azure DevOps SRE
Azure DevOps SRE

Today, we’re happy to introduce Azure DevOps service status portal which helps with real time insights into active service events and provides further details o...

Latest posts

Azure DevOps Availability Issues – 19 April 2019
Apr 30, 2019
0
0

Azure DevOps Availability Issues – 19 April 2019

Azure DevOps SRE
Azure DevOps SRE

Please refer to this link from the Azure DevOps Status Portal for the details on this incident.  

Postmortem: Azure DevOps Service Outages in October 2018
Oct 16, 2018
0

Postmortem: Azure DevOps Service Outages in October 2018

Azure DevOps SRE
Azure DevOps SRE

Earlier this month, Azure DevOps experienced several significant service outages, for which we are deeply sorry. As with every significant live site incident, we have completed a detailed root cause analysis for these. Due to the proximity of these incidents and common underlying causes, we wanted to share the details with you to ensure that you know what happened and what we’re doing to prevent them from recurring. October 3, 4 and 8 Incidents The incident on Wednesday, 3 October 2018 started with a networking issue in the North Central US region. Since our authentication service, SPS, is in this region the issu...

Postmortem – VS Marketplace outage – 4 September 2018
Sep 24, 2018
0

Postmortem – VS Marketplace outage – 4 September 2018

Azure DevOps SRE
Azure DevOps SRE

On Tuesday, 4 September 2018, Visual Studio Marketplace suffered an extended outage affecting most of its customers. Marketplace hosts and serves extensions for the Visual Studio IDE, Visual Studio Code, and Azure DevOps. This was the first instance of the Marketplace service going down completely, and we sincerely apologize for the outage. What happened and resultant customer impact Azure resources that Marketplace depends on (largely Compute, Storage and SQL) were down during the incident in Azure South Central US and this took down the single instance Marketplace service completely from 2018-09-04 09:45 UTC to...

Postmortem: VSTS 4 September 2018
Sep 10, 2018
0

Postmortem: VSTS 4 September 2018

Azure DevOps SRE
Azure DevOps SRE

Postmortem – VSTS Outage – 4 September 2018 On Tuesday, 4 September 2018, VSTS (now called Azure DevOps) suffered an extended outage affecting customers with organizations hosted in the South Central US region (one of the 10 regions globally hosting VSTS customers). The outage also impacted customers globally due to cross-service dependencies. It required more than 21 hours to recover all VSTS services in South Central US because the recovery of VSTS services was dependent upon Azure restoring the data center. After VSTS services were recovered, we had an additional incident which lasted 2 hours impacting Release...

Postmortem: Global VSTS availability issues – 22 May 2018
May 25, 2018
0

Postmortem: Global VSTS availability issues – 22 May 2018

Azure DevOps SRE
Azure DevOps SRE

Customer Impact: On 22 May 2018, Visual Studio Team Services (VSTS) experienced a major incident across multiple regions between 15:00 and 16:55 UTC.  An event in a Western European scale unit of the Team Foundation Service (TFS), caused a chain reaction that sporadically took other TFS scale units offline in other regions.  Based on our telemetry, we estimate a total of 20,800 users were impacted during the incident. Impacted Users over time Total request volume over time   What Happened: First, some background on a few components in VSTS. In the example:     So,...

Updated and Completed Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018
Feb 26, 2018
0
0

Updated and Completed Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018

Azure DevOps SRE
Azure DevOps SRE

A week ago we posted an incomplete postmortem and are now following up with the completed version. If you want the full story of how we progressed through this incident, start by reading that.  This postmortem will cover the full root cause analysis but it won’t rehash the first part of the investigation. Customer Impact On 7 February 2018 we had an incident which impacted users in our Western European scale unit. During this time, users experienced slow performance and 503 errors (service unavailable) when interacting with VSTS services. Close to 5,000 users were impacted at the peak of the incident. The incid...

Preliminary Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018
Feb 14, 2018
0

Preliminary Postmortem: Performance Issues and failures in VSTS West Europe – 7 February 2018

Azure DevOps SRE
Azure DevOps SRE

Edit February 26, 2018: We have just posted an updated and complete postmortem here: https://devblogs.microsoft.com/devopsservice/?p=16295 Customer Impact On 7 February 2018 we had an incident which impacted users in our Western European scale unit. During this time, users experienced slow performance and 503 errors (service unavailable) when interacting with VSTS services. Close to 5,000 users were impacted at the peak of the incident. The incident lasted for two and a half hours on 7 February 2018 from 10:10 - 12:40 UTC. What Happened Our root cause analysis (RCA) has not gone well for this incident. ...

Postmortem – Intermittent Failures for Visual Studio Team Services on 14 Dec 2017
Dec 27, 2017
0

Postmortem – Intermittent Failures for Visual Studio Team Services on 14 Dec 2017

Azure DevOps SRE
Azure DevOps SRE

On 14 December 2017 we began to have a series of incidents with Visual Studio Team Services (VSTS) for several days that had a serious impact on the availability of our service for many customers (incident blogs #1 #2 #3). We apologize for the disruption these incidents had on you and your team. Below we describe the cause and the actions we are taking to address the issues which caused these incidents.  Customer Impact The issues caused intermittent failures across multiple instances of the VSTS service within certain US and Brazilian data centers. During this time, we experienced failures within our applicati...

Postmortem – Availability issues with Visual Studio Team Services on 6 Dec 2017
Dec 19, 2017
0

Postmortem – Availability issues with Visual Studio Team Services on 6 Dec 2017

Azure DevOps SRE
Azure DevOps SRE

On 6 December 2017 we had a global incident with Visual Studio Team Services (VSTS) that had a serious impact on the availability of our service (incident blog here). We apologize for the disruption. Below we describe the cause and the actions we are taking to address the issues. Customer Impact This was a global incident that caused performance issues and errors across multiple instances of VSTS, impacting many different scenarios. The incident occurred within Shared Platform Services (SPS), which contains identity, account, and commerce information for VSTS. The incident started on 6 December at 8:45 UTC an...