The Journey to a DevOps Culture
App Dev Manager Wyn Lewis-Bevan reflects on the journey to DevOps Culture and how our team can help your business get there.
Before joining Microsoft, I worked on a slew of development projects, both large and small. This was during the 90’s and the term Rapid Application Development (RAD), was popular and only one of several approaches available to help accelerate the delivery of software to customers. At around the time I joined Microsoft, Scrum and agile concepts were still in their infancy and Enterprises tended to follow heavy weight software development approaches, which we now often lump under the term waterfall. Here, after several months of detailed planning, and a few months of development and then testing, QA, rewriting, retesting etc. (including last minute change requests) the application was ready for release to an unsuspecting client. I say this because, from my experience working with enterprise customers, applications were often delivered late, full of bugs and required swift corrective action and redeployment to function correctly. The idea of more rapid, iterative development had been around well before the nineties, but the inclusion of feedback from the customer and/or end user during the process seems to have been first discussed with the formal concept of spiral development by Boehm [Boehm, 1988. “A Spiral Model of Software Development and Enhancement,” Computer, May 1988, pp 61-72.] At a first approximation, this and other RAD type methods developed around this time are the precursors to today’s agile software development.
Around the same time that the various RAD methods were being developed, the concept of Scrum was introduced in 1995 by Jeff Sutherland and Ken Schwaber. Six years later, in 2001, with the collaboration of Alistair Cockburn, and 14 other developers, the Agile Manifesto was released. The 12 principles or tenents still used today to determine if an organization is Agile (with a capital A). Agile includes much more than RAD. While Agile and RAD have some strong similarities there are definite differences, especially when it comes to the culture of the team. Not to dwell on these differences, but one key difference between RAD and Agile is the minimization of technical excellence in RAD where the focus is more on rapid prototyping and gaining user feedback at the expense of technical robustness. The ideas to develop quickly and to expect change during development process are common to both.
Problems Adopting Anything Different
So why couldn’t larger enterprises adopt RAD methodologies, or better yet, Agile approaches to software development? From working with enterprise customers over the years, I think I can point to a few areas of concern. RAD, and early versions of Agile didn’t scale well. Most larger development efforts required large teams of developers and the adoption of these methods was immediately quashed based on this. While RAD has largely been replaced by Agile and or DevOps, it was the introduction of Scaled Agile and various techniques to scale Scrum that began to make a dent in the armor. Initially, these newer approaches didn’t appear to convince many enterprises to make the switch either.
The cost estimation process was also a blocker. Large enterprise customers tend to focus more on the cost of the functionality of the a given software product rather than the cost of the teams building it. Using various estimation tools, the features to be delivered are planned and milestones marked in the sand. Breaking down the projects into small manageable pieces, each with an associated cost helps support this model, independent of the relative size or complexity of each piece. This often leads to long hours for the teams as milestones approach, or worse, missed deadlines, not to mention lower quality code, the introduction of bugs, increases in technical debt, and not considering practical aspects like non-functional requirements (NFRs).
The idea of giving up total control of the development process to a self-organizing team was a blocker and a misnomer. Self-organizing teams doesn’t mean they ignore the business requirements. On the contrary, they incorporate the requirements of the business through the backlog, in one form or another, and through regular discussion and feedback with the business. Being self-organizing allows them to be more efficient and choose the best way for that individual team to function.
The next hurdle (at this stage of the discussion) and the most difficult to tackle, before the developers could ever experiment with Agile was how to convince the business that this change was beneficial or indeed necessary. Bottom up change sometimes worked, but usually ran into more issues than it helped solve. Changing the mindset of the business has proven to be the most difficult. Having participated in several customer Agile Assessments (discussed below) it is amazing to discover some of the conflicting responses that the business participants provide. For example – do you want higher quality code, with less bugs in releases? – the answer is always yes. Do you trust your developer teams can deliver this? – the answer is often no. Do you want to understand the user flow and features your end users are touching within the application? – the answer is often “I don’t care” but for external facing applications where another business is paying the bill, this is often “yes” and actually leads to the adoption of agile methods.
Beyond Agile and on to DevOps
The evolution of DevOps principals is the next step in the story, but before we get there, we need to understand what changed to make some enterprises make the leap to agile. Hinted at above, one of the biggest complaints I have heard has been around application performance, stability and the number of bugs end users find in production with B2B customers questioning why they are paying vast sums of money, to ‘test’ the software while it’s in production. This factor led one customer to explore Agile development in order to get a handle on improving application quality. At around the same time, as applications became more sophisticated, the consistency of the underlying infrastructure was becoming an issue. In order to scale production or to quickly test new features in isolation, creating new instances of the server environments in a timely manner was becoming necessary. Enterprise customers consistently had issues with both provisioning new environments and maintaining the correct configuration settings and app dependencies. Luckily, this was also at the time when the concept of infrastructure as code and automated provisioning was becoming popular, so customers were making some inroads into agile concepts and automation for reasons other than adopting Agile practices.
In order to be successful with agile practices, teams realized that they should adopt a truly Agile approach. A decade ago, one of the services that we offered was a lightweight development practice analysis which measured the maturity of a developer organization at a very high approximation. If necessary, this was followed by an in-depth Agile assessment. Having delivered the results of the in-depth Agile assessment, customers would realize the gaps and choose a model to work with. Scaled Scrum was a popular choice of Agile discipline adopted by many enterprise customers and their adoption of more “Agile” approaches spurred on the development higher quality code and more stable applications. Realization that certain aspects of the application lifecycle should be considered earlier in the lifecycle forced customers to include security best practice checks, unit testing and operational provisioning in order to make them more successful. So, why wasn’t everyone embracing these new practices?
The Business is Still a Blocker
Convincing the business that this is the way forward was the key to success. Adopting a DevOps culture is a huge shift in the way applications are developed, not just for the development teams, but the layers around and above that support them. Test, QA, Operations, Security, image provisioning, etc., all need to be uprooted and re-included. Not only did the business leaders needed convincing, but often their B2B partners/customers needed convincing too because the possibility of features not making a given release in favor of higher quality, became a reality. Promises of fewer bugs, better change control more opportunity to provide feedback and a steady flow of new features where often enough to convince the business this was the way ahead. The possibility of new features on a regular cadence often scared not only the business, but the dev and IT teams too. The core tenant of a successful Agile lifecycle is the delivery of value to the user (in the form of stable high-quality working code) on a continual basis. The delivery of value to the end user is adopted by many teams as a key part of the Definition of Done, blowing the minds of business owners and some IT leads since many organizations tend to wrap a mini waterfall process with Agile ceremonies delivering working code at pre-determined milestones (quarterly or semi-annually.) This is often referred to as “Scrum-fall” and besides having a piece of software that can be demoed at the end of each sprint, the real hardening takes place much later in the lifecycle as the code still has to be “thrown over the fence” for security and compliance checks, testing and QA etc.
Another blocker is the term CD as in “Continuous Delivery” which is often misunderstood, especially when we talk about CI/CD pipelines. CI/CD pipelines are the basis of any modern Agile and DevOps development approach. To many, CD is interpreted as constant deployment (to production,) whereas, we are really talking about delivery of tested software with minimal bugs, that can be integrated into the main working branch and delivered to a repository. Feature flags or a similar mechanism in place allow this to be extended to deployment to production. Where the concept causes trepidation is when the customer is used to deploying once a quarter to production to meet a milestone.
As implied by the last paragraph, the next step for a typical enterprise is to implement CI/CD and this is often rejoiced and celebrated by the deployment of bits into various new environments from Dev to Test to QA and sometimes pre-production if the operations team will allow it, but hardly ever into production. The customers I have worked with have eventually embraced their Operations teams with the idea that automation from the same CI/CD pipelines can also safely deploy into production. Shifting a mindset (whether it’s Ops, Sec, or others who are fighting and screaming) into a DevOps culture is the hardest aspect. While hands on UI testing has a place, whether as part of a ring-based deployment model (which is way beyond were we are in this discussion) or through failing fast, automated testing has an extremely important place in this discussion. The realization that automated testing is difficult often comes late into the realization that getting to an advanced Agile or DevOps state is rather difficult. Shifting left, not only means writing as many automated tests as possible but writing tests that have very few dependencies. At Microsoft we label these dependencies (starting on the far left) as L0, with no dependencies, L1 with one dependency etc. and increasing as we move back to the right. In order to do this successfully, it is often necessary to rewrite code in order to make it testable. This has a high cost and really stretches organizational sensibilities when it comes to being Agile – as discussed above, the business and PMO don’t often consider the cost of technical debt and NFRs when they explore the cost of developing an application. As the backlog of new features continues to grow, it’s hard to rationalize such costs. On the other hand, without automation and the appropriate pipelines in place, it’s hard to envision rapid deployment of continuous value to the end users.
How We Often Help
Having touched on many of the blockers and steps that organizations often take to move forward with an Agile culture adoption and then on to a DevOps culture, it might be useful to discover what steps we, as Premier Developers, go through to help our customers move forward. First, we strive to understand the pain points that the customer is suffering. For example, if waterfall or CMMI is working just fine and they are successfully deploying high quality applications and the end user sees value, then perhaps there is nothing to do. However, if we hear that the end users are opening tickets and the developers are unable to deploy fixes or new features without significant delay or downtime, we move forward. For a while we would conduct workshops on “The Day in the Life of a Development Team,” or “A Release in the life of an Agile Team”, and while these were often eye opening for the customer, they often where too broad in their scope and coverage so they didn’t meet the needs of everyone in the audience. Now, we usually begin with understanding the current state of application development at a customer with a thorough DevOps assessment. Agile and DevOps assessments consist of an engagement that covers about 2 weeks; during the first week we introduce what the purpose of the engagement is, conduct multiple interviews (usually face-to-face) with individual representatives from each role that encompass the complete lifecycle of the application. This hopefully crosses all the disciplines including dev, sec, ops, business ownership, etc. Finally, we deliver a quick high-level review of the preliminary results before the final analysis and delivery of the results.
The outcomes of the assessment have changed little over the past decade. Ranking the maturity across many different areas across people, process and technology so we can understand the overall environment and then stepping back and making a set of recommendations of next steps (usually identifying low hanging fruit) on how to move forward with a plan of prioritized actions. How we approach helping a customer move forward has matured over the years. Possible actions range from helping form teams and giving advice to on how to operate those teams, to helping set up and manage a backlog etc. However, based on the outcome of an assessment, what we tend towards today, as a viable next step is to explain how product groups at Microsoft have transformed from non-Agile through the Agile adoption process and onto a DevOps culture. These discussions and education sessions are conducted at various levels and with different groups, from the CXO stake holders, down the chain to the development teams. At one point we would sometimes have customers visit the PG for discussions on how a particular team worked, but this sometimes lead to the adoption of the wrong behaviors, so we tend toward a customized (depending upon the attendees) delivery stressing that they should learn from what we share rather than copy what we share.
Moving beyond this step really depends on the size of the organization and how we feel we can best help them. We often include Microsoft Consulting Services to do custom engagements where they are embedded in a team and help with all aspects of DevOps, including designing pipelines, writing test code, reinforcing the culture etc. A relatively new and recent MCS offering is to invite the team to take part in a DevOps DoJo. Here the team is immersed in a sprint long engagement with DevOps experts overseeing every step of the way.
To wrap up. I’ve not covered some of the key elements of DevOps culture, like the concepts of sharing code and inner sourcing etc. which comes as we move DevOps practices across an organization, but generally, most applications are built in isolation from one another; code and proven practices are developed multiple times so that common code and solution architectures have been repeatedly reinvented. As customers automate their processes, they generally need to shift their culture to accommodate the changes throughout and to minimize bottlenecks. Adoption of Agile processes, the introduction of pipelines, shifting left (and right) and failing early are all prerequisites that lead to an easier adoption of a DevOps culture.
As customers move to Azure, sharing best practices across the enterprise becomes more important and reshaping their organizational structure and transforming them into modern Agile practices with a DevOps culture, while quite difficult, will lead to the faster delivery of stable high-quality applications with fewer bugs, less technical debt the ability to more rapidly deliver value to the end users.