Background
More and more companies are adopting Kubernetes as their new cloud platform. The promise of being able to standardize on Kubernetes as a platform is attractive as it offers unique solutions to many of the challenges companies face. One such challenge is the dynamic between the developer and the operator. Many times, the operator is not aware of the considerations that went into development and the developer is not aware of deployment constraints. The issues that stem from this disconnect tend to lead to major drops in developer productivity and large holes in security.
Challenges and Objectives
As we developed the Azure Service Operator (ASO), we worked with three customers so ASO could be used in production to address their needs. This helped us identify and address key requirements for real-life deployments across the globe.
- Media and Communications company: Their goal was to dynamically provision Azure services using Operators for their Data Processing Pipeline
- Financial institution: Their goal was to empower developers to self-provision Azure SQL databases and connect to applications on Azure Kubernetes Service seamlessly
- Automotive Manufacturer: They were building the backend for one of their services in Azure with Kubernetes and wanted to deploy Azure Services from within Kubernetes
“Our Data Hub, like any large system comprises of many components – Configuration, Relationships, External Azure Services, Pods, Deployments, Databricks notebooks and more. All these systems, their unique configuration and requirements needed to be managed from a central orchestration environment – Kubernetes. Using Operators, we could create and remove resources like Notebooks or EventHub in a CRUD like fashion all via Kubernetes manifests. The unique challenge here was the variety of services that needed to be provisioned to realize a single use case.” – Principal Architect (Media and Communications company)
This summarizes a common challenge that customers have – the inability to provision resources and applications, all from within the same environment. They do not want to use multiple tools – one for deploying Azure resources, one for deploying apps, one for configuring the Azure services they provision and so on.
“At our company, our goal was to provision Azure SQL and integrate with applications on AKS in a self-serve manner. There were three main challenges we were trying to solve: usability, scalability, and security. Since we are building a public cloud platform for application teams across the company, we had to ensure the solution was simple to use and can easily integrate with workloads deployed on AKS. In addition, security requirements, such as storing connection credentials in Azure Key Vault and having MSI support for SQL database, needed to be in place. Azure Service Operators helped us achieve this self-serve scenario for our developers.” – Director, Cloud (Financial Institution)
Providing application developers a way to self-provision required resources is another compelling use case for the Azure Service Operators. This way, developers can use the same language to author application deployment manifests (YAML) and also deploy Azure resources they need.
“We are building up our backend in Azure Cloud with Kubernetes. We needed a smoother integration with Azure Cloud, so that we can adopt our current deployment approach and move even faster. We were using Open Service Broker for Azure and were introduced to Azure Service Operator which met our needs. Azure Service Operator helped us ensure we were on track on our timeline to provide our business services in Azure.” – Software Architect (Automotive Manufacturer)
For customers that want to continue using a Kubernetes native way of deploying infrastructure, whether they were using operators with another cloud provider or using Open Service Broker for Azure**, Azure Service Operator addresses this need.
Solution
“Operators are software extensions to Kubernetes that make use of custom resources to manage applications and their components. Operators follow Kubernetes principles, notably the control loop” – From https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
Kubernetes and the Operator model aim to bridge the gap between the developer and the cluster operator. This is done by empowering developers to provision the services their applications depend on but from within the cozy confines of RBAC and native Kubernetes constructs.
Azure Service Operator provides the ability to provision Azure services using the same YAML-based syntax and declarative model that developers are familiar with, right from their Kubernetes environment.
Azure Service Operator – What is it?
The Azure Service Operator defines each Azure service using a Custom Resource Definition (CRD) and implements a control loop that acts on creation, deletion, or update of Custom Resources of these CRD types.
For instance, we have a Custom Resource Definition for an Azure SQL server that is defined as below.
The application developer deploys this alongside the application deployment manifest and needs to only update the name, location, and resource group of where to deploy the Azure SQL server. The Azure Service Operator deployed in the Kubernetes cluster takes care of deploying the SQL server instance while the in-built Kubernetes controllers attempt to deploy the application.
You can find examples of applications that demonstrate this in the samples repository here.
We see how this helps address some of the challenges we saw above:
- Developers do not have to use another tool to deploy the Azure infrastructure required by their applications.
- Developers can self-provision the infrastructure they want without additional learning required as it’s using the same language the application uses – YAML.
- They do not have to wait for the infrastructure to be deployed before they deploy the app. They deploy everything at once, and Kubernetes ensures the SQL server is deployed and then the application.
Architecture details
Operator architecture – How customer-driven design made this better
The Azure Service Operator comprises of:
- The Custom Resource Definitions (CRDs) for each of the Azure services a Kubernetes user can provision.
- The Kubernetes controller that watches for requests to create Custom Resources for each of these CRDs and creates them.
The Azure Service Operator project was built using the Kubebuilder framework which enables developers to easily generate boilerplate CRDs and controller code. However, we realized that for every new service that we added a CRD for, the developer had to write a lot of repetitive code in the controller. So, we added a generic controller framework to separate out the Kubernetes controller from the Azure Service Provisioning logic. This helps abstracts away some of the complexities and lets developers focus on the idempotent Azure Service provisioning logic.
The project uses the Azure SDK for Go to perform the actual service provisioning. One of the reasons we chose the Azure SDK over ARM templates is to be able to satisfy the more granular management needs and access permissions of the customers we worked with. In most organizations, developers only have access to certain resource groups and needed the flexibility to be able to provision and delete a specific service without having to manage the resource group itself.
Another important design aspect that was driven by customer requirements was ensuring eventual consistency with the desired list of services, without having to worry about the order of deployment. For instance, developers can trigger the provisioning of an Azure SQL server and database at the same time and be sure that both will eventually be provisioned successfully.
Security
Security considerations are important for customers when they run an operator in production. While we provided the option to store secrets from provisioning in Kubernetes secrets, our customers wanted the option to also be able to store these secrets from provisioning in either a global operator Azure Key Vault or sometimes even a per resource Key Vault that they would specify as part of the spec.
The Azure Service Operator can run in the context of a Service Principal or a Managed Identity, as we discovered during the customer engagements that most production environments prefer to use Managed Identity for increased security.
Skipping reconcile for easier migration
As we worked with one of the customers, we came across an interesting scenario where the operator had to be migrated to a different cluster. While doing this, they wanted to make sure that the already deployed resources are not affected in Azure. To support this scenario, the Azure Service Operator has a “skipReconcile” annotation that can be applied to prevent changes to be propagated to Azure thereby helping them achieve this.
Azure Service Provisioning flow
Once the Azure Service Operator is deployed in your cluster, this is how a typical Azure Service is provisioned.
- The user deploys an application that includes the custom resource manifest for installing an Azure service that the app depends on.
- The application is deployed using its manifest. However, the deployment does not yet succeed as it waits on the Azure service to be successfully created. The application references a secret that provides the information required to consume the Azure service, and the secret does not exist yet.
- The Azure Service Operator continuously watches the custom resource definitions (CRDs) corresponding to the Azure services and recognizes the request for a custom resource.
- The Azure Service Operator then updates the Kubernetes instance for the requested resource with the correct status and events.
- The Azure Service Operator requests an authorizer from Azure Active Directory for the Azure resource management endpoint as the identity it is running as and receives an authorizer token.
- The Azure Service Operator then sends the provisioning request to Azure API, along with the authorizer token in the request.
- Azure API provisions/deprovisions the resource and returns the Resource object to the Service Operator.
- The Azure Service Operator retrieves the information required to access/consume the Azure resource from the Resource object and stores it in a Kubernetes secret, or as a secret in a pre-specified Azure Key Vault.
- The app is deployed successfully now that the Azure service it depends on is provisioned, and the secret it references exist.
Summary
“For us, the main goal of this project was to be able to define the Data pipelines using abstract methods without having to worry about underlying technology, their integration and associated configuration. Azure Service Operator played a key role in being able to achieve this since most of these concepts can be implemented using Custom Resource Definitions (CRDs).” – Principal Architect (Media & Communications company)
The Azure Service Operator, though being used by each of these customers for slightly different reasons, helps achieve the goal of being able to seamlessly provision Azure Resources from within Kubernetes.
“This was a new solution we wanted to incorporate into our automation for provisioning PaaS services and allow applications on AKS to connect to the services. We are using Azure Service Operator with our AKS platform solution for seamless integration with PaaS services while leveraging the capability to secure connection information inside Azure Key Vault.” – Director, Cloud (Financial Institution)
The Customer scenarios we have seen for Azure Service Operators are common, practical and real-life ones that could apply to any organization using Kubernetes.
“With Azure Service Operator, we have achieved a better integration during the deployment of applications in Kubernetes.” – Software Architect (Automotive Manufacturer)
As we worked with these customers in developing the Azure Service Operator, we have discovered and built in features like the skipReconcile annotation and better debugging logs that would be useful more widely.
We have open sourced the Azure Service Operator here and hope to see more customers use it. We welcome contributions, feature requests and feedback to the project.
The Team
Below is the Microsoft team that worked on this project, making sure we meet the customer requirements while coming up with a solution that works broadly all along the way!
Erin Corson (Tech lead and architect), Melanie Rush, William Mortl, Sakthi Vetrivel (PM), Justin Pflueger, Jarrod Skulavik, Claudia Nadolny, Denis Kisselev, Janani Vasudevan
The project is now maintained by the Azure Kubernetes Service team at Microsoft. See this link for an announcement of the v2 of Azure Service Operator.
Last but not the least, we would like to acknowledge the engineering teams of the customers we worked with, for their input, feedback, clear requirements, and collaboration throughout the project. Thank you for shaping the v1 of this project!
** The Open Service Broker for Azure project is no longer being maintained by Microsoft. The recommendation is to look at Azure Service Operator as an alternate.
0 comments