This past year, Azure introduced the concept of ‘Resource Groups’ and the ability to write templates to deploy resources to Azure. This makes it easy to describe your infrastructure as code which you can easily redeploy.
The Azure engineering team maintains an open source repository of sample templates where anyone can contribute a useful template that can be used by anyone to deploy resources to Azure. For example, you can deploy anything from a simple Linux virtual machine to an entire Mesos cluster with Marathon and Swarm and pretty much anything in between. This repository has become the central place for all community-curated templates for provisioning Azure resources for a variety of partners such as WordPress, Dokku, Deis, Cloud Foundry and many others.
The Problem
When maintaining an open source repository, it’s important to make clear the expectation of quality and validation in each contribution to that project. Many times that’s done by utilizing continuous integration to validate that contributions work and don’t regress any other part of the project. This method makes projects very contribution-friendly while driving the quality of the project for users.
As the Quickstart Templates project has grown, one of the problems encountered is the review process for each new template that enters the repository. Validation of the template wasn’t very streamlined, and there wasn’t any per-merge validation of submitted templates. The people reviewing submissions must manually validate them, which makes it tough to spot pesky bugs in templates before deployment as well as during actual template deployments. This issue led to many template files having errors from JSON syntax validation to problems deploying to Azure Resource Manager (ARM).
What the project needed was a community-friendly way of doing basic validation of a template, as well as deploying the template to Azure to test that it works. This approach provides a guarantee to the community that every submitted and modified template works for users who wish to use them.
The Solution
Travis CI is one of the most trusted CI-as-a-Service platforms in the industry for open source projects. They provide free compute time for open source projects to run tests for each commit and pull request received on GitHub. Because JSON templates comprise the Quickstart Templates repository, using JavaScript tooling to run tests comes quite natural.
The solution requires templates to be validated end-to-end in an automated way which is public enough for everyone in the community to view. At the same time, the solution cannot expose the credentials of the test account. The Azure Quickstart Templates repository is unique in that nearly all templates are self-contained in their folder, and each should conform to a uniform convention. A typical template folder has the following structure:
- azuredeploy.json – The actual Azure Resource Manager template language file, describing the resources to deploy
- azuredeploy.parameters.json – The placeholder parameters file to use for the deployment
- metadata.json – A metadata file describing the template, the date created and the author information. The Azure template search indexer uses this file.
- README.md – A readme file including information about the template and a ‘Deploy to Azure’ button for quick point-and-click deployments of a template.
With a fully end-to-end validation solution, we quickly spotted many templates with issues ranging from naming conventions to ARM template language errors.
Architectural Diagram
The diagram below outlines the high-level overview of the solution’s architecture. We’ll do a deep dive of each part later but the basic flow is:
- Contributors open pull requests against the Azure/azure-quickstart-templates repository for new template contributions
- Travis CI responds to the GitHub pull request web hook and clones the pull request contents.
- Travis executes the dynamically generated mocha.js test suite within the repo and deploys in parallel only the modified templates in the pull request.
- The ARM validation server accepts the requested templates, deploys them to Azure and returns the deployment result back to Travis CI.
- Travis receives the test result from the ARM validation server and reports the status back to the GitHub pull request.
- The contributor sees the test result on their pull request.
Validations
Metadata Files
To validate that the metadata.json
file conforms to the required schema, we use an awesome Node.js module called skeemas to describe the required schema and validate the JSON against it:
var metadata = tryParse(metadataPath, metadataData);
var result = skeemas.validate(metadata, {
properties: {
itemDisplayName: {
type: 'string',
required: true,
minLength: 10,
maxLength: 60
},
description: {
type: 'string',
required: true,
minLength: 10,
maxLength: 1000
},
summary: {
type: 'string',
required: true,
minLength: 10,
maxLength: 200
},
githubUsername: {
type: 'string',
required: true,
minLength: 2
},
dateUpdated: {
type: 'string',
required: true,
minLength: 10
}
},
additionalProperties: false
});
You can see that the second parameter describes the required schema of the metadata file by specifying the field names and their minimum and maximum lengths, as well as enforcing that no other properties exist. This requirement makes it easy to validate the template without the need to write additional logic.
Template Files
Template and parameters files are statically validated using the Azure API, so no real logic was required besides submitting the template and parameters for validation. If static validation passes, then the template is deployed to an ephemeral resource group. The validation passes if the deployment is successful, otherwise it will fail with the Azure Resource Manager error.
Contributor-Friendly CI
If you are familiar with Travis, it allows you to run CI test suites on GitHub pull requests which puts a pass checkmark or fail X mark for the commits in your pull request. This mark allows contributors of new templates to understand if their contribution to the Quickstart Templates repository is formatted correctly.
Since running CI tests on a pull request is essentially the same thing as running arbitrary code from anyone on the internet, Travis doesn’t allow for any secure environment variables to be exposed to pull requests. It treats pull requests as untrusted code since anyone can execute a printenv
command to reveal your sensitive information to the world.
At the same time, full validation of templates submitted by the community is valuable for ensuring the quality of the sample templates.
An ARM Validation Server
To protect our test account credentials, while still allowing for deployment validation, an ‘ARM Template Server’ was built, which essentially is a wrapper around the Azure CLI exposing a simple RESTful API to validate and deploy templates without needing credentials. This setup allows us to make subscription management changes while providing limited access to the requesting client. This server is very simple, tiny and uses the Express web framework for Node.js deployed using Dokku on Azure.
The most important thing about the server is to remove resources as soon as they are provisioned to minimize the load on our subscription, as well as avoid running arbitrary untrusted code within our subscription.
The API has two simple endpoints:
POST /validate
POST /validate
validates the template by using the Azure CLI command azure group template validate
. The post body is simple and requires the Azure deployment template as the template
value and the parameters
file as the parameters.
{
"template": "[azuredeploy.json contents]",
"parameters" : "[azuredeploy.parameters.json contents]"
}
POST /deploy
POST /deploy
deploys the template using long polling. The post body is the same as the /validate
endpoint. The response status is always HTTP 202 Accepted
and its body must be inspected to understand the result of the deployment
Sometimes template parameters are required to be unique or have a public key. For the POST /deploy
API, any unique parameters are replaced with a unique value. For example, given the parameters file:
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"newStorageAccountName": {
"value": "GEN-UNQIUE-12"
},
"adminUsername": {
"value": "azureuser"
},
"sshPublicKey": {
"value": "SSH-PUB-KEY"
},
"dnsNameForPublicIP": {
"value": "GEN-UNIQUE"
},
"departmentName": {
"value": "myDepartment"
},
"applicationName": {
"value": "myApp"
},
"createdBy": {
"value": "myName"
}
}
}
The validation server can be configured to replace any placeholder GEN-UNIQUE
or GEN-UNIQUE-[N]
value with a unique value. This configuration works for other things, such as the placeholder value SSH-PUB-KEY
which the server will replace with an actual SSH public key. You can read the latest documentation for more details.
Long Polling
Long polling is the technique of making an HTTP request from a client to a server that keeps the connection open for an extended time until the client leaves or retrieves posted data. Before the advent of web sockets, this was used to simulate push-like behavior.
The default Azure deployment CLI command holds an HTTP connection open to the resource management API until the deployment is complete, which can take a while. Because of this, when a client sends a template for deployment via POST /deploy
, the server holds open the HTTP connection and sends the headers immediately. What’s not obvious right away is that most cloud services, including Azure, will quietly close your HTTP connection if you don’t transmit anything in a certain amount of time. The validation server gets around this issue by sending non-significant bytes to the client (‘ ‘ characters) until the deployment completes.
When the deployment is complete, although the status code is both HTTP 202 for success or failure, you can differentiate a successful deployment vs. an unsuccessful deployment by examining the response body:
Successful Deployment:
HTTP 202 Accepted:
{
"result": "Deployment Successful"
}
Failed Deployment:
HTTP 202 Accepted
{
"_rgName": "[the deployed test resource group name. The group is in the process of deleting by the time this message returns and is not accessible by the client]",
"command": "[the azure-cli command]",
"parameters":"[the parameters JSON used to deploy the provided template, including any generated values]"
}
Cleaning up Deployments
It’s critical to quickly delete resources provisioned by the test run, especially in the case of untrusted pull requests which can be any arbitrary cloud deployment.
All resource groups are immediately deleted after deployment to combat this risk. Storing each resource group name in a small MongoDB database for persistence purposes ensures the test subscription always deletes the deployment. In the event of a restart during deployment, any leftover resource groups not removed from the database will be requested to be deleted on startup.
Deploying the Server
We host our server using a Dokku server on Azure. We won’t go into much details on the specifics, but you can check out our previous post on deploying Dokku apps to Azure. Dokku provides us with straightforward nginx web server configuration and Heroku-like deployments, and helps us run multiple instances of the server. It also allows us to update the server without stopping current test runs with zero-downtime deployments. In short, Dokku keeps previous versions of the application running until web traffic to that version is complete. During this time new requests are routed to the latest version.
Dynamically Generating Template Validations
Although the actual template validations happen on the ARM Validation Server, the Mocha.js test harness which runs within Travis CI controls the test case generation and execution for each template.
The Test Framework
Mocha is a powerful test execution framework for JavaScript tests on the server or browser. The use case for mocha is unique in that we generate a test dynamically based on the number of template directories. A test object in mocha is very straight-forward:
{
args: ['template-directory-name/azuredeploy.json', 'template-directory-name/azuredeploy.parameters.json', 'template-directory-name/metadata.json'],
expected: true
}
By using Node’s fs.readdir
API, we can obtain the list of directories to test, and generate an array of these test objects. For flexibility, tests won’t be generated for directories with a .ci_skip
file placed at its root.
Using the existing Mocha functions describe
and it
, we can verbosely describe tests which will be used by the test reporter:
describe('Template Validation Suite', function() {
/** code omitted for brevity **/
tests.forEach(function(test) {
it(test.args[0] + ' & ' + test.args[1] + ' should be valid', function() {
// validate template files are in correct place
test.args.forEach(function (path) {
var res = ensureExists.apply(null, [path]);
});
// validatate metadata.json
validateMetadata.apply(null, [test.args[2]]);
// template validation
return validateTemplate.apply(null, test.args)
.then(function (result) {
debug('template validation sucessful, deploying template...');
return deployTemplate.apply(null, test.args);
})
.then(function () {
// success
return assert(true);
})
.catch(function (err) {
assert(false, errorString + ' nnServer Error:' + JSON.stringify(err));
});
});
});
});
Doing Things in Parallel
This kind of testing is extremely IO bound, so running template validations in parallel can save a ton of time. A few things come together nicely here:
- Azure Resource Management APIs support simultaneous deployments
- Node.js is asynchronous, so our test validation server can handle many parallel test requests at once
- Using the mocha.parallel extension, we can run batches of tests in parallel within Travis
Using the parallel
function within mocha.parallel
we can modify the code snippet from the last section to run in parallel by batching our tests in groups:
describe('Template Validation Suite', function() {
/** code omitted for brevity **/
// testGroups is an array of test arrays
testGroups.forEach(function(tests) {
tests.forEach(function(test) {
parallel('Running ' + tests.length + ' Parallel Template Validation(s)...', function () {
it(test.args[0] + ' & ' + test.args[1] + ' should be valid', function() {
// validate template files are in correct place
test.args.forEach(function (path) {
var res = ensureExists.apply(null, [path]);
});
validateMetadata.apply(null, [test.args[2]]);
return validateTemplate.apply(null, test.args)
.then(function (result) {
debug('template validation sucessful, deploying template...');
return deployTemplate.apply(null, test.args);
})
.then(function () {
// success
return assert(true);
})
.catch(function (err) {
assert(false, errorString + ' nnServer Error:' + JSON.stringify(err));
});
});
});
});
});
});
You can batch tests by a configurable amount which can control the test run from violating subscription limitations by capping the number of parallel deployments.
Test output looks like below (some stack traces omitted for brevity):
Running 3 Parallel Template Validation(s)...
...
✓ 101-simple-linux-vm/azuredeploy.json & 101-simple-linux-vm/azuredeploy.parameters.json should be valid (153820ms)
1) 101-tags-vm/azuredeploy.json & 101-tags-vm/azuredeploy.parameters.json should be valid
2) 101-webapp-with-golang/azuredeploy.json & 101-webapp-with-golang/azuredeploy.parameters.json should be valid
1 passing (3m)
2 failing
1) Template Validation Suite Running 3 Parallel Template Validation(s)... 101-tags-vm/azuredeploy.json & 101-tags-vm/azuredeploy.parameters.json should be valid:
AssertionError: Template Validiation Failed. Try deploying your template with the commands:
azure group template validate --resource-group (your_group_name) --template-file 101-tags-vm/azuredeploy.json --parameters-file 101-tags-vm/azuredeploy.parameters.json
azure group deployment create --resource-group (your_group_name) --template-file 101-tags-vm/azuredeploy.json --parameters-file 101-tags-vm/azuredeploy.parameters.json
Server Error:{"error":"Deployment provisioning state was not successfuln","_rgName":"sedouard-ci4a407279-5a6b-3b78-7670-947cd5dd7bcf","command":"azure group deployment create --resource-group (your_group_name) --template-file azuredeploy.json --parameters-file azuredeploy.parameters.json","parameters":"{"$schema":"http://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#","contentVersion":"1.0.0.0","parameters":{"newStorageAccountName":{"value":"@invalid-account-name"},"adminUsername":{"value":"okdude"},"adminPassword":{"value":"Cortama131"},"dnsNameForPublicIP":{"value":"citest25c541ad90ba1dfd"},"departmentName":{"value":"myDepartment"},"applicationName":{"value":"myApp"},"createdBy":{"value":"myName"}}}"}
+ expected - actual
-false
+true
at test/tests.js:288:13
2) Template Validation Suite Running 3 Parallel Template Validation(s)... 101-webapp-with-golang/azuredeploy.json & 101-webapp-with-golang/azuredeploy.parameters.json should be valid:
AssertionError: Template Validation Failed. Try deploying your template with the commands:
azure group template validate --resource-group (your_group_name) --template-file 101-webapp-with-golang/azuredeploy.json --parameters-file 101-webapp-with-golang/azuredeploy.parameters.json
azure group deployment create --resource-group (your_group_name) --template-file 101-webapp-with-golang/azuredeploy.json --parameters-file 101-webapp-with-golang/azuredeploy.parameters.json
Server Error:{"error":"Deployment provisioning state was not successfuln","_rgName":"sedouard-ci6df55539-52be-9916-e237-67bf0c4eba0f","command":"azure group deployment create --resource-group (your_group_name) --template-file azuredeploy.json --parameters-file azuredeploy.parameters.json","parameters":"{"$schema":"http://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json","contentVersion":"1.0.0.0","parameters":{"siteName":{"value":"$!@!@#D"},"appServicePlanName":{"value":"demoHostingPlan"},"siteLocation":{"value":"West U"},"sku":{"value":"Free"},"workerSize":{"value":"0"},"use32bitWorkerProcess":{"value":true},"enableAlwaysOn":{"value":false}}}"}
+ expected - actual
-false
+true
at test/tests.js:288:13
The log above explains that 1 template validation passed and 2 failed. For each failure, you get a generated command line to re-run the deployment. You can take a look at a real run on Travis CI here.
Testing Changes Only
If you take a look at the Quickstart Templates, there are heaps of templates and running each template validation for every single commit doesn’t make much sense.
Travis CI makes available to us the commit range which is exposed via an environment variable, TRAVIS_COMMIT_RANGE
. Its value looks something like:
TRAVIS_COMMIT_RANGE=[Last Commit Hash]...[First Commit Hash]
The Last Commit Hash
value is the hash of the last commit on the branch that you are testing. The First Commit Hash
value is the hash of the first commit representing the point the HEAD
and BASE
branch diverged.
By executing a git diff –name-only
command using First Commit Hash…Last Commit Hash
we can precisely calculate the changes in the pull request.
git diff --name-only 8965c253425198806c9cfd7f5abd10ced263dcbb…1f39d53c1b73959e73233fc2b9e5286be952cf83
modified: 101-simple-linux-vm/azuredeploy.json
modified: 101-tags-vm/azuredeploy.parameters.json
modified: 101-webapp-with-golang/azuredeploy.parameters.json
modified: test/tests.js
Using this information, by matching directory names, like those shown above, we can generate tests only for templates that have changed. This method is necessary because of the vast number of templates and how forcing compliance of all templates at once is not reasonable.
By only testing changes to the repository, we decrease workflow time tremendously since contributors only have to be concerned about their own templates passing validation. At the same time, maintainers can run a test on the entire repository by changing an environment variable.
How to Submit Your Own Azure Template
Ensuring Your Template Compliance
Once you’re done creating your template, ensure:
- Your template deploys via PowerShell, Azure CLI or the Azure portal
- Any unique parameters in your
azuredeploy.parameters.json
file such as domain names or storage account names have the placeholderGEN-UNIQUE
orGEN-UNIQUE-[N]
where[N]
is the length of the unique parameter. - Any public SSH key parameter in your
azuredeploy.parameters.json
file hasSSH-PUB-KEY
as the place-holder value - Read the contribution guide to make sure you comply with template folder naming conventions, and also follow the GitHub workflow
Checking Your Template Validation
When you create a new pull request to the Quickstart Templates repository, you’ll notice a box toward the bottom of the feed indicating your test run is progress:
Click the ‘Details’ link, and you can see the tests run live, which will show you the actual error log output.
After some time (depending on how many templates you’ve submitted) you’ll either see a pass or fail message.
Pass:
Fail:
In the case your test fails and you aren’t exactly sure why, you can checkout the parameters used by the test in the error output. The tests use the cross-platform azure-cli and, as the test output suggests, you can do the template validation yourself by executing:
azure group create (your_group_name) westus
### Validate your template
azure group template validate --resource-group (your_group_name) --template-file 201-1-vm-loadbalancer-2-nics/azuredeploy.json --parameters-file 201-1-vm-loadbalancer-2-nics/azuredeploy.parameters.json
### Deploy your template
azure group deployment create --resource-group (your_group_name) --template-file 201-1-vm-loadbalancer-2-nics/azuredeploy.json --parameters-file 201-1-vm-loadbalancer-2-nics/azuredeploy.parameters.json
You can then check the Azure portal for the deployment error diagnostics. After your template passes validation, maintainers will see this on your pull request and be able to merge your template pending a code review.
0 comments