October 31st, 2016

Automated Azure Template Validations

Anthony Turner
Sr. Software Security Engineer

This past year, Azure introduced the concept of ‘Resource Groups’ and the ability to write templates to deploy resources to Azure. This makes it easy to describe your infrastructure as code which you can easily redeploy.

The Azure engineering team maintains an open source repository of sample templates where anyone can contribute a useful template that can be used by anyone to deploy resources to Azure. For example, you can deploy anything from a simple Linux virtual machine to an entire Mesos cluster with Marathon and Swarm and pretty much anything in between. This repository has become the central place for all community-curated templates for provisioning Azure resources for a variety of partners such as WordPress, Dokku, Deis, Cloud Foundry and many others.

The Problem

When maintaining an open source repository, it’s important to make clear the expectation of quality and validation in each contribution to that project. Many times that’s done by utilizing continuous integration to validate that contributions work and don’t regress any other part of the project. This method makes projects very contribution-friendly while driving the quality of the project for users.

As the Quickstart Templates project has grown, one of the problems encountered is the review process for each new template that enters the repository. Validation of the template wasn’t very streamlined, and there wasn’t any per-merge validation of submitted templates. The people reviewing submissions must manually validate them, which makes it tough to spot pesky bugs in templates before deployment as well as during actual template deployments. This issue led to many template files having errors from JSON syntax validation to problems deploying to Azure Resource Manager (ARM).

What the project needed was a community-friendly way of doing basic validation of a template, as well as deploying the template to Azure to test that it works. This approach provides a guarantee to the community that every submitted and modified template works for users who wish to use them.

The Solution

Travis CI is one of the most trusted CI-as-a-Service platforms in the industry for open source projects. They provide free compute time for open source projects to run tests for each commit and pull request received on GitHub. Because JSON templates comprise the Quickstart Templates repository, using JavaScript tooling to run tests comes quite natural.

The solution requires templates to be validated end-to-end in an automated way which is public enough for everyone in the community to view. At the same time, the solution cannot expose the credentials of the test account. The Azure Quickstart Templates repository is unique in that nearly all templates are self-contained in their folder, and each should conform to a uniform convention. A typical template folder has the following structure:

  • azuredeploy.json – The actual Azure Resource Manager template language file, describing the resources to deploy
  • azuredeploy.parameters.json – The placeholder parameters file to use for the deployment
  • metadata.json – A metadata file describing the template, the date created and the author information. The Azure template search indexer uses this file.
  • README.md – A readme file including information about the template and a ‘Deploy to Azure’ button for quick point-and-click deployments of a template.

With a fully end-to-end validation solution, we quickly spotted many templates with issues ranging from naming conventions to ARM template language errors.

Architectural Diagram

The diagram below outlines the high-level overview of the solution’s architecture. We’ll do a deep dive of each part later but the basic flow is:

Diagram of solution detailed in the list below, showing how template contributor pulls Quickstart template and validates it using Travis CI and ARM validation server.

  1. Contributors open pull requests against the Azure/azure-quickstart-templates repository for new template contributions
  2. Travis CI responds to the GitHub pull request web hook and clones the pull request contents.
  3. Travis executes the dynamically generated mocha.js test suite within the repo and deploys in parallel only the modified templates in the pull request.
  4. The ARM validation server accepts the requested templates, deploys them to Azure and returns the deployment result back to Travis CI.
  5. Travis receives the test result from the ARM validation server and reports the status back to the GitHub pull request.
  6. The contributor sees the test result on their pull request.

Validations

Metadata Files

To validate that the metadata.json file conforms to the required schema, we use an awesome Node.js module called skeemas to describe the required schema and validate the JSON against it:

var metadata = tryParse(metadataPath, metadataData);

  var result = skeemas.validate(metadata, {
    properties: {
      itemDisplayName: {
        type: 'string',
        required: true,
        minLength: 10,
        maxLength: 60
      },
      description: {
        type: 'string',
        required: true,
        minLength: 10,
        maxLength: 1000
      },
      summary: {
        type: 'string',
        required: true,
        minLength: 10,
        maxLength: 200
      },
      githubUsername: {
        type: 'string',
        required: true,
        minLength: 2
      },
      dateUpdated: {
        type: 'string',
        required: true,
        minLength: 10
      }
    },
    additionalProperties: false
  });

 

You can see that the second parameter describes the required schema of the metadata file by specifying the field names and their minimum and maximum lengths, as well as enforcing that no other properties exist. This requirement makes it easy to validate the template without the need to write additional logic.

Template Files

Template and parameters files are statically validated using the Azure API, so no real logic was required besides submitting the template and parameters for validation. If static validation passes, then the template is deployed to an ephemeral resource group. The validation passes if the deployment is successful, otherwise it will fail with the Azure Resource Manager error.

Contributor-Friendly CI

If you are familiar with Travis, it allows you to run CI test suites on GitHub pull requests which puts a pass checkmark or fail X mark for the commits in your pull request. This mark allows contributors of new templates to understand if their contribution to the Quickstart Templates repository is formatted correctly.

Since running CI tests on a pull request is essentially the same thing as running arbitrary code from anyone on the internet, Travis doesn’t allow for any secure environment variables to be exposed to pull requests. It treats pull requests as untrusted code since anyone can execute a printenv command to reveal your sensitive information to the world.

At the same time, full validation of templates submitted by the community is valuable for ensuring the quality of the sample templates.

An ARM Validation Server

To protect our test account credentials, while still allowing for deployment validation, an ‘ARM Template Server’ was built, which essentially is a wrapper around the Azure CLI exposing a simple RESTful API to validate and deploy templates without needing credentials. This setup allows us to make subscription management changes while providing limited access to the requesting client. This server is very simple, tiny and uses the Express web framework for Node.js deployed using Dokku on Azure.

The most important thing about the server is to remove resources as soon as they are provisioned to minimize the load on our subscription, as well as avoid running arbitrary untrusted code within our subscription.

The API has two simple endpoints:

POST /validate

POST /validate validates the template by using the Azure CLI command azure group template validate. The post body is simple and requires the Azure deployment template as the template value and the parameters file as the parameters.

{
  "template": "[azuredeploy.json contents]",
  "parameters" : "[azuredeploy.parameters.json contents]"
}

 

POST /deploy

POST /deploy deploys the template using long polling. The post body is the same as the /validate endpoint. The response status is always HTTP 202 Accepted and its body must be inspected to understand the result of the deployment

Sometimes template parameters are required to be unique or have a public key. For the POST /deploy API, any unique parameters are replaced with a unique value. For example, given the parameters file:

{
  "$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "newStorageAccountName": {
      "value": "GEN-UNQIUE-12"
    },
    "adminUsername": {
      "value": "azureuser"
    },
    "sshPublicKey": {
      "value": "SSH-PUB-KEY"
    },
    "dnsNameForPublicIP": {
      "value": "GEN-UNIQUE"
    },
    "departmentName": {
      "value": "myDepartment"
    },
    "applicationName": {
      "value": "myApp"
    },
    "createdBy": {
      "value": "myName"
    }
  }
}

 

The validation server can be configured to replace any placeholder GEN-UNIQUE or GEN-UNIQUE-[N] value with a unique value. This configuration works for other things, such as the placeholder value SSH-PUB-KEY which the server will replace with an actual SSH public key. You can read the latest documentation for more details.

Long Polling

Long polling is the technique of making an HTTP request from a client to a server that keeps the connection open for an extended time until the client leaves or retrieves posted data. Before the advent of web sockets, this was used to simulate push-like behavior.

The default Azure deployment CLI command holds an HTTP connection open to the resource management API until the deployment is complete, which can take a while. Because of this, when a client sends a template for deployment via POST /deploy, the server holds open the HTTP connection and sends the headers immediately. What’s not obvious right away is that most cloud services, including Azure, will quietly close your HTTP connection if you don’t transmit anything in a certain amount of time. The validation server gets around this issue by sending non-significant bytes to the client (‘ ‘ characters) until the deployment completes.

When the deployment is complete, although the status code is both HTTP 202 for success or failure, you can differentiate a successful deployment vs. an unsuccessful deployment by examining the response body:

Successful Deployment:

HTTP 202 Accepted:

{
  "result": "Deployment Successful"
}

 

Failed Deployment:

HTTP 202 Accepted

{
  "_rgName": "[the deployed test resource group name. The group is in the process of deleting by the time this message returns and is not accessible by the client]",
  "command": "[the azure-cli command]",
  "parameters":"[the parameters JSON used to deploy the provided template, including any generated values]"
}

 

Cleaning up Deployments

It’s critical to quickly delete resources provisioned by the test run, especially in the case of untrusted pull requests which can be any arbitrary cloud deployment.

All resource groups are immediately deleted after deployment to combat this risk. Storing each resource group name in a small MongoDB database for persistence purposes ensures the test subscription always deletes the deployment. In the event of a restart during deployment, any leftover resource groups not removed from the database will be requested to be deleted on startup.

Deploying the Server

We host our server using a Dokku server on Azure. We won’t go into much details on the specifics, but you can check out our previous post on deploying Dokku apps to Azure. Dokku provides us with straightforward nginx web server configuration and Heroku-like deployments, and helps us run multiple instances of the server. It also allows us to update the server without stopping current test runs with zero-downtime deployments. In short, Dokku keeps previous versions of the application running until web traffic to that version is complete. During this time new requests are routed to the latest version.

Dynamically Generating Template Validations

Although the actual template validations happen on the ARM Validation Server, the Mocha.js test harness which runs within Travis CI controls the test case generation and execution for each template.

The Test Framework

Mocha is a powerful test execution framework for JavaScript tests on the server or browser. The use case for mocha is unique in that we generate a test dynamically based on the number of template directories. A test object in mocha is very straight-forward:

{
  args: ['template-directory-name/azuredeploy.json', 'template-directory-name/azuredeploy.parameters.json', 'template-directory-name/metadata.json'],
  expected: true
}

 

By using Node’s fs.readdir API, we can obtain the list of directories to test, and generate an array of these test objects. For flexibility, tests won’t be generated for directories with a .ci_skip file placed at its root.

Using the existing Mocha functions describe and it, we can verbosely describe tests which will be used by the test reporter:

describe('Template Validation Suite', function() {

/** code omitted for brevity **/

  tests.forEach(function(test) {
    it(test.args[0] + ' & ' + test.args[1] + ' should be valid', function() {
      // validate template files are in correct place
      test.args.forEach(function (path) {
        var res = ensureExists.apply(null, [path]);
      });

      // validatate metadata.json
      validateMetadata.apply(null, [test.args[2]]);

      // template validation
      return validateTemplate.apply(null, test.args)
      .then(function (result) {
        debug('template validation sucessful, deploying template...');
        return deployTemplate.apply(null, test.args);
      })
      .then(function () {
        // success
        return assert(true);
      })
      .catch(function (err) {
        assert(false, errorString + ' nnServer Error:' + JSON.stringify(err));
      });
    });
  });
});

 

Doing Things in Parallel

This kind of testing is extremely IO bound, so running template validations in parallel can save a ton of time. A few things come together nicely here:

  • Azure Resource Management APIs support simultaneous deployments
  • Node.js is asynchronous, so our test validation server can handle many parallel test requests at once
  • Using the mocha.parallel extension, we can run batches of tests in parallel within Travis

Using the parallel function within mocha.parallel we can modify the code snippet from the last section to run in parallel by batching our tests in groups:

describe('Template Validation Suite', function() {
  /** code omitted for brevity **/

  // testGroups is an array of test arrays
  testGroups.forEach(function(tests) {
    tests.forEach(function(test) {
      parallel('Running ' + tests.length + ' Parallel Template Validation(s)...', function () {
        it(test.args[0] + ' & ' + test.args[1] + ' should be valid', function() {
          // validate template files are in correct place
          test.args.forEach(function (path) {
          var res = ensureExists.apply(null, [path]);
          });

          validateMetadata.apply(null, [test.args[2]]);

          return validateTemplate.apply(null, test.args)
          .then(function (result) {
          debug('template validation sucessful, deploying template...');
          return deployTemplate.apply(null, test.args);
          })
          .then(function () {
          // success
          return assert(true);
          })
          .catch(function (err) {
          assert(false, errorString + ' nnServer Error:' + JSON.stringify(err));
          });
        });
      });
    });
  });
});

 

You can batch tests by a configurable amount which can control the test run from violating subscription limitations by capping the number of parallel deployments.

Test output looks like below (some stack traces omitted for brevity):

Running 3 Parallel Template Validation(s)...
...
      ✓ 101-simple-linux-vm/azuredeploy.json & 101-simple-linux-vm/azuredeploy.parameters.json should be valid (153820ms)
      1) 101-tags-vm/azuredeploy.json & 101-tags-vm/azuredeploy.parameters.json should be valid
      2) 101-webapp-with-golang/azuredeploy.json & 101-webapp-with-golang/azuredeploy.parameters.json should be valid


  1 passing (3m)
  2 failing

  1) Template Validation Suite Running 3 Parallel Template Validation(s)... 101-tags-vm/azuredeploy.json & 101-tags-vm/azuredeploy.parameters.json should be valid:

      AssertionError: Template Validiation Failed. Try deploying your template with the commands:
azure group template validate --resource-group (your_group_name)  --template-file 101-tags-vm/azuredeploy.json --parameters-file 101-tags-vm/azuredeploy.parameters.json
azure group deployment create --resource-group (your_group_name)  --template-file 101-tags-vm/azuredeploy.json --parameters-file 101-tags-vm/azuredeploy.parameters.json

Server Error:{"error":"Deployment provisioning state was not successfuln","_rgName":"sedouard-ci4a407279-5a6b-3b78-7670-947cd5dd7bcf","command":"azure group deployment create --resource-group (your_group_name) --template-file azuredeploy.json --parameters-file azuredeploy.parameters.json","parameters":"{"$schema":"http://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#","contentVersion":"1.0.0.0","parameters":{"newStorageAccountName":{"value":"@invalid-account-name"},"adminUsername":{"value":"okdude"},"adminPassword":{"value":"Cortama131"},"dnsNameForPublicIP":{"value":"citest25c541ad90ba1dfd"},"departmentName":{"value":"myDepartment"},"applicationName":{"value":"myApp"},"createdBy":{"value":"myName"}}}"}
      + expected - actual

      -false
      +true

      at test/tests.js:288:13
  2) Template Validation Suite Running 3 Parallel Template Validation(s)... 101-webapp-with-golang/azuredeploy.json & 101-webapp-with-golang/azuredeploy.parameters.json should be valid:

      AssertionError: Template Validation Failed. Try deploying your template with the commands:
azure group template validate --resource-group (your_group_name)  --template-file 101-webapp-with-golang/azuredeploy.json --parameters-file 101-webapp-with-golang/azuredeploy.parameters.json
azure group deployment create --resource-group (your_group_name)  --template-file 101-webapp-with-golang/azuredeploy.json --parameters-file 101-webapp-with-golang/azuredeploy.parameters.json

Server Error:{"error":"Deployment provisioning state was not successfuln","_rgName":"sedouard-ci6df55539-52be-9916-e237-67bf0c4eba0f","command":"azure group deployment create --resource-group (your_group_name) --template-file azuredeploy.json --parameters-file azuredeploy.parameters.json","parameters":"{"$schema":"http://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json","contentVersion":"1.0.0.0","parameters":{"siteName":{"value":"$!@!@#D"},"appServicePlanName":{"value":"demoHostingPlan"},"siteLocation":{"value":"West U"},"sku":{"value":"Free"},"workerSize":{"value":"0"},"use32bitWorkerProcess":{"value":true},"enableAlwaysOn":{"value":false}}}"}
      + expected - actual

      -false
      +true

      at test/tests.js:288:13

 

The log above explains that 1 template validation passed and 2 failed. For each failure, you get a generated command line to re-run the deployment. You can take a look at a real run on Travis CI here.

Testing Changes Only

If you take a look at the Quickstart Templates, there are heaps of templates and running each template validation for every single commit doesn’t make much sense.

Travis CI makes available to us the commit range which is exposed via an environment variable, TRAVIS_COMMIT_RANGE. Its value looks something like:

TRAVIS_COMMIT_RANGE=[Last Commit Hash]...[First Commit Hash]

 

The Last Commit Hash value is the hash of the last commit on the branch that you are testing. The First Commit Hash value is the hash of the first commit representing the point the HEAD and BASE branch diverged.

By executing a git diff –name-only command using First Commit Hash…Last Commit Hash we can precisely calculate the changes in the pull request.

git diff --name-only 8965c253425198806c9cfd7f5abd10ced263dcbb…1f39d53c1b73959e73233fc2b9e5286be952cf83

  modified:   101-simple-linux-vm/azuredeploy.json
  modified:   101-tags-vm/azuredeploy.parameters.json
  modified:   101-webapp-with-golang/azuredeploy.parameters.json
  modified:   test/tests.js

 

Using this information, by matching directory names, like those shown above, we can generate tests only for templates that have changed. This method is necessary because of the vast number of templates and how forcing compliance of all templates at once is not reasonable.

By only testing changes to the repository, we decrease workflow time tremendously since contributors only have to be concerned about their own templates passing validation. At the same time, maintainers can run a test on the entire repository by changing an environment variable.

How to Submit Your Own Azure Template

Ensuring Your Template Compliance

Once you’re done creating your template, ensure:

  • Your template deploys via PowerShell, Azure CLI or the Azure portal
  • Any unique parameters in your azuredeploy.parameters.json file such as domain names or storage account names have the placeholder GEN-UNIQUE or GEN-UNIQUE-[N] where [N] is the length of the unique parameter.
  • Any public SSH key parameter in your azuredeploy.parameters.json file has SSH-PUB-KEY as the place-holder value
  • Read the contribution guide to make sure you comply with template folder naming conventions, and also follow the GitHub workflow

Checking Your Template Validation

When you create a new pull request to the Quickstart Templates repository, you’ll notice a box toward the bottom of the feed indicating your test run is progress:

GitHub in-progress dialog showing some checks haven't completed yet, with a check mark indicating the branch is up to date with the base branch

Click the ‘Details’ link, and you can see the tests run live, which will show you the actual error log output.

After some time (depending on how many templates you’ve submitted) you’ll either see a pass or fail message.

Pass:

Test pass dialog showing all checks have passed and the Travis CI build passed

Fail:

Test fail dialog showing all checks have failed and the Travis CI build failed

In the case your test fails and you aren’t exactly sure why, you can checkout the parameters used by the test in the error output. The tests use the cross-platform azure-cli and, as the test output suggests, you can do the template validation yourself by executing:

azure group create (your_group_name) westus

### Validate your template
azure group template validate --resource-group (your_group_name)  --template-file 201-1-vm-loadbalancer-2-nics/azuredeploy.json --parameters-file 201-1-vm-loadbalancer-2-nics/azuredeploy.parameters.json

### Deploy your template
azure group deployment create --resource-group (your_group_name)  --template-file 201-1-vm-loadbalancer-2-nics/azuredeploy.json --parameters-file 201-1-vm-loadbalancer-2-nics/azuredeploy.parameters.json

 

You can then check the Azure portal for the deployment error diagnostics. After your template passes validation, maintainers will see this on your pull request and be able to merge your template pending a code review.

Author

Anthony Turner
Sr. Software Security Engineer

0 comments

Discussion are closed.

Feedback