Caching and faster artifacts in Azure Pipelines

Alex Mullans

Alex

I’m excited to announce the public previews of pipeline caching and pipeline artifacts in Azure Pipelines. Together, these technologies can make every run of your pipeline faster by accelerating the transfer of artifacts between jobs and stages, and by caching the results of common operations like package restores.

Pipeline caching

Pipeline caching introduces a new CacheBeta task that takes a path of files to cache and a cache key. A cache key can be the contents of a file (like a package lockfile), a string of your choice, or a combination of both.

For example, to cache Node.js dependencies installed with Yarn:

steps:
- task: NodeTool@0
  inputs:
    versionSpec: '10.x'
  displayName: 'Install Node.js 10.x'

- task: CacheBeta@0
  inputs:
    key: |
      $(Agent.OS)
      $(Build.SourcesDirectory)/yarn.lock
    path: $(Pipeline.Workspace)/.cache/yarn
  displayName: 'Cache yarn'

- script: yarn install

As we’ve implemented pipeline caching, we’ve learned that every tool behaves differently. So, we’re excited to release this preview and see the cache used in the wild. If you try out the cache and find it doesn’t improve the performance of the step you’re caching, we’d like to hear about it as an issue on azure-pipelines-tasks. If you can create a public repo and pipeline that we can fork to reproduce your issue, all the better. We’ll be listening to these issues and tuning the cache.

We’ve already got some improvements, including preserving file attributes and fallback keys, that we’ll be shipping while the preview is running. We look forward to your ideas and feedback.

There’s a possibility we’ll make breaking changes between v0 and v1 of the task, so we recommend not yet including the cache in production/master branch CI builds.

Learn more about pipeline caching on docs.microsoft.com.

Pipeline artifacts

Pipeline artifacts are the built-in way to move files between jobs and stages of your pipeline and to save files from your pipeline for later use. Pipeline artifacts intelligently deduplicate content and only upload net-new content to the server, which helps speed up repeated builds of the same repository.

To use Pipeline artifacts, just use the publish and download YAML shortcuts, like this:

steps:
- publish: bin
  artifact: binaries

By default, all artifacts published by previous jobs are downloaded at the beginning of subsequent jobs, so it’s not necessary to add a download step. If you want to control this behavior, use the download shortcut, like this:

steps:
- download: none

For most common use cases, the publish and download shortcuts are recommended. However, if you need more control over how your artifacts are downloaded, you can also use the Download Pipeline Artifact and Publish Pipeline Artifact tasks directly.

Learn more about pipeline artifacts on docs.microsoft.com. And, if you run into issues, let us know on Developer Community.

Alex Mullans
Alex Mullans

Senior Program Manager, Azure Artifacts

Follow Alex   

13 comments

  • Avatar
    Sam Smith

    Hi Alex. Thanks for posting this. I’m confused how I would use this with my NuGet packages, so I don’t have to run a dotnet restore for every build. I don’t see any docs about this. Did I miss something? 

  • Adriaan de Beer
    Adriaan de Beer

    That’s awesome, thanks Alex – this should greatly help with slow nuget restores and (not tried yet) multiple stages within same pipeline that use same artifacts.
    To get it to work with Nuget I used this:
    steps:- task: NuGetToolInstaller@0displayName: ‘Use NuGet 4.4.1’inputs:versionSpec: 4.4.1
    – task: CacheBeta@0displayName: ‘CacheBeta (to speed up nuget restore)’inputs:key: |$(Agent.OS)$(Build.SourcesDirectory)/Src/Apps/MyProj/MyProj/packages.config”Nuget_V4.4.1″path: ‘$(Build.SourcesDirectory)/Src/Apps/MyProj/packages’
    – task: NuGetCommand@2displayName: ‘NuGet restore’inputs:restoreSolution: ‘$(Parameters.solution)’vstsFeed: ‘/79667df1-fe11-4171-bdb6-927e0ca39336’restoreDirectory: ‘$(Build.SourcesDirectory)/Src/Apps/MyProj/packages’verbosityRestore: Normal

    Specifically – to get it working, I had to explicitly tell Azure DevOps Nuget restore task what package directory to use (the NUGET_PACKAGES environment doesn’t seem applicable).That said…some observations
    Got about 50% improvement with Nuget Restore – specifically, down from 2m20s to 1m15s. Nuget being nuget still seems to be unecessary slow as the Cache tool only takes 30 seconds (so in reality, I only saved 30s)What would be even better is to conditionally running Nuget Restore based on whether or not there was a cache hit.  Likely this would be very handy for pipelines too

  • Adriaan de Beer
    Adriaan de Beer

    Brilliant – I just noticed you can output to a variable whether or not there was a cache hit. This allows you to then conditionally disable unecessary tasks.
    So, to get this working just add a cacheHitVar to the cache step – e.g
      cacheHitVar: ‘Nuget_Cache_Hit’
    And then for next step, (in my cache Nuget Restore) command custom condition to:
      and(succeeded(), eq(variables[‘Nuget_Cache_Hit’], ‘false’))

    Unfortunately, even though above works, the project compile still failed and wanted a proper nuget restore. So unfortunately, the above doesn’t work for Nuget restore scenario.
    Specifically, got the following error:
    Error NETSDK1004: Assets file ‘D:\a\1\s\Src\Public\MyProj\obj\project.assets.json’ not found. Run a NuGet package restore to generate this file.
    Any chance you can further enhance/simplify/document the nuget restore scenario? 

  • Avatar
    Ben Coleman

    Be useful to have some examples of caching with various package management systems
    I’m using Python 3 (with the UsePythonVersion task) and pip. I run `pip install -r requirements.txt` which is a standard step in any Python build, and I’d really like to cache the packages. However it’s not clear what path I would need to cache.

  • Avatar
    Riccardo Corradin

    Hi Alex,
    Thanks for this awesome news. I experimented with this and I must say I stopped using it and reverted back to the pipeline cache solution from MS Devlabs: https://marketplace.visualstudio.com/items?itemName=1ESLighthouseEng.PipelineArtifactCaching
    The main reason is that the latter is much faster in restoring caches, see my screenshots for a comparison.
    Ok I apparently am unable to insert screenshots in this forum, if you contact me I am happy to provide them for you.

    I tested with a node_modules size of 434.8MBBeta caching: 1m 10sMS Devlabs solution: 23s

  • Abhishek Kumar
    Abhishek Kumar

    Hi Alex,
    Seems this is interesting feature. We are in process of creating pipeline for a huge project where in a single Azure project we will have around 700+ java azure repo and so build pipeline.
    The build process is in maven where we will use this heavily but our concern is if we are making pom.xml as key (we have main and child poms) how this solution will benefit. As for every branch there will be different cache(what i understood from documents). How can we achieve cache on Azure project level apart from having cache on each pipeline and for each branch? ie all 700+builds of project will use the same cache and in the process if any new dependency added it will be added to cache as well?

    Currently we are running our builds on jenkins server where we have common location as */.m2/repositiry. Any new dependency of any build has been downloaded here once and this dependency is available for all subsequent builds.

    My question was how we can achieve same kind of behaviour(common cache across project) in azure cache.

    Thanks,
    Abhishek

Leave a comment