Let’s Hack a Pipeline: Shared Infrastructure
Welcome back to Let’s Hack a Pipeline. We’ve seen argument injection and source code stealing. This week, we’ll wrap up the miniseries with Episode III: a Shared Infrastructure attack.
One more time: security is a shared responsibility. The purpose of this series is to showcase some potential pitfalls to help you avoid them.
Let’s say I’m part of a large company called Fabrikam.
Fabrikam’s Azure DevOps organization is divided into lots of separate projects.
We have a centralized team responsible for setting up pipelines infrastructure.
The central team has created an agent pool full of powerful build machines called
FabrikamPool is shared with several projects.
This way, every team has access to these beefy machines.
This isn’t so much “an” attack as a class of possible attacks. All I have to do is compromise one pipeline in any project. I could be an outside attacker or even a malicious insider. Maybe I use one of the attacks mentioned in previous Let’s Hack a Pipeline posts.
From there, I can do anything to the agent that its credentials allow. I can also change anything on the host machine that the agent has access to. Maybe I install a persistent backdoor which lets me remote into the machine. Or, maybe I install a filesystem watcher that can automatically steal code the next time a pipeline runs from another project. What about a hacked compiler that adds malicious code to everything it compiles? Once I’ve poisoned the agent, my attack possibilities are basically endless.
Why this works
When you run a pipeline job, you’re extending the trust boundary out to a machine that’s beyond Azure DevOps’s direct control. If multiple projects are each targeting the same agent instance, then everyone is at the mercy of the least-defended pipeline. And most self-hosted agents are persistent, meaning that the environment they run on lasts beyond the scope of a single job.
The agent segregates runs from different pipelines into their own folders, but that’s for convenience, not secure isolation. In fact, there’s a whole class of non-malicious, but still painful, “poisoning” that can take place. Imagine a pipeline which needed to test changing some operating system-level feature (TLS 1.2, localization) or installing a global package (new version of Python). All the other pipeline jobs which run on that agent would see the changed environment, whether or not they expected it.
Mitigating attacks on infrastructure
The best way to mitigate this attack is to not share infrastructure. Projects are a pretty firm security boundary, and shared pools violate that boundary.
One-time use agents
For any pipelines where you can, you should prefer using Microsoft-hosted or scale set agents with one-time use agents. Because the agent is reimaged after each use, there’s no standing or persistent infrastructure to attack.
If you can’t use Microsoft-hosted or scale set agents, run your self-hosted agent software with the minimal privileges needed to run your pipelines. Remember, the agent’s job is to run arbitrary, untrusted code (that’s what CI does, after all). Your host machine should treat the agent as potentially malicious.
Malicious pipelines in the same project
Security exists in balance with other attributes like usability and maintainability. While it would increase security to maintain a dedicated pool of agents per-pipeline, this would likely be expensive, hard to use, and hard to manage. We think that the project level is an appropriate middle ground: you get a fair amount of isolation while still keeping most of the benefits of shared infrastructure. Your security team may feel differently, and you should absolutely consider other spots along this spectrum.
Sharing pools across projects is tempting since you can potentially save money and complexity by setting up fewer agents. However, that configuration puts all pipelines in all projects at risk. The most insecure pipeline in any project becomes your best-case scenario.
Great post thanks!
Question around single use agents. When can we expect to have this support with containers within kubernetes?
We chose AKS as infrastructure to support our Self-hosted agents. Would be great if the agent deployment would scale dynamically or kill itself automatically when the job is done to prevent workspace sharing.
Getting an expert answer on this one – hang tight.
Thanks for the suggestion. We are evaluating this possibility at this time. I will have an update on the timelines in the next month or so.
If you check the Azure DevOps feature timeline then at least you can expect this in Q3 2020:
There is a specific item that will support AKS and elastic self-hosted agent pools: