A colleague told a story of how he once broke the entire Office division’s ability to check in code because he accidentally checked in a syntax error to the script that is used to verify that your proposed change has satisfied all the pre-submit requirements such as passing static analysis and unit testing.¹
Since the script was now failing, all attempts to check in code were being blocked, including the attempts to fix the script itself!
In order to get things working again, they had to find someone who had access to the console of the machine that runs the validation tests and manually edit the script so that the script repair check-in would pass its own validation.
Just another example of I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS.
¹ The error was caused by smart quotes being used by mistake instead of straight quotes. He doesn’t know how they snuck in, and the two styles of quotation marks were sufficiently similar that it eluded everyone’s notice.
At previous company, most of the teams would host their on-call playbooks at a site, one team was maintaining. That team also hosted their playbook there (“dogfooded”), but the story (I heard) was that when their service went down – they could not… well 🙂 – follow their own playbook. Since that time (many years ago), it was mandatory (somewhat) that you need to host your uptodate playbooks on USB stick, or some other way.
Wow, the linked PDF is (1) hilarious and (2) frankly offensive to HCI people. But despite being a part-time HCI person myself, I think given the trajectory from the “decently well designed, if flawed here and there” UI of windows 7, to Windows 11, that HCI people deserve every bit of grief they get. Maybe more.
I considered this kind of problem when I made a little language for a calculator app. I wanted people to be able to add a code snippet to a Word doc and still have the code be correct. (And one of those people, of course, is me when I went to make the manual).
My solution was made sure that every character that Word converted into something else would also be valid in the language. So...
That’s not as bad as the time Facebook messed up their network configuration, so they could no longer access the machines inside the data centers to fix the problem, and also the door locks didn’t work anymore for the same reason, so they could not get inside the data centers to fix the problem.
https://www.datacenterdynamics.com/en/news/facebook-blames-major-outage-on-maintenance-work-effectively-disconnecting-facebook-data-centers-globally/
And that’s why a few trusted and tenured developers should have a way to bypass the checks and push emergency fixes directly.
Is’nt that the path to a cve, or an internal leak of ip to the world in these modern times?
Well if x and y say their z is ok then nevermind.
Not if you do it well, and you choose the trusted developers carefully.
I've worked places with trusted developers who can bypass the checks where they have to switch accounts to bypass the checks, and everything they do while in the bypass account creates an audit trail - which is monitored by a security team in real time, and also saved for later checks.
This tends to prevent both CVEs and leaks, because the people with access...
It can be, but you set up the system to scream at everybody when they do it. So unless it’s expected it triggers an investigation.