The history of defect tracking in the Windows team goes back to Windows 1.0, which used a text file.
After Windows 1.01 released, a bunch of people in the apps division got together and threw together a bug tracking database. Because hey, a database, wouldn’t that be neat?
The name was chosen by vote among the team, and the selected name was RAID, which is the name of a brand of insecticide whose advertisements in the United States use the tag line “Kills bugs dead.” The icon for the program was a can of bug spray, naturally.
The letters RAID were retroactively declared to be an acronym for “Reporting and Incidents Database”, but nobody knew that or cared. It was RAID.
After you built a bug query, you could save it for future use, and the file extension was .rdq
, short for “RAID Query”.
The name RAID was linguistically productive, because you can “RAID a bug”, which means “File a bug in the project’s RAID database.” The .rdq
also could be used as its own noun, meaning the query file. “Can you send me the .rdq for the bugs we are reviewing tomorrow?”
The database was written back in the days of 16-bit computing, so naturally it had a limit of 32,767 bugs. This was sufficient for many years, but eventually products encountered the record limit and had to “roll over” to new databases, where all bugs from the old database that hadn’t yet been closed were copied to a new database (and received new record numbers), and the old database was put into read-only mode.
Naturally, this created confusion when you were reading through some code, and it had a comment like “This fixes bug 3141,” with no indication as to which bug database that bug number refers to.
I think Windows 95 went through three RAID databases during its life.
The original authors of RAID had no idea that their little bug tracking database tool would be the primary defect tracking tool across all of Microsoft for multiple decades. If they had known, they might have been too scared to write it. When looking back on the origin of RAID, one of the original developers confessed, “It really wasn’t made to last that long. Sorry!”
Another scalability problem was that by the time the Windows XP project was chugging along, you would get into situations where there were so many people using RAID at once that the server would simply stop accepting new connections. When the ship room convened to go over the state of the Windows project, they sometimes had to call into operations and ask them to kill a few active connections to the back-end database so that the ship room could connect.
It was clear that RAID was being pushed far beyond what it was originally designed. A new defect tracking system was developed, named Product Studio, because naming your app Something Studio was fashionable at the time.
Product Studio didn’t have a limit of 32,767 records. It used a three-tier architecture for improved reliability and flexibility. It supported file attachments!
Product Studio served as the primary bug-tracking database for many years. But even with its improved architecture, you often ran into cases where the app stopped responding and simply told you “There was an error contacting the middle tier.”
I liked to joke that we should just get rid of that middle tier. It’s always the one that’s causing problems.
Product Studio kept things going until Windows 8, at which point Windows switched to on-premise Team Foundation Services for work item tracking.
The most recent move was in Windows 10, when the Windows team switched to Visual Studio Online for its work item tracking database. Mind you, that doesn’t mean that things have been stable, because the name of the service changed from Visual Studio Online to Visual Studio Team Services, and then again to Azure DevOps Services.
Even Azure DevOps wasn’t big enough to contain all of the Windows work items. Periodically, old work items are archived and moved to another project.¹ But at least the remaining work items didn’t get renumbered. They kept their old numbers, thank goodness.
¹ Unfortunately, the archive project renumbers the work items. Fortunately, the original work item is remembered in the title, so you can do a search for originalid:3141 to find the old work item known as number 3141.
I was in Critical Problem Resolution for Exchange 1999-2010ish… I remember when we switched from RAID to Product Studio, while the interfaces were somewhat similar, the performance difference was like night and day. There were a lot of bugs initially, but it seemed that the PTT handled them pretty quickly when reported.
Where I’ve been since leaving MSFT, we use TFS, and it serves our purposes well.
While SD was kind of a PITA to work with, it worked well… but I would have loved to have something like TFS for the Exchange codebase back then
> Naturally, this created confusion when you were reading through some code, and it had a comment like “This fixes bug 3141,” with no indication as to which bug database that bug number refers to.
It’s easy. If the bugfix is for item 3141 on the third database, just say “This fixes bug 3-3141,” much like how we address the page we’re talking about in “volume+page_num” when discussing about books.
At the time we were writing fixes (in my case, Exchange), we didn’t necessarily know what database number/version it was… We just knew the bug number … RAID went away soon after I started at MSFT, but back then, many of us writing fixes in CPR didn’t know it’s back end limitations.
Exactly. Nobody remembers how many rollovers there have been. The number 3141 is the number that shows up in queries, reports, email, etc. It’s like nobody prints “206-555-1212 after the area code split of 1995” on their business cards. They just write “206-555-1212”, and then you call that number and it’s the wrong number, because the card was printed in 1996, and the number changed as a result of the area code split of 1997.
Yeah I would’ve thought so too. And fortunately if you’re referencing a bug because you’re fixing it, then it wouldn’t (shouldn’t!?) be found in a later bug database under a different number.
Sure, but there’s every possibility that a completely different bug is in the new database with that bug number. So when you look for the bug that was fixed you find some bug that (if you’re *lucky*) is in a completely different part of the application and couldn’t possibly have been fixed by the code you’re looking at.
If you’re unlucky, the new bug is just close enough to the old one that it’s vaguely plausible that there might be some connection between the bug and the code, and you waste a lot of time trying to puzzle out the details before you realise it was all an illusion.
Funny, when I saw RAID, the first thing that came to mind was Borland’s internal bug tracker, which was also known as RAID. Gee, I wonder where they got that name from…
Hong Kong have RAID (the insecticide) be sold, maybe it’s sold at UK too.
RAID and Mortein were the two big brands here in Australia when I was growing up. They still are, except I have the luxury of not seeing any TV ads these days so can only judge based on shelf space and eye-level placement when shopping for groceries.
> ¹ Unfortunately, the archive project renumbers the work items. Fortunately, the original work item is remembered in the title, so you can do a search for originalid:3141 to find the old work item known as number 3141.
Employees can also visit https://task.ms/3141 , which routes you to the bug you wanted, even if it was archived and renumbered.
The two things I remember most about RAID after all these years are:
1. Admin access to the RAID database was determined by having specific credentials in the connection string in the .rdq file. If you asked someone “Hey, can you send me the .rdq for the Frobnozzle database?”, and that someone was an admin for that database, there was a pretty good chance they’d accidentally send you the .rdq with the admin credentials, at which point _you_ were now an admin for the Frobnozzle database.
2. I was mildly disturbed the first time I ran a big query, waited for it for 10 minutes, and then got a message box saying “You have been chosen as the deadlock victim.”
This brings back memories. I was the first lead for Product Studio from ’99 until ’05. and it was one of a suite of tools that came out of the newly organised Productivity Tools Team which was part of the Windows division. The others which have been mentioned on this blog and elsewhere were Source Depot, for source control tracking, and LocStudio, which was the primary localisation tool for most of the other products we shipped. When I left, PTT was re-organised into DevDiv and most of my old team went on to build Team Foundation Server using much of the experience gained from working on Source Depot and Product Studio. Frankly, those six years at PTT were the best I spent in my 15 year career at Microsoft.
RAID, incidentally, wasn’t just one version but at least two. Office cloned the source and built their own custom version and one of my jobs when we brought RAID into PTT was to merge the Windows and Office versions into one. Office had its own peculiar customisations specific to how they worked so it wasn’t easy but the knowledge we learned went onto making Product Studio flexible enough for all of the big six divisions.
(The precursor to Source Depot came from Windows and was written by one of the original NT devs (Steve Wood, I think) in response to requirements to store the NT source. The challenges of getting the Windows source code onto Source Depot were legion and most of the technical designs we made were primarily to meet its requirements. In any case TFS wasn’t up to it so they went from Source Depot to an especially customised version of Git.)
I think SLM dates before NT.
I think you’re right. My memory is a bit hazy but I remember SLM quite well in the early days and seeing Steve’s name in the source code function headers.
When I joined MSFT in Jan ’94, both SLM and Windows NT existed. So SLM is at least as hold as ’94. How much older, I dunno.