Some people objected to the length, the structure, the metaphors, the speculation, and fabrication. So let’s say they were my editors. Here’s what the article might have looked like, had I taken their recommendations. (Some recommendations were to text that was also recommended cut. I applied the recommendations before cutting; the cuts are in gray.) You tell me whether you like the original or the edited version.
Back in the days of Windows 95 development,
one of my colleagues debugged
a line-of-business application for a major delivery service.
This was a program that the company gave to its top-tier
high-volume customers,
so that they could place and track their orders directly.
And by directly,
I mean that the program
dialed the modem
(since that was how computers communicated with each other back then)
to contact
|
[Length. The “top tier customer” part of the story is irrelevant.]
[Length. The mainframe part of the story is irrelevant.]
[Speculation.
No proof that the computer being dialed is a mainframe.
For all you know, it was an Apple ][ on the other end of the modem.]
|
Version 1.0 of the application had a notorious bug:
Ninety days after you installed the program,
it stopped working.
|
[Length. Version 1.0 is irrelevant.]
[Speculation.
No proof that the beta expiration code was left by mistake.
It could have been intentional, for whatever reason.
Probably some nefarious reason.]
|
Anyway, the bug that my colleague investigated was that If you entered a particular type of order with a particular set of options in a particular way, then the application crashed your system. Setting up a copy of the application in order to replicate the problem was itself a bit of an ordeal, but that’s a whole different story. |
[Length. Retransition no longer necessary.
The “setting up” story is irrelevant.]
|
Okay, the program is set up, and yup, it crashes exactly as described when run on Windows 95. Actually, it also crashes exactly as described when run on Windows 3.1. This is just plain an application bug. |
[Length. Irrelevant.]
|
The initial crash |
[Structure. Create heading (even though it gives away some of the story).]
|
|
[Fabrication.
All that is known is that there was a list box that lost focus to
a dialog box.]
|
Okay, well, that’s no big deal. A null pointer fault should just put up the Unrecoverable Application Error dialog box and close the program. Why does this particular null pointer fault crash the entire system? |
[Embellishment.]
|
Recovering from the crash |
[Structure. Create heading.]
|
|
[Speculation.
No way of knowing that this was what the developers were thinking
when they wrote the code.]
|
Now, 16-bit Windows didn’t have structured exception handling. The only type of exception handler was a global exception handler, and this wasn’t just global to the process. This was global to the entire system. Your exception handler was called for every exception everywhere. If you screwed it up, you screwed up the entire system. (I think you can see where this is going.) |
[Embellishment.] |
extern jmp_buf caught; extern BOOL trapExceptions; void scaryFunction(...) { if (setjmp(&caught)) return; trapExceptions = TRUE; ... body of function ... trapExceptions = FALSE; }
Their global exception handler checks the |
[Speculation.
No way of knowing that this was what the developers were thinking
when they wrote the code.
No proof that the code was first written without a global exception handler,
and that the handler was added later.
No proof that every such function set this variable.
No proof that the reason for adding the
setjmp was to
protect against null pointer failures.]
|
Yes, things are kind of messed up as a result of this. Yes, there is a memory leak. But at least their application didn’t crash. |
[Embellishment.]
|
On the other hand, if the global variable is Okay, so far so good, for certain values of good. |
[Embellishment.]
|
Failed recovery |
[Structure. Add heading here.]
|
These system-wide exception handlers had to be written in assembly
code because they were dispatched with a very strange calling
convention.
|
[Speculation.
No proof that the program was written with MFC in the
Microsoft Visual C++ IDE.
It could have been written with Notepad in
assembly language
that just happens to look like the assembly language generated by
the Microsoft Visual C++ compiler when it compiles code written in MFC.]
|
The |
[Need to explain the
DS register in case the reader cannot
infer this from the description that comes later.
We have established that neither the author nor the reader
is allowed to draw inferences.]
|
|
[Embellishment.]
|
The application crashes on a null pointer. The system-wide custom exception handler is called. The crash is not one that is being protected by the global variable, so the custom exception handler frees the application from memory. The system-wide custom exception handler now returns, but wait, what is it returning to?
The crash was in the application,
which means that the |
[Embellishment.]
|
That’s right, The system-wide custom exception handler crashed with an exception. |
[Embellishment]
|
The chain reaction |
[Structure. Add heading here.]
|
|
[Embellishment.]
|
Since an exception was raised, the custom exception handler is called recursively. Each time through the recursion, the custom exception handler frees all the DLLs and memory associated with the application. But that’s okay, right? Because the second and subsequent times, the memory was already freed, so the attempts to free them again will just fail with an invalid parameter error.
But wait, their list of DLLs associated
with the application included
|
[Speculation.
No way of knowing that this was what the developers
were thinking when they wrote the code.]
|
Therefore, each time through the loop,
the usage counts for
|
[Embellishment.]
|
Boom, bluescreen. Hot flaming death.
|
[Length. Irrelevant.]
|
Bonus chatter: What is that whole different story mentioned near the top? |
[Length. Cut the entire bonus chatter. Irrelevant story.]
|
Well, when the delivery service sent the latest version
of the software to the Windows 95 team,
they also provided an account number to use.
My colleague used that account number to
try to reproduce the problem,
and since the problem occurred only after the order was
submitted,
she would have to submit delivery requests,
|
[Fabrication.
No proof that these were the addresses and orders used.
All that is known is that fictitious orders were placed.]
|
After
about two weeks of this,
my colleague got a phone call from
people identifying themselves as
Microsoft’s
shipping department.
|
[Speculation.
No proof that the call truly came from the shipping department.
Could have been a lucky prank call.]
[Fabrication.
No transcript of this call exists.]
|
It turns out that the account number my colleague was given was Microsoft’s own corporate account number. As in a real live account. She was inadvertently prank-calling the delivery company and sending actual trucks all over the country to pick up nonexistent letters and packages. The people who identified themselves as Microsoft’s shipping department and people from the delivery service’s headquarters claimed that they were frantic trying to trace where all the bogus orders were coming from. |
[Hearsay.]
|
¹ Mind you, this sort of thing is the stuff that average Joe customers can do while still in their pajamas, but back in those days, it was a feature that only top-tier customers had access to, because, y’know, mainframe. |
0 comments