A colleague of mine related to me this story about uninitialized
floating point variables.
He had a function that went something like this,
simplified for expository purposes.
The infoType
parameter specified which piece of
information you’re requesting,
and depending on what you’re asking for,
one or the other of the output parameters may not contain a
meaningful result.
BOOL GetInfo(int infoType, int *intResult, double *dblResult) { int intValue; double dblValue;switch (infoType) { case NUMBER_OF_GLOBS: intValue = …; break;
case AVERAGE_GLOB_SIZE: dblValue = …; break; … } *intResult = intValue; *dblResult = dblValue; … }
After the product shipped, they started geting crash reports.
This was in the days before Windows Error Reporting,
so all they had to work from was the faulting address,
which implicated the line
*dblResult = dblValue
.
My colleague initially suspected that dblResult
was an invalid pointer, but a search of the entire code base
ruled out that possibility.
The problem was the use of an uninitialized floating point
variable.
Unlike integers, not all bit patterns are valid for use as
floating point values.
There is a category of values
known as signaling NaNs,
or
SNaN
for short,
which are special “not a number” values.
If you ask the processor to,
it will keep an eye out for these signaling NaNs
and raise an “invalid operand” exception when one
is encountered.
(This, after all, is the whole reason why it’s called a signaling
NaN.)
The problem was that, if you are sufficiently unlucky,
the leftover values in the memory assigned to the
dblValue
will happen to
have a bit pattern corresponding to a SNaN
.
And then when the processor tries to copy it to
dblResult
, then exception is raised.
There’s another puzzle lurking behind this one: Why wasn’t this problem caught in internal testing? We’ll learn about that next time.
0 comments