A customer reported that their program would sometimes print Chinese text instead of the actual desired value. Your initial reaction is probably, “Oh, I bet I know what’s going on. They’re displaying an ANSI string as if it were Unicode, amirite?”
And then you look at the screen shot.
췍췍췍췍췍췍췍 |
Okay, first of all, that’s not Chinese text. That’s Korean.
But I’ll forgive that error, because to the uninitiated, Chinese, Japanese, and Korean characters look alike: They are all monospace complex symbols. Of course, once you’ve become initiated, you can instantly tell them apart. The hard part is the initiation.
If you look more closely, you may even recognize the character as Unicode code point U+CDCD.
And that’s the key to the puzzle.
The byte 0xCD
is a common fill byte. Visual Studio uses it in debug mode to represent uninitialized heap memory.
Therefore, the reason for the Korean character repeated over and over is that your so-called string is actually just uninitialized heap memory. Follow the money backward to the function which was supposed to fill it with data, and debug why that function failed. (While you’re at it, you might also want to add error checking, so that when that function fails, you don’t run ahead with uninitialized data.)
There’s a hidden bonus if you can read Korean, the way the character is pronounced is very similar to “check” in English!
This reminds me of an ancient joke: A newbie of C programming language yanked the power plug after he sees “烫烫烫” (lit. hot hot hot) printed on screen, in fear of the computer being overheated. “烫”, in code page 936, is 0xCCCC — that’s uninitialized stack.
I’m a bit troubled by the suggestion that the customer was shipping debug build of their software.
Not necessarily. They might have found this error during QA, for example.
OK, yes, probably they were shipping the debug build. I'm sure I've seen articles — whether here or on The Daily WTF I don't know — about companies who shipped the debug build, because the release build kept crashing but the debug build worked fine (for certain values of "worked" and "fine").
Why take the time and trouble to fix your errors, when you can...
The customer found this problem during development, even before handing over to QA. Developers use debug builds, so the problem was consistent there.
This comment has been deleted.
There’s money in uninitialized data? Who knew?
there is money if it leads to an exploit which allows you to plant malware on the victim’s computer
Round here, my font doesn’t support that character, so it’s a tiny box with
CD
CD
in it, even easier to decode (if one thinks of looking).