October 7th, 2019

Why does my string consist of this Korean character repeated over and over?

A customer reported that their program would sometimes print Chinese text instead of the actual desired value. Your initial reaction is probably, “Oh, I bet I know what’s going on. They’re displaying an ANSI string as if it were Unicode, amirite?”

And then you look at the screen shot.

췍췍췍췍췍췍췍

Okay, first of all, that’s not Chinese text. That’s Korean.

But I’ll forgive that error, because to the uninitiated, Chinese, Japanese, and Korean characters look alike: They are all monospace complex symbols. Of course, once you’ve become initiated, you can instantly tell them apart. The hard part is the initiation.

If you look more closely, you may even recognize the character as Unicode code point U+CDCD.

And that’s the key to the puzzle.

The byte 0xCD is a common fill byte. Visual Studio uses it in debug mode to represent uninitialized heap memory.

Therefore, the reason for the Korean character repeated over and over is that your so-called string is actually just uninitialized heap memory. Follow the money backward to the function which was supposed to fill it with data, and debug why that function failed. (While you’re at it, you might also want to add error checking, so that when that function fails, you don’t run ahead with uninitialized data.)

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

9 comments

Discussion is closed. Login to edit/delete existing comments.

Newest
Newest
Popular
Oldest
  • Robert Lim

    There’s a hidden bonus if you can read Korean, the way the character is pronounced is very similar to “check” in English!

  • GL

    This reminds me of an ancient joke: A newbie of C programming language yanked the power plug after he sees “烫烫烫” (lit. hot hot hot) printed on screen, in fear of the computer being overheated. “烫”, in code page 936, is 0xCCCC — that’s uninitialized stack.

  • cheong00

    I’m a bit troubled by the suggestion that the customer was shipping debug build of their software.

    • Scarlet Manuka

      Not necessarily. They might have found this error during QA, for example.

      OK, yes, probably they were shipping the debug build. I’m sure I’ve seen articles — whether here or on The Daily WTF I don’t know — about companies who shipped the debug build, because the release build kept crashing but the debug build worked fine (for certain values of “worked” and “fine”).

      Why take the time and trouble to fix your errors, when you can just get the system to probably mask them for you?

      • Raymond ChenMicrosoft employee Author

        The customer found this problem during development, even before handing over to QA. Developers use debug builds, so the problem was consistent there.

      • anonymous

        This comment has been deleted.

  • Paul Topping

    There’s money in uninitialized data? Who knew?

    • Piotr Siódmak

      there is money if it leads to an exploit which allows you to plant malware on the victim’s computer

  • Simon Clarkstone

    Round here, my font doesn’t support that character, so it’s a tiny box with
    CD
    CD
    in it, even easier to decode (if one thinks of looking).

Feedback