April 13th, 2004

Unicode collation is hard

The principle of “garbage in, garbage out” applies to Unicode collation. If you hand it a meaningless string and ask to compare it to another meaningless string, you get meaningless results.

I am not a Unicode expert; I just play one on the web. A real Unicode expert is Michael Kaplan, whose explanation of how comparing invalid Unicode strings result in nonsensical results I strongly recommend to those who attempt to generate random test strings in Unicode.

Topics
Code

Author

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

0 comments

Discussion are closed.