{"id":42643,"date":"2003-09-05T12:23:00","date_gmt":"2003-09-05T12:23:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2003\/09\/05\/case-mapping-on-unicode-is-hard\/"},"modified":"2003-09-05T12:23:00","modified_gmt":"2003-09-05T12:23:00","slug":"case-mapping-on-unicode-is-hard","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20030905-00\/?p=42643","title":{"rendered":"Case mapping on Unicode is hard"},"content":{"rendered":"\n<p>         Occasionally, I&#8217;m asked, &#8220;I have to identify strings that are identical, case-insensitively.&nbsp;         How do I do it?&#8221;      <\/p>\n<p>         The answer is, &#8220;Well, it depends. Whose case-mapping rules do you want to use?&#8221;      <\/p>\n<p>         Sometimes the reply is, &#8220;I want this to be language-independent.&#8221;      <\/p>\n<p>         Now you have a real problem.      <\/p>\n<p>         Every locale has its own case-mapping rules. Many of them are in conflict with the         rules for other locales. For example, which of the the following pairs of words compare         case-insensitive equal?      <\/p>\n<table>\n<tbody>\n<tr>\n<td>                     1.<\/td>\n<td>                     gif                  <\/td>\n<td>                     GIF                  <\/td>\n<\/tr>\n<tr>\n<td>                     2.<\/td>\n<td>                     Ma&szlig;e                  <\/td>\n<td>                     MASSE<\/td>\n<\/tr>\n<tr>\n<td>                     3.<\/td>\n<td>                     Ma&szlig;e                  <\/td>\n<td>                     Masse<\/td>\n<\/tr>\n<tr>\n<td>                     4.<\/td>\n<td>                     m&ecirc;me<\/td>\n<td>                     MEME<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>         Answers:      <\/p>\n<ol>\n<li>             no in Turkey, yes in US          <\/li>\n<li>             no in US, yes in Germany          <\/li>\n<li>             no in US, no in Germany, yes in Switzerland! (Though you would likely never see it             written as &#8220;Ma&szlig;e&#8221; in Switzerland.)          <\/li>\n<li>             yes in France, no in Quebec!          <\/li>\n<\/ol>\n<p>         (And I&#8217;ve heard that the capitalization rules for German are context-sensitive. Maybe         that changed with <a>the most recent spelling         reform<\/a>.) <a href=\"http:\/\/www.unicode.org\/reports\/tr21\/tr21-5.html\">Unicode Technical         Report #21<\/a> has more examples.      <\/p>\n<p>         Just because you&#8217;re using Unicode doesn&#8217;t mean that all your language problems are         solved. Indeed, the ability to represent characters in nearly all of the world&#8217;s languages         means that you have more things to worry about, not less.      <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Occasionally, I&#8217;m asked, &#8220;I have to identify strings that are identical, case-insensitively.&nbsp; How do I do it?&#8221; The answer is, &#8220;Well, it depends. Whose case-mapping rules do you want to use?&#8221; Sometimes the reply is, &#8220;I want this to be language-independent.&#8221; Now you have a real problem. Every locale has its own case-mapping rules. Many [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-42643","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Occasionally, I&#8217;m asked, &#8220;I have to identify strings that are identical, case-insensitively.&nbsp; How do I do it?&#8221; The answer is, &#8220;Well, it depends. Whose case-mapping rules do you want to use?&#8221; Sometimes the reply is, &#8220;I want this to be language-independent.&#8221; Now you have a real problem. Every locale has its own case-mapping rules. Many [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/42643","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=42643"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/42643\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=42643"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=42643"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=42643"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}