{"id":93122,"date":"2016-03-07T07:00:00","date_gmt":"2016-03-07T22:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/?p=93122"},"modified":"2020-10-07T21:08:39","modified_gmt":"2020-10-08T04:08:39","slug":"20160307-00","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20160307-00\/?p=93122","title":{"rendered":"On word breaking in Chinese and Japanese"},"content":{"rendered":"<p>In Western languages, you can generally break a line at whitespace. (You can also break a line within a word, subject to language-specific hyphenation rules, but let&#8217;s not get into that.) People unfamiliar with other language families sometimes wonder what&#8217;s up with line breaking in other languages. In particular, line breaking in Chinese and Japanese tend to elicit confused responses.<\/p>\n<blockquote class=\"q\">\n<p>When I put text in a static control and it does not fit, the behavior is different depending on whether I&#8217;m using Chinese characters or Latin characters. Why does the Chinese string wrap to the second line, but the Latin string does not?<\/p>\n<table border=\"0\">\n<tbody>\n<tr>\n<td style=\"border: inset 2px gray;\" width=\"50\">\n<div style=\"width: 8.5em; height: 2.5em; line-height: normal; overflow: hidden;\">\u3105\u3106\u3107\u3108\u3109\u310a\u310b\u310c\u310d\u310e\u310f\u3110<\/div>\n<\/td>\n<\/tr>\n<tr>\n<td style=\"border: inset 2px gray;\" width=\"50\">\n<div style=\"width: 8.5em; height: 2.5em; line-height: normal; overflow: hidden; word-break: normal;\">ABCDEFGHIJKLMNOPQRSTUVWXYZ.<\/div>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/blockquote>\n<p>In Chinese and Japanese, there are no spaces between words, so if you&#8217;re going to wait for a space before inserting a line break, you&#8217;re going to be waiting a long time. Instead, to a first approximation, line breaks are permitted after almost any character. (You can learn <a href=\"https:\/\/en.wikipedia.org\/wiki\/Line_breaking_rules_in_East_Asian_languages\"> the finer points of line breaking<\/a> from Wikipedia.)<\/p>\n<p>The static control uses <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/dd374091.aspx\"> Uniscribe<\/a> to decide where to insert line breaks, and Uniscribe understands that in Chinese and Japanese text, you can break after almost any character. That&#8217;s why you&#8217;re seeing a line break in the static control with Chinese text. On the other hand, the static control cannot find a valid word break in the Latin string, so it all gets jammed onto one line (and the excess gets clipped).<\/p>\n<p>The <code>Draw\u00adText<\/code> function also has rudimentary understanding of line breaks in Chinese, Japanese, and Korean text. You can override the default line breaking rule of &#8220;line breaks allowed after any full-width character&#8221; by passing the <code>DT_<wbr \/>NO\u00adFULL\u00adWIDTH\u00adCHAR\u00adBREAK<\/code> flag, which forces the <code>Draw\u00adText<\/code> function to break only at whitespace. (Basically, have it treat CJK characters as if they were Latin.)<\/p>\n<p>The documentation for <code>DT_<wbr \/>NO\u00adFULL\u00adWIDTH\u00adCHAR\u00adBREAK<\/code> notes that it may be useful to pass this flag if you know that the text is Korean, because Korean does put spaces between words, and preferring to break Korean text at whitespace can result in more attractive results. (The <code>Draw\u00adText<\/code> function is not very clever and does not try to autodetect whether the string is Korean. It is legal to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Korean_mixed_script\"> mix Chinese characters into Korean text<\/a>, and trying to figure out whether the string is &#8220;Mostly Korean with Chinese characters mixed in&#8221; or &#8220;Mostly Chinese with Korean mixed in&#8221; would require too much fuzzy logic for the simple <code>Draw\u00adText<\/code> function.)<\/p>\n<p><b>Bonus chatter<\/b>: You thought Chinese, Japanese, and Korean line breaking is hard. Thai is even harder. In Thai, words are run together with no spaces, but <a title=\"Rules for Breaking Lines in Asian Languages\" href=\"http:\/\/web.archive.org\/web\/20080925213906\/https:\/\/msdn.microsoft.com\/en-us\/goglobal\/bb688158.aspx#EEF\"> line breaks are permitted only between words<\/a>. This means that in order to break lines properly, you need a Thai dictionary.<\/p>\n<p><b>Bonus bonus chatter<\/b>: On that last page I linked to, there is a reference to the Windows Intelligent Font Emulator, which went by the acronym WIFE. Somebody probably worked really hard to retrofit that acronym.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Different rules.<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25],"class_list":["post-93122","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code"],"acf":[],"blog_post_summary":"<p>Different rules.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/93122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=93122"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/93122\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=93122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=93122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=93122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}