{"id":419,"date":"2022-03-02T11:30:01","date_gmt":"2022-03-02T19:30:01","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/math-in-office\/?p=419"},"modified":"2022-03-04T17:49:32","modified_gmt":"2022-03-05T01:49:32","slug":"two-phonetic-scripts-vietnamese-and-korean","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/math-in-office\/two-phonetic-scripts-vietnamese-and-korean\/","title":{"rendered":"Two Phonetic Scripts: Vietnamese and Korean"},"content":{"rendered":"<p>A few years ago, I visited two very interesting countries, Vietnam and South Korea. Being actively involved in writing software (mostly RichEdit) for editing the world\u2019s scripts, I was naturally fascinated to see Vietnamese and Korean text displayed in profusion. The Vietnamese and Korean scripts were designed with a common purpose in mind: enable the languages to be read and written easily by all members of their respective countries. Earlier on, people tried to write <a href=\"http:\/\/en.wikipedia.org\/wiki\/Vietnamese_alphabet\">Vietnamese<\/a> and <a href=\"http:\/\/en.wikipedia.org\/wiki\/Hangul\">Korean<\/a> by customizing the Chinese script. But while the Chinese script is well suited to Chinese languages, it\u2019s considerably less suited to Vietnamese and Korean. Accordingly, only a small percentage of the Vietnamese and Korean people were able to read and write their languages using the Chinese script.<\/p>\n<p>In Vietnam in the 1500\u2019s and 1600\u2019s, Portuguese and French missionaries wanted to be able to read and write Vietnamese and to communicate with the Vietnamese people in writing as well as verbally. To this end, they chose a Latin alphabetic script with the letters a..z, \u0111, \u00e2, \u0103, \u00ea, \u00f4, \u01a1, \u01b0 plus the corresponding upper-case letters and five tone marks\u00a0 \u00a0\u0300 \u00a0\u0301 \u00a0\u0303 \u00a0\u0309 \u00a0\u0323\u00a0(acute, grave, tilde, hook, and dot below, defined in the Unicode U+0300 block) for a total of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Vietnamese_Standard_Code_for_Information_Interchange\">134 characters<\/a>. This alphabetic script represented the Vietnamese language phonetically. The traditional Chinese orthography continued to be dominant until the early 1900\u2019s, when the alphabetic script took over. In Vietnam today you still see Chinese characters, but mostly on old buildings and manuscripts. The vast majority of Vietnamese text uses the alphabetic script. All 134 characters were encoded in Unicode 1.1 (June 1993). Initially people used 8-bit code pages such as <a href=\"http:\/\/en.wikipedia.org\/wiki\/Windows-1258\">1258<\/a> to encode the Vietnamese characters. But since Unicode has all the characters, it\u2019s much more efficient to use them.<\/p>\n<p>The default Windows 11 Vietnamese keyboard encodes the tone marks as combining marks in the U+0300 block instead of using the fully composed characters. This requires complex-script shaping, which slows down the display. Admittedly shaping engines can perform other useful tasks such as kerning and ligature formation, yielding finer typography. And a Vietnamese tone mark applies to a whole syllable, so it doesn\u2019t have to be placed where a fully composed vowel has it. But web sites such as Wikipedia use fully composed Unicode characters. In Windows 11 you can install the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Telex_(input_method)\">Telex<\/a> and\/or the <a href=\"http:\/\/en.wikipedia.org\/wiki\/VNI\">VNI<\/a> keyboards, which have slicker ways to enter Vietnamese characters and insert fully composed characters. VNI\u2019s option of automagically inserting the accents is particularly intriguing. To use one of these methods, press Windows &gt; Settings &gt; Time &amp; Language &gt; Language &amp; region &gt; Add a keyboard and add Vietnamese. Click on the Vietnamese keyboard Options \u201c\u2026\u201d \u00a0and you can choose a more advanced Vietnamese keyboard from the drop-down menu<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/Vietnamese-drop-down.png\"><img decoding=\"async\" class=\"alignnone size-medium wp-image-426\" src=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/Vietnamese-drop-down-300x153.png\" alt=\"Image Vietnamese drop down\" width=\"300\" height=\"153\" srcset=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/Vietnamese-drop-down-300x153.png 300w, https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/Vietnamese-drop-down-768x391.png 768w, https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/Vietnamese-drop-down.png 969w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>While foreign missionaries were responsible for the Vietnamese script, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Sejong_the_Great\">King Sejong<\/a> of Korea was responsible for the Korean script. His motivation was essentially the same as the European missionaries\u2019: make it easy for all Koreans to read and write their language. His original script published in 1446 had only 24 characters, called jamo, as shown in the following picture taken of an interactive display in the <a href=\"http:\/\/en.wikipedia.org\/wiki\/National_Palace_Museum_of_Korea\">National Palace Museum of Korea<\/a> in Seoul.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/jamo.jpg\"><img decoding=\"async\" class=\"size-medium wp-image-420 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/jamo-300x164.jpg\" alt=\"Image jamo\" width=\"300\" height=\"164\" srcset=\"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/jamo-300x164.jpg 300w, https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/jamo-1024x559.jpg 1024w, https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/jamo-768x419.jpg 768w, https:\/\/devblogs.microsoft.com\/math-in-office\/wp-content\/uploads\/sites\/65\/2022\/03\/jamo.jpg 1430w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>Modern Korean requires more: 19 initial consonants (C), 21 vowels (V) and 27 final consonants (T). The final consonants include most of the initial consonants and add some others. The jamo are displayed in boxes called <a href=\"http:\/\/en.wikipedia.org\/wiki\/Hangul\">Hangul<\/a> syllables. There are 19\u00d721 CV combinations and 19\u00d721\u00d727 CVT combinations for a total of 11172 possible Hangul syllables in modern Korean. The jamo are encoded in the Unicode U+1100 block (C\u2014U+1100..U+1112, V\u2014U+1161..U+1175, T\u2014U+11A8..U+11C2) and the 11172 Hangul syllables are encoded from U+AC00..U+D7A3 in CVT sort order (T varies fastest, C varies slowest).<\/p>\n<p>If you look at the <a href=\"http:\/\/www.unicode.org\/charts\/PDF\/U1100.pdf\">Unicode U+1100 block<\/a>, you\u2019ll notice it\u2019s full: 256 jamo! That\u2019s more than 19 + 21 + 27. The major difference is the inclusion of many old Hangul jamo that are not used in Modern Korean. Modern Korean can be handled as a simple script: just use the Hangul symbols for which no glyph shaping is needed. In contrast, Old Hangul has many more combinations and needs to have a shaping engine to place the jamo correctly. The Unicode Standard explains how to do this in <a href=\"http:\/\/www.unicode.org\/versions\/Unicode7.0.0\/ch03.pdf\">Chapter 3<\/a>, Section 3.12 Conjoining Jamo Behavior.<\/p>\n<p>Some interesting Unicode Hangul history. Non-combining jamo (U+3130..U+318F) and 2350 Hangul syllables (U+3400..U+3D2D) were part of Unicode 1.0 (October, 1991). Unicode 1.1 (June 1993) added the modern combining jamo (U+1100 block) and 4306 more Hangul syllables. The Korean government wanted the remaining 11172 \u2013 4306 \u2013 2350 = 4516 syllables of Modern Korean to be added as well and preferably to collect all the syllables in a single block. I had just joined the Unicode Technical Committee (over 26 years ago!) and it seemed to us to be a shame to have the Hangul syllables split up into three blocks. Furthermore, Unicode wasn\u2019t yet used for Korean anywhere as far as we could tell. Windows NT had support for Unicode, but nothing special for Hangul. Other operating systems didn\u2019t even support Unicode at that time. Word processing programs that supported Korean used a Korean code page, not Unicode. S.G. Hong of the Microsoft Korean subsidiary pleaded for us to use a single block and after considerable deliberation the UTC and WG2 (the ISO 10646 working group on character sets) elected to do so. Hence in Unicode 2.0 (July 1996) the two earlier Hangul blocks were deprecated and the Hangul syllables were assigned U+AC00..U+D7A3 in the ideal alphabetic order. To this day, no one has come up with a Korean document that was compromised by these changes. But you should have heard the outcries of folks that were upset that the old codes were deprecated.<\/p>\n<p>Ever since then, Unicode code points have been completely stable and such stability is a basic requirement. Shortly after the release of Unicode 2.0, Word 1997 was released. Based on Unicode, it supported the modern Hangul syllables. At that point it would have been unthinkable to change the code points since documents actually existed that used the code points. Fortunately, we were able to make the changes early enough in Unicode\u2019s history that Korea enjoys excellent Unicode support. I couldn\u2019t help but think of that a bit while walking through the streets and palaces of beautiful downtown Seoul.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few years ago, I visited two very interesting countries, Vietnam and South Korea. Being actively involved in writing software (mostly RichEdit) for editing the world\u2019s scripts, I was naturally fascinated to see Vietnamese and Korean text displayed in profusion. The Vietnamese and Korean scripts were designed with a common purpose in mind: enable the [&hellip;]<\/p>\n","protected":false},"author":40611,"featured_media":55,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-419","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-math-in-office"],"acf":[],"blog_post_summary":"<p>A few years ago, I visited two very interesting countries, Vietnam and South Korea. Being actively involved in writing software (mostly RichEdit) for editing the world\u2019s scripts, I was naturally fascinated to see Vietnamese and Korean text displayed in profusion. The Vietnamese and Korean scripts were designed with a common purpose in mind: enable the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/419","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/users\/40611"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/comments?post=419"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/posts\/419\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media\/55"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/media?parent=419"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/categories?post=419"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/math-in-office\/wp-json\/wp\/v2\/tags?post=419"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}