ISO Doc. # ISO/IEC JTC1/SC22/WG20 N1080 Korea Doc. # Korea JTC1/SC22WG20 K_______ Subject: some thoughts about documents SC22/WG20 N1037 and N1051 RE: Hangeul collating in the context of 14651 and UCA Date: 2003-10-16 Author: KIM, Kyongsok (Rep. of Korea) 0) many thanks to mark davis for N1037 and to kent karlsson for N1051. - they studied Korean Hangeul very hard :-) 1) hangeul sorting/collating rule (e.g. for modern hangeul) is not so complex to describe - N1037, UTS#10 (UCA for Hangeul) & N1051 . description of hangeul collating/sorting seems too complicated, too hard to follow; . Personally, I must confess that I could not verify whether the logic is correct or not. - most Koreans (especially linguists who will decide and describe the rules) will have difficulty following the logic in these two documents. - no well-established rule for old Hangeul... so.. cannot tell about Old Hangeul 2) RE: CTT -- concluded that CTT cannot handle properly such as - old Hangeul (esp. yet-to-be-found complex letters) and - compatibility Hangeul letters - in general, preprocessing is required to sort properly. 3) if there is no preprocessing in 14651 env. - we must first agree on a set of assumptions as to a) which will be collating correctly and which won't). E.g., old Hangeul not completely sorted, independent letters not completely sorted, Only Wanseong syllables (ac00-d7a3) supported, both Wanseong syllables and SYL-IPF letters (11xx) supported, etc. b) how we collate old Hangeul (not the standard Old Hangeul collating rule, but just for 14651 purpose) - there has been no consensus on the assumptions and experts suggested their solutions. - Only after we agree on the assumptions, we can start describing algorithms, implementation, etc. 4) Old Hangeul collating rule: - probably both linguists and comp. sci. people will set up a collating rule for Old Hangeul... (but there is none yet) - once the rule of collating letters (both simple and complex) is given, it is fairly straightforward to establish rule of collating syllables. . in other words, this framework can be described and I already wrote a document. 5) equivalent rules/methods/algo. etc - two different rules/methods/algos can produce the same results. - In such a situation, we could first describe them in an easy-to-understand way so that most people can understand without much difficulty. . after that, we could improve their performance, or could describe them in a compact manner, etc. to get an equivalent one. 6) a possibility of introducing a new construct in 14651 - e.g., kent karlsson's suggestion (20 page-long collating element for Thai) could be described in tens of lines "if" we could introduce a new construct that can describe such cases in a compact form. - since I was told that there is no "status" concept in 14651, I did not consider introducing a new construct. - However, a new construct could be one possibility for solving Hangeul problem in the future. 7) future work RE Hangeul collating in 14651 env.: a) define a few sets of assumptions and agree on one (or two or three?) reasonable sets of assumptions and then try to get a solution b) introducing a new construct in 14651 8) the relationship between 14651 and UTS #10 (UCA) - I can understand how 14651 works (no status concept, except in collating element) - collating element supports a limited form of status. - sometimes, it is said that UCA is equiv. to 14651; however, there seems some (much?) difference (e.g. preprocessing can be done in UCA but not in 14651?) - I cannot still understand what can/cannot be done in UCA, or the difference between these two. - As a result, Kent Karlsson's doc which talks in 14651 env. is relatively easier to understand than Mark Davis' paper which talks in UCA env. 9) personally, I am currently staying in US for one year. somewhat hard to discuss the issue with people in Korea... but will try... * * * --------------------------------------------------------------------- KIM, Kyongsok, Professor (visiting) College of Computing, Georgia Institute of Technolgy, Room 153, 801 Atlantic Drive, Atlanta, GA 30332-0280, U.S.A.