[Begin document N 1263] ISO/IEC JTC1/SC2/WG2 N 1263 Date: 1995-09-18 Title: On the complexity of Tibetan character names Source: Michael Everson, Everson Gunn Teoranta (WG2 member for Ireland) Status: Expert contribution Action: For consideration by WG2 Type: ASCII Version 1.0 Naming conventions and Tibetan. In a paper sent to BSI/IST/2 on 1995-09-04, a copy of which was sent to me, Hugh McGregor Ross raised some questions regarding the Tibetan character names adopted at the SC2/WG2 meeting in Helsinki in June 1995. In this letter, Hugh states that he regards "the 'dictionary spellings' of Tibetan names as suspect" and encourages me to write a paper (which I had promised to do) discussing them. This is that paper. 2.0 History of the names in the current Tibetan proposal. The names originally proposed in a paper to the Unicode Consortium by Peter Lofting in May 1995 differed from the names used by the Chinese Member Body in a number of proposals prior to May 1995. I was invited by the Unicode Consortium to review the Tibetan situation and comment on it before the UTC in May 1995, at which Chinese and Tibetan experts participated. The Chinese Member Body's proposals were transliterations, letter-by-letter representations of the way the Tibetan words are written. Peter's names were transcriptions, phonetic realizations of the way the Tibetan words are pronounced. Some fundamental difficulties in Tibetan orthography (discussed in 3.0 below) made me endeavour to work out a full set of both transliterations and transcriptions for all the characters. In addition, there were a number of characters without names supplied at the last minute by Nyima Trashi (i#-f-dq}-b#n! which could be written Nyima bKrashis or N?i-ma Bkra-c,is) for which I myself supplied names. I was in close contact with Peter Lofting as I prepared my report on Tibetan for the UTC, and the resulting convention, of using both transliterations and transcriptions, separated by the word OR, in 44 of the Tibetan names (they are not required in all of them), was arrived at by Peter and myself as the best, and only, possible and realistic compromise. Note please that Hugh Ross's suggestion in his paper to BSI/IST/2 that Lee Collins and I "tampered" with Peter Lofting's names is unfounded. Peter explained to me that after the UTC meeting in California, Lee Colins kindly edited up a "delta" name list that showed all the differences between the present and earlier Unicode Tibetan encodings. Not having an electronic form of Peter's list to hand, Lee edited the electronic text of my name list. The purpose of the list was to indicate the changes in repertoire and arrangement; names were not an issue and no special attention was given to them at that stage. In the proposal arising from the UTC only the transliterations were kept in the names, and the transcriptions were moved to the informative aliases in the usual Unicode format. After long discussion in several working sessions between myself and Nyima Trashi in Helsinki, in which we strove together to get correct spellings and correct pronunciations for each, a set of names was forwarded to WG2 in which the longer names were restored with improvements. 3.0 How to write and spell Tibetan. It is the general practice in ISO 10646 to transliterate, not to transcribe, names, following the ALA/LC rules or other rules when possible. In the case of Tibetan, it is relatively easy to transliterate. Several systems exist but they are nearly isomorphic, and no great confusion obtains if the user encounters one a bit different from the one he is used to using. I have used the system in Sarat Chandra Das's Tibetan English dictionary with Sanskrit synonyms (Reprinted from the 1902 ed. by Motilal Banarsidass, Delhi, 1976); this dictionary is easy to obtain today and gives headwords in Tibetan and in transliteration. I will discuss two of the characters with long names here. 0F35 TIBETAN MARK NGES-BZUNG NYI ZLA OR NYISONG NYI DA 0F17 TIBETAN SIGN SGRA-GCAN HCHAR RTAGS OR TRACHEN CHAR TA Das writes these n>es-bzun> n?i zla and Sgra-gcan hchar rtags. In Tibetan they are written: r*n-d;$r-i#-:@ and N@@-et]-zyc-@en Writing NG for n>, NY for n?, and ignoring the underscore diacritic is consistent with ISO 10646 practice. The problem, of course, is that the correct way of spelling the words is to use the Tibetan script. In Romanization, one may choose to transliterate, which allows one to reconstruct the Tibetan original, or to transcribe, which allows one to discuss the names with a Tibetan. If you say "S'gra-g'chan h'char r'tags" to a Tibetan he does not understand you. He understands "Trachen char ta". Neither can you spell the word "sa-gata-rata tsek ga-ca-na tsek ha-cha-ra tsek ra-tata-ga-sa", because Tibetans do not spell this way. Quoting from Herbert Bruce Hannah's Grammar of the Tibetan language (Reprinted from the 1912 ed. by Motilal Banarsidass, Delhi, 1985): "Tibetan spelling may be described as a cumulative process, one only of the component parts of a syllable being taken up at a time. Next, the sound so taken up is repeated, but with the addition in advance, or by way of assumption, of the second component part. Then this second component part is pronounced by itself. Finally the phonetic effect of all that has thus been taken up is pronounced together, and that effect represents the literal expression of the syllable." Hannah gives as one example the word we know in English as Darjeeling: "(-@*-e@#r- rDo-rje-gling 'Vajra place, power place' We might say, while spelling this according to our Latin habits: ra-data-naro, ra-jata-dengbu, ga-lata-gigu-nga Tibetans say, while spelling it (this is in Hannah's transcription): ra, da-ta, da; da, na-ro, do; ra, ja-ta, ja; ja-deng-bu, je; ga, la-ta, la; la, gi-gu, li; ling, nga, ling; do-je-ling. Nyima and I spent hours working together checking the Tibetan names, and we proved that both names are necessary and useful. The two spellings in Latin script are orthogonal representations of the underlying Tibetan spelling and neither can be represented by the other or in terms of the other. Transliteration cannot be represented coherently in phonetic terms and phonetic respellings cannot reliably represent the underlying spelling of Tibetan. This is analogous to the situation in English, where the phonetic reality (differing greatly from dialect to dialect) cannot be unambiguously deduced from English orthography. Hugh Ross stated in his paper that "What Everson seems to be attempting is to do something sensible withing the constraints of the Naming Guidelines which can only, in this situation, give nonsensical results." In faith, I cannot see how he can say this in light of the actual situation. There is value in being able to reconstruct the original Tibetan spelling (which you can only do by transliteration) and there is value in being able to pronounce the Tibetan names (which you can only do with transcription). It is nonsensical to try to decide which of the two approaches is "more important" than the other when, in fact, there is no compelling reason to choose between them in this case. 4.0 Changed names. Hugh pointed out that a few names, such as ANG KHANG, are used in preference to Peter Lofting's original names (in that case, GO KYIM). Sometimes this was because the names proposed by the Chinese were different from the names Peter proposed. It is the case that some of the symbols have more than one name. The vertical stroke SHAD OR SHEY, for instance, also has the following names: rkyang shad or kyang shey 'simple stroke' chig shad or chi shey 'single stroke' phur shad or phur shey 'nail stroke' Peter Lofting and I are preparing a comprehensive list of names which we will submit to the Tibetans for verification, correction, and emendation. We hope, for instance, that some of the names he and I have invented can be improved. We assume that Tibetan review of this document will be completed before publication of the next edition of ISO 10646, and that Tibetan recommendations vis a! vis that list will be taken into account before final publication. 5.0 A personal remark. It is unfortunate that the encoding of Tibetan has been rushed in the way that it has been. I myself have been put under pressure to get this document out to BSI before their next meeting, in order to answer Hugh Ross and possibly stave off unnecessary negative UK ballot comments. Tibetan is an important script, to be sure, and it has been an honour and a pleasure for me to work with the Tibetans, with members of WG2, and with the Unicode Consortium to encode Tibetan. But I have been at a loss to understand the urgency which has been given it on the part of the Chinese and the Unicode Consortium. Perhaps we may expect soon that a large number of Tibetan implementations will be made available. But in general I believe that the maxim "more haste, less speed" applies to WG2 work. 6.0 Rationale for keeping the long names. There are a number of reasons, each of them sufficient in itself, in my opinion. 6.1 Names are used in order to provide informative and useful identification of characters to human readers. In principle, the function of "unique identifier" is fulfulled by the numeric address of the character. The name is an identifier for human beings; if the requirement were only to have a unique identifier, any alphanumeric string would do. In choosing names for characters, we must endeavour to find the most unambiguous, unique, and informative name possible. If this is the requirement, then Tibetan experts with practical experience (Nyima Trashi, Peter Lofting, Michael Everson) agree that for general purposes and within the context of ISO Naming Guidelines, the longer names are more useful than shorter ones. 6.2 There is precedent for the use of OR already in the Standard: 05BC HEBREW POINT DAGESH OR MAPIQ A similar use of OR has been suggested for the Runic names, and may be useful for other characters and other scripts in future (Burmese for instance). Certainly this use of OR is not forbidden. 6.3 Inclusion of both forms in the full name of the character guarantees that both will always be available to users, whereas relegation of one or another of them to informative remarks either in parentheses next to the name or in Annex Q does not. Both forms are necessary for unambiguous identification of the character in various situations. 6.4 Deletion of either of the forms does not improve the names in any particular way. 6.5 The longest name proposed, 0F11 TIBETAN MARK RIN-CHEN SPUNGS SHAD OR RINCHEN PUNG SHEY, is 54 characters long. ISO/IEC 10646-1 already contains 123 characters whose names are longer. These longer names are between 55 and 83 characters in length. In any database system, the field length has to be determined by the longest permissible string. In the case of ISO 10646, the maximum space is already at least 83. List of 44 Tibetan character names, ranging between 25 and 54 characters in length. 0F0D TIBETAN MARK SHAD OR SHEY 25 0F8B TIBETAN MARK RGYINGS OR GIM 27 0F38 TIBETAN MARK CHE MGO OR CHE GO 30 0F83 TIBETAN SIGN SNA-LDAN OR NADAN 30 0F0C TIBETAN DELIMITER TSHEG OR TSEK 31 0F86 TIBETAN MARK LCI RTAGS OR JI TA 31 0F18 TIBETAN SIGN HKHYUD PA OR CHU PA 32 0F12 TIBETAN TRIPLE CROSS SHAD OR SHEY 33 0F1D TIBETAN SIGN ONE RDEL-NAG OR DENA 33 0F1E TIBETAN SIGN TWO RDEL-NAG OR DENA 33 0F34 TIBETAN MARK BSDUS RTAGS OR DU TA 33 0F7F TIBETAN SIGN RNAM-BCAD OR NAMCHEY 33 0F89 TIBETAN MARK MCHU CAN OR CHU CHEN 33 0F0E TIBETAN MARK NYIS SHAD OR NYI SHEY 34 0F14 TIBETAN MARK GTER SHAD OR TER SHEY 34 0F1A TIBETAN SIGN ONE RDEL-DKAR OR DEKA 34 0F1B TIBETAN SIGN TWO RDEL-DKAR OR DEKA 34 0F3F TIBETAN SIGN GYAS HKHYUD OR YE CHU 34 0F87 TIBETAN MARK YANG RTAGS OR YANG TA 34 0F8A TIBETAN MARK SQUARE RGYINGS OR GIM 34 0F3E TIBETAN SIGN GYON HKHYUD OR YUE CHU 35 0F05 TIBETAN MARK MEDIAL MGO-YIG OR GOYIK 36 0F08 TIBETAN MARK SBRUL SHAD OR DRUL SHEY 36 0F0F TIBETAN MARK TSHEG SHAD OR TSEK SHEY 36 0F1C TIBETAN SIGN THREE RDEL-DKAR OR DEKA 36 0F39 TIBETAN MARK SBRANG GSAD OR TRANG SE 36 0F04 TIBETAN MARK INITIAL MGO-YIG OR GOYIK 37 0F15 TIBETAN LOGOTYPE CHAD RTAGS OR CHE TA 37 0F16 TIBETAN LOGOTYPE LHAG RTAGS OR LAK TA 37 0F19 TIBETAN SIGN KDONG TSHUGS OR DONG TSU 37 0F06 TIBETAN MARK STANDING MGO-YIG OR GOYIK 38 0F0B TIBETAN INTERSYLLABIC MARK TSHEG OR TSEK 40 0F3C TIBETAN MARK LEFT HANG KHANG OR ANG KHANG 41 0F88 TIBETAN MARK LCE RTSA CAN OR CHE TSA CHEN 41 0F3D TIBETAN MARK RIGHT HANG KHANG OR ANG KHANG 42 0F07 TIBETAN MARK LEFT ORNAMENT MGO-YIG OR GOYIK 43 0F1F TIBETAN SIGN RDEL-DKAR RDEL-NAG OR DEKA DENA 44 0F10 TIBETAN MARK NYIS TSHEG SHAD OR NYI TSEK SHEY 45 0F82 TIBETAN SIGN NYI ZLA SNA-LDAN OR NYI DA NADAN 45 0F03 TIBETAN SYLLABLE HUM WITH GTER SHAD OR TER SHEY 47 0F35 TIBETAN MARK NGES-BZUNG NYI ZLA OR NYISONG NYI DA 49 0F37 TIBETAN MARK NGES-BZUNG SGOR RTAGS OR NYISONG GOR TA 52 0F17 TIBETAN SIGN SGRA-GCAN HCHAR RTAGS OR TRACHEN CHAR TA 53 0F11 TIBETAN MARK RIN-CHEN SPUNGS SHAD OR RINCHEN PUNG SHEY 54 List of 123 character names already in ISO 10646, ranging between 55 and 83 characters in length. 1F5F GREEK CAPITAL LETTER UPSILON WITH DASIA AND PERISPOMENI 55 (...) (121 characters deleted for the e-mail version of this document) (...) FBF9 ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM 83 [end of document N 1263]