ISO/IEC JTC1/SC2/WG2 N1132R Date: 15 April 1995 Title: Proposal Summary for ISO/IEC JTC1/SC2/WG2 N1058 Source: Michael Everson, Everson Gunn Teoranta (WG2 member for Ireland) Status: National position Action: For consideration by WG2 This is a revised Proposal Summary (ISO/IEC JTC1/SC2/WG2 form N1116F) for document N1058, "Proposal to ISO/IEC 10646-1 for support of Irish Gaelic characters". A. Administrative 1. Title: Proposal to ISO/IEC 10646-1 for support of Irish Gaelic characters 2. Requester's name: Michael Everson, Everson Gunn Teoranta 3. Requester type: Member body (Irish national position) 4. Submission date: 1995-04-15 5. Requester's reference: EGT SC2/WG2 SEIMHIU 940815, JTC1/SC2/WG2 N1058 6. Type of proposal: This is a complete proposal. The following two items are to be completed by WG2: a. Relevant SC2/WG2 document numbers: b. Status (list of meeting number and corresponding action or disposition): c. Interested parties contacted B. Technical (General) 1. Nature of proposal: This proposal is for the addition of a set of characters to an existing block. Name of the existing block: Table 32, Latin Extended Additional. 2. Number of characters in proposal: Three characters. 3. Proposed category per SC2/WG2 N1116: Category A. 4. Proposed Level of Implementation: LATIN SMALL LETTER LONG S WITH DOT ABOVE: Level 1 LATIN CAPITAL LETTER SEIMHIU: Level 1 LATIN SMALL LETTER SEIMHIU: Level 1 Is a rationale provided for the choice? Yes. SMALL LETTER LONG S WITH DOT ABOVE is required at level for support of existing de facto character set standards. Level 1 is required for implementation of CAPITAL LETTER SEIMHIU and SMALL LETTER SEIMHIU on existing 8-bit systems. 5. Is a repertoire including character names provided?: Yes. a. If YES, are the names in accordance with the 'character naming guidelines' in Annex K of ISO/IEC 10646-1? Yes. b. Are the character shapes legible? Yes, see below. 6. Who will provide the appropriate computerized font for publishing the standard? Michael Everson, Everson Gunn Teoranta; TrueType format If available now, identify source(s) for the font: Michael Everson, Everson Gunn Teoranta, 15 Port Chaeimhghein Iochtarach, Baile Atha Cliath 2, Eire. 7. References: a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided: Yes, see Exhibit 9.0 in N1058. b Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached? Yes, see Exhibits 1-10 in N1058. C. Technical (Justification) 1. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other experts, etc.)? Yes. These characters have been approved by NSAI (AGITS committee), by vendors (EGT), and by experts (Royal Irish Academy CURIA text encoding initiative project). 2. Information on the user community for the proposed characters (for example: size, demographics, information technology use, or publishing use) is included. A million or so people are able to use Irish Gaelic with some facility. There are no estimates on the number of those people who do text processing in Irish, and no estimates on the number of those who do text processing with the Gaelic script. On the other hand, Everson Gunn Teoranta and other publishing companies are using the Gaelic script, and one government department recently bought Gaelic fonts from EGT for printing their Christmas cards. So usage is fairly ordinary and widespread. (Cf. @2.2 in N1058.) 3. The context of use for the proposed characters (type of use; common or rare) is included. LATIN SMALL LETTER LONG S WITH DOT ABOVE is required for compatibility with existing character set standards (cf. @4.5 in N1058). LATIN LETTER SEIMHIU is required for text processing and unambiguous reversible transfer of data for texts encoded in the Gaelic variant and the Roman variant of the Latin script (see @7.1-7.3 in N1058). One if its main uses will be the scanning in by OCR for republication of over 200 years of printed literature. The usefulness of the SEIMHIU characters has been endorsed by Dr. Patricia Kelly of the CURIA project, the principal Irish-language text encoding initiative based in University College Cork and in the Royal Irish Academy. 4. Are the proposed characters in current use by the user community? LATIN SMALL LETTER LONG S WITH DOT ABOVE is in current use in de facto coded character set standards. See Exhibits 9-10 in N1058. SEIMHIU has not yet been implemented in these standards but will be as soon as positions in 10646 are allocated to them. 5. After giving due considerations to the principles in N1116 must the proposed characters be entirely in the BMP? Yes. They must be in the BMP just as all other letters used to write Irish Gaelic are in the BMP. Cf. @5.2-5.5 in N1058. 6. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? The SEIMHIUs should be kept next to each other. The characters could go in Table 32. We suggest the following allocations, keeping these charaters together in the same block with other characters used in Irish: 1E9B LATIN SMALL LETTER LONG S WITH DOT ABOVE 1E9C LATIN CAPITAL LETTER SEIMHIU 1E9D LATIN SMALL LETTER SEIMHIU 7. Can any of the proposed characters be considered a presentation form of an existing character? No. 8. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? LATIN LETTER SEIMHIU has two glyph representations: one looks like LATIN CAPITAL LETTER H or LATIN SMALL LETTER H and the other looks like COMBINING DOT ABOVE. But it is one character with two very different glyph presentations simply and solely depending on font. It is not an H and it is not a COMBINING DOT ABOVE. If YES, is a rationale for its inclusion provided? Yes, see @7.0-8.2 in N1058. 9. Does the proposal include use of composite sequences? No. Is a list of composite sequences and their corresponding glyph images (graphic symbols provided? No. IT MUST BE NOTED that although there ARE equivalencies in 10646 related to these characters, they are NOT composite sequences as defined in ISO 10646! The following set of combinations all yield results which, with respect to reading Irish Gaelic, are equivalent: LATIN CAPITAL LETTER T + LATIN CAPITAL LETTER SEIMHIU LATIN CAPITAL LETTER T + LATIN SMALL LETTER SEIMHIU LATIN SMALL LETTER T + LATIN SMALL LETTER SEIMHIU LATIN CAPITAL LETTER T + LATIN CAPITAL LETTER H LATIN CAPITAL LETTER T + LATIN SMALL LETTER H LATIN SMALL LETTER T + LATIN SMALL LETTER H LATIN CAPITAL LETTER T + COMBINING DOT ABOVE LATIN SMALL LETTER T + COMBINING DOT ABOVE All of these may be used to represent the "precomposed" characters: LATIN CAPITAL LETTER T WITH DOT ABOVE LATIN SMALL LETTER T WITH DOT ABOVE But there is a big difference in the way in which these characters are used in roundtrip conversion of Irish Gaelic text encoded in the Gaelic variant of the Latin script and the Roman variant of the Latin script. It is not possible to unify Gaelic script and Roman script without the introducation of a new distinctive character for lenition in Irish. Simple global substitution, for instance, cannot be relied upon to preserve non-lenited -th- or -ph- in the Gaelic script; see @2.5 and @7 in N1058. A note on the history of the proposal of these characters: In October 1994 in San Francisco Bruce Patterson pointed out to me that it was not possible under current ISO 10646 rules to define characters as both letters and combining characters at the same time. He suggested to me that we determine exactly what it is the SEIMHIU characters should be within the rules established for 10646. In April 1995 in Geneva WG2 members pointed out to me that the proposal to encode the SEIMHIU characters as combining characters was problematic, because these characters are defined as having case and combining characters are not permitted case distinctions. Therefore we have redefined the SEIMHIU characters as ordinary letter characters. It is the case that in some fonts, these characters may be represented by characters which look like combining characters (that is, they have zero width and are drawn above a preceding character), but it should be pointed out that this change of definition was made in order to accommodate present 10646 rules. We know what these characters are used for, we know why they are required for Irish Gaelic text processing, and we know how to implement them. We urge WG2 members to look carefully at the arguments given in N1058 regarding the problems of text processing which these two characters will solve for us. We propose that the following glyphs be used in the published standard: (Glyphs to be provided on hardcopy) (For 10646, a Gaelic-script H with a dot above will be used. For Unicode, a glyph looking like Roman-script H and h will be on the left, and a glyph looking like a combining dot will be used on the right.) For LATIN SMALL LETTER LONG S WITH DOT ABOVE, the form given here is the form which users would expect for this character, which as far as we know is only used in Irish Gaelic. For LATIN CAPITAL LETTER SEIMHIU and LATIN SMALL LETTER SEIMHIU, the 10646 glyph is a conceptual representation. In real implementations, only the Unicode glyphs would be actually seen by users. The special glyphs for publication in 10646 are given because glyph variants are not conventionally provided in 10646 and we feel that it is important that the glyph in 10646 be easily identifiable. Any speaker of Irish Gaelic seeing the name LETTER SEIMHIU and the glyphs @ and @ side by side will know immediately what these characters are.