ISO

INTERNATIONAL ORGANIZATION FOR STANDARDIZATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

Universal Multiple-Octet Coded Character Set

(UCS)

ISO/IEC JTC 1/SC 2/WG 2 N 1116

1995-01-27

Title: Principles and Procedures for Allocation of New Characters and Scripts (Revised N 946)

Source: Ad hoc group on Principles and Procedures - Messrs.
V.S. Umamaheswaran, Sven Thygesen, Peter Edberg, Sten G. Lindberg

References: N 946, N 995 (section 9-a-i.3), N 1002, N 1061, N 1117, and N 1118.

Action: To be considered by SC 2/WG 2 and all potential submitters of proposals for new characters to the repertoire of ISO/IEC 10646

Distribution: ISO/IEC JTC 1/SC 2/WG 2, ISO/IEC JTC 1/SC 2 and Liaison Organizations

This document was originally prepared by Mark Davis, Edwin Hart and Sten G. Lindberg, as document N 946 (dated 11 October 1994), based on N 884 (authored by Rick McGowan and Joe Becker). It has been enhanced by an ad hoc group on principles and procedures set up at the San Francisco SC 2/WG 2 meeting no. 26, following Resolution M26.6 at that meeting. The following are the major changes made to N 946:

1. Items 5 and 6 have been added to the submitter's responsibilities, reflecting resolution M26.6 from the San Francisco SC 2/WG 2 meeting 26 and document N 1002 (which was accepted in the Washington meeting).

2. Annex A contains a 'Proposal Summary Form' that was designed during SC 2/WG 2 meeting 26 in San Francisco, which can be used as a template for submitting information accompanying the submissions for new characters to be added to the repertoire of the standard. The enhanced section on submitter's responsibilities has been moved to this annex.

3. Several minor editorial changes have also been made.


Principles and Procedures
for Allocation of New Characters and Scripts

I. Goals for Encoding New Characters into the Basic Multilingual Plane

A. The Basic Multilingual Plane should contain all contemporary characters in common use:

Generally, the Basic Multilingual Plane (BMP) should be devoted to high-utility characters that are widely implemented in some form of communication system. These include, for example, characters from hard copy typographic systems that are awaiting computerization, and characters recognizable and useful to a large community of customers. The "utility" of a character in a computer or communications standard can be measured (at least in theory) by such factors as: number of publications (for example, newspapers or books) using the character, the size of the community who can recognize the character, etc. Characters of more limited use should be considered for encoding in supplementary planes, for example, obscure archaic characters.

B. The characters encoded into the Basic Multilingual Plane will not cover all characters included in future standards:

It is not necessary, though it may often be desirable, that all characters encoded in future international, national, and industry information technology and communication standards be included in the BMP. The first edition used characters from pre-existing standards as a means of evaluating the established utility as well as ensuring compatibility with existing practice. Characters encoded in future standards may or may not have proven utility, and may or may not establish themselves in common use.

II. Character Categories

SC 2/WG 2 will use the following categories to aid in assessing the encoding of the proposed characters[1.

]

A. Contemporary

There exists a contemporary community of native users who produce new printed matter with the proposed characters in newspapers, magazines, books, signs, etc. Examples include Burmese, Maldivian, Syriac, Yi, Xishuang Banna Dai.

B.1 Specialized (Small Collections of Characters)

The characters are part of a relatively small set. There exists a limited community of users (for example, liturgical) who produce new printed material with these proposed characters. Generally, these characters have few native users, or are not in day-to-day use for ordinary communication. Examples include Javanese, Pahlavi...

B. 2 Specialized (Large Collections of Characters)

The characters are part of a relatively large set. There exists a limited community of users (for example, liturgical) who produce new printed material with these proposed characters. Generally, these characters have few native users, or are not in day-to-day use for ordinary communication. Examples include personal name ideographs, Chu Nom, Archaic Han.

C. Major Extinct (Small Collections of Characters)

The characters are part of a relatively small set. There exists a relatively large body of literature using these characters, and a relatively large scholarly community studying that literature. Examples include Etruscan, Linear B.

D. Attested Extinct (Small Collections of Characters)

The characters are part of a relatively small set. There exists a relatively limited literature using these characters and a relatively small scholarly community studying that literature. Examples include Samaritan, Meroitic.

E. Minor Extinct

The characters are part of a relatively small set. The utility of publicly encoding these characters in open to question[2. Examples are Khotanese, Lahnda.

]

F. Archaic Hieroglyphic or Ideographic

These characters are part of a large set (for example, 160 or more characters) of hieroglyphic or ideographic characters. A large character set is almost by definition obscure, since it is difficult to obtain information or agreement on the precise membership of the set. Examples include Lolo, Moso, Akkadian, Egyptian Hieroglyphics, Hittite (Luwian), Khitan, Mayan Hieroglyphics, Nuchen.

G. Obscure or Questionable Usage Symbols

The characters are part of a small or large collection that is not yet deciphered, or not completely understood, or not well-attested by substantial literature or the scholarly community. Or they are symbols that are not normally used in in-line text, that are merely drawings, that are used only in two-dimensional diagrams, or that may be composed (such as, a slash through a symbol to indicate forbidden). Examples include logos, pictures of cows, circuit components, weather chart symbols.

III. Procedure for Encoding New Characters and Scripts

The following defines a procedure with criteria for deciding how to encode new characters in ISO/IEC 10646. This procedure shall be used for new scripts only after thorough research into the repertoire and ordering of the characters within the script.

See submitter's responsibilities and the attached Proposal Summary Form in Annex A.

SC 2/WG 2 Evaluation Procedure

In assessing the suitability of a proposed character for encoding, SC 2/WG 2 shall use the following procedure:

1. Do not encode.

a) If the proposed character is a (shape or other) variation of a character already encoded in ISO/IEC 10646 and therefore may be unified, or

b) If the proposed character is a presentation form (glyph), variant, or ligature, or

c) If the proposed character may be better represented as a sequence of ISO/IEC 10646 encoded characters.

2. Suggest use of the Private Use Area

a) If the proposed character has an extremely small or closed community of customers, or

b) If the proposed characters are part of a script that is very complex to implement and the script has not yet been encoded in ISO/IEC 10646 (the private use area may be used for test and evaluation).

3. Encode on a supplementary plane

a) If the proposed character is used infrequently, or

b) If it is part of a set of characters for which insufficient space is available in the Basic Multilingual Plane.

4. Encode on the Basic Multilingual Plane

a) If the proposed character does not fit into one of the previous criteria (1, 2, or 3), and

b) If the proposed character is part of a well-defined character collection not already encoded in ISO/IEC 10646, or

c) If the proposed character is part of a small number of characters to be added to a script already encoded in the Basic Multilingual Plane of ISO/IEC 10646 (for example, the characters can be encoded at unallocated code positions within the block or blocks allocated for that script).


Annex A
INFORMATION ACCOMPANYING SUBMISSIONS

The process of deciding which characters should be included in the repertoire of ISO/IEC 10646 by SC 2/WG 2 depends on the availability of accurate and most comprehensive information about any proposed additions. SC 2/WG 2, at its San Francisco meeting 26, designed a form (template) that will assist the submitters in gathering and providing the relevant information, and will assist SC 2/WG 2 in making more informed decisions. This form is included in the following pages of this annex.

Each new submission must be accompanied by a duly completed proposal summary form to assist SC 2/WG 2 to better evaluate the requirements and towards a speedier acceptance of the submission. Submitters are also requested to ensure that a proposed character does not already exist in ISO/IEC 10646

If a submission has already been made prior to the existence of the proposal summary form, the submitter(s) is requested to re-evaluate the submission for completeness using the form as a template, and either provide reference(s) to existing information or provide additional information.

Submitter's Responsibilities

The national body or liaison organization (or any other organization or an individual) proposing a new character shall provide:

1. Proposed category for the character, character name, and description of usage.

2. Justification for the category and name

3. A representative glyph image on paper:
if this glyph image is similar to a glyph image of a previously encoded ISO/IEC 10646 character, then additional justification for encoding the new character shall be provided.

4. Mappings to accepted sources, for example, other standards, dictionaries, accessible published materials

5. Computerized font:
prior to the preparation of the final text a suitable computerized font shall be provided to the project editor. The minimum design resolution for the font is 96 by 96 dots matrix, for presentation at or near 22 points in print size.

6. Equivalent glyph images:
if the submission intends using composite sequences of proposed or existing combining and non-combining characters, a list consisting of each composite sequence and its corresponding glyph image shall be provided to better understand the intended use.

ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM
TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646
[3]

Please fill Sections A, B and C below. Section D will be filled by SC 2/WG 2.

A. Administrative

1. Requester's name:

2. Requester type (Member body/Liaison/Individual contribution):

3. Submission date:

4. Requester's reference (if applicable):

5. (Choose one of the following:)
This is a complete proposal: ; or,
More information will be provided later:

B. Technical - General

1. (Choose one of the following:)

a. This proposal is for a new script (set of characters):
Proposed name of script:

b. The proposal is for addition of character(s) to an existing block?:
Name of the existing block:

2. Number of characters in proposal:

3. Proposed category per SC 2/WG 2 N1116:

4. Proposed Level of Implementation (1, 2, 3 or ?):
Is a rationale provided for the choice?
If Yes, reference:

5. Is a repertoire including character names provided?:

a. If YES, are the names in accordance with the 'character naming guidelines' in
Annex K of ISO/IEC 10646-1?
b. Are the character shapes legible?

6. Who will provide the appropriate computerized font for publishing the standard?

If available now, identify source(s) for the font:

7. References:
a. Are references (to other character sets, dictionaries, descriptive texts etc.)
provided:

b. Are published examples (such as samples from newspapers, magazines, or
other sources) of use of proposed characters attached?

C. Technical - Justification

1. Information on the user community for the proposed characters (for example: size,
demographics, information technology use, or publishing use) is included.:
Reference:

2. The context of use for the proposed characters (type of use; common or rare) is
included.
Reference:

3. Are the proposed characters in current use by the user community?
If YES, where? Reference:

4. After giving due considerations to the principles in N 1116 must the proposed
characters be entirely in the BMP?
If YES, is a rationale provided?
If YES, reference:

5. Should the proposed characters be kept together in a contiguous range (rather than
being scattered)?

6. Can any of the proposed characters be considered a presentation form of an existing
character or character sequence?
If YES, is a rationale for its inclusion provided?
If YES, reference:

7. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character?
If YES, is a rationale for its inclusion provided?
If YES, reference:

8. Does the proposal include use of composite sequences?
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images (graphic
symbols) provided ?
If YES, reference:

D. SC 2/WG 2 Administrative (To be completed by SC 2/WG 2)

1. Relevant SC 2/WG 2 document numbers:

2. Status (list of meeting number and corresponding action or disposition):


===================== END OF PROPOSAL SUMMARY INFORMATION =========================

ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM
TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646
[4]

An Example: Material from form filled as attachment to N 1093 (on Braille) during San Francisco meeting is used as input.

Please fill Sections A, B and C below. Section D will be filled by SC 2/WG 2.

A. Administrative

1. Requester's name: Kohji Shibano

2. Requester type (Member body/Liaison/Individual contribution):
Individual Contribution (may become member body contribution)

3. Submission date: 1994-10-10

4. Requester's reference (if applicable): JTC1/SC2/WG2 N 1093

5. (Choose one of the following:)
This is a complete proposal: ; or,
More information will be provided later: XXXX

B. Technical - General

1. (Choose one of the following:)

a. This proposal is for a new script (set of characters): YES
Proposed name of script: Braille

b. The proposal is for addition of character(s) to an existing block?:
Name of the existing block:

2. Number of characters in proposal: 448 characters (512 code points)

3. Proposed category per SC 2/WG 2 N1116: A

4. Proposed Level of Implementation (1, 2, 3 or ?): 1
Is a rationale provided for the choice?
If Yes, reference:

5. Is a repertoire including character names provided?: YES

a. If YES, are the names in accordance with the 'character naming guidelines' in
Annex K of ISO/IEC 10646-1? NO (will provide)
b. Are the character shapes legible? YES

6. Who will provide the appropriate computerized font for publishing the standard?
Japan
If available now, identify source(s) for the font:
IBM Japan, NEC etc

7. References:
a. Are references (to other character sets, dictionaries, descriptive texts etc.)
provided: ISO TC 173
b. Are published examples (such as samples from newspapers, magazines, or
other sources) of use of proposed characters attached? NO (will provide)e

C. Technical - Justification

1. Information on the user community for the proposed characters (for example: size,
demographics, information technology use, or publishing use) is included.: NO
Reference: PEOPLE WITH IMPAIRED VISION (info will be provided

2. The context of use for the proposed characters (type of use; common or rare) is
included. Common use; including on-line database services for Braille-
translated text.
Reference:

3. Are the proposed characters in current use by the user community? YES
If YES, where? Reference: Worldwide

4. After giving due considerations to the principles in N 1116 must the proposed
characters be entirely in the BMP? YES
If YES, is a rationale provided?
If YES, reference:

5. Should the proposed characters be kept together in a contiguous range (rather than
being scattered)? YES

6. Can any of the proposed characters be considered a presentation form of an existing
character or character sequence? NO
If YES, is a rationale for its inclusion provided?
If YES, reference:

7. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? NO
If YES, is a rationale for its inclusion provided?
If YES, reference:

8. Does the proposal include use of composite sequences? NO
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images (graphic
symbols) provided ?
If YES, reference:

D. SC 2/WG 2 Administrative (To be completed by SC 2/WG 2)

1. Relevant SC 2/WG 2 document numbers:

2. Status (list of meeting number and corresponding action or disposition):


ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM
TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646
[5]

An Example: Material from form filled as attachment to N 1094 (on CJK Symbols) during San Francisco meeting is used as input.

Please fill Sections A, B and C below. Section D will be filled by SC 2/WG 2.

A. Administrative

1. Requester's name: Japan

2. Requester type (Member body/Liaison/Individual contribution):
Member Body

3. Submission date: 1994-10-13

4. Requester's reference (if applicable): J2-94-XY

5. (Choose one of the following:)
This is a complete proposal: ; or,
More information will be provided later: XXXX

B. Technical - General

1. (Choose one of the following:)

a. This proposal is for a new script (set of characters):
Proposed name of script:

b. The proposal is for addition of character(s) to an existing block?: YES
Name of the existing block: CJK SYMBOLS AND PUNCTUATION

2. Number of characters in proposal: 2363

3. Proposed category per SC 2/WG 2 N1116: A

4. Proposed Level of Implementation (1, 2, 3 or ?): 1
Is a rationale provided for the choice?
If Yes, reference:

5. Is a repertoire including character names provided?: NO (To be provided)

a. If YES, are the names in accordance with the 'character naming guidelines' in
Annex K of ISO/IEC 10646-1?
b. Are the character shapes legible?

6. Who will provide the appropriate computerized font for publishing the standard?
Japan
If available now, identify source(s) for the font:

7. References:
a. Are references (to other character sets, dictionaries, descriptive texts etc.)
provided: YES; See text of SC2 WG2 N 1094

b. Are published examples (such as samples from newspapers, magazines, or
other sources) of use of proposed characters attached?

C. Technical - Justification

1. Information on the user community for the proposed characters (for example: size,
demographics, information technology use, or publishing use) is included.: NO
Reference: News Papers, Libraries; 5,000,000 x 10/ per ????)

2. The context of use for the proposed characters (type of use; common or rare) is
included. Common use
Reference:

3. Are the proposed characters in current use by the user community? YES
If YES, where? Reference: News Paper Readers

4. After giving due considerations to the principles in N 1116 must the proposed
characters be entirely in the BMP? NO - partially in BMP
If YES, is a rationale provided?
If YES, reference:

5. Should the proposed characters be kept together in a contiguous range (rather than
being scattered)? NO

6. Can any of the proposed characters be considered a presentation form of an existing
character or character sequence? NO
If YES, is a rationale for its inclusion provided?
If YES, reference:

7. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? NO
If YES, is a rationale for its inclusion provided?
If YES, reference:

8. Does the proposal include use of composite sequences? NO
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images (graphic
symbols) provided ?
If YES, reference:

D. SC 2/WG 2 Administrative (To be completed by SC 2/WG 2)

1. Relevant SC 2/WG 2 document numbers:

2. Status (list of meeting number and corresponding action or disposition):


===================== END OF PROPOSAL SUMMARY INFORMATION =========================