Previous Top Next

Principles and Procedures for Allocation of New Characters and Scripts

I. Goals for Encoding New Characters into the Basic Multilingual Plane

A. The Basic Multilingual Plane should contain all contemporary characters in common use:

Generally, the Basic Multilingual Plane (BMP) should be devoted to high-utility characters that are widely implemented in some form of communication system. These include, for example, characters from hard copy typographic systems that are awaiting computerization, and characters recognizable and useful to a large community of customers. The "utility" of a character in a computer or communications standard can be measured (at least in theory) by such factors as: number of publications (for example, newspapers or books) using the character, the size of the community who can recognize the character, etc. Characters of more limited use should be considered for encoding in supplementary planes, for example, obscure archaic characters.

B. The characters encoded into the Basic Multilingual Plane will not cover all characters included in future standards:
It is not necessary, though it may often be desirable, that all characters encoded in future international, national, and industry information technology and communication standards be included in the BMP. The first edition used characters from pre-existing standards as a means of evaluating the established utility as well as ensuring compatibility with existing practice. Characters encoded in future standards may or may not have proven utility, and may or may not establish themselves in common use.

II. Character Categories

SC 2/WG 2 will use the following categories to aid in assessing the encoding of the proposed characters1.

A. Contemporary

There exists a contemporary community of native users who produce new printed matter with the proposed characters in newspapers, magazines, books, signs, etc. Examples include Burmese, Maldivian, Syriac, Yi, Xishuang Banna Dai.

B.1 Specialized (Small Collections of Characters)

The characters are part of a relatively small set. There exists a limited community of users (for example, liturgical) who produce new printed material with these proposed characters. Generally, these characters have few native users, or are not in day-to-day use for ordinary communication. Examples include Javanese, Pahlavi...

B. 2 Specialized (Large Collections of Characters)

The characters are part of a relatively large set. There exists a limited community of users (for example, liturgical) who produce new printed material with these proposed characters. Generally, these characters have few native users, or are not in day-to-day use for ordinary communication. Examples include personal name ideographs, Chu Nom, Archaic Han.

C. Major Extinct (Small Collections of Characters)

The characters are part of a relatively small set. There exists a relatively large body of literature using these characters, and a relatively large scholarly community studying that literature. Examples include Etruscan, Linear B.

D. Attested Extinct (Small Collections of Characters)

The characters are part of a relatively small set. There exists a relatively limited literature using these characters and a relatively small scholarly community studying that literature. Examples include Samaritan, Meroitic.

E. Minor Extinct

The characters are part of a relatively small set. The utility of publicly encoding these characters is open to question2. Examples are Khotanese, Lahnda.

F. Archaic Hieroglyphic or Ideographic

These characters are part of a large set (for example, 160 or more characters) of hieroglyphic or ideographic characters. A large character set is almost by definition obscure, since it is difficult to obtain information or agreement on the precise membership of the set. Examples include Lolo, Moso, Akkadian, Egyptian Hieroglyphics, Hittite (Luwian), Khitan, Mayan Hieroglyphics, Nuchen.

G. Obscure or Questionable Usage Symbols

The characters are part of a small or large collection that is not yet deciphered, or not completely understood, or not well-attested by substantial literature or the scholarly community. Or they are symbols that are not normally used in in-line text, that are merely drawings, that are used only in two-dimensional diagrams, or that may be composed (such as, a slash through a symbol to indicate forbidden). Examples include logos, pictures of cows, circuit components, weather chart symbols.

III. Procedure for Encoding New Characters and Scripts

The following defines a procedure with criteria for deciding how to encode new characters in ISO/IEC 10646. This procedure shall be used for new scripts only after thorough research into the repertoire and ordering of the characters within the script.

See submitter's responsibilities and the attached Proposal Summary Form in Annex A.

SC 2/WG 2 Evaluation Procedure

In assessing the suitability of a proposed character for encoding, SC 2/WG 2 shall evaluate the credibility of the submitter and then use the following procedure:

1. Do not encode.

a) If the proposed character is a (shape or other) variation of a character already encoded in ISO/IEC 10646 and therefore may be unified, or

b) If the proposed character is a presentation form (glyph), variant, or ligature, or

c) If the proposed character may be better represented as a sequence of ISO/IEC 10646 encoded characters.

2. Suggest use of the Private Use Area

a) If the proposed character has an extremely small or closed community of customers, or

b) If the proposed characters are part of a script that is very complex to implement and the script has not yet been encoded in ISO/IEC 10646 (the private use area may be used for test and evaluation).

3. Encode on a supplementary plane

a) If the proposed character is used infrequently, or

b) If it is part of a set of characters for which insufficient space is available in the Basic Multilingual Plane.

4. Encode on the Basic Multilingual Plane

a) If the proposed character does not fit into one of the previous criteria (1, 2, or 3), and

b) If the proposed character is part of a well-defined character collection not already encoded in ISO/IEC 10646, or

c) If the proposed character is part of a small number of characters to be added to a script already encoded in the Basic Multilingual Plane of ISO/IEC 10646 (for example, the characters can be encoded at unallocated code positions within the block or blocks allocated for that script).

Previous Top Next

1 Refer to SC 2/WG 2 document N 947 for a proposed initial categorization and allocation of characters. 2The minor extinct category of characters may be secondary candidates for encoding elsewhere on the BMP, or their limited scholarly communities may wish to encode them in the Private Use Area.