INTERNATIONAL ORGANIZATION FOR STANDARDIZATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

 

Universal Multiple-Octet Coded Character Set

(UCS)

Title: Principles and Procedures for Allocation of New Characters and Scripts and handling of Defect Reports on Character Names (Revised N 1402)

Source: Ad hoc group on Principles and Procedures - Messrs.
V.S. Umamaheswaran, Sven Thygesen

References: N 946, N 995 (section 9-a-i.3), N 1002, N 1061, N 1117, N 1118, N 1137 and N 1203 (section 6.1, 6.2 and 10.1.2), N1218, N1352, N1402.

Action: To be considered by SC 2/WG 2 and all potential submitters of proposals for new characters to the repertoire of ISO/IEC 10646

Distribution: ISO/IEC JTC 1/SC 2/WG 2, ISO/IEC JTC 1/SC 2 and Liaison Organizations

This document was originally prepared by Mark Davis, Edwin Hart and Sten G. Lindberg, as document N 946 (dated 11 October 1994), based on N 884 (authored by Rick McGowan and Joe Becker). It has been enhanced by an ad hoc group on principles and procedures set up at the San Francisco SC 2/WG 2 meeting no. 26, The result was presented as SC2/WG 2 N1116. At the Geneva SC 2/WG 2 meeting no 27, where some enhancements were proposed. The result was presented as SC2/WG 2 N1202. At the Helsinki SC 2/WG 2 meeting no 28, some enhancements were proposed and adopted. The result was presented as SC2/WG 2 N1252. The document was accepted, following Resolution M28.6 at that meeting. At the meeting no 31 a new Annex C: " Description of the UCS work flow and stages in progression from initial proposal to final publication" was added. Furthermore a new question (C 10) has been included in the proposal summary form. At the meeting no 32 a new Annex D:"BMP and Supplementary Planes Allocation Roadmap". The annex D is the inclusion of the US contribution N1499 only with minor editorial changes. Minor editorial changes have been made to align the different standing documents.

 

 

 

 

Principles and Procedures

for Allocation of New Characters and Scripts

 

I. Goals for Encoding New Characters into the Basic Multilingual Plane

 

A. The Basic Multilingual Plane should contain all contemporary characters in common use:

Generally, the Basic Multilingual Plane (BMP) should be devoted to high-utility characters that are widely implemented in some form of communication system. These include, for example, characters from hard copy typographic systems that are awaiting computerization, and characters recognizable and useful to a large community of customers. The "utility" of a character in a computer or communications standard can be measured (at least in theory) by such factors as: number of publications (for example, newspapers or books) using the character, the size of the community who can recognize the character, etc. Characters of more limited use should be considered for encoding in supplementary planes, for example, obscure archaic characters.

B. The characters encoded into the Basic Multilingual Plane will not cover all characters included in future standards:

It is not necessary, though it may often be desirable, that all characters encoded in future international, national, and industry information technology and communication standards be included in the BMP. The first edition used characters from pre-existing standards as a means of evaluating the established utility as well as ensuring compatibility with existing practice. Characters encoded in future standards may or may not have proven utility, and may or may not establish themselves in common use.

II. Character Categories

 

SC 2/WG 2 will use the following categories to aid in assessing the encoding of the proposed characters.

 

A. Contemporary

There exists a contemporary community of native users who produce new printed matter with the proposed characters in newspapers, magazines, books, signs, etc. Examples include Burmese, Maldivian, Syriac, Yi, Xishuang Banna Dai.

 

B.1 Specialized (Small Collections of Characters)

The characters are part of a relatively small set. There exists a limited community of users (for example, liturgical) who produce new printed material with these proposed characters. Generally, these characters have few native users, or are not in day-to-day use for ordinary communication. Examples include Javanese, Pahlavi...

 

B. 2 Specialized (Large Collections of Characters)

The characters are part of a relatively large set. There exists a limited community of users (for example, liturgical) who produce new printed material with these proposed characters. Generally, these characters have few native users, or are not in day-to-day use for ordinary communication. Examples include personal name ideographs, Chu Nom, Archaic Han.

 

 

C. Major Extinct (Small Collections of Characters)

The characters are part of a relatively small set. There exists a relatively large body of literature using these characters, and a relatively large scholarly community studying that literature. Examples include Etruscan, Linear B.

 

D. Attested Extinct (Small Collections of Characters)

The characters are part of a relatively small set. There exists a relatively limited literature using these characters and a relatively small scholarly community studying that literature. Examples include Samaritan, Meroitic.

 

E. Minor Extinct

The characters are part of a relatively small set. The utility of publicly encoding these characters is open to question. Examples are Khotanese, Lahnda.

 

F. Archaic Hieroglyphic or Ideographic

These characters are part of a large set (for example, 160 or more characters) of hieroglyphic or ideographic characters. A large character set is almost by definition obscure, since it is difficult to obtain information or agreement on the precise membership of the set. Examples include Lolo, Moso, Akkadian, Egyptian Hieroglyphics, Hittite (Luwian), Khitan, Mayan Hieroglyphics, Nuchen.

 

G. Obscure or Questionable Usage Symbols

The characters are part of a small or large collection that is not yet deciphered, or not completely understood, or not well-attested by substantial literature or the scholarly community. Or they are symbols that are not normally used in in-line text, that are merely drawings, that are used only in two-dimensional diagrams, or that may be composed (such as, a slash through a symbol to indicate forbidden). Examples include logos, pictures of cows, circuit components, weather chart symbols.

 

III. Procedure for Encoding New Characters and Scripts

 

The following defines a procedure with criteria for deciding how to encode new characters in ISO/IEC 10646. This procedure shall be used for new scripts only after thorough research into the repertoire and ordering of the characters within the script.

 

See submitter's responsibilities and the attached Proposal Summary Form in Annex A.

 

SC 2/WG 2 Evaluation Procedure

 

In assessing the suitability of a proposed character for encoding, SC 2/WG 2 shall evaluate the credibility of the submitter and then use the following procedure:

 

1. Do not encode.

a) If the proposed character is a (shape or other) variation of a character already encoded in ISO/IEC 10646 and therefore may be unified, or

b) If the proposed character is a presentation form (glyph), variant, or ligature, or

c) If the proposed character may be better represented as a sequence of ISO/IEC 10646 encoded characters.

 

2. Suggest use of the Private Use Area

a) If the proposed character has an extremely small or closed community of customers, or

b) If the proposed characters are part of a script that is very complex to implement and the script has not yet been encoded in ISO/IEC 10646 (the private use area may be used for test and evaluation).

 

3. Encode on a supplementary plane

a) If the proposed character is used infrequently, or

b) If it is part of a set of characters for which insufficient space is available in the Basic Multilingual Plane.

 

4. Encode on the Basic Multilingual Plane

a) If the proposed character does not fit into one of the previous criteria (1, 2, or 3), and

b) If the proposed character is part of a well-defined character collection not already encoded in ISO/IEC 10646, or

c) If the proposed character is part of a small number of characters to be added to a script already encoded in the Basic Multilingual Plane of ISO/IEC 10646 (for example, the characters can be encoded at unallocated code positions within the block or blocks allocated for that script).

 

 

Principles and Procedures

for Handling Defect Reports on Character Names

In principle, the Character Names in the standard are not to be changed. However, there may be situations where changes, deletions or annotations to names of characters may be warranted. Requests for changing of Character Names may be issued as defect report. The principles of dealing with such defect reports by SC 2/WG2 are described in Annex B.

 

 

 

 

Annex A

INFORMATION ACCOMPANYING SUBMISSIONS

 

The process of deciding which characters should be included in the repertoire of ISO/IEC 10646 by SC 2/WG 2 depends on the availability of accurate and most comprehensive information about any proposed additions. SC 2/WG 2, at its San Francisco meeting 26, designed a form (template) that will assist the submitters in gathering and providing the relevant information, and will assist SC 2/WG 2 in making more informed decisions. This form is included in the following pages of this annex.

 

Each new submission must be accompanied by a duly completed proposal summary form to assist SC 2/WG 2 to better evaluate the requirements and towards a speedier acceptance of the submission. Submitters are also requested to ensure that a proposed character does not already exist in ISO/IEC 10646.

 

If a submission has already been made prior to the existence of the proposal summary form, the submitter(s) is requested to re-evaluate the submission for completeness using the form as a template, and either provide reference(s) to existing information or provide additional information.

 

Submitter's Responsibilities

 

The national body or liaison organization (or any other organization or an individual) proposing new character(s) or a new script shall provide:

 

1. Proposed category for the script or character(s), character name(s), and description of usage.

2. Justification for the category and name(s).

3. A representative glyph(s) image on paper:
if this glyph image is similar to a glyph image of a previously encoded ISO/IEC 10646 character, then additional justification for encoding the new character shall be provided.

4. Mappings to accepted sources, for example, other standards, dictionaries, accessible published materials

5. Computerized/camera ready font:
prior to the preparation of the final text of the next version of the standard a suitable computerized font (camera ready font) will be needed. Camera ready copy is mandatory for final text of any pDAMs before the next revision. Ordered preference of the fonts: True Type, PostScript or 96x96 bit-mapped format. The minimum design resolution for the font is 96 by 96 dots matrix, for presentation at or near 22 points in print size.

6. List of all the parties consulted.

7. Equivalent glyph images:
if the submission intends using composite sequences of proposed or existing combining and non-combining characters, a list consisting of each composite sequence and its corresponding glyph image shall be provided to better understand the intended use.

 

 

ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646

Please fill Sections A, B and C below. Section D will be filled by SC 2/WG 2.

A. Administrative

1. Title:

 

2. Requester's name:

 

3. Requester type (Member body/Liaison/Individual contribution):

4. Submission date:

 

5. Requester's reference (if applicable):

 

6. (Choose one of the following:)
This is a complete proposal: ; or, More information will be provided later:

 

B. Technical - General

1. (Choose one of the following:)

a. This proposal is for a new script (set of characters):
Proposed name of script:

b. The proposal is for addition of character(s) to an existing block:
Name of the existing block:

2. Number of characters in proposal:

 

3. Proposed category (see section II, Character Categories):

 

4. Proposed Level of Implementation (see clause 15, ISO/IEC 10646-1):
Is a rationale provided for the choice?
If Yes, reference:

 

5. Is a repertoire including character names provided?:

a. If YES, are the names in accordance with the 'character naming guidelines'

in Annex K of ISO/IEC 10646-1?
b. Are the character shapes attached in a reviewable form?

6. Who will provide the appropriate computerized font (ordered preference: True Type,

PostScript or 96x96 bit-mapped format) for publishing the standard?

If available now, identify source(s) for the font (include address, e-mail,

ftp-site, etc.) and indicate the tools used:

7. References:
a. Are references (to other character sets, dictionaries, descriptive texts etc.)
provided?

b. Are published examples (such as samples from newspapers, magazines, or
other sources) of use of proposed characters attached?

 

8. Special encoding issues:

Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information): ______________________________________________________________________________

 

C. Technical - Justification

 

1. Has this proposal for addition of character(s) been submitted before?

If YES explain

 

2. Has contact been made to members of the user community (for example: National

Body, user groups of the script or characters, other experts, etc.)?

If YES, with whom?
If YES, available relevant documents?

3. Information on the user community for the proposed characters (for example: size,
demographics, information technology use, or publishing use) is included?
Reference:

 

4. The context of use for the proposed characters (type of use; common or rare)
Reference:

 

5. Are the proposed characters in current use by the user community?
If YES, where? Reference:

 

6. After giving due considerations to the principles in N 1352 must the proposed
characters be entirely in the BMP?
If YES, is a rationale provided?
If YES, reference:

7. Should the proposed characters be kept together in a contiguous range (rather than
being scattered)?

 

8. Can any of the proposed characters be considered a presentation form of an existing
character or character sequence?
If YES, is a rationale for its inclusion provided?
If YES, reference:

 

9. Can any of the proposed character(s) be considered to be similar (in appearance

or function) to an existing character?
If YES, is a rationale for its inclusion provided?
If YES, reference:

 

10. Does the proposal include use of combining characters and/or use of composite

sequences (see clause 4.11 and 4.13 in ISO/IEC 10646-1)?
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images

(graphic symbols) provided?
If YES, reference:

11. Does the proposal contain characters with any special properties such as control function or similar

semantics?
If YES, describe in detail (include attachment if necessary)

D. SC 2/WG 2 Administrative (To be completed by SC 2/WG 2)

1. Relevant SC 2/WG 2 document numbers:

 

2. Status (list of meeting number and corresponding action or disposition):

3. Additional contact to user communities, liaison organizations etc:

 

4. Assigned category and assigned priority/time frame:

 

ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646

An Example: Fictitious summary form filled in for illustration of the use of the form.

Please fill Sections A, B and C below. Section D will be filled by SC 2/WG 2.

A. Administrative

1. Title: Braille

 

2. Requester's name: Kohji Shibano, Japan

 

3. Requester type (Member body/Liaison/Individual contribution): Individual Contribution

4. Submission date: 1994-10-10

5. Requester's reference (if applicable): J2-94-xy

6. (Choose one of the following:)
This is a complete proposal: ; or, More information will be provided later:
Yes

B. Technical - General

1. (Choose one of the following:)

a. This proposal is for a new script (set of characters): Yes
Proposed name of script:
Braille

 

b. The proposal is for addition of character(s) to an existing block: No
Name of the existing block:

2. Number of characters in proposal: 448

 

3. Proposed category (see section II, Character Categories): A

 

4. Proposed Level of Implementation (see clause 15, ISO/IEC 10646-1): 1
Is a rationale provided for the choice?
No
If Yes, reference:

 

5. Is a repertoire including character names provided?: Yes

a. If YES, are the names in accordance with the 'character naming guidelines'

in Annex K of ISO/IEC 10646-1? No (will provide)
b. Are the character shapes attached in a reviewable form? Yes

6. Who will provide the appropriate computerized font (ordered preference: TrueType,

PostScript or 96x96 bit-mapped format) for publishing the standard?
Japan
If available now, identify source(s) for the font (include address, e-mail,

ftp-site, etc.) and indicate the tools used:
IBM Japan (ftp://ifi.jp/pub/font)

7. References:
a. Are references (to other character sets, dictionaries, descriptive texts etc.)
provided?
ISO TC 173

b. Are published examples (such as samples from newspapers, magazines, or
other sources) of use of proposed characters attached?
No (will provide)

8. Special encoding issues:

Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information): ______________________________________________________________________________

 

C. Technical - Justification

 

1. Has this proposal for addition of character(s) been submitted before? No

If YES explain

 

2. Has contact been made to members of the user community (for example: National

Body, user groups of the script or characters, other experts, etc.)? No

If YES, with whom?
If YES, available relevant documents?

3. Information on the user community for the proposed characters (for example: size,
demographics, information technology use, or publishing use) is included?
Reference:
People with impaired vision (info will be provided)

 

4. The context of use for the proposed characters (type of use; common or rare) Common
Reference:
on-line database services for Braille-translated text (e.g. www: braille.dknet.dk)

 

5. Are the proposed characters in current use by the user community? Yes
If YES, where? Reference:
Worldwide

 

6. After giving due considerations to the principles in N 1352 must the proposed
characters be entirely in the BMP?
Yes
If YES, is a rationale provided?
If YES, reference:

 

7. Should the proposed characters be kept together in a contiguous range (rather than
being scattered)?
Yes

 

8. Can any of the proposed characters be considered a presentation form of an existing
character or character sequence?
No
If YES, is a rationale for its inclusion provided?
If YES, reference:

9. Can any of the proposed character(s) be considered to be similar (in appearance

or function) to an existing character? No
If YES, is a rationale for its inclusion provided?
If YES, reference:

 

10. Does the proposal include use of combining characters and/or use of composite

sequences (see clause 4.11 and 4.13 in ISO/IEC 10646-1)? No
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images

(graphic symbols) provided?
If YES, reference:
11. Does the proposal contain characters with any special properties such as control function or similar

semantics? No
If YES, describe in detail (include attachment if necessary)

D. SC 2/WG 2 Administrative (To be completed by SC 2/WG 2)

1. Relevant SC 2/WG 2 document numbers:

 

2. Status (list of meeting number and corresponding action or disposition):

3. Additional contact to user communities, liaison organizations etc.:

 

4. Assigned category and assigned priority/time frame:

 

ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646

An Example: Fictitious summary form filled in for illustration of the use of the form.

Please fill Sections A, B and C below. Section D will be filled by SC 2/WG 2.

A. Administrative

1. Title: Addition of two Latin characters

 

2. Requester's name: Danish Standards Association

 

3. Requester type (Member body/Liaison/Individual contribution): NB

4. Submission date: 1995-03-10

 

5. Requester's reference (if applicable):

 

6. (Choose one of the following:)
This is a complete proposal:
Yes ; or, More information will be provided later:

B. Technical - General

1. (Choose one of the following:)

a. This proposal is for a new script (set of characters): No
Proposed name of script:

b. The proposal is for addition of character(s) to an existing block: Yes
Name of the existing block:
Table 4 - Row 01: Latin Extended-B

 

2. Number of characters in proposal: 2

3. Proposed category (see section II, Character Categories): A

 

4. Proposed Level of Implementation (see clause 15, ISO/IEC 10646-1): 1
Is a rationale provided for the choice?
If Yes, reference:

 

5. Is a repertoire including character names provided?: Yes

a. If YES, are the names in accordance with the 'character naming guidelines'

in Annex K of ISO/IEC 10646-1? Yes
b. Are the character shapes attached in a reviewable form? Yes

6. Who will provide the appropriate computerized font (ordered preference: True Type,

PostScript or 96x96 bit-mapped format) for publishing the standard?
Michael Everson, Everson Gunn Teoranta
If available now, identify source(s) for the font (include address, e-mail,

ftp-site, etc.) and indicate the tools used:
Michael Everson, Everson Gunn Teoranta

7. References:
a. Are references (to other character sets, dictionaries, descriptive texts etc.)
provided?
Yes

b. Are published examples (such as samples from newspapers, magazines, or
other sources) of use of proposed characters attached?
Yes
8. Special encoding issues:

Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information):

Specifications enclosed

 

C. Technical - Justification

1. Has this proposal for addition of character(s) been submitted before? No

If YES explain

 

1. Has contact been made to members of the user community (for example: National

Body, user groups of the script or characters, other experts, etc.)? Yes

If YES, with whom? Irish National Body, Oxford University
If YES, available relevant documents?
Enclosed

2. Information on the user community for the proposed characters (for example: size,
demographics, information technology use, or publishing use) is included?
Yes
Reference:
The Community of Gothic and Medieval English Literature

 

3. The context of use for the proposed characters (type of use; common or rare) Rare
Reference:

 

4. Are the proposed characters in current use by the user community? Yes
If YES, where? Reference:
Scholar Communities

 

5. After giving due considerations to the principles in N 1352 must the proposed
characters be entirely in the BMP?
Yes
If YES, is a rationale provided?
Yes
If YES, reference:
Enclosed

 

6. Should the proposed characters be kept together in a contiguous range (rather than
being scattered)?
No

 

7. Can any of the proposed characters be considered a presentation form of an existing
character or character sequence?
No
If YES, is a rationale for its inclusion provided?
If YES, reference:

 

8. Can any of the proposed character(s) be considered to be similar (in appearance

or function) to an existing character? No
If YES, is a rationale for its inclusion provided?
If YES, reference:

 

9. Does the proposal include use of combining characters and/or use of composite

sequences (see clause 4.11 and 4.13 in ISO/IEC 10646-1)? No
If YES, is a rationale for such use provided?
If YES, reference:
Is a list of composite sequences and their corresponding glyph images

(graphic symbols) provided?
If YES, reference:
10. Does the proposal contain characters with any special properties such as control function or similar

semantics? No
If YES, describe in detail (include attachment if necessary)

D. SC 2/WG 2 Administrative (To be completed by SC 2/WG 2)

1. Relevant SC 2/WG 2 document numbers:

 

2. Status (list of meeting number and corresponding action or disposition):

3. Additional contact to user communities, liaison organizations etc.:

4. Assigned category and assigned priority/time frame:

 

Annex B

Handling of Defect Reports on Character Names

 

Since the publication of ISO/IEC 10646-1 in May 1993, several defect reports requesting changes to character names have been received by WG 2. In principle, the names in the standard are not to be changed. However, there may be situations where changes, deletions or annotations to names of characters may be warranted. The following paragraphs describe the principles of dealing with such defect reports:

 

a. Explanatory information in Annex P , "Additional Information on Characters"

If WG 2 decides that the request is justified, WG 2 will first consider accommodating the request by adding explanatory text to Annex P, "Additional Information on Characters", of the Standard ISO/IEC 10646-1.

 

b. Non-normative parenthetic annotation of the name

If WG 2 considers that the request falls within the guidelines of Rule 12 in Annex K - Character naming guidelines in the standard, then an appropriate annotation will be added to the character name.

 

In instances where a name change causes a potential problem for compliance by implementations of existing standard, and if the concern expressed in the defect report may be handled with a simple explanatory note, a note may be added.

 

c. Deprecation

If WG 2 considers that the character identified in the defect report should not have been in the standard, for reasons such as duplication, or incorrect inclusion in a block, then that coded character will be marked with the annotation "(deprecated character)" after its name.

 

d. Technical Corrigendum

If WG 2 considers that the character identified in the defect report has indeed been incorrectly named, based on the evidence provided in the defect report, a Technical Corrigendum to correct the name will be prepared and forwarded to the SC 2 secretariat for further processing.

 

e. Reject

In all other situations, where WG 2 considers that the request is not sufficiently justified or a name change is not warranted, the defect report will not be entertained (will be rejected).

 

Some Guidelines for Submitters of Defect Reports:

 

As a supplement to the above information on dealing with defect reports, the submitters can assist the working group by following the guidelines given below:

 

a) report all defects associated with characters from the same block or set of characters as a single defect report (for example, use a single one for all defects from within a character block such as Malayalam), instead of one for each character.

b) avoid including defective characters from different character blocks or sets in the same report.

c) please check if the defect has already been processed by a national body or considered before by WG 2. Copies of the disposition of prior defect reports can be obtained from the SC 2 Secretariat.

d) if one or more new character(s) - with their own new name and glyph - is proposed to be added in conjunction with a defect report, please submit the addition requests separate from the defect report along with the Proposal Summary Form for them.

 

Annex C

Description of the UCS work flow and stages in progression from initial proposal to final publication

 

 

The UCS work flow

UCS work can simplified be illustrated as follows:

 

Communication to WG2 and communication inside WG2 related to populating the standard

 

Communication from WG2 to the world outside

 

Input

 

Process

Output

Output

 

From whom

What

Under meetings

After meetings

What

To whom

  • Convener
  • SC2
  • JTC1
  • ITTF

Agenda (e.g. N 1387)

Ballots

Resolutions (e.g. N 1354)

Minutes (e.g. N 1353):

- Action Items

Result of request:

  • Acceptance
  • Rejection
  • Requester
  •  

    • NBs
    • WG experts
    • IRG-group
    • Liaisons

    Input documents:

    • Requests (e.g. N1324)
    • Defect reports (e.g.
    • Working documents
    • Liaison statements
       
    • Editorial corrigenda.
    • Technical. corrigenda. (e.g. N 1393)
    • Amendments (e.g. N )
    • Standards (e.g. ISO/IEC 10646-1)
  • SC2
  • JTC1
  • ITTF
  • Secretary
  • Editor
  • Minutes:
  • Action Items
  • Standing documents
  •  

         
    • IRG

     

       

    How

    • Secretary
    • Editor

    Standing documents:

    • WG2 distribution list (e.g. N1351)
    • Document register (e.g. 1300)
    • Summary of WG2 work (e.g. N1302)
    • Cumulative list of repertoire additions (Buckets) (e.g. N 1385)
      • Alphabetic (Arabic, Cyrillic, Hebrew, Latin, etc.)
      • Symbols
      • Ideographs
    • Cumulative list of Corrigenda (editorial, technical) (e.g. N1384)
    • ISO/IEC 10646-1 Corrigendum (e.g. N1396)
    • List of character names and code positions allocated (e.g. N 1360 modified)
    • Principles and procedures (e.g. N 1352)
    • Overview of the basic Multilingual Plane (e.g. N1332)

     

    Presentation forms:

    • Paper documents
    • Webs (the WG2 web at DKUUG and the IRG web in HK)

    Table 1, WG 2 document flow

     

     

    The stages of work:

    Any new proposal for addition of new characters will pass a number of stages from initial proposal to finalized publication. The stages are:

     

     

    This terminology indicates the stage of maturity of the proposal and the WGs confidence in the proposal.

     

    In process within WG 2

     

    Further progression

     

     

    Stages ®

    Item

    ¯

    Initial proposal

    Provisional acceptance

    Final acceptance (allocation of bucket)

    Hold for ballot

    Progression/ Publication status

     

    SC2

    Ballot

    JTC 1 Ballot

    ITTF

    Publica-tion

    1

    2

    3

    4

    5

    6

    7

    1*

    Char. shapes

    1.1

    2.1

    2*

    Char. names

    1.2

    2.2

    3*

     

    Code position allocation

    1.3

    2.3

    4*

     

    Text to be included in the standard

    1.4

    2.4

    5*

    Font**

    1.5

    2.5

    6

    Other items from proposal summary form

    1.6

    2.6

    * Mandatory for entering "final acceptance" stage

    ** Camera ready copy is mandatory for stage 7. It is expected that the quality of the fonts will improve to camera ready quality as the proposal progress trough the various stages. For information on the format of the font see the "Proposal summary form".

     

    The stages 1-3 may contain provisionally allocated code positions. When a proposal enters stage 4 the code positions are final.

     

    The content of the Buckets are reviewed at every meeting to decide whether the content shall progress for balloting (stage 4).

     

    The progress of the proposals are recorded in the document Summary of WG2 work (the excel spread sheet).

     

    When a proposal reaches stage 4 its status is included in List of character names and code positions allocated.

     

    Examples:

     

    List of character names and code positions allocated:

    Code position

    Status

    Reference

    Character name

    ...

         

    20AB

    6

    N1092

    DONG SIGN

    ...

         

    012C

       

    LATIN CAPITAL LETTER I WITH BREVE

    ...

         

    00E6

    7

    N1128

    LATIN SMALL LETTER AE (ash)

    01FD

    7

    N1128

    LATIN SMALL LETTER AE WITH ACUTE (ash)

    01E3

    7

    N1128

    LATIN SMALL LETTER AE WITH MACRON (ash)

    ...

         

    1E9B

    6

    N1132

    LATIN SMALL LETTER LONG S WITH DOT ABOVE

    ...

         

    FFFC

    2

    N1365

    OBJECT REPLACEMENT CHARACTER

    ....

         

     

     

    Summary of WG2 work items:

     

    STATUS/STAGE:

     

    Annex D

    BMP and Supplementary Planes Allocation Roadmap

     

    Overview

    The intention of this annex D is to lay out a logical roadmap for further allocations of scripts in ISO/IEC 10646 (also in the Unicode Standard), within and beyond the BMP. This roadmap is a snapshot of a roadmap and intended as a general guideline, and does not attempt to make detailed allocations of characters. The roadmap consists of two parts.

     

    For Plane 1, a proposed list of all additional known scripts is provided here, with rough estimates of the sizes of the scripts. In contrast to the roadmap for the BMP, no particular locations for scripts are proposed as yet. By current estimates (see details below), all remaining General scripts and symbol sets should fit within this one plane.

    Plane 2 is envisioned as containing future Unified Ideographic character additions. The largest current Unified Ideographic character collections should fit within Planes 0 & 2, as long as duplicate character encoding is avoided. No substructure for Plane 2 is proposed here.

    The roadmap indicates that these three planes should suffice for all future encoding of characters having worldwide utility. However, note that 14 supplementary planes are available altogether for encoding (with an additional 2 planes reserved for private use). The planes described in this Roadmap, as well as all other planes accessible by UTF-16, are explicitly enumerated in Table 1.

    A list of references for WG2 documents and other sources relevant to the issue of allocation of scripts and other characters in the BMP and on supplementary planes can be found at the end of this annex. Status at a given time can be found in the standing doucuments like:

     

    Table 1: Suggested Allocations for Planes in ISO10646

    00000000..0000FFFF

    Plane 0/BMP

    Encoded in 10646-1

    00010000..0001FFFF

    Plane 1

    GSP

    00020000..0002FFFF

    Plane 2

    UISP

    00030000..0003FFFF

    Plane 3

    Reserved for Future Encoding

    00040000..0004FFFF

    Plane 4

    Reserved for Future Encoding

    00050000..0005FFFF

    Plane 5

    Reserved for Future Encoding

    00060000..0006FFFF

    Plane 6

    Reserved for Future Encoding

    00070000..0007FFFF

    Plane 7

    Reserved for Future Encoding

    00080000..0008FFFF

    Plane 8

    Reserved for Future Encoding

    00090000..0009FFFF

    Plane 9

    Reserved for Future Encoding

    000A0000..000AFFFF

    Plane 10

    Reserved for Future Encoding

    000B0000..000BFFFF

    Plane 11

    Reserved for Future Encoding

    000C0000..000CFFFF

    Plane 12

    Reserved for Future Encoding

    000D0000..000DFFFF

    Plane 13

    Reserved for Future Encoding

    000E0000..000EFFFF

    Plane 14

    Reserved for Future Encoding

    000F0000..000FFFFF

    Plane 15

    Reserved for Private Use

    00100000..0010FFFF

    Plane 16

    Reserved for Private Use

     

    1 plane (BMP) is accessible by UCS2.

    16 planes (1..16 inclusive) are accessible by UTF-16.

    2 planes (15, 16) are reserved completely for private use, accessible by UTF-16.

    12 planes (3..14 inclusive) are left reserved for future standardized encoding, accessible by UTF-16.

     

    Notes on the BMP (Plane 0)

    All accounting of unassigned space in this proposal is done in terms of "columns": 16-character chunks starting with a coded value divisible by 16, e.g. U+0700..U+070F, etc. These are visualizable as vertical columns in the chart formats printed in IS 10646-1 (also the Unicode Standard). Since no one is proposing to encode scripts in scattershot fashion or crossing column boundaries, it is easier and more accurate to track available columns rather than unassigned character positions.

    The roadmap is aimed at the examination of the remaining allocation space in the BMP; therefore it lists only blocks which contain free columns. Blocks for scripts which contain no free columns are omitted from the listing.

    Proposed additional scripts are placed within the open areas. The exact order at this stage is not significant. However, right-to-left script additions are placed adjacent to the currently encoded right-to-left scripts, Hebrew and Arabic.

    Because of the need to accommodate Yi, a script with 1165 characters proposed for encoding, this roadmap designates a new area: A000..ABFF = General Scripts Area II.

    The fate of CJK Unified Ideographs, Extension A is left indeterminate in this proposal. The roadmap shows 701 NO-BLOCK free columns still unallocated in BMP (Plane 0) in ISO/IEC 10646 (also in the Unicode Standard) (exclusive of the Compatibility and Specials Area). If the 412 columns of CJK Unified Ideographs, Extension A (6585 characters) are assigned in the presumed BMP target area:

    3400..4DFF 416 0 NO AREA

    that would leave 701 - 412 = 289 NO-BLOCK free columns in BMP (Plane 0) in ISO/IEC 10646 (also in the Unicode Standard).

    Thus, given the approximate placements of this draft, there remains considerable free space in BMP (Plane 0) in ISO/IEC 10646 (also in the Unicode Standard) to make adjustments in specific placements of one or another script before committing to actual encoding of the scripts.

     

     

    BMP Roadmap

    Key:

    "-" means: replace this ISO/IEC 10646 (Unicode 2.0) line with the "+" or "#" lines following it

    "+" means: add this line representing a proposed placement

    "#" means: add this line representing assignments already in pDAM stage

    the proposed script placements are given as "~ n cols" rather than as specific assignments, because the precise size and ordering of these blocks is not known and not significant at this time

     

    NO- In-

    BLOCK Block

    Free Free

    Range Cols Cols Area/Block

    ========= ==== ===== ==============================

     

    - 0000..1FFF 249 13 General Scripts Area

    + 0000..1FFF 41 13 General Scripts Area

    ---------- ---- ---- ------------------------------

    0220..024F 3 Latin Extended-B

    02F0..02FF 1 Spacing Modifier Letters

    0350..035F 1 Combining Diacritical Marks

    0500..052F 3 NO BLOCK

     

    - 0700..08FF 32 NO BLOCK -- N.B.: R-to-L scripts

    + ~ 3 cols Maldivian (= Dihevi)

    + ~ 3 cols Samaritan

    + ~ 6 cols Syriac (Jacobite, Estrangelo, Nestorian)

    + ~ 2 cols Phoenecian

    + ~ 2 cols Old Aramaic

    + ~ 3 cols Tifinagh (= Tamasheq)

    + ~ 3 cols Avestan (= Pahlavi)

    + 10 NO BLOCK

     

    0AF0..0AFF 1 Gujarati

    0C70..0C7F 1 Telugu

    0CF0..0CFF 1 Kannada

    0D70..0D7F 1 Malayalam

     

    - 0D80..0DFF 8 NO BLOCK

    + ~ 8 cols Sinhalese

     

    0E60..0E7F 2 Thai

    0EE0..0EFF 2 Lao

     

    - 0FC0..109F 14 NO BLOCK

    + ~ 4 cols Tibetan Extended

    + ~ 10 cols Mongolian (including Manchu)

     

    - 1200..1DFF 192 NO BLOCK

    # 24 cols Ethiopic [pDAM #10] -- 1200..137F

    + 2 NO BLOCK -- 1380..139F

    # 6 cols Cherokee [pDAM #12] -- 13A0..13FF

    # 40 cols Canadian [pDAM #11] -- 1400..167F

    + ~ 2 cols Ogham

    + ~ 6 cols Runic

    + ~ 8 cols Burmese

    + ~ 8 cols Khmer

    + ~ 8 cols Dai

    + ~ 5 cols Cham

    + ~ 5 cols Tai Lue (= Chiang Mai)

    + ~ 3 cols Tai Nuea (= Tai Mau)

    + ~ 5 cols Lepcha (= Rong)

    + ~ 6 cols Limbu (= Kirat)

    + ~ 6 cols Phags-Pa (= Passepa)

    + ~ 4 cols Siddham

    + ~ 6 cols Meitei (= Manipuri)

    + ~ 6 cols Javanese

    + ~ 2 cols Batak

    + ~ 2 cols Buginese (= Makassar)

    + ~ 2 cols Lisu

    + ~ 4 cols Karenni (= Kayah Li)

    + ~ 6 cols Glagolitic (= Glagolitsa)

    + 26 NO BLOCK

     

    - 2000..2FFF 132 28 Symbols Area

    + 2000..2FFF 116 24 Symbols Area

    ---------- ---- ---- ------------------------------

    2050..205F 1 General Punctuation

    2090..209F 1 Superscripts and Subscripts

    20B0..20CF 2 Currency Symbols

    20F0..20FF 1 Combining Marks for Symbols

    2140..214F 1 Letterlike Symbols

    - 21F0..21FF 1 Arrows

    - 2380..23FF 8 Miscellaneous Technical

    + 23A0..23FF 6 Miscellaneous Technical

    2430..243F 1 Control Pictures

    2450..245F 1 Optical Character Recognition

    24F0..24FF 1 Enclosed Alphanumerics

    - 25F0..25FF 1 Geometric Shapes

    2670..26FF 9 Miscellaneous Symbols

    - 27C0..2FFF 132 NO BLOCK

    + 27C0..27FF 4 NO BLOCK

    + ~ 16 cols Braille Pattern Symbols

    + 2900..2FFF 112 NO BLOCK

     

    - 3000..33FF 6 1 CJK Phonetics and Symbols Area

    + 3000..33FF 4 1 CJK Phonetics and Symbols Area

    ---------- ---- ---- ------------------------------

    - 31A0..31FF 6 NO BLOCK

    + ~ 2 cols Kuoyu (extension to Bopomofo)

    + 31C0..31FF 4 NO BLOCK

    3250..325F 1 Enclosed CJK Letters and Months

     

    3400..4DFF 416 0 NO AREA

    ---------- ---- ---- ------------------------------

    3400..4DFF 416 NO BLOCK

     

    4E00..9FFF 0 5 CJK Ideographs Area

    ---------- ---- ---- ------------------------------

    9FB0..9FFF 5 CJK Unified Ideographs

     

    - A000..ABFF 192 0 NO AREA

    + A000..ABFF 119 0 General Scripts Area II

    ---------- ---- ---- ------------------------------

    - A000..ABFF 192 NO BLOCK

    + ~ 73 cols Yi (Nuo-su, Lolo)

    + 119 NO BLOCK

     

    AC00..D7AF 0 0 Hangul Syllables Area

    ---------- ---- ---- ------------------------------

     

    D7B0..D7FF 5 0 NO AREA

    ---------- ---- ---- ------------------------------

    D7B0..D7FF 5 NO BLOCK

     

    D800..DFFF 0 0 Surrogates Area

    ---------- ---- ---- ------------------------------

     

    E000..F8FF 0 0 Private Use Area

    ---------- ---- ---- ------------------------------

     

    F900..FFFF 2 17 Compatibility and Specials Area

    ---------- ---- ---- ------------------------------

    FA30..FAFF 13 CJK Compatibility Ideographs

    FBC0..FBCF 1 Arabic Presentation Forms-A

    FD40..FD4F 1 Arabic Presentation Forms-A

    FDD0..FDEF 2 Arabic Presentation Forms-A

    FE00..FE1F 2 NO BLOCK

     

    ========= ==== ===== ==============================

    Totals, excluding Compatibility and

    Specials Area

    - 1000 47

    + 701 43

     

    NO- In-

    BLOCK Block

    Free Free

    Cols Cols

     

     

    Additions in the Pipeline

    The roadmap includes the following additions which are currently in the pipeline and are presumed acceptable to WG2:

    Realized BMP assignments [approx. 78 cols total]:

    40 Canadian Syllabics

    24 Ethiopic

    6 Cherokee

    6 Runic

    2 Ogham

    2* Keyboard Layout Symbols

    1* Graphical Symbols for Control Characters

    1* Electrotechnical Symbols

    0 Object Replacement Character

    0 APL Function Symbol Quad

    0 Macedonian Vowels with Grave

    0 Pinyin N with Grave

    0 Hebrew Yod with Hiriq

    0 Modifier Letter Middle Dot

     

    * Alotted to open positions in the Miscellaneous Technical Symbols Block, 237B..23FF, the Arrows Block, 2190..21FF, the Geometric Shapes Block, 25A0..25FF, and elsewhere.

    Other Proposals for the BMP

    The Roadmap also includes the following proposals which are not yet in the pipeline, but which are generally considered active and appropriate for encoding on the BMP.

    Probable BMP assignments [approx. 136 cols total]:

     

    73 ~ 55 Yi [two proposals differ in size]

    16 ~ 32 Braille Pattern Symbols [size not agreed upon]

    10 Mongolian

    8 [?] Burmese

    8 [?] Khmer

    8 [?] Sinhalese

    8 Dai

    5 Cham

     

    The Roadmap considers the following active proposal for addition of 6585 Han ideographic characters, but does not place it in the summary chart, since encoding of these characters in the BMP or in the UISP is still an open issue under debate within WG2.

    412 CJK Unified Ideographs, Extension A

     

    The Roadmap does not include the two following inactive symbol proposals:

    150 [approx.] Non-Ideographic Japanese Characters

    14 Greek Byzantine Musical Notation

     

    Plane 1: (First) General Scripts and Symbols Supplementary Plane (GSP)

    The following section represents all other significant scripts of the world (mostly extinct) for which there exists, in principle, if not in practice, enough information to eventually produce a detailed character encoding proposal.

    These scripts are culled from UTC working documents, taking into account the placement of scripts proposed for the BMP roadmap above.

    The scripts are organized by general type and by historical and geographic affinity. The group headings are only meant for convenience in reference in this roadmap; they should not be taken as designating particular script areas for the purposes of encoding.

    Additionally, to accomodate sets of symbols which may not fit within the 116 columns still open in the Symbols Area of the BMP, we suggest setting aside 4K cells (=256 columns) for encoding other symbols on Plane 1.

    Estimates of sizes of the scripts vary in accuracy. For some of these scripts an encoding proposal already exists, and the exact number of characters is known. For other small scripts a reasonably accurate guess can be made from the size of historically affiliated scripts. For the large scripts such as Cuneiform and various ideographic or hieroglyphic systems, only very rough estimates can be made until detailed proposals are brought forward.

    Based on these estimates, all of these scripts total to approximately 40,000 characters to encode, and fit within a single plane with plenty of room to spare.

    Name Chars # Cols

    Alphabetic

    European

    Albanian (Buthakukye) 31 2

    Albanian (Elbassan) 53 4

    Albanian (Veso Beis) 22 2

    Gothic 58 4

    Iberian 32E 2

     

    Misc. Mediterranean Classical Scripts

    Carian 32E 2

    Cretan Linear A 75 5

    Cretan Linear B (Mycenaean) 128 8

    Cypriote syllabary 55-58? 4

    Cypro-Minoan (Enkomi + Ugarit) 64E 4

    Etruscan (+ Oscan) {RL} 36 3

    Kök Turki runes (Orkhon script) 64E 4

    Old Hungarian runes ?

    Lycian {RL} 29 2

    Lydian {RL} 26 2

     

    Semitic & Middle Eastern

    Cuneiform, Old Persian (Achaemenid) 49 3

    Cuneiform, Ugaritic 31 2

    Meroitic {RL} 24 2

    Parthian 32E 2

    South Arabian {RL} 29 2

     

    Arabic-like & North African

    Ethiopic Extended 120E 8

    Maghreb 96E 6

    Mandaean (Mandaic) [see Syriac] {RL} 24? 2

    Manichaean 64E 4

    Nabataean [See Aramaic] 24? 2

    Numidian {TB or RL} 25 2

    Palmyrene {RL} [See Aramaic] 24? 2

     

    Central Asian

    Sogdian (Uzbekistan) 48E 3

    Uighur 96E 6

     

    Indic & Southeast Asian

    Ahom 41 (~48) 3

    Balinese (~Javanese?) 96E 6

    Balti {RL} 30 2

    Box-headed script 96E 6

    Brahmi (Asoka) 96E 6

    Chakma 96E 6

    Chola 96E 6

    Hmong <96E 6

    Kaithi (orig. Bihari) 96E 6

    Khamti (~ Kham)) 35 3

    Kharoshthi 96E 6

    Khotanese 96E 6

    Lahnda (orig. Punjabi) 96E 6

    Modi 96E 6

    Pyu (Tircul) <64E 4

    Satavahana 96E 6

    Tankri 96E 6

     

    Indonesian & Micronesian

    Mangyan(Buhid) <64E 4

    Rejang (Sumatra) <64E 6

    Tagalog 19 2

    Woleai (Caroline) 100E 7

     

    Americas

    Chinook shorthand 48E 3

     

    Hieroglyphic, Ideographic & Misc. Syllabaries

    Middle-Eastern Classical Precursors

    Proto-Byblic 100E 7

    Proto-Elamic <500E

     

    Cuneiform, Ideographic Types (Akkadian)

    Cuneiform, Assyrian <600E

    Cuneiform, Babylonian <500E

     

    Hieroglyphs, Classical

    Cretan (Minoan) ideograms ?

    Egyptian (Hieroglyphic, Hieratic, Demotic) <9000E

    Hittite hieroglyphics >110E 7

    Hittite hieroglyphic syllabary (Luwian) 48 3

    Sumerian pictograms <1000E

     

    Hieroglyphs, pictograms, and syllabaries, other

    Aymara pictograms <1000E

    Aztec pictograms <1000E

    Bamum (Cameroon) <500E

    Kauder script (Micmac) <500E

    Mayan hieroglyphics <1000E

    Rongo-rongo (Easter Island script) 253-396?

    Indus Valley script <500E

    Paucartambo script <500E

     

    Han Ideographic Derived

    Khitan (Chi-Tan, Liao) 5000E

    Naxi (Nahsi, Nasi, Moso) ideograms 2000E

    Naxi (Moso) phonetic script 500E

    Nuchen characters (Yu-Chen) 5000E

    Tangut (Xixia) ideograms 5819

     

    Newly Invented Scripts (in roughly chronological order)

    Deseret Alphabet (Mormon) 76 6

    Pollard phonetic script 64E 4

    Vai (Liberian syllabary) <500E

    Shorthands (misc.) <200E

    Shaw Alphabet (Shavian) 53 4

    Osmanya script (Somalian) 64E 4

    Cirth 60 4

    Tengwar (Elvish) 64 4

    Aiha (Kesh) 40 3

    pIqaD (Klingon) 32E 2

     

    Others (poorly understood, single instances, etc.)

    Bone & Shell script ?

    Jindai (Shinto, Japan) ?

    Phaistos disk script 64E 4

    Sidetic ?

    Tamil Granta (probably extension of Tamil) ?

    Tartaria (Romanian ideographs) ?

     

     

    Symbol Sets

    Plane 1 Symbols Area <4096E 256

    (For example, musical symbols and symbols from a large number of other specific disciplines and/or cultural areas. See N884 for a representative sampling of symbol sets which might be appropriate for encoding as characters.)

     

    Totals

    These totals apply to the estimates made above for the GSP. They do not include any estimates for the number of Unified Ideographic characters which may be encoded in the UISP.

     

    Alphabetic < 3767

    Syllabaries, hieroglyphs, misc.

    ideographs, and pictograms ~ 17754

    Han-derived ideographic systems ~ 18319

     

    Total for Scripts ~ 40000

     

    Plane 1 Symbols Area < 4096

     

    Grand Total (Scripts + Symbols) ~ 44000

     

     

    References

    WG2 N 884 (= X3L2/93-017 = UTC-93-004)

    Concerning Future Allocations

    Unicode Technical Committee -- Rick McGowan & Joe Becker

    1993-04-06

     

    WG2 N 1370

    Roadmap to 10646 BMP

    Michael Everson

    1996-04-22

    http://www.indigo.ie/egt/standards/iso10646/map/map.html latest update

     

    WG2 N 1385(S)

    Repertoire additions for 10646 Cumulative List No. 3

    Bruce Paterson

    1996-05-12

     

    WG2 N 1452

    Summary of WG2 work items post Quebec meeting 31 (replaces N 1302)

    Sven Thygesen

    1996-10-03

    ftp://dkuug.dk/JTC1/SC2/WG2/docs/N1452.xls also .doc

     

    WG2 N 1464

    Guidance to position allocation in 10646

    Sven Thygesen, Mike Ksar

    1996-10-03

    ftp://dkuug.dk/JTC1/SC2/WG2/docs/N1464.doc

     

    Proposed Unicode Characters

    Mark Davis

    1996-10-25

    http://www.stonehand.com/unicode/alloc/Pipeline.html