ISO
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

Universal Multiple-Octet Coded Character Set
(U C S)

ISO/IEC JTC1/SC2/WG2 N1838
Date: 1998-09-02

Title: 

Proposal to add the letters LATIN SMALL / CAPITAL LETTER A WITH DOT ABOVE to the BMP

Source: 

Mark Davis

Status: 

Expert Contribution

Action: 

For consideration by JTC1/SC2/WG2

This document contains the proposal summary (ISO/IEC JTC1/SC2/WG2 form N1352) and a full proposal for the encoding of two new characters in the BMP of ISO/IEC 10646.



A. Administrative

1.

Title

Proposal to add LATIN SMALL/CAPITAL LETTER A WITH DOT ABOVE to the BMP

2.

Requester's name

Mark Davis

3.

Requester type

Expert contribution

4.

Submission date

1998-09-02

5.

Requester's reference

 

6a.

Completion

This is a complete proposal.

6b.

More information to be provided?

No

B. Technical -- General

1a.

New script? Name?

No

1b.

Addition of characters to existing block? Name?

Yes, to Latin.

Suggested locations are U+1E9C/U+1E9D. However, the characters could be added at any reasonable place in the BMP.

2.

Number of characters

2

3.

Proposed category

Category A

4.

Proposed level of implementation and rationale

Level 1

5a.

Character names included in proposal?

Yes

5b.

Character names in accordance with guidelines?

Yes

5c.

Character shapes reviewable?

Yes

6a.

Who will provide computerized font?

Mark Davis
(if necessary--it is a trivial modification of any font containing U+01E0 and U+01E1)

6b.

Font currently available?

No, but it can be generated quickly

6c.

Font format?

TrueType

7a.

Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?

N/A--See below

7b.

Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?

N/A--See below

8.

Does the proposal address other aspects of character data processing?

Yes

C. Technical -- Justification

1.

Has this proposal been submitted before?

No

2.

Contact with the user community?

N/A--See below

3.

Information on the user community?

N/A--See below

4a.

The context of use for the proposed characters?

N/A--See below

4b.

Reference

N/A--See below

5a.

Proposed characters in current use?

N/A--See below

5b.

Where?

N/A--See below

6a.

Characters should be encoded entirely in BMP?

Yes

6b.

Rationale

Required for efficient normalization of Unicode/10646, as described below.

7.

Should characters be kept in a continuous range?

It would be useful, but not absolutely necessary

8a.

Can the characters be considered a presentation form of an existing character or character sequence?

To the same degree as U+01E0 LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON

8b.

Where?

 N/A--See below

8c.

Reference

 N/A--See below

9a.

Can any of the characters be considered to be similar (in appearance or function) to an existing character?

No

9b.

Where?

 

9c.

Reference

 

10a.

Combining characters or use of composite sequences included?

No

10b.

List of composite sequences and their corresponding glyph images provided?

No

11.

Characters with any special properties such as control function, etc. included?

No

D. SC2/WG2 Administrative

To be completed by SC2/WG2

1.

Relevant SC 2/WG 2 document numbers:

                                                                    

2.

Status (list of meeting number and corresponding action or disposition)

 

3.

Additional contact to user communities, liaison organizations etc.

 

4.

Assigned category and assigned priority/time frame

 

5.

Other Comments

 


E. Proposal

Proposal to add the letters LATIN SMALL/CAPITAL LETTER A WITH DOT ABOVE to BMP of ISO/IEC 10646-1

While the character A WITH DOT ABOVE may indeed occur in natural languages or academic use, the principal reason for this proposal has to do with the nature of normalization. There has been a great deal of interest in providing complete specifications for different normalized forms of Unicode/10646. (Cf. http://www.unicode.org/unicode/reports/techreports.html)

One of the normalization forms of particular interest is one that normalizes to precomposed forms--for example, that always uses U+00C0 LATIN CAPITAL LETTER A WITH GRAVE instead of the sequence of A with a separate combining grave accent <U+0041, U+0300>.

Implementations can be particularly efficient if Unicode and 10646 are coded such that whenever a single composed character X is canonically equivalent to composed character sequence <B, C1, C2,...,Cn> then there is another composed character Y which is equivalent to the sequence without the final combining mark <B, C1, C2,...,Cn-1>. For the purposes of this discussion, Y is called the completion character for X. If X does not have a completion character, X is called incomplete. Notice that only characters with two or more combining marks need to be checked for completeness.

There are only two incomplete characters in 10646:

U+01E0 LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON
U+01E1 LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON

By adding these characters, we can insure that implementations of normalization can uniformly apply the best algorithms to all text. By not having to check for special cases, the inner loops of the transformations can be as fast as possible.

The value of composed characters is fundamentally a product of their usefulness in implementations, since they could be expressed with composed character sequences. This is a special case where the addition of these characters is of particular value.


Name and glyph

LATIN CAPITAL LETTER A WITH DOT ABOVE

LATIN SMALL LETTER A WITH DOT ABOVE


Unicode Character Properties

XXXX;LATIN CAPITAL LETTER A WITH DOT ABOVE;Lu;0;L;0041 0307;;;;N;;;;YYYY;
YYYY;LATIN SMALL LETTER A WITH DOT ABOVE;Ll;0;L;0061 0307;;;;N;;;XXXX;;XXXX