ISO
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 2/WG 2

Universal Multiple-Octet Coded Character Set
(U C S)

ISO/IEC JTC1/SC2/WG2 N2235
Date: 2000-08-09

 

Title: 

Proposal for addition of ZERO WIDTH WORD JOINER

Source: 

Unicode Technical Committee

Status: 

Liaison Communication

Action: 

For consideration by JTC1/SC2/WG2

The codepoint U+FEFF serves two very different purposes.

It is clear in retrospect that this was a grave mistake. If U+FEFF only had the semantic of a signature codepoint, it could be freely deleted from text without affecting the interpretation of the rest of the text. Appending files together, for example, can result in a signature codepoint in the middle of text. Unfortunately, U+FEFF does also have significance as a character. As a ZWNBSP, it indicates that line breaks are not allowed between the adjoining characters. Thus U+FEFF does impact the interpretation of text, and cannot be freely deleted. The overloading of semantics for this codepoint has caused innumerable problems for programs, not in the least in terms of overall comprehensibility of Unicode/10646.

To ameliorate this situation, the UTC has approved the addition of a new character at U+2060, ZERO WIDTH WORD JOINER. This character would have the same semantics in all cases as U+FEFF, except that it cannot be used as a signature. The goal is to move implementations to use this new character over the next few years, discouraging the use of U+FEFF as ZWNBSP. At some point in time, the use of U+FEFF as a ZWNBSP can be deprecated, thus preserving only the use as a signature. This will simplify the programming model for Unicode/10646 significantly, and decrease the opportunity for error in countless implementations. The character should be encoded in the BMP, since it is similar to other characters there.

The UTC urges WG2 to also approve this character for addition to ISO 10646.


ISO/IEC JTC 1/SC 2/WG 2 - N2235 Attachment
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646


Please fill Sections A, B and C below. Section D will be filled by SC 2/WG 2.

For instructions and guidance for filling in the form please see the document " Principles and Procedures for Allocation of New Characters and Scripts" (http://www.dkuug.dk/JTC1/SC2/WG2/prot)

A. Administrative


1. Title:   ZERO WIDTH WORD JOINER



2. Requester's name: Unicode Technical Committee


3. Requester type (Member body/Liaison/Individual contribution):    Liaison


4. Submission date: 2000-08-10


5. Requester's reference (if applicable):


6. (Choose one of the following:) This is a complete proposal
This is a complete proposal: ; or,
More information will be provided later:


B. Technical - General


1. (Choose one of the following:)

a. This proposal is for a new script (set of characters): No

Proposed name of script:

b. The proposal is for addition of character(s) to an existing block: Yes

Name of the existing block: 2000; 206F; General Punctuation


2. Number of characters in proposal: One


3. Proposed category (see section II, Character Categories): Alternate Format Character (as with ZWJ)


4. Proposed Level of Implementation (see clause 15, ISO/IEC 10646-1): Any level is acceptable

Is a rationale provided for the choice? N/A

If Yes, reference:


5. Is a repertoire including character names provided?: Yes

a. If YES, are the names in accordance with the 'character naming guidelines' in Annex K of ISO/IEC 10646-1? Yes

b. Are the character shapes attached in a reviewable form? N/A


6. Who will provide the appropriate computerized font (ordered preference: True Type, PostScript or 96x96 bit-mapped format) for publishing the standard? The Unicode Technical Committee

If available now, identify source(s) for the font (include address, e-mail, ftp-site, etc.) and indicate the tools used:


7. References:

a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided? N/A

b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached? N/A


8. Special encoding issues:

Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information): Yes, see ISO/IEC JTC1/SC2/WG2 N2235


C. Technical - Justification



1. Has this proposal for addition of character(s) been submitted before? No

If YES explain


2. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other experts, etc.)? Yes

If YES, with whom? Unicode member companies (see http://www.unicode.org/unicode/consortium/memblogo.html)

If YES, available relevant documents?


3. Information on the user community for the proposed characters (for example: size, demographics, information technology use, or publishing use) is included? major IT industry leaders

Reference:


4. The context of use for the proposed characters (type of use; common or rare) YES

Reference: see
ISO/IEC JTC1/SC2/WG2 N2235


5. Are the proposed characters in current use by the user community? N/A

If YES, where? Reference:


6. After giving due considerations to the principles in N 1352 must the proposed characters be entirely in the BMP? Yes

If YES, is a rationale provided? Yes

If YES, reference: Yes, see
ISO/IEC JTC1/SC2/WG2 N2235


7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? N/A


8. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? No

If YES, is a rationale for its inclusion provided?

If YES, reference:


9. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? No

If YES, is a rationale for its inclusion provided?

If YES, reference:


10. Does the proposal include use of combining characters and/or use of composite sequences (see clause 4.11 and 4.13 in ISO/IEC 10646-1)? No

If YES, is a rationale for such use provided?

If YES, reference:

Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided? No

If YES, reference:


11. Does the proposal contain characters with any special properties such as control function or similar semantics? Yes
If YES, describe in detail (include attachment if necessary) see
ISO/IEC JTC1/SC2/WG2 N2235


D. SC 2/WG 2 Administrative (To be completed by SC 2/WG 2)


1. Relevant SC 2/WG 2 document numbers:


2. Status (list of meeting number and corresponding action or disposition):


3. Additional contact to user communities, liaison organizations etc:


4. Assigned category and assigned priority/time frame: