From listadm Mon Aug 12 20:22:48 2002 Received: from email1.ansi.org (email1.ansi.org [12.15.192.17]) by dkuug.dk (8.9.2/8.9.2) with ESMTP id UAA25984 for ; Mon, 12 Aug 2002 20:22:46 +0200 (CEST) (envelope-from mdeane@ANSI.org) Received: by email1.ansi.org with Internet Mail Service (5.5.2653.19) id ; Mon, 12 Aug 2002 14:24:36 -0400 Message-ID: <2F81C8110D55D411882A0020356797B2027FD6FC@email1.ansi.org> From: Matthew Deane To: "'SC 22 Distribution List'" Subject: SC 22 N 3465 - SC 22/WG 4 Convenor Contribution to the Coded Char acter Sets Workshop Date: Mon, 12 Aug 2002 14:24:30 -0400 X-Mailer: Internet Mail Service (5.5.2653.19) ISO/IEC JTC 1/SC22 Programming languages, their environments and system software interfaces Secretariat: U.S.A. (ANSI) ISO/IEC JTC 1/SC22 N3465 TITLE: SC 22/WG 4 Convenor Contribution to the Coded Character Sets Workshop, 26 August 2002, Saariselkä, Finland DATE ASSIGNED: 2002-08-12 SOURCE: SC 22/WG 4 Convenor (A. Bennett) BACKWARD POINTER: DOCUMENT TYPE: Other document (Open) PROJECT NUMBER: N/A STATUS: This contribution will be reviewed at the Character Sets ad hoc. ACTION IDENTIFIER: FYI DUE DATE: N/A DISTRIBUTION: Text CROSS REFERENCE: N/A DISTRIBUTION FORM: Open Address reply to: ISO/IEC JTC 1/SC22 Secretariat Matt Deane ANSI 25 West 43rd Street New York, NY 10036 Telephone: (212) 642-4992 Fax: (212) 840-2298 Email: mdeane@ansi.org ____end of cover page, beginning of document__________ August 9, 2002 To: SC22 character set ad hoc From: Ann Bennett, Convener, ISO/IEC JTC 1/SC 22/WG4 - COBOL Subject: Large character set and cultural adaptability support in COBOL WG4 has recently completed ISO/IEC FDIS 1989, which includes support for cultural adaptability and large character sets (typically ISO/IEC 10646, but not mandated). The support is summarized below. WG4 had significant help from WG20 in developing this support, particularly in the early planning stage when I attended WG20 meetings. WG4 is now starting to plan for future work and will be considering the following: (1) Handling of surrogate pairs of UTF-16 as a character unit (currently the unit of processing for UTF-16 is a 2-octet code). (2) Handling of combining sequences. WG4 needs input on the industry direction. (3) Additional date and time formatting. Current support is minimal. WG4 needs input on the requirements. (4) Additional extended letters for identifiers, to accord with additions to ISO/IEC 10646. WG4 would expect to adopt additional letters in a future revision or amendment of COBOL. WG4 needs stable normative references for the repertoire of extended letters in identifiers and the associated case foldings. Both need to be provided in normative references that ISO/IEC accepts. WG4 is looking to the character set ad hoc and WG20 for continued support in understanding character set and cultural adaptability requirements. Summary of character set and cultural adaptability support in ISO/IEC FDIS 1989:2002: - The COBOL FDIS adds a character data type for large character sets (such as ISO/IEC 10646), but does not mandate any particular representation. In COBOL terms, a USAGE NATIONAL clause has been added to data description entries. Operations are performed on fixed-size units (called encoding units in some implementations) with no recognition of surrogate pairs or combining sequences. Character set conversion is provided by intrinsic functions and features for conversion on input/output of file records. - An intrinsic function is provided for comparisons using an implementation of ISO/IEC 14651. Users can identify a table and specify a comparison level. - Cultural adaptability support lets the user choose a "locale" for sorting, comparisons, monetary format, upper and lower casing, and date and time formatting. The Posix locale is used for specification purposes, but a Posix implementation is not required. Users can select each category of cultural conventions independently of the others. For example, users might sort with one "locale" and use monetary format from another. - Extended identifiers are provided using the repertoire of TR 10176:2001, with the addition of Catalan middle dot. Case foldings are in accordance with the tolower specification of DTR 14652, with overrides for Greek sigma (,) and Turkish small dotless i (,). The tables were copied into the COBOL FDIS.