Minutes for ISO/JTC1/SC22/WG14 and INCITS J11 Document WG14/N973, J11/02-001 15-19 April 2002 Curacao, Netherlands Antilles 1. Opening activities 1.1 Opening Comments Randy Marques, our host, extended a warm welcome to Curacao (and cautioned us about the tropical sun). Instructions were given for gaining web, news, and email access. 1.2 Introduction of Participants/Roll Call United States (J11) John Benito, Farance Inc. (Convener) Douglas Walls, Sun (HOD) John Parks, Intel Clark Nelson, Intel (non-voting) Fred Tydeman, Tydeman Consulting Randy Meyers, Silverhill Systems Raymond Mak, IBM Jeff Muller, Oracle David Keaton, Self Barry Hedquist, Perennial PJ Plauger, Dinkumware Tana Plauger, Dinkumware (non-voting) Tom Plum, Plum-Hall Canada Raymond Mak, IBM (HOD) Walter Banks, ByteCraft UK Francis Glassborow, Self (HOD) Denmark Allan Frederiksen, Nokia Jan Kristoffersen, RAMTEX (HOD) Netherlands Randy Marques, Atos Origin Willem Wakker, ACE (HOD) Germany Nobuyoshi Mori (Nobu), SAP (HOD) Norway Keld Simonsen, RAP (HOD) 1.3 Selection of Meeting Chair John Benito, Chair John Parks, Secretary 1.4 Procedures for this Meeting Everyone was encouraged to participate (and participate in straw votes), regardless of voting status. For formal votes, we would vote as countries. 1.5 Approval of Previous Minutes (N960) Approved without objection. Will get new document number and be made public. 1.6 Review of Action Items and Resolutions The following action items are now CLOSED o Gwyn: Rationale words for 6.2.6.1 trap representations o Tydeman: Rationale words for 6.3.1.6 complex types o Tydeman: Rationale words for 6.3.1.7 real and complex o Jones: Rationale words for 6.9.2 External object definitions - tentative definition o Jones: Rationale words for 6.11.(1,2,3,4,7,8,9) Future directions - array parameters removed o Tydeman: Rationale words for 7.12 Need list of new math (as of C99) functions o Benito/Thomas: Rationale words for 7.12 Redo Jim Thomas's words on math, e.g., 'this draft', 'that draft' o Tydeman: Rationale words for 7.12.14 FP compare o Tydeman: Signaling NaN paper o Benito: explain blue line in DRs; add (and explain) more states of DRs in DR log o Benito: DR 273 __STDC_ISO_10646__ o Kristoffersen: new example for annex D (IOHW) o Wakker: prepare disposition of comments document o Benito: determine what to do about registering C locale The following action items are still OPEN o Benito: C99 Time issues and WG15 o Mak: Sequence points paper in TR format (Tydeman, Seymour, Seebach, Meyers will review/help) Needs review comments, will distribute paper after meeting. o Meyers: write paper based upon DR 219 o Meyers: review DR 230 o Meyers: Rationale words for 6.3.2.1 rvalue array type Randy wasn't sure. o Gwyn: Rationale words for 7.1 General library overview on const o Gwyn: Rationale words for 7.18.1.5 Greatest-width integer types o Meyers: DR 230 paper o Gwyn: DR 274 alternate wording o Benito/Plauger: DR 224 response on INFINITE 1.7 Approval of Agenda (N967) Simonsen: requests time (30 min) to discuss C locale, Tues AM, first thing Walls: requests to change US TAG meeting to Wed afternoon Tydeman: signaling NaN paper, do that in liaison section Glassborow: requests time to talk about UK-hosted meeting 1.8 Distribution of New Documents None. 1.9 Information on Next Meetings 1) October, 2002 -- Santa Cruz, CA hosts: Perennial and Dinkumware dates: 10/14 (for C) 10/20 (for C++) hotel: West Coast Santa Cruz Hotel (former Dream Inn) make reservations directly with hotel don't have to stay at Inn, there are B&Bs, etc. nearby room rate is good, all rooms have ocean view $125/night single or double, $135 triple, $145 quad rate will stretch to one or the other weekend 2) April, 2003 -- Oxford, UK host: UK dates 3/31 (for C), 4/7 for C++) hotel: Holiday Inn (just outside ring road) entirely rebuilt, excellent conference center, good internet connections $145/room (110 pounds) at current exchange rate, includes tax and full breakfast trying to negotiate lower room rates Travelodge next door, 70 pounds/night, no breakfast ACCU Spring conference will be there also looking for presenter(s) sponsorship: ACCU and 2 private sponsors (including Francis Glassborow) co-sponsor would be very welcome, free spot at the conference Plum: large companies that do business in UK may want to consider bus service, train service, easy to get to/from London and Oxford 3) October, 2003 -- Kona, Hawaii host: Plum-Hall preliminary dates: 10/15-10/21 (for C), 10/20 (for C++), 2 days overlap the Iron Man Triathlon hotel blackout period makes it difficult to begin on 10/13 Wakker: 2 mid-week travel days may be objectionable could consider shorter meeting 4) April 2004 -- Norway (possibly Oslo) 5) October 2004 -- searching for North American host 1.10 Identification of national bodies Canada Denmark Germany Netherlands Norway United Kingdom USA 1.11 Identification of J11 voting members 11 voting members present, a quorum. 2. Rationale Editors report (Benito) Benito doesn't have new doc for this meeting with help from Jones and Tydeman, got everything into doc that was supposed to go into doc (from Redmond meeting) problems with index, Microsoft Word issues, removed index style, Benito now producing another one that's ONLY issue left with Rationale Benito will give doc number and post to web once the index problems are resolved Plum why not get rid of index? Benito rumors are that somebody wants to publish it with all of C99 and first set of defect reports Tydeman would it be worthwhile to publish to website without index? [AI] Benito: will post to website without index [AI] Glassborow: volunteer to take care of index 3. TR Status Report (ISO/IEC WDTR 18037.2, WG14 N968, Wakker) this is the latest version of the TR most important changes: fixed-point arithmetic and section that details exact changes to the C99 Standard need more comments on doc (it's been too quiet) hope to have decision on next ballot by the end of this meeting Plum: C++ liaison issues historical perspective: we didn't define a compatibility header to facilitate the use of complex in both languages; that issue is still being hotly discussed on the C++ side if we try to define THE compatibility header in this case, however, we've got no chance to make it compatible with local conventions (which are suited to local needs) an existent proof would be nice but neither committee should define the header Glassborow until you do existence proof, you don't know it's possible who should do it is difficult problem we shouldn't avoid issue a template-like syntax might help Wakker philosophy was to create basic types, implementations can provide mappings to more locally palatable names Annex G is informative, not normative, it only suggests how implementors might provide compatibility C/C++ Meyers very little work by C and C++ committees would have gone a LONG way toward making life easier for implementors and users Plum would be good to show example of a compatibility header in the document (including values) C0X implementation has efficiency edge on C++ because it has REAL fixed point values, probably not a big deal Glassborow when either language makes future changes, if you had a compatibility header you would have something to regression test against Plum before the decision about whether header is normative or informative is made, we need header Plauger there are regrettable incompatibilities between C/C++ complex but that case is overstated, Dinkumware has existence proof of headers that reconcile library complex differences Dinkumware will soon begin work on fixed-point libraries, not terribly worried about incompatibilities Benito -- next ballot on this is technical next step is to move on to draft status at end of week -- are we ready to move this forward? if we do NOT move this to draft status after this meeting, that will REALLY push this out if we get this out from this meeting, we can move on 4. Liaison activities 4.1 J11 (Meyers) nothing much to report NCITS has become INCITS (International NCITS) Tana Plauger this may be difficult for small companies since they can be over-dominated on international committees Plum small corporations are probably already under-represented in all countries US domiciled corporations can go to individual US TAGs natural conflict -- should US standards committees always push US point of view OR be seen as participating in international efforts 4.2 WG14 (Benito) convenership open, Benito will volunteer again if there are no objections or volunteers plenary in August in Finland, Benito will attend new work item vote failed (item 5, discussed later) 4.3 J16/WG21 (Plum) C/C++ compatibility issue is flaring up again there are process issues and technical issues, Plum wants to address process issues and leave technical to Meyers Plum suggested in Redmond that to improve things, we should work within ISO procedures, not try to create new special arrangements proposal: let's encourage as many companies as possible to participate in both committees (and as liaisons) WG21 designated 7 liaisons from WG21->WG14 (that was also done for WG14->WG21 and those names were listed in the WG21 minutes) Tana Plauger if you are designated as a liaison are you automatically part of the other group? Plum no, you must have some interest and expertise in the other language liaisons Dinkumware and Intel want to be added as liaisons Perennial is already on the list Benito is now off the liaison list 4.4 WG15 (POSIX) (Simonsen) new revision IEEE and OpenGroup have voted affirmatively (the ISO vote must do so also before this can become FDIS) 4.5 WG20 (I18N) (Simonsen) there was a proposal to disband this group, it was resolved/defeated in SC22 though the vote was close Benito is there another disband proposal on the table? Simonsen no knowledge of that there are a number of standards being considered 1) updated repertoire of 10646 (Unicode), ballot ended 4/14 2) TR 14652, ballot ends middle of May 3) API standard, about to be disbanded, not progressing changed from language independent spec to spec that it bound to C, maybe it should be dealt with in C Committee? 4) tech report on design of programming languages WG14 has used character descriptions about to go out to ballot (next few weeks) Plum SC22 plenary talked about politics SC2 has close working relationship with Unicode Consortium strong feeling at ISO that we want just one place (committee) where identifiers are defined inconsistency -- Java uses Unicode, C++ and C point to different versions of WG20 document which are inconsistent with Unicode there is the "checks and balances" argument -- SC22 is court of last resort for small companies and countries would like to see document that explains why particular characters that aren't in Unicode really need to be considered it would be good if we stopped pointing to WG20 documents Simonsen the history is that Unicode picked up WG20 character list work WG20 and Unicode Technical Committee cooperate (UTC is WG20 liaison) Unicode is soldier of big American firms, other cultures need clout Nobu (who is also on Unicode committee) small differences, lists are synchronized, Unicode is not just American corporations, they support WG20's work Glassborow standards are about portability our expertise is in language design, not character sets, we should confine ourselves to language design Nobu C# may soon become ISO language, 16 bit characters, we should strive for compatibility there 4.6 WG11 (Wakker) the language independent arithmetic ballot succeeded Frank Farance taking lead in revising data types 4.7 Other Liaison Activities 4.7.1 Tydeman: signaling NaN paper the paper is up on the website he has had feedback from Nick Maclaren and a trial implementation will remove problem section and repost, will eventually be rewritten [AI] Tydeman: give Simonsen revised document, Benito and Keaton will also review 4.7.2 Changes to WG21 Liaisons remove Benito add Dinkumware and Intel need leader Plum is convener and so cannot, may drop convenership in October and then might consider it Francis Glassborow will do it, if only for a short time 4.7.3 Other Liaison Issues Simonsen SC22 plenary in August, will discuss char set issues (including Unicode) will we prepare input for that meeting? Benito add agenda time to discuss WG14 input to SC22 plenary tomorrow morning after C locale conversation Benito will attend plenary to represent WG14 Simonsen there are 3 J11 ballots pending (DIS C#, DIS TLI, TR TLI), they end in middle of July, there is a ballot resolution meeting end of September in Hawaii Walls there were very few comments on C# (doc edited by Rex Jaeschke) and the review period is now over Plum US TAG part of J11 gets the opportunity to comment on language docs that come through, it is strongly encouraged to form an opinion, only several of the committee members might have any interest or knowledge of the subject matter, C++ refused to take a stance and instead voted to abstain Benito that process is clearly broken ------------------------- lunch (sponsored by ACE) ------------------------- 5. SC22 N3356 (Benito) the ballot for this new work item failed in Redmond we had 6 countries interested in this, only 3 voted for it and we need 5 to continue this work what does it mean for a country to say that a work item should be pursued but they don't want to participate? Canada and the Netherlands think they probably should have voted for this hopefully, we can straighten out the vote this week let's talk about this work item this week anyway Glassborow doesn't understand the UK vote and nobody consulted him doesn't want UK to be left out of it Wakker the corrected Dutch vote will be issued in the next day or two 6. Defect report status (Benito) Benito made all of the changes that were requested of him described terms (closed, closed published, closed with date, review, open) Simonsen questions methodology there was consensus agreement that Benito followed agreed upon process defect reports are on the web, not in mailing 7. ISO/IEC WDTR 18037 Editors report (Wakker) type names have changed new type conversion rules (2.1.4) r, q, h for constants and formatted I/O (capital for unsigned versions) for fract, accum, short variants open issue: wide string versions, are they necessary? asked committee for feedback on whether the format of section 2.2 (detailed changes to the C Standard) is reasonable are there machines where sizes of accum/fract don't map to size of integers? yes. Keaton in past meetings it was agreed that there are some machines that simply can't be supported by this model, that's OK since this will work for a large class of machines asked committee to please read and review annex G on C++ compatibility annex F on functionality not included in this TR 8. Additional character types (N969)(Nobu) the paper is mostly a summary of the WG14 reflector discussion on the SAP proposal to add a UTF-16 character type to C/C++ Plauger many of the proposed solutions are horrible, including anything that wires UTF-16 into the C Standard, it only has currency now because Java chose it biggest lapse we made in C89 was to provide little/no guidance on wide character mapping proposes adding char16_t and char32_t solves user problem that we don't know what wchar_t is going to give us likes idea of generalizing string literals as mechanism for static initialization would like us to define where users can change wide char representations within their code Plum the conjecture that UTF-16 was chosen because of Java is not right, it's really the sweet spot for databases and codes that large corporations are trying to standardize on (convenience versus efficiency tradeoff), 8-bit is too short, 32-bit takes too much space we've got constituencies asking us for 16-bit the original request was for a type whose size was evident from its name and whose representation was the approved ISO-blessed Unicode UTF-8 and UTF-16 are stateful encodings, UTF-32 is not we're really extending our pioneering work that allows users to prepend 'L' on string certain operations can only be defined on "strings" not individual chars we can't eliminate individual char processing methods but we want to restrict its use as much as possible (hopefully, just to library) Simonsen there is a business need for us to support UTF-16 UTF-32 is a tainted term, a Unicode-only term, UCS-4 is more politically correct wants to get rid of 'L' at end of strings because they're ugly agrees with Plum that we should start looking at "strings" as basic unit of operation Benito paper was meant to be high-level summary, no divisive Mak there is no need for UTF-32 (though there is need UTF-16) C model is that wchar_t is fixed width encoding proposal is for 16-bit, variable width encoding (for efficiency) optimum length seems to be 16 bits if we want to be more general, let's define a 16-bit type that can map to more than just UTF-16 we should decouple wchar_t and the new data type to prevent user problems mixing objects with those types Plum programmers that need UTF-16 strings need them now (and have needed them for several years) and have responded to that need by 1) writing text-to-text preprocessors, and 2) hacking gcc internally compiler solution would eliminate the need for these workarounds our TR charter included investigation for ongoing library work, not to define new APIs (charter: data type, initializer, APIs) if we add a 32-bit type, we may encourage programmers to use it as their fundamental type and that's not what the Unicode Consortium is recommending (they are recommending UTF-16) Nobu agrees that we don't need UTF-8 (redundant) if majority think UTF-32 is unnecessary (or undesirable) he will go along with that endianness is not in proposal Glassborow data types that include state information are complicated to handle (UTF-32 would be required to get rid of state info) Plauger if this was a good idea somebody would have implemented it doesn't like principle of somebody convincing committee to do something so we can see if it's a good idea... Simonsen UTF-16 is not just Unicode, it's an ISO standard we should not do quick fix, we'll have to live with this for years Meyers C committee was wise to leave char set stuff out of language there seems to be strong need to 16-bit equivalent to char and it would probably be wise to make it generic Plum quality-of-implementation question: C++ will need to overload on array of UTF-16 it is hopeless to try for "stateless" code points UTF-32 entity is a "code point" (character) implementations must deal with combining code points (ordering and adjacency matter) so they cannot be entirely "stateless" Unicode defines "canonical sequence" of code points so they can be manipulated it is not necessary for code points to encode endianness everything that we are discussing deals with array of 16-bit int type, it is an int type, there is only one endianness, it's the type that is supported on this platform, it is thus not an issue on this platform, just for interchanging data across platforms (on web), within same machine family, use native endianness endianness is data interchange problem (library vendor) there is existing practice the installed bases of .NET and XML and Java are existing practice, they represent a sizeable market we are not withdrawing support for traditional C char/library methods (traditional POSIX narrow/wide char model), this proposal simply adds support for a new market we are standardising another character in front of string literals, if another community needs another encoding standardised they should consider the door open to add yet another letter Nobu other 16-bit encodings (the ones he knows about) are just historical, they have been replaced by Unicode Walls vendors have standardised on char and wchar_t we have customers begging us for 16-bit Unicode, we tried to convince them otherwise, they wouldn't go away, we said that we'd work with them on the Standards committee there is more pressure for this for C than for C++ Simonsen Unicode has something like 10 normalization forms, to canonically represent a string (10646 has just one) Plum the real problem is getting from initializer double-quotes to array of 16-bit ints, that's what cannot be done without a preprocessor Simonsen idea: instead of a new char type, offer them a string type Plauger let's NOT wire UTF-16 into C, let's make it possible but not necessary if the market forces make it happen, so be it does NOT accept premise that we've finally gotten characters right ------------------------- break ------------------------- Nelson UTF-16 is not a valid wchar_t encoding, UTF-32 is not valid either because it is not really stateless Plauger "grapheme clusters" SC22 passed a resolution a decade ago that said that language groups do not need to treat combining characters Plum there are combining characters in spoken languages, so even if your char set does not use them, the language itself might (addressing Nelson's point about UTF-32 not being stateless either) Plauger the problem with processing UTF-16 as chars is embedded nulls Nobu problem with wchar_t is that it is historically implementation-dependent can we summarize the conversation in such a way that we can move the document forward? Do we perhaps agree that 1) UTF-8 is not necessary 2) we should create UTF-16 data type 3) we should create UTF-32 data type Simonsen UTF-32 is not necessary a more generalized approach would be better, one that includes information about which encoding, etc. doesn't want to call out specific standard Meyers you can consider UTF-8 as multibyte type, store in char why isn't UTF-8 desirable? Mak UTF-8 is not being asked for UTF-32 is not required either we should focus on just what's missing in the C standard - the translation of UTF-16 literals (which isn't now possible) Plum let's take straw votes and then move on from there there won't unanimity in any case proposal: we should consider 2.1 with "some" string literal prefix and not including UTF-32 (2.2) Plauger what would make Sun happy? Nelson is it important to have wide-string literals and UTF-16 in the same translation unit? Nobu no, there are technical problems with that Plauger do we really need more than 2 char encodings? wouldn't customers be happy if the platform just made wchar_t 16-bit? Walls no, customer does not want that they ship their own library Simonsen iconv can solve problem ad hoc Plauger there are any number of ad hoc solutions, the question is what's the nicest thing to do to the C Standard Nobu what would be the minimal approach to make customer happy? 1) compound literals 2) new minimum approach 3) 1 or 2 new data types Plum argues for 2.1 proposal since C++ doesn't have compound literals compound literals happen at the wrong phase of translation David Keaton compound literal approach spreads character set translation across multiple phases of translation (they wouldn't be entirely lexical) proposal that we take a straw vote on the concepts, not the specifics of the approaches described in the paper - are you in favor of pursuing an approach along the lines of... Walls suggests that we all reread the paper straw vote first thing tomorrow morning Mak let's separate 2.4a/2.4b so that we can separate out the 32-bit type ========================= adjourn for day ========================= Tuesday, April 16, 2002 continuation of discussion about N969 and the possible addition of new character types to the C language ------------------------------------------------------------------- Plum 2.1 is what the Unicode Technical Committee originally requested, 2.2 is an extension of that (thinking that UTF-32 might be needed someday), the rest are afterthoughts everything after 2.3 tells the Unicode Technical Committee that we don't want to do what they're asking for (i.e. create a data type that is exactly UTF-16), they are just alternative ways to say "no" Simonsen there are other requests than just this one from the Unicode Technical Committee the Norwegian member body may/will propose that we have more portable/generic literals Benito let's not complicate vote let's deal with votes on this paper then move forward there is no other paper on the agenda about longer char types Walls some of the other proposals are being asked for (2.4 in particular) selectable by compile-time option (for some character set other than UTF-16) Plum what's the marketplace demanding (other than UTF-16) Walls "other" character sets Nobu 2.1 is the minimum approach the Unicode Technical Committee would be happy with it Plauger this isn't simple issue of handling a request from another body, this is a long-standing and controversial issue now that UTF-16 has been raised, it is absolutely appropriate to raise other alternatives at this time it is important to address the issue about switching between wide char encodings this vote is just preliminary step towards addressing these issues Glassborow these proposals raise compatibility issues with C++ doesn't like typedef to introduce new data types - if we want new data types we should introduce data types we shouldn't do anything without consulting WG21 Mak UCS2 can already be handled in implementations by defining wchar_t (since it's fixed-width) to be 16 bits you can't do that for UTF-16 since it's variable-width Plum proper C++ solution would be to allow specification of underlying type of enum (like you can do in C#) Glassborow problem with all these solutions is that they are creating arithmetic types for object that are not arithmetic ------------------------------------------------ [straw vote] on additional character types (N969) ------------------------------------------------ 1) who is in favor of something along the lines of 2.1? in favor 10 opposed 7 abstain 4 2) who is in favor of something along the lines of 2.2 (as an addition to 2.1) in favor 0 opposed 12 abstain 8 3) who is in favor of something along the lines of 2.3? in favor 0 opposed 12 abstain 8 4) who is in favor of something along the lines of 2.4 for char16_t? in favor 6 opposed 2 abstain 13 5) who is in favor of something along the lines of 2.4 for char32_t? in favor 2 opposed 7 abstain 12 6) who is in favor of something along the lines of 2.5 for 16-bit? in favor 3 opposed 1 abstain 17 7) who is in favor of something along the lines of 2.5 for 32-bit? in favor 1 opposed 6 abstain 14 8) who is in favor of something along the lines of 2.6? in favor 0 opposed 11 abstain 10 9) who is in favor of something along the lines of 2.7? in favor 0 opposed 12 abstain 9 10) none of the above (do something different) in favor 6 opposed (I want one of the above) 6 abstain 9 Benito the committee has already said it wants to do something about this (from Redmond meeting) and it has already been communicated to Unicode technical committee Nobu a lot of people seem to favor a different approach wish those people who had spoken up earlier Benito calls for Simonsen to get paper in post-Curacao mailing that might outline a different approach to move forward, we need these other approaches spelled out Plauger put myself in category of "asleep at the wheel" busy providing stop-gap solutions at Dinkumware Benito deadlines for mailings are generous and can be flexible the deadline is usually 4 weeks after the meeting but that can be extended 9.0 Registering C Locale (Benito) Benito was asked to register C locale (London 1999 meeting) and he sent doc to SC22 there was a ballot, ballot comments, Benito tried to answer them we need to finish resolving comments doc N3236 (at SC22 website) SC22 decided that WG15 should answer 2 of the questions (that hasn't happened) Simonsen believes that we are responsible for answering all of them Benito in Redmond, there was discussion about whether we want to register this locale at all do we want to register the locale? if we do, let's resolve the comments and it's done Plauger in favor of registering it it fills in some of the gaps from our original C locale description Mak does registering locale mean POSIX implementations or ALL implementations must conform? it would be difficult for EBCDIC to conform Plauger C implementations that don't conform to POSIX need not conform to this C locale requirement this would only apply slightly more pressure to conform Plum specifically applies only to POSIX conforming implementations needs to be documented as such or there will be confusion Benito says that at the top of the document [AI] Benito: verify that the C locale specification says that it applies only to POSIX conforming implementations ----------------------------------------------------------------------- [straw vote] should we continue with this registration of the C locale? ----------------------------------------------------------------------- in favor 17 opposed 1 abstain 2 [AI] Benito: circulate his preliminary answers to ballot concerns to a few people (Wakker, Plauger, Simonsen, Hedquist), will put in post-Curacao mailing, will forward to SC22 10. Character set ad hoc at the SC22 plenary (Benito) Benito plans to attend plenary is there anything that WG14 wants him to talk about there? Simonsen should we support 10646 Unicode? yes. should we have some means of support for UTF-16? yes. should continue to support other char sets? Plum we've got a sub-optimal way of working now, SC2 and Unicode are not going away no matter what SC22 does, wherever possible our standards should point to Unicode and merely point out where they are wrong right now we have 2 moving targets, theirs and ours Nobu hopes everybody will give better support to Unicode standards, that is what market needs Simonsen Unicode has a lot of expertise and they should have their say SC22 is appropriate counterpart with slightly different focus it is good to have forum within ISO that coordinates and cooperates with Unicode Technical Committee ISO does API work and strings work that is not done in Unicode WG20 definitions are "almost same" as Unicode Plum eventually the C++ annex on identifiers will change (when everything settles down); there are 2 possible ways it might change 1) change C++ to adopt the C approach to upward superset-ability (an implementation can support more identifier chars than the min) 2) adopt the rule that a character is an identifier character if either it appears on the WG20 list OR the Unicode list (union of 2 lists) Simonsen stability is important, Unicode updates very frequently Glassborow purpose of standards is portability extended identifiers go against that is is unfortunate that C is written in language that is so much like English readability is important but it harms portability Plum do any vendors/customer bases use extended identifiers? moving target is part of the problem every few months when Unicode adds a new script implementors cannot afford to keep modifying their lexing algorithm we could probably make a list of all non-identifier chars, then implementors could choose to make ALL others chars legal ident chars Meyers last days of PDP-11, C compiler added support for extended Latin-1 chars and that was very popular others commented that they know of places that are using extended identifiers Simonsen both Unicode committee and WG20 came up with positive lists (not negative lists like Plum suggested) WG20 discussed positive/negative list extensively; they chose positive and he thinks that was the right decision Nobu well-defined negative lists would allow for better portability Benito summarizing - Norway is saying 1) character set independence 2) less frequent updates ------------------------- break ------------------------- Benito Dutch vote on new work item TR has been officially recorded as "yes" 11. Break out into defect reports subgroup (12 people, led by Nelson) and embedded TR subgroup (9 people, led by Wakker) ========================= adjourn for day ========================= Thursday, April 18, 2002, Plenary Session 12. Embedded TR discussion review (Wakker) UK, Netherlands, and Canada have revised their ballots to say that they wish to participate that brings the total to 6 who wish to participate Norway still has concerns the subgroup went through section 2 discussed John Hauser proposal the Annex will say that the approach being suggested is only suggested (not mandated), the Hauser approach may prove better Annex G (compatibility with C++) more input from WG21 is necessary current text gives good indication about what could be done in that area confidence that the document is complete both textually and technically propose that document sent out of WG14 shortly after meeting will get minutes from subgroup scribe Benito how is C++ input getting solicited? Wakker that was not discussed explicitly it is not yet on radar screen at WG21 Glassborow there may be interest in WG21 if it is nurtured will push it with WG21 Plum we should rely on our new 8-person liaison group Benito time for feedback is NOW, not after we push document into draft our document is freely available on the website it would have been good to have their input ALREADY Plum in present times of diminished budgets we need to look hard at our new work with concern towards what implementors can afford to implement next ballot stages move TR into official committee draft (PDTR) final PDTR DTR Benito if the ballot passes (at SC22) it will become draft technical report Simonsen PDTR puts SC22 member bodies in control (in some sense), WG14 can only change through formal comment process, changes will be tracked Glassborow concerned that sending this to SC22 may cause problems working with WG21 (since changes will then be harder to make since control will be in the hands of the national bodies) Wakker SC22 ballot only means we will track the changes better Simonsen after this has been sent to ballot we can only change it in response to comments there is no need to delay what we're doing this ballot will tell WG21 to take this seriously Plum annex G is more important to WG14 than WG21, it is our document WG14 needs to work harder on this annex Plauger there is still ample time for WG21 to get up to speed and give adequate feedback on this document Plum would be nice to have agenda time devoted to technical content of Annex G Glassborow politically, it would be better to go to WG21 before rather than after ballot Simonsen we have already done registration vote at SC22, this isn't the first vote on this ------------- [formal vote] ------------- To instruct the WG14 convener to submit document WDTR 18037 as amended for SC22 PDTR ballot as quickly as possible after this meeting. J11 vote (10 present) Benito moves, Tydeman seconds in favor 10 opposed 0 abstain 0 WG14 (7 present) in favor 6 opposed 1 abstain 0 13. Defect report review (Nelson) 13.1 DRs ready to close with no change to the committee recommendation (we're recommending that these be moved to "closed") 224, 235, 238, 239, 240, 241, 242, 244, 246, 247, 248, 249, 254, 256, 258, 259, 262, 263, 264, 273, 276, 278 It was noted that the response to DR 258 could be nicer. Benito commented that all of the responses will be reviewed and made nicer where appropriate. 13.2 DRs in the "review" state where we want to make some kind of change DR 237 we agree but will add footnote similar to the first bullet item in the response that explains implications [AI] Meyers: write footnote DR 243 we question justification for imposing documentation requirement here Tydeman has submitted essentially the same proposal to IEEE committee we want to hold this until IEEE decides what to do with it Wakker suggests that we should add response words that say that leave in "review" state DR 250 we think the Standard is correct (preprocessing directive includes non-directive) the term "preprocessing directive" should probably be italicized in paragraph 2 (6.10) the answer to Clive's question is "yes, it's a directive" [AI] Meyers: write footnote DR 252 we don't want to work on non-prototyped functions [AI] Meyers: draft words that capture this intent (possible taken from response to DR 255 full committee decided to "close" with these new words DR 253 we agree with response but want to add to the response words that say "the initializations for x and y are different" (they are different because of paragraph 21) full committee decided to "close" with these new words (aside) the term "designated initializer" is never mentioned in the Standard though it appears in the index and new features section (we use the term "designation initializer" in the text) DR 255 change wording to The Committee does not wish to further refine the behavior of calls not in the scope of prototypes. In practice, this will not be a problem, and the Committee does not wish to define the behavior. note the emphasis on "calls" full committee decided to "close" with this change DR 257 want to add "committee discussion" words that capture our thoughts on each of these suggestions 1) we agree with him but do not believe that this is a defect in the Standard (or a substantive problem); there is some support for changing the example 2) takes away the "visibility rule" and we don't want to do that; this is related to DR 236 3) we agree with him but do not think a change is warranted at this time [rev] we think we should consider this for a future rev of the Standard 4) we think this should be a separate DR (and possibly a new footnote) [AI] Nelson: file new DR for this alone leave in "review" state DR 265 we think the footnote text should be changed to read Thus on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000 is signed and positive within a #if expression even though it is unsigned in translation phase 7. full committee decided to "close" with this change DR 269 we agree with suggested resolution [rev] we think we should consider suggestion 2 (about unsigned types) for a future rev of the Standard full committee decided to "close" DR 270 we recommend adding Rationale words similar to It would be legal for an implementation to have wchar_t be either int or unsigned int and wint_t be either long or unsigned long. The suggested changes would invalidate implementations currently allowed by the Standard that made those choices. [AI] Tydeman: draft Rationale words that say that wchar_t does not necessarily promote to wint_t full committee decided to "close" DR 271 [rev] we think we should consider this for a future rev of the Standard full committee decided to "close" DR 277 subcommittee was divided on whether they think the words can be misread the way Clive suggests; they agree that if they make a change, the first suggested rewording is best ---------------------------------------- [straw vote] should we make this change? ---------------------------------------- yes 5 no 8 Wakker why don't we simply remove the restriction that disallows constants? Meyers this was an arbitrary language decision, C++ precedent is to disallow this Plum wasn't necessary in C++ since you can intermingle code and decls (which C has since adopted) Glassborow people are reluctant to change standard since it's costly BUT it is also costly if the standard isn't unambiguously clear full committee decided to "close" 13.3 DRs that were in the "open" state DR 245 we agree with the suggestions, except that 7.21.4.3 is incorrectly listed as 7.21.4.4 move to "review" state DR 275 we think the macro should be defined as zero in this case (and cannot be left undefined) move to "review" state DR 279 the original restriction came from Plauger and he now thinks restriction should be removed we recommend suggestion 2 (removing the restriction) move to "review" state DR 236 we did not like the suggested example in the "committee discussion" we agreed that the Committee's original intent was to create aliasing rules to allow optimizations based on types; unions can create "aliases" that break this we agreed that we wanted to fix these rules so that they can remain useful to optimizers we agreed that we want to move the Standard in the direction of "the visibility of the union at the point of the reference" is what matters [AI] Meyers and Nelson: draft changes to the aliasing rules (and the definition of "effective type") so that optimizers can continue to use these rules to optimize programs Mak the discussion focused on example 2 (unions) and we also need to handle malloc'ed storage Keaton no objection to invalidating some programs (with new rules), those programs were written in a problematic coding style (passing function arguments that are aliased to one another) Plauger in favor of new rules, they will provide guidance to real programmers trying to write correct, optimizable code there should be a quick response that says that we don't like example leave in "open" state [additional discussion on DR 236 from subcommittee] Meyers at Digital, we took conservative approach and any union in scope stopped optimizations (since we could look at whole compilation unit all at once) Walls at Sun, there are command-line options to control aliasing assumptions the compiler is 1-pass and so a union wouldn't prohibit optimization unless union was seen first Meyers the optimization in example 2 should not be allowed and it is not worth fixing the Standard to allow it aliasing rules are just heuristic, these optimizations are usually put under control of command-line switches, it is inherently unsafe to write code like this if we make the suggested change, we would be starting an education process to teach people how to fix their programs and write them in such a way that they can be optimized fully Nelson effective type is not a concept that is applied to unions, it is only for dynamically allocated memory suggestion 1) apply effective type concept to unions (e.g. assignment to a union member has the potential for changing the effective type) 2) add new bullet to ANSI aliasing rules about "members of a visible union type" Meyers another suggestion: restrict taking address of union member Nelson we might draft wording that touches 1) 6.5p7 if you have 2 types which are members of a visible union type 2) 6.5p6 effective type definition 3) 6.2.6.1p7 when a value is stored into union member, other members become indeterminate 4) 6.5.2.3p5 it was also suggested that we should put examples that we want to work and not work in the TR DR 219 (related to DR 236) we think that the resolution of this DR will make no practical difference to compiler writers Meyers suggested saying that if the memory being copied is a partial object then the effective type is array of char, if it is a full object that determines the effective type [AI] Meyers: write paper detailing this approach (for next meeting) and argue why we should dismiss the suggested approach leave in "open" state 14. Additional character types TR discussion review Nobu how/when do we decide encoding (2.1, 2.4, 2.5)? we're now focused on 3 possible solutions (we've rejected UTF8 and UTF32) in what direction should we go? what shall we say to people who want to write encoding-dependent programs? 100K people are writing that kind of code at SAP Benito to proceed, we need to register document, then circulate, then move it to PDTR Simonsen we're in early stages here, we cannot go to PDTR too quickly Benito Wakker's embedded C proposal took several years to move to PDTR Plauger is this a normative or informative change? Simonsen want more generic approach that would also accommodate UTF-16 Nobu there could be a more generic approach using macros charU => 1) char 2) wchar_t 3) char16_t strcmpU() => strcmp() cU("") => 1) "" 2) L"" 3) u"" Simonsen if we want to proceed along these lines we need a written proposal Benito [AI] Nobu: put type generic approach description on WG14 website (once he cleans the code) and forward to Benito to put in mailing Simonsen WG20 document 15434 I18N API generic API (not bound to Unicode) reentrant (thread safe), locale, codeset, repertoire, string supports new locale specification (14652), sort of a Posix enhancement partially implemented in gnu compiler Nobu ICU API is already widely used Simonsen ICU is bound to Unicode 15434 is also widely used (in gnu compiler) Nobu 15434 has missing pieces (strcmp, etc.) the targets of the 2 projects are different ICU is already implemented Simonsen 15434 is also implemented (in gnu compiler) do we want to be generic or Unicode-bound? assume we want to be generic (EBCDIC, etc.) Nobu generic approach is nicer we should continue discussion Keaton we have clear committee position that we do not want to be character set specific (in the committee DR response to allow EBCDIC) Plum disagrees with assertion that somebody has asked for a Unicode-only environment the request will not affect char or wchar_t they are asking for a new name that would guarantee UTF-16 type if people want more generalized 16 bit type that's fine, the proponents of those encodings should come to as Nelson we have already hard-wired support for ISO 10646 into C suggestion: we can soft-wire support for UTF-16 by defining a macro that says whether or not the extensions are available; implementors would NOT be required to have them Plum this is just a technical report, not a part of the Standard as long as the source code can tell at compile-time whether or not UTF-16 is supported, users could be satisfied Keaton likes soft-wired approach Plum there are complaints about wchar_t because there are no guarantees about either its size or encoding, wchar16_t guarantees at least the size supports use of upper and lower case u's to decorate strings Meyers prefers that typedef not say "utf" in it likes u (both upper and lower), we can think of it as "Universal" and the Unicode community can think it's "Unicode" Keaton we could use string creation macros and just blow off the prefix Plum what about adjacent string concatenation? Mak suggests "v" for variable-width encoding still has reservations about 32-bit, all we need is 16-bit Muller prefer separate macros for char16 and UTF (guaranteed to be UTF) Simonsen wants more generic literals Norwegian comments (submitted to convener and will be sent with ballot) 1) good idea to support UTF-16 in C standard 2) other encodings should be supported in portable, generic way 3) cultural conventions registry should be included 4) should be possible to create character constants in different encodings 5) should be possible to determine whether or not something is really a constant or something intended for translation 6) WG20 should be consulted Norwegian comments mailed to WG14 reflector Nelson comments 3 and 4 suggests that there should be syntax for specifying meta information about strings Simonsen Norwegian comments do not suggest that implementations need to support lots of encodings Mak would implementations be required to support more than one encoding in the same compilation? Simonsen supporting more than one kind of string at the same time would be useful Plum by adding string prefix, we would make clear how other encodings could also be added as extensions C++ library group has been playing with how to extend string literals so that they could be user-defined encodings in Nobu's mid-preprocessor, it is important to know the context in which the string appears; if initializer, for instance, it must create static array with hex constants Nobu we MUST support UTF-16! SAP and C# people are demanding it Tydeman could use u followed by number and then pragmas to switch between 16-bit encodings Meyers it would be rare for people to need more than one encoding in the same compilation unit people don't complain about this now, codes that require different character sets are simply compiled separately not a big deal to support more than one encoding at the same time, it could be done with techniques similar to the locale-changing techniques we have now (setlocale) we could use #pragmas to establish "current" locale Plum people who are asking for this TR were hoping that it could be done quickly, Norway 2-5 suggest much more complex work we should consider possibility that if we add char32_t that it is defined to be ISO 10646 encoding people cannot use wchar_t for that purpose today because it isn't guaranteed to be 16 bits (and isn't on Microsoft) Simonsen Norway simply wants better generic support should that be a separate TR? Meyers making this char set independent would not necessarily be that much of a burden Benito if we decide we want to do another TR to address the API issues that would need to wait (we're small group) the way this new work item is written, we could write a very small TR to handle char issues and then a much larger TR to handle the API issues Meyers if we set good direction, implementations need not wait for the standardization process Simonsen Norway doesn't feel strongly about point 6, taking this to WG20 Nobu I18N API was rejected elsewhere and now it's simply been shifted to WG14 Wakker UTF-16 is the primary issue would like to see more flexible mechanism, that's a more difficult task and will take more time WG14 is not appropriate place to do I18N API work Kristoffersen favors a more generic approach UTF-16 is not stable Mak we should concentrate on data type and string literals, not API Plum I thought our charter says that we're not going to define APIs but we will look at and point to and describe APIs Benito our charter is not worded that way Simonsen Norway would be fine with quick TR to get UTF-16, then longer more complex TR to get generic support Meyers don't believe that it would be difficult to write the 16-bit support in way that would make Unicode happy small group was put together to review revised TR Plum, Wakker, Plauger, Mak, Meyers, Simonsen, Muller, Nobu Simonsen let's task small group to respond to Norwegian comments (disposition of comments document) Wakker there were also comments from Japan, when will those be addressed? Benito we are waiting from reply/clarification from Nota-san 15.1 Future Meetings see 1.9 above 15.1.1 Future Meeting Schedule 15.1.2 Future Agenda Items generate press release (embedded TR, UTF-16 stuff, etc.) Simonsen and Benito volunteered to draft Keaton and Wakker volunteered to review more volunteers would be welcome 15.1.3 Future Mailings post Curacao 5/17 pre Santa Cruz 9/20 no paper mailings 16.2 Resolutions 16.2.1 Review of Decisions Reached All straw votes are marked [straw vote]. 16.2.2 Formal Vote on Resolutions There was only one formal vote. The discussion and details are in section 12. Resolution: to instruct the WG14 convener to submit document WDTR 18037 as amended for SC22 PDTR ballot as quickly as possible after this meeting. WG14 (7 present) in favor 6 opposed 1 abstain 0 16.2.3 Review of Action Items all action items are marked [AI] all items that we should consider in a future revision of the Standard are marked [rev] 16.2.4 Thanks to Host 16.3 Other Business we should republish when we issue a new Standard (and update the Standard number) 17. Adjournment ___________________________________________________________________________ 2002-04-16 Embedded C minutes (day 1 of 3) Attendees: Walter Banks, Bytecraft Allan Frederiksen, Nokia Francis Glassborow, Independent Barry Hedquist, Perennial David Keaton, Independent Jan Kristoffersen, Ramtex P.J. Plauger, Dinkumware Tana Plauger, Dinkumware Tom Plum, Plum Hall Willem Wakker, ACE Topics to cover: Status of Document Fixed-Point Hardware I/O Address Spaces Annexes What to do with document after this meeting. Key: *A* means Action item; *D* means Decision. Status of Document Changes from Redmond are highlighted. Document Issues (fixed-point -- Willem): A number of editor's notes need to be addressed. Comments from Fred Tydeman (to be mentioned later). Introduction: Jan: Are all paragraphs necessary? *D* General agreement is to leave them in. 2.1.3 Rounding Strike editor's note. Changes will be allowed after the coming ballott. 2.1.4 Type Conversion/Usual Arithmetic Conversions Should the text go here or to the rationale? Willem feels its difference from usual C is important enough to put it here. Barry: TR rules are more flexible than rules for a standard w.r.t. main text vs. rationale. *D* General agreement to keep text here. Changed names of types (fixed --> _Fixed). *D* General agreement to keep this. 2.1.9 Formatted I/O What should the conversion specifiers be? R,r,Q,q were proposed to avoid three-letter specifiers. This makes format specifiers different from constant specifiers. Use of u in format specifiers could break existing programs that use %u, so it would have to be %qu and not %uq. *D* General agreement to use R,r,Q,q instead. Current proposal uses f style conversions as opposed to e or g. *D* General agreement that this is OK. 2.1.10 viewing a value as a different type 7.18.5.4 page 32 What guaranteed-large-enough integer type should be used? Bill Plauger: We should just have typedefs for integers of the right sizes. To print them, just cast them to intmax_t. David: intfract_t, intaccum_t, etc. *A* Willem will come up with a proposal. Does anyone need fixed-point to wide string conversion? Probably not. 2.2 Page 19 parallels specification of floating-point types. Should we try to accommodate anything other than 2's complement types? Nobody could think of a non-2's complement fixed-point machine. David: Fixed-point machines are recent, and recent machines are 2's complement. *D* General agreement to leave it 2's complement and let people complain if their ox is gored. First note on page 22 is to editor of C standard if this TR is eventually adopted into the standard. Second note -- pointer types? Check C standard to make sure this is right. Page 23 note on type qualifiers. This is the point to introduce saturated integers if we want them. *A* The group agreed in Redmond not to have saturated integers, so Willem will remove this note. Page 36 conversions. Recommended practice in C std. 7.21.3. Willem suggests striking editor's note. Return value of conversion functions on overflow? *D* General agreement to use saturated result. Document Issues (I/O HW -- Jan): John Hauser proposed a new method via e-mail. He was not able to attend this meeting. Jan presented John's proposal. New concept: base & index instead of addresses. John's proposal requires a memory instantiation for the access type because it passes pointers. 10-15% slower than macros. Speed was the reason for the I/O HW effort. Therefore, Jan suggests we not use John's proposal. This would also avoid delaying the document. Some processors cannot even use a memory instantiation because the I/O address is encoded in the instruction. Jan likes some aspects of John's proposal and suggests we encourage John to put it in a more final form for next meeting, addressing Jan's issues, for possible substitution instead of the existing wording in the annex. Meanwhile, work will proceed on the document as it is, without delaying it. Annex C, p. 64 Atomic operation -- not documented in normative part. Needs to be. *A* Jan will add normative words for atomic operation. (Done) Document Issues (Address Spaces -- Walter): Status: Walter's new document is not folded into the TR yet. There are various feelings about whether compatibility with existing C++ is required. Should #pragmas be used or should new keywords be introduced? *A* Walter will send out the latest draft of his proposal for the group's review. (Done) Walter: Address space extensions could save hardware. For example, currently extended address space on an 8051 requires external hardware. Compiler address space extensions including call, return, etc. could eliminate that hardware. There is little need to have operations other than read and write for data objects. The I/O HW proposal could be used to access regions of strange types of memory, but would be missing the ability to use the compiler's symbol table management, type system, alignment, etc. The requirements for adapting C++ to handle address spaces have not yet been examined. Worst case seen so far: Walter is working on a C compiler for a new processor that has 7 address spaces. *A* Willem, Jan, and Walter will flesh out a more detailed proposal for later this week. (Done) Annexes *A* David will continue the old action item of documenting the proposal that was rejected at Redmond for the rationale. Preferred by May 1, when Willem goes on vacation for 3.5 weeks. Fred Tydeman suggested rationale wording indicating that we considered BCD and decided not to do it. *A* Walter will write this up. *A* Tom Plum will talk to John Hauser about collaborating on a C++ compatibility header for annex G. *D* For the time being, annex G will be left as is in anticipation of further elaboration. 2002-04-17 Embedded C minutes (day 2 of 3) Willem distributed a revised TR with Walter's address space updates included. We collectively went through the result. There was some debate over the inclusion of "register" memory spaces in the TR. Some people had seen several cases where this would be needed; others wanted it moved to the annex. Walter will consider this further. Walter: Should we suggest naming conventions for memory spaces? Something to think about. Should #pragma be used or should another way be found? *D* General agreement that #pragma should be used as currently in the document. Allan noted that we agreed in Copenhagen that -1*-1 could yield FRACT_MAX, but this does not yet appear in section 2. Division of FRACT_MIN or ACCUM_MIN by -1 has the same problem. *A* Willem will add this. Willem asked an independent observer to review the document. One point that resulted was that sometimes one might want to multiply 100*0.5r and get an integer 50. The other was that it makes little sense to add and subtract integers with fracts. Willem hopes to send out the next document revision before May 1 to make it in time for the next plenary ballot. 2002-04-18 Embedded C minutes (day 3 of 3) Annex G, C++ compatibility Bill: The example can be done with templates without expanding all 1200 possbilities. --------------------------------------------------------------------------- Minutes for J11/U.S. TAG Meeting April 17, 2002 Randy Meyers, presiding 1) Debbie Donovan at INCITS would like the names of people work will work on the new TR for characters (Walls) Benito there's no advantage in US doing that let's just say US TAG and move on Walls if we had a lead individual that might be enough Meyers volunteers to lead 2) liaison request (Benito) Farance would like to see J11 have a formal liaison with INCITS M1 (biometrics), he would like to be that liaison [AI] Meyers: will contact Farance and appoint him as our informal liaison 3) Farance request for additional type (Benito) is there any interest in WG14 producing a TR on IEEE 1596.5? (Shared Data Formats for Optimized Scalable Coherent Interface) The rationale for wanting this work to start was sent by Benito after the meeting: "This work gives a user standardized datatypes (actually, typedefs) that address byte ordering (little, big), alignment (aligned or not), and representation (e.g., 2s complement, unsigned). Some of these types are LittleSignedByte (8-bits) AlignedLittleSignedDoublet (little-endian, 16-bits, aligned) AlignedLittleSignedQuadlet (little-endian, 32-bits, aligned) The reason for considering these kind of typedefs is: (1) they permit the precise layout, endianness, alignment, and representation on the common arithmetic types, (2) this certainty of layout is important for network-centric programming, processinging portable file formats, and shared-memory applications." Keaton if that group was in ISO we would eventually be required to bind to them in some way Meyers is there is a vendor that is actually going to do this? no support for adding this at this time 4) Adjournment 4:34 PM