SC22/WG20 N949 From: Kenneth Whistler [kenw@sybase.com] Sent: Wednesday, May 29, 2002 7:33 PM Subject: (SC22WG20.3801) What are standards for? What standards need developing? Pat Hall ruminated: > I have been watching the discussions on this list, aiming to get some > feeling for the group and the differences in opinion that guide WG20 > decisions. I have seen a lot of very strongly expressed views where the > strength of expression seemed unrelated to the particular point being > debated, and was left wondering whether there was not some form of > inter-cultural communication problem here. > > On 25 Apr Ken Whistler seemed to put his finger on it, in a response to > Keld Simonsen: > << ... This is a clash of standards culture, between people and organizations that > have different ideas about how things should proceed and who should do what > kind of work. ... >> > > But in the debate that followed I really could not get any understanding of > the different standards cultures that Ken had in mind. There was a fierce > exchange about whether the US and larger companies were imposing their > interests, and whether national delegations were being unduly influenced in > their voting. But what were the principles at stake underlying the exchange? The difference in standards culture that I had in mind boils down more or less to the following: 1. The U.S. participants tend to view standards as voluntary specifications that facilitate commerce by setting agreements that allow products to work with each other. 2. The "European" participants tend to view standards as regulatory devices to further social agendas, and in the case of internationalization, as mechanisms for preservation of cultural identity. The European participants (and here I include Canada as an honorary European ;-) ) tend to look askance at the American orientation, viewing it as an excuse for freemarketeering globalization and domination by IT megacompanies associated with the U.S. primarily. The American participants tend to look askance at the European orientation, viewing it as an excuse for regulatory trade restrictions and for the substitution of bureaucratic requirements instead of good IT design driven by customer feedback and product success in the market. In my opinion, these differences in point of view underlie much of the intractable conflict regarding the particular work of WG20. > I have been reading TR 11017:1997 which I am told was intended to frame the > work of WG20, and also reading TR 10176:2001 which supposedly incorporated > advice concerning internationalisation to guide the programming language > committees of SC22. TR11017 in particular was disappointing for the errors > around linguistics and writing systems, though these are not important > here the key thing for me was Section 7 "Model and Services required for > Internationalisation" summarised in Figure 11. I was looking for such a > summary of the interfaces across which interoperation would take place and > which could be the subject of standardisation. I was also looking for > groupings of these that might guide us as to which SC within ISO particular > standards might properly be pursued. I regret to say that the model > presented was not much help, but maybe it could be made to be so. Here I am not at all surprised that TR 11017 fails as a guide to what needs standardization in the area of internationalization. The problem is that it is a very general conceptual model of the problems of internationalization, but it is not grounded in the actual architecture of software systems. Standardization might make sense if you can clearly identify interfaces across which systems need to interoperate. Internet protocols are excellent examples -- since you have distributed processing that needs well-defined protocols to work across the net. But internationalization issues tend not to surface in such well-defined protocol-like interfaces, and it isn't always clear that they should. One of the deep defects of the failed draft for an internationalization API standard (15435) was that it didn't see this complexity, and instead attempted to view "internationalization" as a monolithic functionality that you could design a (single) API for which could be standardized. > In thinking about such an architectural model, we must also bear in mind > that this assumes some related industrial model. The pieces of technology > on each side of the interface can be created independently. There has been > a continual debate between, maybe even battle between, the European Union > and the major IT multi-nationals, to get internal interfaces properly > specified so that companies in Europe ^and in the United States! This isn't merely a multinationals versus Europe thing, although it is characteristic of the differences in standardization culture (and general culture) that it should be construed as such. > could supply software to work to > these interfaces. Many multinationals have been really good about doing > this. But is it enough that these interfaces should be made public, should > they not also be stabalised in some way, perhaps through > standardisation? The discussion around ICU gave us an example of this. Yes -- that this position in itself is highly controversial. The UTC take on this issue is that in the area of internationalization the appropriate topics for standardization are character encoding per se, data (associated with characters and with other things -- including all kinds of cultural elements), and algorithms. But when it comes to API's, standardization inevitably tends to favor one platform over another or one architecture over another -- and there is little agreement or consensus that can be used to sort out the alternatives. This is why, for example, the UTC participants were in favor of standardizing a string ordering algorithm and big tailorable template table, but were adamantly opposed to including an API for collation in the same standard. The algorithm and the table are non-trivial, and it behooves everyone to depend on the standard and not reinvent things that will just lead to nasty interoperability problems. The API for collation, on the other hand, is fairly trivial, and it doesn't matter much whether there are one, two, or six of them -- as long as they instantiate the same algorithm and data tables. > Thus the voice of industrial concerns is very important, though the will of > large organisations must not dominate. But the history of standardisation > reassures us that this does not happen think of IBM and EBCDIC, Microsoft > versus Sun with Java. It isn't clear to me exactly what point you are making with these examples. > However we must take note of the unheard voices of > small companies, though these might find voice through regional and > national governments. Why? Why not through industrial consortia? Many small companies vote with their feet and join consortia precisely because they need technically appropriate standards in a timely manner, and because the international standards process, dominated by sometimes disconnected and unresponsive national standards organizations, is too risky, too slow, and sometimes leads to technically inferior standards. > But there are also the unheard voices of small > communities referred to in the recent discussion; my own concern here is > mostly with respect to the creation of appropriate writing systems, but > that is the business of SC2. Actually, the business of SC2 is the standardization of characters and of properties related to characters. SC2 is not concerned with the standardization of writing systems. But I agree that those issues are not directly related to WG20's charter. > > Where does this discussion take us? Well, firstly, maybe we do need to > discuss the principles underpinning our standardisation activities. Surely > this must have been debated many times, and even have been well documented, > but that does not mean that we should not ourselves discuss it and reach a > position of shared understanding, maybe even agreement. How about it, in > Tromso? I'm all for having discussion about basic principles. However, I am not sanguine about the likelihood of agreement on those principles. The gulf in goals is quite vast here. Recall that some of us are arguing that WG20 itself should just be shut down altogether, since by and large it is not producing useful standards, and what useful work it is engaged in could be done better elsewhere. Others are arguing that WG20's charter should be expanded, to elaborate an enormous vision of interlocking internationalization standards. > > Having agreed what standardisation is about, could we then proceed to pick > up the many issues raised in the email discussion? > > Is there a need for a standard for an internationalisation APIs of the > collection of services associated with locales? No. At least not at this time. > This is the domain of > 14652 currently subject to vote and the now rejected 15435 - APIs for > Internationalization. In the recent correspondence the ICU was suggested > as meeting this need, and Keld even posed the question whether this should > be standardised, to which Glen Seed give sensible caution that it would be > premature. Of course it would not be the ICU that is standardised, but the > set of interfaces. To which you must counterbalance the fact that the architects and maintainers of ICU have no desire whatsoever that their API be standardized by ISO. > To me this seems highly desirable, on the basis that > this would enable other software suppliers to produce conformant software > for locales that ICU has no interest in serving. But there is no point, whatsoever, in going through an ISO standardization process to accomplish this. The locale definitions for ICU are clearly documented. The data is available; the utilities to manipulate them are Open Source, as is the library itself. Anyone can take a particular version of the library and add on their own locales and add them to their product. This on top of the fact that ICU already has more locale support for such functionality as date-time string formatting and the like than anything else available -- already built in. What service would be provided by attempting to turn any aspect of this into an ISO standard, which by its very nature could not track the development and versioning of the ICU library itself? > > However I do have some concerns with ICU and its intimate association with > Java, C and C++. These are the same concerns that made me conflate 14652 > and 15435, much to Keld's surprise. You see, I would expect to see > interfaces defined at two levels of abstraction, a high level of > abstraction independent of any programming language that spells out the > essence of the interface, and the binding of the abstract services or > facilities to particular programming languages. This was, in fact, Keld's original plan for 15435, but it failed in the execution. And it is my opinion that such interface design *cannot* -- except for the most trivial of functionality -- straddle the full range of programming language distinctions. A procedural interface design is just a different animal than an object-oriented interface design. There are ways, of course, to wrap an object-oriented interface in a procedural API layer and vice versa, but you basically are making design compromises in both directions when you do so. The way I would approach such a problem in a real production context would be to develop an engineering requirements document that specified the required functionality at an abstract level, and then separate functional specifications that would spell out the details of the API design for procedural or object-oriented designs to implement the engineering requirements. But I don't think that methodology transfers very well to the arena of standards -- which just don't have the same flexibility for iterative development guided by actual software implementation. > > Should 14651 on string ordering be moved to SC2? Yes. > String ordering is > clearly contingent upon the community of use, the culture or locale, and > could even vary within a community. Which is why tailoring is provided for in the algorithm. That isn't an argument as to which WG in ISO is appropriate for dealing with the maintenance of the 14651 table. > Mostly we come across it in > dictionaries and directories, it is part of the presentation of information > to people, and does on that basis seem to be a matter for user interfaces > and SC35. Irrelevant. All manner of information is presented to people via computer user interfaces. But that is no argument that SC35 has any competence in the ordering of characters in Tibetan or Korean. > One small aspect of string ordering arises when there are > alternative orthographies and we are concerned with string > equivalence string equivalence is clearly a concern for SC2, and as such > ordering seems to be associated with SC2's work, though my own reaction > there is to suggest that SC2 should solve the problem of alternative > orthographies, not to remove them, but to characterise them independently > of ordering. It isn't exactly clear to me what you are talking about here. Do you mean string equivalence between orthographies, as for example comparing a string expressed in Cyrillic with the "same" string transliterated into a Latin romanization? Or Mongolian in the Uighur script versus Mongolian in the Cyrillic script? And so on? If such, those issues are clearly *outside* the scope of 14651 International String Ordering, anyway. And SC2/WG2 has expressed no interest whatsoever in getting involved in transliteration standards. (Or standardization of orthographies on their own terms, for that matter.) > I personally think that the approach being taken in W3C > Character Model to pursue a single canonical form could be mistaken. That is a string normalization issue that is completely orthogonal to considerations of string ordering. > So > there could be strong reasons for NOT moving 14651 to SC2. I have yet to hear any coherent ones. > Ordering can > also be used within the internal mechanics of computing systems, but I > would view that as an accidental rather than essential characteristic of > orderings. So where is the 'proper' home for 14651? SC2/WG2 > We need some further > principles to guide us in this. The U.S. and Ireland have both provided arguments for why the script expertise needed for understanding the ordering of additions to the 14651 template table is located in SC2/WG2 -- not in WG20. Nobody wants to change the *architecture* of string ordering in 14651. The maintenance consists of additions to the table. Who in WG20 understands how to order Limbu or Tagbanwa or Batak? The connections to the communities and experts who understand those things is through WG2, which has considered the proposals for encoding and which solicits information about ordering of scripts as part of the encoding process. > > I look forward with enthusiasm to our discussion in Tromso. > As do I. Regards, --Ken 6