SC22/WG20 N958 Problem statement for expressing DIN 5007 in 14651 Status: Expert contribution Author: Marc Wilhelm Küster Date: 2002-06-11 Action: For discussion Umlaut and trema DIN 5007, the long-established German standard for ordering, as well as current practice in German libraries distinguishes between two diacritics of very similar - in fact today mostly identical - appearance: - the umlaut - the trema Both diacritics are encoded in the UCS as U0308, the COMBINING DIAERESIS. However, in a number of traditional library coding schemes and software with a long history such as the *Tübingen System für TextVerarbeitungs Programme (TUSTEP)* these two are, distinguished in their encoding. Both diacritics have a very different roots and traditional German typography visually set both diacritics apart through the relative distance of the two dots and sometimes through their diameter. That distinction has, however, largely disappeared with the advent of PostScript and is now almost obsolete. Traditional ordering This analysis may sound like a plea for the encoding of two separate diacritics. This is not the case. A disunification for umlaut and trema would, for a variety of reasons, be undesirable. However, the current unification poses a problem is with German ordering, as DIN 5007 treats letters with an umlaut different from letters with trema. In the ordering of entities which are not names letters with umlaut come directly after the respective base letter whereas letters with trema follow after many of the remaining diacritics. Hence, you have a sequence of the type of a ä á if ä is an a with umlaut, but you get a á ä if the ä is an a with trema. This distinction is mandatory in DIN 5007. Both versions of ä would, from the point of view of the UCS, be encoded identically, namely as U00E4 or U0061 + U0308. Tailoring in 14651 can only be on the level of individual characters and character / diacritic combinations. For this reason, there is no way to express DIN 5007 as a profile of 14651. This is unfortunate and causes problems in German libraries, especially in large research libraries. Analogous problems exist for the ordering of names. Desired guidance The author would like guidance from WG20 on how to handle this problem within the 14651 framework. Such guidance could take the form of either: * a note in 14651 stating the best practice or * some other WG20 best practice document that can be readily referenced or * the resolution that WG20 has no views on this matter and leaves it entirely up to the national ordering standards to take provisions for this and similar cases. A technical recommendation for this a technical solution could work along the following lines: In order to maintain the difference between a letter with umlaut and the same letter with trema * Mark up the distinction between the two diacritics through a higher level protocol if this distinction is deemed necessary in a particular context * Decompose the string at least with regards to the ambiguous letters * Map the markup + combining trema combination to a character in the private use area and treat that character as a combining diacritic for ordering purposes * Tailor the template on that assumption. The author is open for any other suggestion. Ideally, such suggestions should be generic enough to be applicable in comparable cases such as may arise with regards to other cultural practices.