Re: Regarding the swedish locale and sorting rules

Författare: Christian Rose (menthos_at_gnu.org)
Datum: 2003-07-14 11:26:37

sön 2003-07-13 klockan 23.34 skrev Petter Reinholdtsen:
> Peter Karlsson suggested I asked on this mailing list about references
> and comments on the swedish locale.  I work with the locale content of
> glibc, trying to keep track of change requests and make sure the
> locales are as good as possible.  I have three issues I want to have
> your opinion on:
> 
>  - swedish locale specification
>  - clock format
>  - correct sorting order for V v W w
> 
> swidish locale specification
> ----------------------------
> 
> I've found the following specification, which appear to be
> authoritative for the swedish locale.  Is the content correct?  Is the
> organisation behind it a well known organisation for such things in
> sweeden?
> 
> <URL:http://std.dkuug.dk/cultreg/registrations/narrative/sv_SE,_1.0.html>

This is actually the first time I hear about ITS; I only knew about SIS
before. But they (ITS) seem pretty official to me, after reading their
web page and about their relationship with SIS and SSR.
The document cited above seems to be (mostly) correct. But some of the
content is questionable, as discussed below.


> clock format
> ------------
> A user of debian have requested a change in the swedish locale in
> glibc (sv_SE), changing time values from '13.49' to '13:49'.  Are you
> aware of this request?  Check out <URL:http://bugs.debian.org/111268>
> for the discussion and
> <URL:http://sources.redhat.com/ml/libc-alpha/2003-05/msg00155.html>
> for the patch.

In fact, both the "13.49" and "13:49" formats are common in Swedish
writing, and used interchangeably.

The "13.49" format is recommended by "Svenska skrivregler" (ISBN
9121112800, 1999), an authoritative book about rules for the writing of
the Swedish language. It not only recommends this format, it also
specifically recommends against using the other format (colon format),
as the colon is not needed to distinguish time from other decimal
numbers, as is in English, since comma is used as a decimal character in
Sweden.
This time format is also what you'll commonly find used in timetables
for trains, television etc.

The "13:49" format is an adoption of the European Standard EN 28601,
which in its turn is an adoption of the international ISO 8601 standard.
The Swedish counterpart of these standards is called SS-EN 28601, and as
a consequence of the adoption of the original international standard it
mandates the use of the colon format for time.

So there's a direct conflict between these recommendations. My personal
opinion, and this is probably at least a part of why the current Swedish
glibc locale also uses the period format, is that since the locale is
there mainly for user presentation and formatting, the main compelling
reasons for using the colon format, data interchageability, are less
relevant. Also, in my perception, the period format, being used in
official time tables and the like, has more of an official value to it.


> correct sorting order for V v W w
> ---------------------------------
> 
> What is the correct sorting order of the following lines when using
> the sv_SE locale?
> 
>   V
>   v
>   W
>   w
> 
> At the moment, it is sorted 'w v W v'.  Is this correct?  I was
> expecting it to sort like 'w W v V' or 'v V w W', and was a bit
> surprised to discover this.  Reading the specifications, it is a but
> hard to know how this should be handled.

The current glibc sorting is correct regarding this matter. It should
indeed be sorted "w v W v". The reason behind this can also be read in
"Svenska skrivregler". English translation follows:

        "The letter w is normally not present in the Swedish alphabet.
        It exists in some names in Swedish and foreign words, but is
        accounted for as a variant of 'v'. Words and names with 'w' are
        in Swedish ordered alphabetically among the words and names with
        'v'. If two words or names are only to be distinguished by 'v'
        or 'w', 'v' is placed before 'w'."

This sorting rule is also what's commonly used in libraries and phone
books etc.; you'll find names using "w" sorted together with those
having "v". This is also why we had the glibc collating for the Swedish
locale changed to accomodate for this, and I was actually expecting the
above quote to be present as a comment next to the collating rules in
the current Swedish locale specification, since it was added when the
w-v sorting was changed, IIRC.

The ITS document linked to above does actually list W as a part of the
Swedish alphabet, and given the above I'm not sure the document is
correct on that point.


Christian



> BTW: I'm not on this list, so please copy any comments to me.

Arkiv genererat av hypermail pre-2.1.8.