gettext-0.9.1

From: Swedish GNU/LI List (sv_at_li.org)
Date: 1995-08-09 16:17:48

     ------
     List:     Swedish GNU/LI List
     Sender:   Ulrich Drepper <drepper@ipd.info.uni-karlsruhe.de>
     Subject:  gettext-0.9.1
     Date:     Wed, 09 Aug 1995 16:17:48 +0200
     ------

Hi folks,

This is not the announcement for yet another new official release but
I implemented something I would like to have more comments about.  We
discussed this change a lot in a small group but now I would like to
know what the real world thinks about this.

The changes eliminate a restriction many people feel uncomfortable
with.  It shows up in many ways.

1.  No way to cleanly have two or more dialects of the same language.
    E.g. norwegian users might want to have both of their two official
    dialect available but the official name
	no_NO.ISO-8859-1
    leaves no room for the distinction.

2.  Related to the above.  Often the messages of a dialect (or
    subclass) of a language is only different in some messages.  Most
    are often shared.  So it does not make sense to have the translations
    repeated in the more specific catalog.
    (This is what I call message inheritence.)

3.  Again related.  When there are no differences in the message catalogs
    for, say de_CH.ISO-8859-1 and de_DE.ISO-8859-1, why should I have two
    catalogs installed?  Both should default to a catalog with the simple
    name `de' or `de.ISO-8859-1'.

4.  The default language is always English.  What I mean here is that
    when the catalog for the needed domain is not available no translations
    will be made and the program speaks English to you (at least inside
    the GNU project this will always be the case).

    I was told that many people in Sweden talk better German than English
    and so would like to have a specification like this:

	If the catalog is available in Swedish, I'll use it.  If not
	try to find a German catalog.  If even this is not available
	fall back on the default.

5.  The cryptic names like de_CH.ISO-8859-1 are not user friendly.
    Instead we just want to say something like `german' or `french'
    to select the right language.


Of course all this extensions must not conflict with the POSIX standards.


The solution we choose is as follows (and it is implemented in
gettext-0.9.1):

We have a new environment variable `LANGUAGE' which contains a colon
separated list of locale specification of the form

	language[_territory[.codeset]][@modifier]

This syntax comes from the X/Open Portability guide, volume 3.  The
POSIX standards does not say anything about the form of the values for
LC_ALL, LC_MESSAGES, and LANG so we can use this form for these
variables, too.

To determine the locale the following list of ordered possibilities
applys:

         Prio   GNU extension       POSIX             value type
         -------------------------------------------------------

         ^      LANGUAGE                              list
         |
         |      LC_ALL              LC_ALL            single
         |
         |      LC_MESSAGES         LC_MESSAGES       single
         |
         |      LANG                LANG              single


You see the full POSIX behaviour is conserved.  Only when the
environment variable LANGUAGE is defined the new behaviour is
selected.

XPG3 does not say what the modifier is used for (only gives a vague
example) so we are free to use it here.  In my proposals I use this to
overcome the problem #1 above.  In Norway you can you

	no_NO.ISO-8859-1@bokmal
or
	no_NO.ISO-8859-1@nynorsk

(please forgive me when this is not written correct).


The second and third problem can be overcome using the structure of
the locale names.  The name no_NO.ISO-8859-1@bokmal can be exploded
into four parts (somewhat like X.400 :-):

	language	= no
	territory	= NO
	codeset		= ISO-8859-1
	modifier	= bokmal

If we now look for the messages catalog for the locale

	no_NO.ISO-8859-1@bokmal

and this is not found, we go on by examining if any of

	no_NO@bokmal
	no.ISO-8859-1@bokmal
	no@bokmal
	no_NO.ISO-8859-1
	no_NO
	no.ISO-8859-1
	no

is found (in this order).  If even the last catalog is not found we go
on by examinig the next entry in the value of LANGUAGE.  Remember:
this is a list on colon separated entries.  Now the swedish user
mentioned above could have LANGUAGE set to the value

	sv_SE.ISO-8859-1:de_DE.ISO-8859-1

and he would get what the informal specification above tells.  Please
note that this process works on a per-language basis.  It seems not to
be reasonable to switch the language when a single message is not
contained in a catalog.  Once the language is chose (as to the first
found in the list) this remains to be used.

But point #3 above asks for some inheritence on catalogs of the same
language.  This is also implemented but as said only the less specific
variants of the currently use catalogs are examined.  Example:

A message is not translated in the catalog for locale

	de_DE.ISO-8859-1

Now instead of returning immediately the untranslated message the
function tries to locate the catalogs for

	de_DE
	de.ISO-8859-1
	de

in this order and examines whether this contain the string in
question.  Remember the example mentioned above: Most strings have a
common translation (possibly located in de.ISO-8859-1).  But some are
special for the swiss locale

	de_CH.ISO-8859-1

Using this mechnism only the message in question has to be contained
in the later catalog.


Now to point #5.  This problem was already solved in the X Window
System and so I reused the method.  A simple "data base" maps locale
names to locale names.  (Commonly this file is found as
	/usr/lib/X11/locale/locale.alias
in system using X).  When this file now contains a line like

	french		fr_FR.ISO-8859-1

we could set LANGUAGE to `french'.

****

As said all this is implemented in gettext-0.9.1.  You can find this
on the alpha server of the GNU projects (those who know this know
where to look) or else on
	i44ftp.info.uni-karlsruhe.de:/pub/gnu

It is not necessary to report warnings for this version because this
is an alpha version, not very much tested or cleaned up.  Of course I
would like to hear about compilation errors.

The path for the alias file is by now simply hardcoded in the Makefile.
Please change it four your X installation.  I'm also looking forward
for porposals how to make this portable.


And now to my final wish.  Please let me know what you think.  I need
some facts when I have to go into the final discussion with the GNU
representatives about this things.  Even saying `I like this', `What in
hell should this be good for' could help.  A comment is even better...


When there is some interest in discussing this things we could change
to the gnu newsgroup.

Thanks for reading,

-- Uli
________---------------------------------------------------------------
\      / Ulrich Drepper / Univ. at Karlsruhe, Germany / CS Dept. / IPD
L\inux/  email: drepper@gnu.ai.mit.edu          smail: Rubensstr. 5
  \  /          drepper@ipd.info.uni-karlsruhe.de      76149 Karlsruhe
   \/1.3.16 ------------------------------------------ Germany --------

Arkiv genererat av hypermail 2.1.1.