mirror of git://sourceware.org/git/glibc.git
parent
964328be73
commit
b8a46c1d5a
|
|
@ -28,6 +28,7 @@
|
||||||
* intl/po2test.sed: New file.
|
* intl/po2test.sed: New file.
|
||||||
* intl/tst-gettext.c: New file.
|
* intl/tst-gettext.c: New file.
|
||||||
* intl/tst-gettext.sh: New file.
|
* intl/tst-gettext.sh: New file.
|
||||||
|
* manual/message.texi: Document new interfaces.
|
||||||
|
|
||||||
* intl/gettext.c: Call __dcgettext directly.
|
* intl/gettext.c: Call __dcgettext directly.
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -226,7 +226,7 @@ When an error occured the global variable @var{errno} is set to
|
||||||
@item EBADF
|
@item EBADF
|
||||||
The catalog does not exist.
|
The catalog does not exist.
|
||||||
@item ENOMSG
|
@item ENOMSG
|
||||||
The set/message ttuple does not name an existing element in the
|
The set/message tuple does not name an existing element in the
|
||||||
message catalog.
|
message catalog.
|
||||||
@end table
|
@end table
|
||||||
|
|
||||||
|
|
@ -470,7 +470,7 @@ This is the interface defined in the X/Open standard. If no
|
||||||
@var{Input-File} parameter is given input will be read from standard
|
@var{Input-File} parameter is given input will be read from standard
|
||||||
input. Multiple input files will be read as if they are concatenated.
|
input. Multiple input files will be read as if they are concatenated.
|
||||||
If @var{Output-File} is also missing, the output will be written to
|
If @var{Output-File} is also missing, the output will be written to
|
||||||
standard output. To provide the interface one is used from other
|
standard output. To provide the interface one is used to from other
|
||||||
programs a second interface is provided.
|
programs a second interface is provided.
|
||||||
|
|
||||||
@smallexample
|
@smallexample
|
||||||
|
|
@ -604,10 +604,10 @@ gencat -H ex.h -o ex.cat ex.msg
|
||||||
This generates a header file with the following content:
|
This generates a header file with the following content:
|
||||||
|
|
||||||
@smallexample
|
@smallexample
|
||||||
#define SetTwoSet 0x2 /* u.msg:8 */
|
#define SetTwoSet 0x2 /* ex.msg:8 */
|
||||||
|
|
||||||
#define SetOneSet 0x1 /* u.msg:4 */
|
#define SetOneSet 0x1 /* ex.msg:4 */
|
||||||
#define SetOnetwo 0x2 /* u.msg:6 */
|
#define SetOnetwo 0x2 /* ex.msg:6 */
|
||||||
@end smallexample
|
@end smallexample
|
||||||
|
|
||||||
As can be seen the various symbols given in the source file are mangled
|
As can be seen the various symbols given in the source file are mangled
|
||||||
|
|
@ -768,6 +768,8 @@ categories:
|
||||||
@menu
|
@menu
|
||||||
* Translation with gettext:: What has to be done to translate a message.
|
* Translation with gettext:: What has to be done to translate a message.
|
||||||
* Locating gettext catalog:: How to determine which catalog to be used.
|
* Locating gettext catalog:: How to determine which catalog to be used.
|
||||||
|
* Advanced gettext functions:: Additional functions for more complicated
|
||||||
|
situations.
|
||||||
* Using gettextized software:: The possibilities of the user to influence
|
* Using gettextized software:: The possibilities of the user to influence
|
||||||
the way @code{gettext} works.
|
the way @code{gettext} works.
|
||||||
@end menu
|
@end menu
|
||||||
|
|
@ -800,6 +802,8 @@ the @file{libintl.h} header file. On systems where these functions are
|
||||||
not part of the C library they can be found in a separate library named
|
not part of the C library they can be found in a separate library named
|
||||||
@file{libintl.a} (or accordingly different for shared libraries).
|
@file{libintl.a} (or accordingly different for shared libraries).
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
@deftypefun {char *} gettext (const char *@var{msgid})
|
@deftypefun {char *} gettext (const char *@var{msgid})
|
||||||
The @code{gettext} function searches the currently selected message
|
The @code{gettext} function searches the currently selected message
|
||||||
catalogs for a string which is equal to @var{msgid}. If there is such a
|
catalogs for a string which is equal to @var{msgid}. If there is such a
|
||||||
|
|
@ -845,6 +849,8 @@ uses the @code{gettext} functions but since it must not depend on a
|
||||||
currently selected default message catalog it must specify all ambiguous
|
currently selected default message catalog it must specify all ambiguous
|
||||||
information.
|
information.
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
|
@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
|
||||||
The @code{dgettext} functions acts just like the @code{gettext}
|
The @code{dgettext} functions acts just like the @code{gettext}
|
||||||
function. It only takes an additional first argument @var{domainname}
|
function. It only takes an additional first argument @var{domainname}
|
||||||
|
|
@ -857,6 +863,8 @@ As for @code{gettext} the return value type is @code{char *} which is an
|
||||||
anachronism. The returned string must never be modified.
|
anachronism. The returned string must never be modified.
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
|
@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
|
||||||
The @code{dcgettext} adds another argument to those which
|
The @code{dcgettext} adds another argument to those which
|
||||||
@code{dgettext} takes. This argument @var{category} specifies the last
|
@code{dgettext} takes. This argument @var{category} specifies the last
|
||||||
|
|
@ -990,6 +998,8 @@ domain named @code{foo}. The important point is that at any time
|
||||||
exactly one domain is active. This is controlled with the following
|
exactly one domain is active. This is controlled with the following
|
||||||
function.
|
function.
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
@deftypefun {char *} textdomain (const char *@var{domainname})
|
@deftypefun {char *} textdomain (const char *@var{domainname})
|
||||||
The @code{textdomain} function sets the default domain, which is used in
|
The @code{textdomain} function sets the default domain, which is used in
|
||||||
all future @code{gettext} calls, to @var{domainname}. Please note that
|
all future @code{gettext} calls, to @var{domainname}. Please note that
|
||||||
|
|
@ -1019,6 +1029,8 @@ This possibility is questionable to use since the domain @code{messages}
|
||||||
really never should be used.
|
really never should be used.
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
|
@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
|
||||||
The @code{bindtextdomain} function can be used to specify the directory
|
The @code{bindtextdomain} function can be used to specify the directory
|
||||||
which contains the message catalogs for domain @var{domainname} for the
|
which contains the message catalogs for domain @var{domainname} for the
|
||||||
|
|
@ -1056,6 +1068,298 @@ variable @var{errno} is set accordingly.
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
|
|
||||||
|
@node Advanced gettext functions
|
||||||
|
@subsubsection Additional functions for more complicated situations
|
||||||
|
|
||||||
|
The functions of the @code{gettext} family described so far (and all the
|
||||||
|
@code{catgets} functions as well) have one problem in the real world
|
||||||
|
which have been neglected completely in all existing approaches. What
|
||||||
|
is meant here is the handling of plural forms.
|
||||||
|
|
||||||
|
Looking through Unix source code before the time anybody thought about
|
||||||
|
internationalization (and, sadly, even afterwards) one can often find
|
||||||
|
code similar to the following:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
printf ("%d file%s deleted", n, n == 1 ? "" : "s");
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
After the first complains from people internationalizing the code people
|
||||||
|
either completely avoided formulations like this or used strings like
|
||||||
|
@code{"file(s)"}. Both look unnatural and should be avoided. First
|
||||||
|
tries to solve the problem correctly looked like this:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
if (n == 1)
|
||||||
|
printf ("%d file deleted", n);
|
||||||
|
else
|
||||||
|
printf ("%d files deleted", n);
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
But this does not solve the problem. It helps languages where the
|
||||||
|
plural form of a noun is not simply constructed by adding an `s' but
|
||||||
|
that is all. Once again people fell into the trap of believing the
|
||||||
|
rules their language is using are universal. But the handling of plural
|
||||||
|
forms differs widely between the language families. There are two
|
||||||
|
things we can differ between (and even inside language families);
|
||||||
|
|
||||||
|
@itemize @bullet
|
||||||
|
@item
|
||||||
|
The form how plural forms are build differs. This is a problem with
|
||||||
|
language which have many irregularities. German, for instance, is a
|
||||||
|
drastic case. Though English and German are part of the same language
|
||||||
|
family (Germanic), the almost regular forming of plural noun forms
|
||||||
|
(appending an `s') is ardly found in German.
|
||||||
|
|
||||||
|
@item
|
||||||
|
The number of plural forms differ. This is somewhat surprising for
|
||||||
|
those who only have experiences with Romanic and Germanic languages
|
||||||
|
since here the number is the same (there are two).
|
||||||
|
|
||||||
|
But other language families have only one form or many forms. More
|
||||||
|
information on this in an extra section.
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
The consequence of this is that application writers should not try to
|
||||||
|
solve the problem in their code. This would be localization since it is
|
||||||
|
only usable for certain, hardcoded language environments. Instead the
|
||||||
|
extended @code{gettext} interface should be used.
|
||||||
|
|
||||||
|
These extra functions are taking instead of the one key string two
|
||||||
|
strings and an numerical argument. The idea behind this is that using
|
||||||
|
the numerical argument and the first string as a key, the implementation
|
||||||
|
can select using rules specified by the translator the right plural
|
||||||
|
form. The two string arguments then will be used to provide a return
|
||||||
|
value in case no message catalog is found (similar to the normal
|
||||||
|
@code{gettext} behaviour). In this case the rules for Germanic language
|
||||||
|
is used and it is assumed that the first string argument is the singular
|
||||||
|
form, the second the plural form.
|
||||||
|
|
||||||
|
This has the consequence that programs without language catalogs can
|
||||||
|
display the correct strings only if the program itself is written using
|
||||||
|
a Germanic language. This is a limitation but since the GNU C library
|
||||||
|
(as well as the GNU @code{gettext} package) are written as part of the
|
||||||
|
GNU package and the coding standards for the GNU project require program
|
||||||
|
being written in English, this solution nevertheless fulfills its
|
||||||
|
purpose.
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
|
@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
|
||||||
|
The @code{ngettext} function is similar to the @code{gettext} function
|
||||||
|
as it finds the message catalogs in the same way. But it takes two
|
||||||
|
extra arguments. The @var{msgid1} parameter must contain the singular
|
||||||
|
form of the string to be converted. It is also used as the key for the
|
||||||
|
search in the catalog. The @var{msgid2} parameter is the plural form.
|
||||||
|
The parameter @var{n} is used to determine the plural form. If no
|
||||||
|
message catalog is found @var{msgid1} is returned if @code{n == 1},
|
||||||
|
otherwise @code{msgid2}.
|
||||||
|
|
||||||
|
An example for the us of this function is:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
printf (ngettext ("%d file removed", "%d files removed", n), n);
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
Please note that the numeric value @var{n} has to be passed to the
|
||||||
|
@code{printf} function as well. It is not sufficient to pass it only to
|
||||||
|
@code{ngettext}.
|
||||||
|
@end deftypefun
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
|
@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
|
||||||
|
The @code{dngettext} is similar to the @code{dgettext} function in the
|
||||||
|
way the message catalog is selected. The difference is that it takes
|
||||||
|
two extra parameter to provide the correct plural form. These two
|
||||||
|
parameters are handled in the same way @code{ngettext} handles them.
|
||||||
|
@end deftypefun
|
||||||
|
|
||||||
|
@comment libintl.h
|
||||||
|
@comment GNU
|
||||||
|
@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
|
||||||
|
The @code{dcngettext} is similar to the @code{dcgettext} function in the
|
||||||
|
way the message catalog is selected. The difference is that it takes
|
||||||
|
two extra parameter to provide the correct plural form. These two
|
||||||
|
parameters are handled in the same way @code{ngettext} handles them.
|
||||||
|
@end deftypefun
|
||||||
|
|
||||||
|
@subsubheading The problem of plural forms
|
||||||
|
|
||||||
|
A description of the problem can be found at the beginning of the last
|
||||||
|
section. Now there is the question how to solve it. Without the input
|
||||||
|
of linguists (which was not available) it was not possible to determine
|
||||||
|
whether there are only a few different forms in which plural forms are
|
||||||
|
formed or whether the number can increase with every new supported
|
||||||
|
language.
|
||||||
|
|
||||||
|
Therefore the solution implemented is to allow the translator to specify
|
||||||
|
the rules of how to select the plural form. Since the formula varies
|
||||||
|
with every language this is the only viable solution except for
|
||||||
|
harcoding the information in the code (which still would require the
|
||||||
|
possibility of extensionsto not prevent the use of new languages). The
|
||||||
|
details are explained in the GNU @code{gettext} manual. Here only a a
|
||||||
|
bit of information is provided.
|
||||||
|
|
||||||
|
The information about the plural form selection has to be stored in the
|
||||||
|
header entry (the one with the empty (@code{msgid} string). There shoud
|
||||||
|
be something like:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=2; plural=n == 1 ? 0 : 1
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
The @code{nplurals} value must be a decimal number which specifies how
|
||||||
|
many different plural forms exist for this language. The string
|
||||||
|
following @code{plural} is an expression which is using the C language
|
||||||
|
syntax. Exceptions are that no negative number are allowed, numbers
|
||||||
|
must be decimal, and the only variable allowed is @code{n}. This
|
||||||
|
expression will be evaluated whenever one of the functions
|
||||||
|
@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called. The
|
||||||
|
numeric value passed to these functions is then substituted for all uses
|
||||||
|
of the variable @code{n} in the expression. The resulting value then
|
||||||
|
must be greater or equal to zero and smaller than the value given as the
|
||||||
|
value of @code{nplurals}.
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
The following rules are known at this point. The language with families
|
||||||
|
are listed. But this does not necessarily mean the information can be
|
||||||
|
generalized for the whole family (as can be easily seen in the table
|
||||||
|
below).@footnote{Additions are welcome. Send appropriate information to
|
||||||
|
@email{bug-glibc-manual@@gnu.org}.}
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Only one form:
|
||||||
|
Some languages only require one single form. There is no distinction
|
||||||
|
between the singular and plural form. And appropriate header entry
|
||||||
|
would look like this:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=1; plural=0
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
Languages with this property include:
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Finno-Ugric family
|
||||||
|
Hungarian
|
||||||
|
@item Asian family
|
||||||
|
Japanese
|
||||||
|
@item Turkic/Altaic family
|
||||||
|
Turkish
|
||||||
|
@end table
|
||||||
|
|
||||||
|
@item Two forms, singular used for one only
|
||||||
|
This is the form used in most existing programs sine it is what English
|
||||||
|
is using. A header entry would look like this:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=2; plural=n != 1
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
(Note: this uses the feature of C expressions that boolean expressions
|
||||||
|
have to value zero or one.)
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
Languages with this property include:
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Germanic family
|
||||||
|
Danish, Dutch, English, German, Norwegian, Swedish
|
||||||
|
@item Finno-Ugric family
|
||||||
|
Finnish
|
||||||
|
@item Latin/Greek family
|
||||||
|
Greek
|
||||||
|
@item Semitic family
|
||||||
|
Hebrew
|
||||||
|
@item Romance family
|
||||||
|
Italian, Spanish
|
||||||
|
@item Artificial
|
||||||
|
Esperanto
|
||||||
|
@end table
|
||||||
|
|
||||||
|
@item Two forms, singular used for zero and one
|
||||||
|
Exceptional case in the language family. The header entry would be:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=2; plural=n>1
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
Languages with this property include:
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Romanic family
|
||||||
|
French
|
||||||
|
@end table
|
||||||
|
|
||||||
|
@item Three forms, special cases for one and two
|
||||||
|
The header entry would be:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
Languages with this property include:
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Celtic
|
||||||
|
Gaeilge
|
||||||
|
@end table
|
||||||
|
|
||||||
|
@item Three forms, special case for one and all numbers ending in 2, 3, or 4
|
||||||
|
The header entry would look like this:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=3; plural=n==1 ? 0 : n%10>=2 && n%10<=4 ? 1 : 2
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
Languages with this property include:
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Slavic family
|
||||||
|
Russian
|
||||||
|
@end table
|
||||||
|
|
||||||
|
@item Three forms, special case for one and some numbers ending in 2, 3, or 4
|
||||||
|
The header entry would look like this:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=3; plural=n==1 ? 0 : \
|
||||||
|
n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
(Continuation in the next line is possible.)
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
Languages with this property include:
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Slavic family
|
||||||
|
Polish
|
||||||
|
@end table
|
||||||
|
|
||||||
|
@item Four forms, special case for one and all numbers ending in 2, 3, or 4
|
||||||
|
The header entry would look like this:
|
||||||
|
|
||||||
|
@smallexample
|
||||||
|
nplurals=4; plural=n==1 ? 0 : n%10==2 ? 1 : n==3 || n+=4 ? 2 : 3
|
||||||
|
@end smallexample
|
||||||
|
|
||||||
|
@noindent
|
||||||
|
Languages with this property include:
|
||||||
|
|
||||||
|
@table @asis
|
||||||
|
@item Slavic family
|
||||||
|
Slovenian
|
||||||
|
@end table
|
||||||
|
@end table
|
||||||
|
|
||||||
|
|
||||||
@node Using gettextized software
|
@node Using gettextized software
|
||||||
@subsubsection User influence on @code{gettext}
|
@subsubsection User influence on @code{gettext}
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue