Update.

* manual/message.texi: Document new interfaces.
2000-01-22 09:20:14 +00:00 · 2000-01-22 09:20:14 +00:00 · b8a46c1d5a
parent 964328be73
commit b8a46c1d5a
2 changed files with 310 additions and 5 deletions
--- a/1
+++ b/1
@ -28,6 +28,7 @@
 	* intl/po2test.sed: New file.
 	* intl/tst-gettext.c: New file.
 	* intl/tst-gettext.sh: New file.
 	* manual/message.texi: Document new interfaces.
 	* intl/gettext.c: Call __dcgettext directly.
--- a/manual/message.texi
+++ b/manual/message.texi
@ -226,7 +226,7 @@ When an error occured the global variable @var{errno} is set to
@item EBADF
 The catalog does not exist.
@item ENOMSG
-The set/message ttuple does not name an existing element in the
+The set/message tuple does not name an existing element in the
 message catalog.
@end table
@ -470,7 +470,7 @@ This is the interface defined in the X/Open standard.  If no
@var{Input-File} parameter is given input will be read from standard
 input.  Multiple input files will be read as if they are concatenated.
 If @var{Output-File} is also missing, the output will be written to
-standard output.  To provide the interface one is used from other
+standard output.  To provide the interface one is used to from other
 programs a second interface is provided.
@smallexample
@ -604,10 +604,10 @@ gencat -H ex.h -o ex.cat ex.msg
 This generates a header file with the following content:
@smallexample
-#define SetTwoSet 0x2   /* u.msg:8 */
+#define SetTwoSet 0x2   /* ex.msg:8 */
-#define SetOneSet 0x1   /* u.msg:4 */
+#define SetOneSet 0x1   /* ex.msg:4 */
-#define SetOnetwo 0x2   /* u.msg:6 */
+#define SetOnetwo 0x2   /* ex.msg:6 */
@end smallexample
 As can be seen the various symbols given in the source file are mangled
@ -768,6 +768,8 @@ categories:
@menu
 * Translation with gettext::    What has to be done to translate a message.
 * Locating gettext catalog::    How to determine which catalog to be used.
 * Advanced gettext functions::  Additional functions for more complicated
                                 situations.
 * Using gettextized software::  The possibilities of the user to influence
                                 the way @code{gettext} works.
@end menu
@ -800,6 +802,8 @@ the @file{libintl.h} header file.  On systems where these functions are
 not part of the C library they can be found in a separate library named
@file{libintl.a} (or accordingly different for shared libraries).
@comment libintl.h
@comment GNU
@deftypefun {char *} gettext (const char *@var{msgid})
 The @code{gettext} function searches the currently selected message
 catalogs for a string which is equal to @var{msgid}.  If there is such a
@ -845,6 +849,8 @@ uses the @code{gettext} functions but since it must not depend on a
 currently selected default message catalog it must specify all ambiguous
 information.
@comment libintl.h
@comment GNU
@deftypefun {char *} dgettext (const char *@var{domainname}, const char *@var{msgid})
 The @code{dgettext} functions acts just like the @code{gettext}
 function.  It only takes an additional first argument @var{domainname}
@ -857,6 +863,8 @@ As for @code{gettext} the return value type is @code{char *} which is an
 anachronism.  The returned string must never be modified.
@end deftypefun
@comment libintl.h
@comment GNU
@deftypefun {char *} dcgettext (const char *@var{domainname}, const char *@var{msgid}, int @var{category})
 The @code{dcgettext} adds another argument to those which
@code{dgettext} takes.  This argument @var{category} specifies the last
@ -990,6 +998,8 @@ domain named @code{foo}.  The important point is that at any time
 exactly one domain is active.  This is controlled with the following
 function.
@comment libintl.h
@comment GNU
@deftypefun {char *} textdomain (const char *@var{domainname})
 The @code{textdomain} function sets the default domain, which is used in
 all future @code{gettext} calls, to @var{domainname}.  Please note that
@ -1019,6 +1029,8 @@ This possibility is questionable to use since the domain @code{messages}
 really never should be used.
@end deftypefun
@comment libintl.h
@comment GNU
@deftypefun {char *} bindtextdomain (const char *@var{domainname}, const char *@var{dirname})
 The @code{bindtextdomain} function can be used to specify the directory
 which contains the message catalogs for domain @var{domainname} for the
@ -1056,6 +1068,298 @@ variable @var{errno} is set accordingly.
@end deftypefun
@node Advanced gettext functions
@subsubsection Additional functions for more complicated situations
 The functions of the @code{gettext} family described so far (and all the
@code{catgets} functions as well) have one problem in the real world
 which have been neglected completely in all existing approaches.  What
 is meant here is the handling of plural forms.
 Looking through Unix source code before the time anybody thought about
 internationalization (and, sadly, even afterwards) one can often find
 code similar to the following:
@smallexample
   printf ("%d file%s deleted", n, n == 1 ? "" : "s");
@end smallexample
@noindent
 After the first complains from people internationalizing the code people
 either completely avoided formulations like this or used strings like
@code{"file(s)"}.  Both look unnatural and should be avoided.  First
 tries to solve the problem correctly looked like this:
@smallexample
   if (n == 1)
     printf ("%d file deleted", n);
   else
     printf ("%d files deleted", n);
@end smallexample
 But this does not solve the problem.  It helps languages where the
 plural form of a noun is not simply constructed by adding an `s' but
 that is all.  Once again people fell into the trap of believing the
 rules their language is using are universal.  But the handling of plural
 forms differs widely between the language families.  There are two
 things we can differ between (and even inside language families);
@itemize @bullet
@item
 The form how plural forms are build differs.  This is a problem with
 language which have many irregularities.  German, for instance, is a
 drastic case.  Though English and German are part of the same language
 family (Germanic), the almost regular forming of plural noun forms
 (appending an `s') is ardly found in German.
@item
 The number of plural forms differ.  This is somewhat surprising for
 those who only have experiences with Romanic and Germanic languages
 since here the number is the same (there are two).
 But other language families have only one form or many forms.  More
 information on this in an extra section.
@end itemize
 The consequence of this is that application writers should not try to
 solve the problem in their code.  This would be localization since it is
 only usable for certain, hardcoded language environments.  Instead the
 extended @code{gettext} interface should be used.
 These extra functions are taking instead of the one key string two
 strings and an numerical argument.  The idea behind this is that using
 the numerical argument and the first string as a key, the implementation
 can select using rules specified by the translator the right plural
 form.  The two string arguments then will be used to provide a return
 value in case no message catalog is found (similar to the normal
@code{gettext} behaviour).  In this case the rules for Germanic language
 is used and it is assumed that the first string argument is the singular
 form, the second the plural form.
 This has the consequence that programs without language catalogs can
 display the correct strings only if the program itself is written using
 a Germanic language.  This is a limitation but since the GNU C library
 (as well as the GNU @code{gettext} package) are written as part of the
 GNU package and the coding standards for the GNU project require program
 being written in English, this solution nevertheless fulfills its
 purpose.
@comment libintl.h
@comment GNU
@deftypefun {char *} ngettext (const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
 The @code{ngettext} function is similar to the @code{gettext} function
 as it finds the message catalogs in the same way.  But it takes two
 extra arguments.  The @var{msgid1} parameter must contain the singular
 form of the string to be converted.  It is also used as the key for the
 search in the catalog.  The @var{msgid2} parameter is the plural form.
 The parameter @var{n} is used to determine the plural form.  If no
 message catalog is found @var{msgid1} is returned if @code{n == 1},
 otherwise @code{msgid2}.
 An example for the us of this function is:
@smallexample
  printf (ngettext ("%d file removed", "%d files removed", n), n);
@end smallexample
 Please note that the numeric value @var{n} has to be passed to the
@code{printf} function as well.  It is not sufficient to pass it only to
@code{ngettext}.
@end deftypefun
@comment libintl.h
@comment GNU
@deftypefun {char *} dngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n})
 The @code{dngettext} is similar to the @code{dgettext} function in the
 way the message catalog is selected.  The difference is that it takes
 two extra parameter to provide the correct plural form.  These two
 parameters are handled in the same way @code{ngettext} handles them.
@end deftypefun
@comment libintl.h
@comment GNU
@deftypefun {char *} dcngettext (const char *@var{domain}, const char *@var{msgid1}, const char *@var{msgid2}, unsigned long int @var{n}, int @var{category})
 The @code{dcngettext} is similar to the @code{dcgettext} function in the
 way the message catalog is selected.  The difference is that it takes
 two extra parameter to provide the correct plural form.  These two
 parameters are handled in the same way @code{ngettext} handles them.
@end deftypefun
@subsubheading The problem of plural forms
 A description of the problem can be found at the beginning of the last
 section.  Now there is the question how to solve it.  Without the input
 of linguists (which was not available) it was not possible to determine
 whether there are only a few different forms in which plural forms are
 formed or whether the number can increase with every new supported
 language.
 Therefore the solution implemented is to allow the translator to specify
 the rules of how to select the plural form.  Since the formula varies
 with every language this is the only viable solution except for
 harcoding the information in the code (which still would require the
 possibility of extensionsto not prevent the use of new languages).  The
 details are explained in the GNU @code{gettext} manual.  Here only a a
 bit of information is provided.
 The information about the plural form selection has to be stored in the
 header entry (the one with the empty (@code{msgid} string).  There shoud
 be something like:
@smallexample
  nplurals=2; plural=n == 1 ? 0 : 1
@end smallexample
 The @code{nplurals} value must be a decimal number which specifies how
 many different plural forms exist for this language.  The string
 following @code{plural} is an expression which is using the C language
 syntax.  Exceptions are that no negative number are allowed, numbers
 must be decimal, and the only variable allowed is @code{n}.  This
 expression will be evaluated whenever one of the functions
@code{ngettext}, @code{dngettext}, or @code{dcngettext} is called.  The
 numeric value passed to these functions is then substituted for all uses
 of the variable @code{n} in the expression.  The resulting value then
 must be greater or equal to zero and smaller than the value given as the
 value of @code{nplurals}.
@noindent
 The following rules are known at this point.  The language with families
 are listed.  But this does not necessarily mean the information can be
 generalized for the whole family (as can be easily seen in the table
 below).@footnote{Additions are welcome.  Send appropriate information to
@email{bug-glibc-manual@@gnu.org}.}
@table @asis
@item Only one form:
 Some languages only require one single form.  There is no distinction
 between the singular and plural form.  And appropriate header entry
 would look like this:
@smallexample
 nplurals=1; plural=0
@end smallexample
@noindent
 Languages with this property include:
@table @asis
@item Finno-Ugric family
 Hungarian
@item Asian family
 Japanese
@item Turkic/Altaic family
 Turkish
@end table
@item Two forms, singular used for one only
 This is the form used in most existing programs sine it is what English
 is using.  A header entry would look like this:
@smallexample
 nplurals=2; plural=n != 1
@end smallexample
 (Note: this uses the feature of C expressions that boolean expressions
 have to value zero or one.)
@noindent
 Languages with this property include:
@table @asis
@item Germanic family
 Danish, Dutch, English, German, Norwegian, Swedish
@item Finno-Ugric family
 Finnish
@item Latin/Greek family
 Greek
@item Semitic family
 Hebrew
@item Romance family
 Italian, Spanish
@item Artificial
 Esperanto
@end table
@item Two forms, singular used for zero and one
 Exceptional case in the language family.  The header entry would be:
@smallexample
 nplurals=2; plural=n>1
@end smallexample
@noindent
 Languages with this property include:
@table @asis
@item Romanic family
 French
@end table
@item Three forms, special cases for one and two
 The header entry would be:
@smallexample
 nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2
@end smallexample
@noindent
 Languages with this property include:
@table @asis
@item Celtic
 Gaeilge
@end table
@item Three forms, special case for one and all numbers ending in 2, 3, or 4
 The header entry would look like this:
@smallexample
 nplurals=3; plural=n==1 ? 0 : n%10>=2 && n%10<=4 ? 1 : 2
@end smallexample
@noindent
 Languages with this property include:
@table @asis
@item Slavic family
 Russian
@end table
@item Three forms, special case for one and some numbers ending in 2, 3, or 4
 The header entry would look like this:
@smallexample
 nplurals=3; plural=n==1 ? 0 : \
  n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2
@end smallexample
 (Continuation in the next line is possible.)
@noindent
 Languages with this property include:
@table @asis
@item Slavic family
 Polish
@end table
@item Four forms, special case for one and all numbers ending in 2, 3, or 4
 The header entry would look like this:
@smallexample
 nplurals=4; plural=n==1 ? 0 : n%10==2 ? 1 : n==3 || n+=4 ? 2 : 3
@end smallexample
@noindent
 Languages with this property include:
@table @asis
@item Slavic family
 Slovenian
@end table
@end table
@node Using gettextized software
@subsubsection User influence on @code{gettext}