glibc

History

Mike FABIAN a7b5eb821d Update to Unicode 16.0.0 [BZ #32168 ] Unicode 16.0.0 Support: Character encoding, character type info, and transliteration tables are all updated to Unicode 16.0.0, using the generator scripts contributed by Mike FABIAN (Red Hat). Changes in CHARMAP and WIDTH: Total added characters in newly generated CHARMAP: 5185 Total removed characters in newly generated WIDTH: 1 Total added characters in newly generated WIDTH: 170 The removed character from WIDTH is U+1171E AHOM CONSONANT SIGN MEDIAL RA. It changed like this: UnicodeData.txt 15.1.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;; UnicodeData.txt 16.0.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mc;0;L;;;;;N;;;;; EastAsianWidth.txt 15.1.0: 1171D..1171F ; N # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA EastAsianWidth.txt 16.0.0: 1171E ; N # Mc AHOM CONSONANT SIGN MEDIAL RA I.e it changed from Mn (Mark Nonspacing) to Mc (Mark Spacing combining). So it should now have width 1 instead of 0, therefore it is OK that it was removed from WIDTH, characters not in WIDTH get width 1 by default. Nothing suspicious when browsing the list of the 170 added characters. Changes in ctype: alpha: Added 4452 characters in new ctype which were not in old ctype combining: Added 51 characters in new ctype which were not in old ctype combining_level3: Added 43 characters in new ctype which were not in old ctype graph: Added 5185 characters in new ctype which were not in old ctype lower: Added 25 characters in new ctype which were not in old ctype print: Added 5185 characters in new ctype which were not in old ctype punct: Missing 33 characters of old ctype in new ctype punct: Added 766 characters in new ctype which were not in old ctype tolower: Added 27 characters in new ctype which were not in old ctype totitle: Added 27 characters in new ctype which were not in old ctype toupper: Added 27 characters in new ctype which were not in old ctype upper: Added 27 characters in new ctype which were not in old ctype Nothing suspicous in the additions. About the 33 characters removed from `punct`: U+0363 - U+036F are identical in UnicodeData.txt. Difference in DerivedCoreProperties.txt: DerivedCoreProperties.txt 15.1.0: not there. DerivedCoreProperties.txt 16.0.0: 0363..036F ; Alphabetic # Mn [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X So that’s the reason why they are added to `alpha` and removed from `punct`. Same for U+1DD3 - U+1DE6, they are identical in UnicodeData.txt but there is a difference in DerivedCoreProperties.txt: DerivedCoreProperties.txt 15.1.0: 1DE7..1DF4 ; Alphabetic # Mn [14] COMBINING LATIN SMALL LETTER ALPHA..COMBINING LATIN SMALL LETTER U WITH DIAERESIS DerivedCoreProperties.txt 16.0.0: 1DD3..1DF4 ; Alphabetic # Mn [34] COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE..COMBINING LATIN SMALL LETTER U WITH DIAERESIS So they became `Alphabetic` and were thus added to `alpha` and removed from `punct`. Resolves: BZ #32168 Reviewed-by: Carlos O'Donell <carlos@redhat.com>		2024-09-27 14:43:38 +02:00
..
ANSI_X3.4-1968	…
ANSI_X3.110-1983	…
ARMSCII-8	…
ASMO_449	…
BIG5	…
BIG5-HKSCS	…
BRF	…
BS_4730	…
BS_VIEWDATA	…
CP737	…
CP770	…
CP771	…
CP772	…
CP773	…
CP774	…
CP775	…
CP949	…
CP1125	…
CP1250	…
CP1251	…
CP1252	…
CP1253	…
CP1254	…
CP1255	…
CP1256	…
CP1257	…
CP1258	…
CP10007	…
CSA_Z243.4-1985-1	…
CSA_Z243.4-1985-2	…
CSA_Z243.4-1985-GR	…
CSN_369103	…
CWI	…
DEC-MCS	…
DIN_66003	…
DS_2089	…
EBCDIC-AT-DE	…
EBCDIC-AT-DE-A	…
EBCDIC-CA-FR	…
EBCDIC-DK-NO	…
EBCDIC-DK-NO-A	…
EBCDIC-ES	…
EBCDIC-ES-A	…
EBCDIC-ES-S	…
EBCDIC-FI-SE	…
EBCDIC-FI-SE-A	…
EBCDIC-FR	…
EBCDIC-IS-FRISS	…
EBCDIC-IT	…
EBCDIC-PT	…
EBCDIC-UK	…
EBCDIC-US	…
ECMA-CYRILLIC	…
ES	…
ES2	…
EUC-JISX0213	…
EUC-JP	…
EUC-JP-MS	…
EUC-KR	…
EUC-TW	…
GB2312	…
GB18030	…
GBK	…
GB_1988-80	…
GEORGIAN-ACADEMY	…
GEORGIAN-PS	…
GOST_19768-74	…
GREEK-CCITT	…
GREEK7	…
GREEK7-OLD	…
HP-GREEK8	…
HP-ROMAN8	…
HP-ROMAN9	…
HP-THAI8	…
HP-TURKISH8	…
IBM037	…
IBM038	…
IBM256	…
IBM273	…
IBM274	…
IBM275	…
IBM277	…
IBM278	…
IBM280	…
IBM281	…
IBM284	…
IBM285	…
IBM290	…
IBM297	…
IBM420	…
IBM423	…
IBM424	…
IBM437	…
IBM500	…
IBM850	…
IBM851	…
IBM852	…
IBM855	…
IBM856	…
IBM857	…
IBM858	…
IBM860	…
IBM861	…
IBM862	…
IBM863	…
IBM864	…
IBM865	…
IBM866	…
IBM866NAV	…
IBM868	…
IBM869	…
IBM870	…
IBM871	…
IBM874	…
IBM875	…
IBM880	…
IBM891	…
IBM903	…
IBM904	…
IBM905	…
IBM918	…
IBM922	…
IBM1004	…
IBM1026	…
IBM1047	…
IBM1124	…
IBM1129	…
IBM1132	…
IBM1133	…
IBM1160	…
IBM1161	…
IBM1162	…
IBM1163	…
IBM1164	…
IEC_P27-1	…
INIS	…
INIS-8	…
INIS-CYRILLIC	…
INVARIANT	…
ISIRI-3342	…
ISO-8859-1	…
ISO-8859-2	…
ISO-8859-3	…
ISO-8859-4	…
ISO-8859-5	…
ISO-8859-6	…
ISO-8859-7	…
ISO-8859-8	…
ISO-8859-9	…
ISO-8859-9E	…
ISO-8859-10	…
ISO-8859-11	…
ISO-8859-13	…
ISO-8859-14	…
ISO-8859-15	…
ISO-8859-16	…
ISO-IR-90	…
ISO-IR-197	…
ISO-IR-209	…
ISO_646.BASIC	…
ISO_646.IRV	…
ISO_2033-1983	…
ISO_5427	…
ISO_5427-EXT	…
ISO_5428	…
ISO_6937	…
ISO_6937-2-25	…
ISO_6937-2-ADD	…
ISO_8859-1,GL	…
ISO_8859-SUPP	…
ISO_10367-BOX	…
ISO_10646	…
ISO_11548-1	…
IT	…
JIS_C6220-1969-JP	…
JIS_C6220-1969-RO	…
JIS_C6229-1984-A	…
JIS_C6229-1984-B	…
JIS_C6229-1984-B-ADD	…
JIS_C6229-1984-HAND	…
JIS_C6229-1984-HAND-ADD	…
JIS_C6229-1984-KANA	…
JIS_X0201	…
JOHAB	…
JUS_I.B1.002	…
JUS_I.B1.003-MAC	…
JUS_I.B1.003-SERB	…
KOI-8	…
KOI8-R	…
KOI8-RU	…
KOI8-T	…
KOI8-U	…
KSC5636	…
LATIN-GREEK	…
LATIN-GREEK-1	…
MAC-CENTRALEUROPE	…
MAC-CYRILLIC	…
MAC-IS	…
MAC-SAMI	…
MAC-UK	…
MACINTOSH	…
MIK	…
MSZ_7795.3	…
NATS-DANO	…
NATS-DANO-ADD	…
NATS-SEFI	…
NATS-SEFI-ADD	…
NC_NC00-10	…
NEXTSTEP	…
NF_Z_62-010	…
NF_Z_62-010_1973	…
NS_4551-1	…
NS_4551-2	…
PT	…
PT2	…
PT154	…
RK1048	…
SAMI	…
SAMI-WS2	…
SEN_850200_B	…
SEN_850200_C	…
SHIFT_JIS	…
SHIFT_JISX0213	…
T.61-7BIT	…
T.61-8BIT	…
T.101-G2	…
TCVN5712-1	…
TIS-620	…
TSCII	…
UTF-8	Update to Unicode 16.0.0 [BZ #32168 ]	2024-09-27 14:43:38 +02:00
VIDEOTEX-SUPPL	…
VISCII	…
WINDOWS-31J	…