mirror of git://sourceware.org/git/glibc.git
Unicode 16.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 16.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Changes in CHARMAP and WIDTH:
Total added characters in newly generated CHARMAP: 5185
Total removed characters in newly generated WIDTH: 1
Total added characters in newly generated WIDTH: 170
The removed character from WIDTH is U+1171E AHOM CONSONANT SIGN MEDIAL RA.
It changed like this:
UnicodeData.txt 15.1.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;;
UnicodeData.txt 16.0.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mc;0;L;;;;;N;;;;;
EastAsianWidth.txt 15.1.0: 1171D..1171F ; N # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
EastAsianWidth.txt 16.0.0: 1171E ; N # Mc AHOM CONSONANT SIGN MEDIAL RA
I.e it changed from Mn (Mark Nonspacing) to Mc (Mark Spacing
combining). So it should now have width 1 instead of 0, therefore it
is OK that it was removed from WIDTH, characters not in WIDTH get
width 1 by default.
Nothing suspicious when browsing the list of the 170 added characters.
Changes in ctype:
alpha: Added 4452 characters in new ctype which were not in old ctype
combining: Added 51 characters in new ctype which were not in old ctype
combining_level3: Added 43 characters in new ctype which were not in old ctype
graph: Added 5185 characters in new ctype which were not in old ctype
lower: Added 25 characters in new ctype which were not in old ctype
print: Added 5185 characters in new ctype which were not in old ctype
punct: Missing 33 characters of old ctype in new ctype
punct: Added 766 characters in new ctype which were not in old ctype
tolower: Added 27 characters in new ctype which were not in old ctype
totitle: Added 27 characters in new ctype which were not in old ctype
toupper: Added 27 characters in new ctype which were not in old ctype
upper: Added 27 characters in new ctype which were not in old ctype
Nothing suspicous in the additions.
About the 33 characters removed from `punct`:
U+0363 - U+036F are identical in UnicodeData.txt. Difference in DerivedCoreProperties.txt:
DerivedCoreProperties.txt 15.1.0: not there.
DerivedCoreProperties.txt 16.0.0: 0363..036F ; Alphabetic # Mn [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
So that’s the reason why they are added to `alpha` and removed from `punct`.
Same for U+1DD3 - U+1DE6, they are identical in UnicodeData.txt but there is a difference in DerivedCoreProperties.txt:
DerivedCoreProperties.txt 15.1.0: 1DE7..1DF4 ; Alphabetic # Mn [14] COMBINING LATIN SMALL LETTER ALPHA..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
DerivedCoreProperties.txt 16.0.0: 1DD3..1DF4 ; Alphabetic # Mn [34] COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
So they became `Alphabetic` and were thus added to `alpha` and removed from `punct`.
Resolves: BZ #32168
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
|
||
|---|---|---|
| .. | ||
| ANSI_X3.4-1968 | ||
| ANSI_X3.110-1983 | ||
| ARMSCII-8 | ||
| ASMO_449 | ||
| BIG5 | ||
| BIG5-HKSCS | ||
| BRF | ||
| BS_4730 | ||
| BS_VIEWDATA | ||
| CP737 | ||
| CP770 | ||
| CP771 | ||
| CP772 | ||
| CP773 | ||
| CP774 | ||
| CP775 | ||
| CP949 | ||
| CP1125 | ||
| CP1250 | ||
| CP1251 | ||
| CP1252 | ||
| CP1253 | ||
| CP1254 | ||
| CP1255 | ||
| CP1256 | ||
| CP1257 | ||
| CP1258 | ||
| CP10007 | ||
| CSA_Z243.4-1985-1 | ||
| CSA_Z243.4-1985-2 | ||
| CSA_Z243.4-1985-GR | ||
| CSN_369103 | ||
| CWI | ||
| DEC-MCS | ||
| DIN_66003 | ||
| DS_2089 | ||
| EBCDIC-AT-DE | ||
| EBCDIC-AT-DE-A | ||
| EBCDIC-CA-FR | ||
| EBCDIC-DK-NO | ||
| EBCDIC-DK-NO-A | ||
| EBCDIC-ES | ||
| EBCDIC-ES-A | ||
| EBCDIC-ES-S | ||
| EBCDIC-FI-SE | ||
| EBCDIC-FI-SE-A | ||
| EBCDIC-FR | ||
| EBCDIC-IS-FRISS | ||
| EBCDIC-IT | ||
| EBCDIC-PT | ||
| EBCDIC-UK | ||
| EBCDIC-US | ||
| ECMA-CYRILLIC | ||
| ES | ||
| ES2 | ||
| EUC-JISX0213 | ||
| EUC-JP | ||
| EUC-JP-MS | ||
| EUC-KR | ||
| EUC-TW | ||
| GB2312 | ||
| GB18030 | ||
| GBK | ||
| GB_1988-80 | ||
| GEORGIAN-ACADEMY | ||
| GEORGIAN-PS | ||
| GOST_19768-74 | ||
| GREEK-CCITT | ||
| GREEK7 | ||
| GREEK7-OLD | ||
| HP-GREEK8 | ||
| HP-ROMAN8 | ||
| HP-ROMAN9 | ||
| HP-THAI8 | ||
| HP-TURKISH8 | ||
| IBM037 | ||
| IBM038 | ||
| IBM256 | ||
| IBM273 | ||
| IBM274 | ||
| IBM275 | ||
| IBM277 | ||
| IBM278 | ||
| IBM280 | ||
| IBM281 | ||
| IBM284 | ||
| IBM285 | ||
| IBM290 | ||
| IBM297 | ||
| IBM420 | ||
| IBM423 | ||
| IBM424 | ||
| IBM437 | ||
| IBM500 | ||
| IBM850 | ||
| IBM851 | ||
| IBM852 | ||
| IBM855 | ||
| IBM856 | ||
| IBM857 | ||
| IBM858 | ||
| IBM860 | ||
| IBM861 | ||
| IBM862 | ||
| IBM863 | ||
| IBM864 | ||
| IBM865 | ||
| IBM866 | ||
| IBM866NAV | ||
| IBM868 | ||
| IBM869 | ||
| IBM870 | ||
| IBM871 | ||
| IBM874 | ||
| IBM875 | ||
| IBM880 | ||
| IBM891 | ||
| IBM903 | ||
| IBM904 | ||
| IBM905 | ||
| IBM918 | ||
| IBM922 | ||
| IBM1004 | ||
| IBM1026 | ||
| IBM1047 | ||
| IBM1124 | ||
| IBM1129 | ||
| IBM1132 | ||
| IBM1133 | ||
| IBM1160 | ||
| IBM1161 | ||
| IBM1162 | ||
| IBM1163 | ||
| IBM1164 | ||
| IEC_P27-1 | ||
| INIS | ||
| INIS-8 | ||
| INIS-CYRILLIC | ||
| INVARIANT | ||
| ISIRI-3342 | ||
| ISO-8859-1 | ||
| ISO-8859-2 | ||
| ISO-8859-3 | ||
| ISO-8859-4 | ||
| ISO-8859-5 | ||
| ISO-8859-6 | ||
| ISO-8859-7 | ||
| ISO-8859-8 | ||
| ISO-8859-9 | ||
| ISO-8859-9E | ||
| ISO-8859-10 | ||
| ISO-8859-11 | ||
| ISO-8859-13 | ||
| ISO-8859-14 | ||
| ISO-8859-15 | ||
| ISO-8859-16 | ||
| ISO-IR-90 | ||
| ISO-IR-197 | ||
| ISO-IR-209 | ||
| ISO_646.BASIC | ||
| ISO_646.IRV | ||
| ISO_2033-1983 | ||
| ISO_5427 | ||
| ISO_5427-EXT | ||
| ISO_5428 | ||
| ISO_6937 | ||
| ISO_6937-2-25 | ||
| ISO_6937-2-ADD | ||
| ISO_8859-1,GL | ||
| ISO_8859-SUPP | ||
| ISO_10367-BOX | ||
| ISO_10646 | ||
| ISO_11548-1 | ||
| IT | ||
| JIS_C6220-1969-JP | ||
| JIS_C6220-1969-RO | ||
| JIS_C6229-1984-A | ||
| JIS_C6229-1984-B | ||
| JIS_C6229-1984-B-ADD | ||
| JIS_C6229-1984-HAND | ||
| JIS_C6229-1984-HAND-ADD | ||
| JIS_C6229-1984-KANA | ||
| JIS_X0201 | ||
| JOHAB | ||
| JUS_I.B1.002 | ||
| JUS_I.B1.003-MAC | ||
| JUS_I.B1.003-SERB | ||
| KOI-8 | ||
| KOI8-R | ||
| KOI8-RU | ||
| KOI8-T | ||
| KOI8-U | ||
| KSC5636 | ||
| LATIN-GREEK | ||
| LATIN-GREEK-1 | ||
| MAC-CENTRALEUROPE | ||
| MAC-CYRILLIC | ||
| MAC-IS | ||
| MAC-SAMI | ||
| MAC-UK | ||
| MACINTOSH | ||
| MIK | ||
| MSZ_7795.3 | ||
| NATS-DANO | ||
| NATS-DANO-ADD | ||
| NATS-SEFI | ||
| NATS-SEFI-ADD | ||
| NC_NC00-10 | ||
| NEXTSTEP | ||
| NF_Z_62-010 | ||
| NF_Z_62-010_1973 | ||
| NS_4551-1 | ||
| NS_4551-2 | ||
| PT | ||
| PT2 | ||
| PT154 | ||
| RK1048 | ||
| SAMI | ||
| SAMI-WS2 | ||
| SEN_850200_B | ||
| SEN_850200_C | ||
| SHIFT_JIS | ||
| SHIFT_JISX0213 | ||
| T.61-7BIT | ||
| T.61-8BIT | ||
| T.101-G2 | ||
| TCVN5712-1 | ||
| TIS-620 | ||
| TSCII | ||
| UTF-8 | ||
| VIDEOTEX-SUPPL | ||
| VISCII | ||
| WINDOWS-31J | ||