Commit Graph

13 Commits

Author SHA1 Message Date
Lars Knoll 13af1312f7 Add QStringConverter::encodingForData()
Add method that tries to determine the encoding of the data
from an initial byte order mark.

Change-Id: I348c51a3d4db9b434af53359b739a7e17acfc760
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:48:55 +02:00
Lars Knoll a639bcda1e Add methods to convert between encoding and name to QStringConverter
Add static methods that allow converting between a name for an
encoding and the Encoding enum.

Change-Id: I12bc503cf757ea31d3ca8d5e1f1216efddcb16d4
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:48:49 +02:00
Lars Knoll 3ce9162ab5 Construct a string converter by name
Add a constructor, that allows constructing a string converter by
name. This is required in some cases and also makes it possible to
(in the future) extend the API to 3rd party encodings.

Also add a name() accessor.

Change-Id: I606d6ce9405ee967f76197b803615e27c5b001cf
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:48:42 +02:00
Lars Knoll 4b2edde373 Cleanup QUtf32::convertToUnicode
Cleanup the implementation and improve performance by
handling the first char outside of the main loop.

Also avoid one copy of the data when using QStringConverter.

Change-Id: Ie698e62de1864352612a4dddc907cb139e7e6407
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:48:07 +02:00
Lars Knoll b1d8ce32cd Refactor QUtf32::convertFromUnicode
Implement proper state handling, and avoid a copy when using
it through QStringConverter.

Change-Id: I201fe966601c424c337e452e359a2e71f76354ad
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:48:00 +02:00
Lars Knoll d8997ad797 Clean up QUtf16::convertTo/FromUnicode
Clean up the method, and refactor it so we can avoid one
copy of the data when using QStringConverter.

Make the conversion to unicode more by avoiding conditions in
the inner loop and doing a memcpy if endianness matches.

Change-Id: I869daf861f886d69b67a1b223ac2238498b609ac
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:47:53 +02:00
Lars Knoll 5dcfd0ac2f Cleanup state handling in QUtf8::convertFromUnicode
And optimize the method so we can avoid a copy of
the data.

Change-Id: Ic267150db80358dbc4010bb1db2af5c0eb97dc65
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:47:47 +02:00
Lars Knoll 618620bc5d Ensure the conversion methods in qstringconverter always get a valid state
Make sure that the conversion methods always get a valid state. This is
already the ecase then using the new QStringConverter API, ensure the
old QTextCodec API also passes in a valid state.

This helps simplify the logic inside those methods.

Change-Id: I1945e98cdefd46bf1427e11984337f1d62abcaa2
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
2020-05-14 07:47:40 +02:00
Lars Knoll cab0d57d1e Clean up the Flag handling in QStringConverter
IgnoreHeader was a rather badly defined enum, in addition the
utf8 and utf16 codecs where handling BOMs somewhat different
for stateless decoding.

Fix this by introducing explicit flags for writing a bom when
encoding and not skipping the initial bom when decoding.

Source compatibility for QTextCodec is done with a couple of
static constexpr variables.

Change-Id: I0b2d94f84c937cec1e0494c16ef448c00382691d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:47:33 +02:00
Lars Knoll 7b93bedb60 Add Latin1 to the set of supported encodings in QStringConverter
Latin1 is the only non Unicode encoding that is still being used
to some extent. Current web site statistics show that it is
being used in ~2% of all web sites. An additional 1% of web sites
use Windows1251 (which is almost the same as latin1).

As it's trivial to support this encoding, we keep it supported
in QStringConverter.

Change-Id: I0eff53a490b6c43d3e474107e7823be245d1715a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:46:51 +02:00
Lars Knoll 94e210faea Move local8bit conversion over to qutfsupport
Local8Bit is always UTF-8 except for Windows platforms.
Also add a Locale encoding to QStringConverter.

Change-Id: I8d729931fd4c1d7fc6857696b6442a44def3fd9d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:46:45 +02:00
Lars Knoll ea0a08c898 Move the UTF conversion methods to qstringconverter
Separate them from the qutfcodec, so that the codec
can later on be moved out of Qt Core.

Fix the QUtf methods to take qsizetype instead of int
for length arguments.

This also makes it possible to not build QTextCodec into
the bootstrap lib anymore.

Change-Id: I0b4f83139d61b19c651520a2f3a5012aa7e85cb8
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:46:38 +02:00
Lars Knoll f64a6bd638 Start work on a new API to replace QTextCodec
The new QStringEncoder and QStringDecoder classes
(with a common QStringConverter base class) are
there to replace QTextCodec in Qt 6.

It currently uses a trivial wrapper around the utf
encoding functionality.

Added some autotests, mostly copied from the text codec
tests.

Change-Id: Ib6eeee55fba918b9424be244cbda9dfd5096f7eb
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-05-14 07:46:14 +02:00