Add method that tries to determine the encoding of the data
from an initial byte order mark.
Change-Id: I348c51a3d4db9b434af53359b739a7e17acfc760
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add static methods that allow converting between a name for an
encoding and the Encoding enum.
Change-Id: I12bc503cf757ea31d3ca8d5e1f1216efddcb16d4
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Add a constructor, that allows constructing a string converter by
name. This is required in some cases and also makes it possible to
(in the future) extend the API to 3rd party encodings.
Also add a name() accessor.
Change-Id: I606d6ce9405ee967f76197b803615e27c5b001cf
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Cleanup the implementation and improve performance by
handling the first char outside of the main loop.
Also avoid one copy of the data when using QStringConverter.
Change-Id: Ie698e62de1864352612a4dddc907cb139e7e6407
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Implement proper state handling, and avoid a copy when using
it through QStringConverter.
Change-Id: I201fe966601c424c337e452e359a2e71f76354ad
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Clean up the method, and refactor it so we can avoid one
copy of the data when using QStringConverter.
Make the conversion to unicode more by avoiding conditions in
the inner loop and doing a memcpy if endianness matches.
Change-Id: I869daf861f886d69b67a1b223ac2238498b609ac
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
And optimize the method so we can avoid a copy of
the data.
Change-Id: Ic267150db80358dbc4010bb1db2af5c0eb97dc65
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Make sure that the conversion methods always get a valid state. This is
already the ecase then using the new QStringConverter API, ensure the
old QTextCodec API also passes in a valid state.
This helps simplify the logic inside those methods.
Change-Id: I1945e98cdefd46bf1427e11984337f1d62abcaa2
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Reviewed-by: Alex Blasche <alexander.blasche@qt.io>
IgnoreHeader was a rather badly defined enum, in addition the
utf8 and utf16 codecs where handling BOMs somewhat different
for stateless decoding.
Fix this by introducing explicit flags for writing a bom when
encoding and not skipping the initial bom when decoding.
Source compatibility for QTextCodec is done with a couple of
static constexpr variables.
Change-Id: I0b2d94f84c937cec1e0494c16ef448c00382691d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Latin1 is the only non Unicode encoding that is still being used
to some extent. Current web site statistics show that it is
being used in ~2% of all web sites. An additional 1% of web sites
use Windows1251 (which is almost the same as latin1).
As it's trivial to support this encoding, we keep it supported
in QStringConverter.
Change-Id: I0eff53a490b6c43d3e474107e7823be245d1715a
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Local8Bit is always UTF-8 except for Windows platforms.
Also add a Locale encoding to QStringConverter.
Change-Id: I8d729931fd4c1d7fc6857696b6442a44def3fd9d
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
Separate them from the qutfcodec, so that the codec
can later on be moved out of Qt Core.
Fix the QUtf methods to take qsizetype instead of int
for length arguments.
This also makes it possible to not build QTextCodec into
the bootstrap lib anymore.
Change-Id: I0b4f83139d61b19c651520a2f3a5012aa7e85cb8
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
The new QStringEncoder and QStringDecoder classes
(with a common QStringConverter base class) are
there to replace QTextCodec in Qt 6.
It currently uses a trivial wrapper around the utf
encoding functionality.
Added some autotests, mostly copied from the text codec
tests.
Change-Id: Ib6eeee55fba918b9424be244cbda9dfd5096f7eb
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>