qtbase/src/corelib/tools/qhash.h

1871 lines
57 KiB
C
Raw Normal View History

/****************************************************************************
**
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
** Copyright (C) 2020 The Qt Company Ltd.
** Contact: https://www.qt.io/licensing/
**
** This file is part of the QtCore module of the Qt Toolkit.
**
** $QT_BEGIN_LICENSE:LGPL$
** Commercial License Usage
** Licensees holding valid commercial Qt licenses may use this file in
** accordance with the commercial license agreement provided with the
** Software or, alternatively, in accordance with the terms contained in
** a written agreement between you and The Qt Company. For licensing terms
** and conditions see https://www.qt.io/terms-conditions. For further
** information use the contact form at https://www.qt.io/contact-us.
**
** GNU Lesser General Public License Usage
** Alternatively, this file may be used under the terms of the GNU Lesser
** General Public License version 3 as published by the Free Software
** Foundation and appearing in the file LICENSE.LGPL3 included in the
** packaging of this file. Please review the following information to
** ensure the GNU Lesser General Public License version 3 requirements
** will be met: https://www.gnu.org/licenses/lgpl-3.0.html.
**
** GNU General Public License Usage
** Alternatively, this file may be used under the terms of the GNU
** General Public License version 2.0 or (at your option) the GNU General
** Public license version 3 or any later version approved by the KDE Free
** Qt Foundation. The licenses are as published by the Free Software
** Foundation and appearing in the file LICENSE.GPL2 and LICENSE.GPL3
** included in the packaging of this file. Please review the following
** information to ensure the GNU General Public License requirements will
** be met: https://www.gnu.org/licenses/gpl-2.0.html and
** https://www.gnu.org/licenses/gpl-3.0.html.
**
** $QT_END_LICENSE$
**
****************************************************************************/
#ifndef QHASH_H
#define QHASH_H
#include <QtCore/qcontainertools_impl.h>
#include <QtCore/qhashfunctions.h>
#include <QtCore/qiterator.h>
#include <QtCore/qlist.h>
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
#include <QtCore/qmath.h>
#include <QtCore/qrefcount.h>
#include <initializer_list>
QT_BEGIN_NAMESPACE
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
struct QHashDummyValue
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
bool operator==(const QHashDummyValue &) const noexcept { return true; }
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
namespace QHashPrivate {
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
// QHash uses a power of two growth policy.
namespace GrowthPolicy {
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline constexpr size_t maxNumBuckets() noexcept
{
return size_t(1) << (8 * sizeof(size_t) - 1);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
inline constexpr size_t bucketsForCapacity(size_t requestedCapacity) noexcept
{
if (requestedCapacity <= 8)
return 16;
if (requestedCapacity >= maxNumBuckets())
return maxNumBuckets();
return qNextPowerOfTwo(QIntegerForSize<sizeof(size_t)>::Unsigned(2 * requestedCapacity - 1));
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
inline constexpr size_t bucketForHash(size_t nBuckets, size_t hash) noexcept
{
return hash & (nBuckets - 1);
}
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
template <typename Key, typename T>
struct Node
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
using KeyType = Key;
using ValueType = T;
Key key;
T value;
template<typename ...Args>
static void createInPlace(Node *n, Key &&k, Args &&... args)
{ new (n) Node{ std::move(k), T(std::forward<Args>(args)...) }; }
template<typename ...Args>
static void createInPlace(Node *n, const Key &k, Args &&... args)
{ new (n) Node{ Key(k), T(std::forward<Args>(args)...) }; }
template<typename ...Args>
void emplaceValue(Args &&... args)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
value = T(std::forward<Args>(args)...);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
T &&takeValue() noexcept(std::is_nothrow_move_assignable_v<T>)
{
return std::move(value);
}
bool valuesEqual(const Node *other) const { return value == other->value; }
};
template <typename Key>
struct Node<Key, QHashDummyValue> {
using KeyType = Key;
using ValueType = QHashDummyValue;
Key key;
template<typename ...Args>
static void createInPlace(Node *n, Key &&k, Args &&...)
{ new (n) Node{ std::move(k) }; }
template<typename ...Args>
static void createInPlace(Node *n, const Key &k, Args &&...)
{ new (n) Node{ k }; }
template<typename ...Args>
void emplaceValue(Args &&...)
{
}
ValueType takeValue() { return QHashDummyValue(); }
bool valuesEqual(const Node *) const { return true; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
template <typename T>
struct MultiNodeChain
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
T value;
MultiNodeChain *next = nullptr;
~MultiNodeChain()
{
}
qsizetype free() noexcept(std::is_nothrow_destructible_v<T>)
{
qsizetype nEntries = 0;
MultiNodeChain *e = this;
while (e) {
MultiNodeChain *n = e->next;
++nEntries;
delete e;
e = n;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return nEntries;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
bool contains(const T &val) const noexcept
{
const MultiNodeChain *e = this;
while (e) {
if (e->value == val)
return true;
e = e->next;
}
return false;
}
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
template <typename Key, typename T>
struct MultiNode
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
using KeyType = Key;
using ValueType = T;
using Chain = MultiNodeChain<T>;
Key key;
Chain *value;
template<typename ...Args>
static void createInPlace(MultiNode *n, Key &&k, Args &&... args)
{ new (n) MultiNode(std::move(k), new Chain{ T(std::forward<Args>(args)...), nullptr }); }
template<typename ...Args>
static void createInPlace(MultiNode *n, const Key &k, Args &&... args)
{ new (n) MultiNode(k, new Chain{ T(std::forward<Args>(args)...), nullptr }); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
MultiNode(const Key &k, Chain *c)
: key(k),
value(c)
{}
MultiNode(Key &&k, Chain *c) noexcept(std::is_nothrow_move_assignable_v<Key>)
: key(std::move(k)),
value(c)
{}
MultiNode(MultiNode &&other)
: key(other.key),
value(qExchange(other.value, nullptr))
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
}
MultiNode(const MultiNode &other)
: key(other.key)
{
Chain *c = other.value;
Chain **e = &value;
while (c) {
Chain *chain = new Chain{ c->value, nullptr };
*e = chain;
e = &chain->next;
c = c->next;
}
}
~MultiNode()
{
if (value)
value->free();
}
static qsizetype freeChain(MultiNode *n) noexcept(std::is_nothrow_destructible_v<T>)
{
qsizetype size = n->value->free();
n->value = nullptr;
return size;
}
template<typename ...Args>
void insertMulti(Args &&... args)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
Chain *e = new Chain{ T(std::forward<Args>(args)...), nullptr };
e->next = qExchange(value, e);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
template<typename ...Args>
void emplaceValue(Args &&... args)
{
value->value = T(std::forward<Args>(args)...);
}
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
template<typename Node>
constexpr bool isRelocatable()
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return QTypeInfo<typename Node::KeyType>::isRelocatable && QTypeInfo<typename Node::ValueType>::isRelocatable;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
// Regular hash tables consist of a list of buckets that can store Nodes. But simply allocating one large array of buckets
// would waste a lot of memory. To avoid this, we split the vector of buckets up into a vector of Spans. Each Span represents
// NEntries buckets. To quickly find the correct Span that holds a bucket, NEntries must be a power of two.
//
// Inside each Span, there is an offset array that represents the actual buckets. offsets contains either an index into the
// actual storage space for the Nodes (the 'entries' member) or 0xff (UnusedEntry) to flag that the bucket is empty.
// As we have only 128 entries per Span, the offset array can be represented using an unsigned char. This trick makes the hash
// table have a very small memory overhead compared to many other implementations.
template<typename Node>
struct Span {
enum {
NEntries = 128,
LocalBucketMask = (NEntries - 1),
UnusedEntry = 0xff
};
static_assert ((NEntries & LocalBucketMask) == 0, "EntriesPerSpan must be a power of two.");
// Entry is a slot available for storing a Node. The Span holds a pointer to
// an array of Entries. Upon construction of the array, those entries are
// unused, and nextFree() is being used to set up a singly linked list
// of free entries.
// When a node gets inserted, the first free entry is being picked, removed
// from the singly linked list and the Node gets constructed in place.
struct Entry {
typename std::aligned_storage<sizeof(Node), alignof(Node)>::type storage;
unsigned char &nextFree() { return *reinterpret_cast<unsigned char *>(&storage); }
Node &node() { return *reinterpret_cast<Node *>(&storage); }
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
unsigned char offsets[NEntries];
Entry *entries = nullptr;
unsigned char allocated = 0;
unsigned char nextFree = 0;
Span() noexcept
{
memset(offsets, UnusedEntry, sizeof(offsets));
}
~Span()
{
freeData();
}
void freeData() noexcept(std::is_nothrow_destructible<Node>::value)
{
if (entries) {
if constexpr (!std::is_trivially_destructible<Node>::value) {
for (auto o : offsets) {
if (o != UnusedEntry)
entries[o].node().~Node();
}
}
delete[] entries;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
entries = nullptr;
}
}
Node *insert(size_t i)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
Q_ASSERT(i <= NEntries);
Q_ASSERT(offsets[i] == UnusedEntry);
if (nextFree == allocated)
addStorage();
unsigned char entry = nextFree;
Q_ASSERT(entry < allocated);
nextFree = entries[entry].nextFree();
offsets[i] = entry;
return &entries[entry].node();
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
void erase(size_t bucket) noexcept(std::is_nothrow_destructible<Node>::value)
{
Q_ASSERT(bucket <= NEntries);
Q_ASSERT(offsets[bucket] != UnusedEntry);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
unsigned char entry = offsets[bucket];
offsets[bucket] = UnusedEntry;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
entries[entry].node().~Node();
entries[entry].nextFree() = nextFree;
nextFree = entry;
}
size_t offset(size_t i) const noexcept
{
return offsets[i];
}
bool hasNode(size_t i) const noexcept
{
return (offsets[i] != UnusedEntry);
}
Node &at(size_t i) noexcept
{
Q_ASSERT(i <= NEntries);
Q_ASSERT(offsets[i] != UnusedEntry);
return entries[offsets[i]].node();
}
const Node &at(size_t i) const noexcept
{
Q_ASSERT(i <= NEntries);
Q_ASSERT(offsets[i] != UnusedEntry);
return entries[offsets[i]].node();
}
Node &atOffset(size_t o) noexcept
{
Q_ASSERT(o < allocated);
return entries[o].node();
}
const Node &atOffset(size_t o) const noexcept
{
Q_ASSERT(o < allocated);
return entries[o].node();
}
void moveLocal(size_t from, size_t to) noexcept
{
Q_ASSERT(offsets[from] != UnusedEntry);
Q_ASSERT(offsets[to] == UnusedEntry);
offsets[to] = offsets[from];
offsets[from] = UnusedEntry;
}
void moveFromSpan(Span &fromSpan, size_t fromIndex, size_t to) noexcept(std::is_nothrow_move_constructible_v<Node>)
{
Q_ASSERT(to <= NEntries);
Q_ASSERT(offsets[to] == UnusedEntry);
Q_ASSERT(fromIndex <= NEntries);
Q_ASSERT(fromSpan.offsets[fromIndex] != UnusedEntry);
if (nextFree == allocated)
addStorage();
Q_ASSERT(nextFree < allocated);
offsets[to] = nextFree;
Entry &toEntry = entries[nextFree];
nextFree = toEntry.nextFree();
size_t fromOffset = fromSpan.offsets[fromIndex];
fromSpan.offsets[fromIndex] = UnusedEntry;
Entry &fromEntry = fromSpan.entries[fromOffset];
if constexpr (isRelocatable<Node>()) {
memcpy(&toEntry, &fromEntry, sizeof(Entry));
} else {
new (&toEntry.node()) Node(std::move(fromEntry.node()));
fromEntry.node().~Node();
}
fromEntry.nextFree() = fromSpan.nextFree;
fromSpan.nextFree = static_cast<unsigned char>(fromOffset);
}
void addStorage()
{
Q_ASSERT(allocated < NEntries);
Q_ASSERT(nextFree == allocated);
// the hash table should always be between 25 and 50% full
// this implies that we on average have between 32 and 64 entries
// in here. The likelihood of having below 16 entries is very small,
// so start with that and increment by 16 each time we need to add
// some more space
const size_t increment = NEntries / 8;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
size_t alloc = allocated + increment;
Entry *newEntries = new Entry[alloc];
// we only add storage if the previous storage was fully filled, so
// simply copy the old data over
if constexpr (isRelocatable<Node>()) {
if (allocated)
memcpy(newEntries, entries, allocated * sizeof(Entry));
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
} else {
for (size_t i = 0; i < allocated; ++i) {
new (&newEntries[i].node()) Node(std::move(entries[i].node()));
entries[i].node().~Node();
}
}
for (size_t i = allocated; i < allocated + increment; ++i) {
newEntries[i].nextFree() = uchar(i + 1);
}
delete[] entries;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
entries = newEntries;
allocated = uchar(alloc);
}
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
template <typename Node>
struct iterator;
template <typename Node>
struct Data
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
using Key = typename Node::KeyType;
using T = typename Node::ValueType;
using Span = QHashPrivate::Span<Node>;
using iterator = QHashPrivate::iterator<Node>;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QtPrivate::RefCount ref = {{1}};
size_t size = 0;
size_t numBuckets = 0;
size_t seed = 0;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
Span *spans = nullptr;
Data(size_t reserve = 0)
{
numBuckets = GrowthPolicy::bucketsForCapacity(reserve);
size_t nSpans = (numBuckets + Span::LocalBucketMask) / Span::NEntries;
spans = new Span[nSpans];
seed = qGlobalQHashSeed();
}
Data(const Data &other, size_t reserved = 0)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
: size(other.size),
numBuckets(other.numBuckets),
seed(other.seed)
{
if (reserved)
numBuckets = GrowthPolicy::bucketsForCapacity(qMax(size, reserved));
bool resized = numBuckets != other.numBuckets;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
size_t nSpans = (numBuckets + Span::LocalBucketMask) / Span::NEntries;
spans = new Span[nSpans];
for (size_t s = 0; s < nSpans; ++s) {
const Span &span = other.spans[s];
for (size_t index = 0; index < Span::NEntries; ++index) {
if (!span.hasNode(index))
continue;
const Node &n = span.at(index);
iterator it = resized ? find(n.key) : iterator{ this, s*Span::NEntries + index };
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
Q_ASSERT(it.isUnused());
Node *newNode = spans[it.span()].insert(it.index());
new (newNode) Node(n);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
}
}
static Data *detached(Data *d, size_t size = 0)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
if (!d)
return new Data(size);
Data *dd = new Data(*d, size);
if (!d->ref.deref())
delete d;
return dd;
}
void clear()
{
delete[] spans;
spans = nullptr;
size = 0;
numBuckets = 0;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
iterator detachedIterator(iterator other) const noexcept
{
return iterator{this, other.bucket};
}
iterator begin() const noexcept
{
iterator it{ this, 0 };
if (it.isUnused())
++it;
return it;
}
constexpr iterator end() const noexcept
{
return iterator();
}
void rehash(size_t sizeHint = 0)
{
if (sizeHint == 0)
sizeHint = size;
size_t newBucketCount = GrowthPolicy::bucketsForCapacity(sizeHint);
Span *oldSpans = spans;
size_t oldBucketCount = numBuckets;
size_t nSpans = (newBucketCount + Span::LocalBucketMask) / Span::NEntries;
spans = new Span[nSpans];
numBuckets = newBucketCount;
size_t oldNSpans = (oldBucketCount + Span::LocalBucketMask) / Span::NEntries;
for (size_t s = 0; s < oldNSpans; ++s) {
Span &span = oldSpans[s];
for (size_t index = 0; index < Span::NEntries; ++index) {
if (!span.hasNode(index))
continue;
Node &n = span.at(index);
iterator it = find(n.key);
Q_ASSERT(it.isUnused());
Node *newNode = spans[it.span()].insert(it.index());
new (newNode) Node(std::move(n));
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
span.freeData();
}
delete[] oldSpans;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
size_t nextBucket(size_t bucket) const noexcept
{
++bucket;
if (bucket == numBuckets)
bucket = 0;
return bucket;
}
float loadFactor() const noexcept
{
return float(size)/numBuckets;
}
bool shouldGrow() const noexcept
{
return size >= (numBuckets >> 1);
}
iterator find(const Key &key) const noexcept
{
Q_ASSERT(numBuckets > 0);
size_t hash = qHash(key, seed);
size_t bucket = GrowthPolicy::bucketForHash(numBuckets, hash);
// loop over the buckets until we find the entry we search for
// or an empty slot, in which case we know the entry doesn't exist
while (true) {
// Split the bucket into the indexex of span array, and the local
// offset inside the span
size_t span = bucket / Span::NEntries;
size_t index = bucket & Span::LocalBucketMask;
Span &s = spans[span];
size_t offset = s.offset(index);
if (offset == Span::UnusedEntry) {
return iterator{ this, bucket };
} else {
Node &n = s.atOffset(offset);
if (n.key == key)
return iterator{ this, bucket };
}
bucket = nextBucket(bucket);
}
}
Node *findNode(const Key &key) const noexcept
{
if (!size)
return nullptr;
iterator it = find(key);
if (it.isUnused())
return nullptr;
return it.node();
}
struct InsertionResult
{
iterator it;
bool initialized;
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
InsertionResult findOrInsert(const Key &key) noexcept
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
if (shouldGrow())
rehash(size + 1);
iterator it = find(key);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
if (it.isUnused()) {
spans[it.span()].insert(it.index());
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
++size;
return { it, false };
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
return { it, true };
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
iterator erase(iterator it) noexcept(std::is_nothrow_destructible<Node>::value)
{
size_t bucket = it.bucket;
size_t span = bucket / Span::NEntries;
size_t index = bucket & Span::LocalBucketMask;
Q_ASSERT(spans[span].hasNode(index));
spans[span].erase(index);
--size;
// re-insert the following entries to avoid holes
size_t hole = bucket;
size_t next = bucket;
while (true) {
next = nextBucket(next);
size_t nextSpan = next / Span::NEntries;
size_t nextIndex = next & Span::LocalBucketMask;
if (!spans[nextSpan].hasNode(nextIndex))
break;
size_t hash = qHash(spans[nextSpan].at(nextIndex).key, seed);
size_t newBucket = GrowthPolicy::bucketForHash(numBuckets, hash);
while (true) {
if (newBucket == next) {
// nothing to do, item is at the right plae
break;
} else if (newBucket == hole) {
// move into hole
size_t holeSpan = hole / Span::NEntries;
size_t holeIndex = hole & Span::LocalBucketMask;
if (nextSpan == holeSpan) {
spans[holeSpan].moveLocal(nextIndex, holeIndex);
} else {
// move between spans, more expensive
spans[holeSpan].moveFromSpan(spans[nextSpan], nextIndex, holeIndex);
}
hole = next;
break;
}
newBucket = nextBucket(newBucket);
}
}
// return correct position of the next element
if (!spans[span].hasNode(index))
++it;
return it;
}
~Data()
{
delete [] spans;
}
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
template <typename Node>
struct iterator {
using Span = QHashPrivate::Span<Node>;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const Data<Node> *d = nullptr;
size_t bucket = 0;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
size_t span() const noexcept { return bucket / Span::NEntries; }
size_t index() const noexcept { return bucket & Span::LocalBucketMask; }
inline bool isUnused() const noexcept { return !d->spans[span()].hasNode(index()); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline Node *node() const noexcept
{
Q_ASSERT(!isUnused());
return &d->spans[span()].at(index());
}
bool atEnd() const noexcept { return !d; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
iterator operator++() noexcept
{
while (true) {
++bucket;
if (bucket == d->numBuckets) {
d = nullptr;
bucket = 0;
break;
}
if (!isUnused())
break;
}
return *this;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
bool operator==(iterator other) const noexcept
{ return d == other.d && bucket == other.bucket; }
bool operator!=(iterator other) const noexcept
{ return !(*this == other); }
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
} // namespace QHashPrivate
template <class Key, class T>
class QHash
{
using Node = QHashPrivate::Node<Key, T>;
using Data = QHashPrivate::Data<Node>;
friend class QSet<Key>;
Data *d = nullptr;
public:
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
using key_type = Key;
using mapped_type = T;
using value_type = T;
using size_type = qsizetype;
using difference_type = qsizetype;
using reference = T &;
using const_reference = const T &;
inline QHash() noexcept = default;
inline QHash(std::initializer_list<std::pair<Key,T> > list)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
: d(new Data(list.size()))
{
for (typename std::initializer_list<std::pair<Key,T> >::const_iterator it = list.begin(); it != list.end(); ++it)
insert(it->first, it->second);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QHash(const QHash &other) noexcept
: d(other.d)
{
if (d)
d->ref.ref();
}
~QHash()
{
if (d && !d->ref.deref())
delete d;
}
QHash &operator=(const QHash &other) noexcept(std::is_nothrow_destructible<Node>::value)
{
if (d != other.d) {
Data *o = other.d;
if (o)
o->ref.ref();
if (d && !d->ref.deref())
delete d;
d = o;
}
return *this;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QHash(QHash &&other) noexcept
: d(std::exchange(other.d, nullptr))
{
}
Centralize the implementation of move assignment operators At the moment we have two main strategies for dealing with move assignment in Qt: 1) move-and-swap, used by "containers" (in the broad sense): containers, but also smart pointers and similar classes that can hold user-defined types; 2) pure swap, used by containers that hold only memory (e.g. QString, QByteArray, ...) as well as most implicitly shared datatypes. Given the fact that a move assignment operator's code is just boilerplate (whether it's move-and-swap or pure swap), provide two _strictly internal_ macros to help write them, and apply the macros across corelib and gui, porting away from the hand-rolled implementations. The rule of thumb when porting to the new macros is: * Try to stick to the existing code behavior, unless broken * if changing, then follow this checklist: * if the class does not have a move constructor => pure swap (but consider ADDING a move constructor, if possible!) * if the class does have a move constructor, try to follow the criteria above, namely: * if the class holds only memory, pure swap; * if the class may hold anything else but memory (file handles, etc.), then move and swap. Noteworthy details: * some operators planned to be removed in Qt 6 were not ported; * as drive-by, some move constructors were simplified to be using qExchange(); others were outright broken and got fixed; * some contained some more interesting code and were not touched. Change-Id: Idaab3489247dcbabb6df3fa1e5286b69e1d372e9 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
2020-01-03 16:18:02 +00:00
QT_MOVE_ASSIGNMENT_OPERATOR_IMPL_VIA_MOVE_AND_SWAP(QHash)
#ifdef Q_QDOC
template <typename InputIterator>
QHash(InputIterator f, InputIterator l);
#else
template <typename InputIterator, QtPrivate::IfAssociativeIteratorHasKeyAndValue<InputIterator> = true>
QHash(InputIterator f, InputIterator l)
: QHash()
{
QtPrivate::reserveIfForwardIterator(this, f, l);
for (; f != l; ++f)
insert(f.key(), f.value());
}
template <typename InputIterator, QtPrivate::IfAssociativeIteratorHasFirstAndSecond<InputIterator> = true>
QHash(InputIterator f, InputIterator l)
: QHash()
{
QtPrivate::reserveIfForwardIterator(this, f, l);
for (; f != l; ++f)
insert(f->first, f->second);
}
#endif
void swap(QHash &other) noexcept { qSwap(d, other.d); }
template <typename U = T>
QTypeTraits::compare_eq_result<U> operator==(const QHash &other) const noexcept
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
if (d == other.d)
return true;
if (size() != other.size())
return false;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
for (const_iterator it = other.begin(); it != other.end(); ++it) {
const_iterator i = find(it.key());
if (i == end() || !i.i.node()->valuesEqual(it.i.node()))
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return false;
}
// all values must be the same as size is the same
return true;
}
template <typename U = T>
QTypeTraits::compare_eq_result<U> operator!=(const QHash &other) const noexcept
{ return !(*this == other); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline qsizetype size() const noexcept { return d ? qsizetype(d->size) : 0; }
inline bool isEmpty() const noexcept { return !d || d->size == 0; }
inline qsizetype capacity() const noexcept { return d ? qsizetype(d->numBuckets >> 1) : 0; }
void reserve(qsizetype size)
{
if (isDetached())
d->rehash(size);
else
d = Data::detached(d, size_t(size));
}
inline void squeeze() { reserve(0); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline void detach() { if (!d || d->ref.isShared()) d = Data::detached(d); }
inline bool isDetached() const noexcept { return d && !d->ref.isShared(); }
bool isSharedWith(const QHash &other) const noexcept { return d == other.d; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
void clear() noexcept(std::is_nothrow_destructible<Node>::value)
{
if (d && !d->ref.deref())
delete d;
d = nullptr;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
bool remove(const Key &key)
{
if (isEmpty()) // prevents detaching shared null
return false;
detach();
auto it = d->find(key);
if (it.isUnused())
return false;
d->erase(it);
return true;
}
T take(const Key &key)
{
if (isEmpty()) // prevents detaching shared null
return T();
detach();
auto it = d->find(key);
if (it.isUnused())
return T();
T value = it.node()->takeValue();
d->erase(it);
return value;
}
bool contains(const Key &key) const noexcept
{
if (!d)
return false;
return d->findNode(key) != nullptr;
}
qsizetype count(const Key &key) const noexcept
{
return contains(key) ? 1 : 0;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
Key key(const T &value, const Key &defaultKey = Key()) const noexcept
{
if (d) {
const_iterator i = begin();
while (i != end()) {
if (i.value() == value)
return i.key();
++i;
}
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return defaultKey;
}
T value(const Key &key, const T &defaultValue = T()) const noexcept
{
if (d) {
Node *n = d->findNode(key);
if (n)
return n->value;
}
return defaultValue;
}
T &operator[](const Key &key)
{
detach();
auto result = d->findOrInsert(key);
Q_ASSERT(!result.it.atEnd());
if (!result.initialized)
Node::createInPlace(result.it.node(), key, T());
return result.it.node()->value;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const T operator[](const Key &key) const noexcept
{
return value(key);
}
QList<Key> keys() const { return QList<Key>(keyBegin(), keyEnd()); }
QList<Key> keys(const T &value) const
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
QList<Key> res;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const_iterator i = begin();
while (i != end()) {
if (i.value() == value)
res.append(i.key());
++i;
}
return res;
}
QList<T> values() const { return QList<T>(begin(), end()); }
class const_iterator;
class iterator
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
using piter = typename QHashPrivate::iterator<Node>;
friend class const_iterator;
friend class QHash<Key, T>;
friend class QSet<Key>;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
piter i;
explicit inline iterator(piter it) noexcept : i(it) { }
public:
typedef std::forward_iterator_tag iterator_category;
typedef qptrdiff difference_type;
typedef T value_type;
typedef T *pointer;
typedef T &reference;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
constexpr iterator() noexcept = default;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline const Key &key() const noexcept { return i.node()->key; }
inline T &value() const noexcept { return i.node()->value; }
inline T &operator*() const noexcept { return i.node()->value; }
inline T *operator->() const noexcept { return &i.node()->value; }
inline bool operator==(const iterator &o) const noexcept { return i == o.i; }
inline bool operator!=(const iterator &o) const noexcept { return i != o.i; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline iterator &operator++() noexcept
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
++i;
return *this;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline iterator operator++(int) noexcept
{
iterator r = *this;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
++i;
return r;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline bool operator==(const const_iterator &o) const noexcept { return i == o.i; }
inline bool operator!=(const const_iterator &o) const noexcept { return i != o.i; }
};
friend class iterator;
class const_iterator
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
using piter = typename QHashPrivate::iterator<Node>;
friend class iterator;
friend class QHash<Key, T>;
friend class QSet<Key>;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
piter i;
explicit inline const_iterator(piter it) : i(it) { }
public:
typedef std::forward_iterator_tag iterator_category;
typedef qptrdiff difference_type;
typedef T value_type;
typedef const T *pointer;
typedef const T &reference;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
constexpr const_iterator() noexcept = default;
inline const_iterator(const iterator &o) noexcept : i(o.i) { }
inline const Key &key() const noexcept { return i.node()->key; }
inline const T &value() const noexcept { return i.node()->value; }
inline const T &operator*() const noexcept { return i.node()->value; }
inline const T *operator->() const noexcept { return &i.node()->value; }
inline bool operator==(const const_iterator &o) const noexcept { return i == o.i; }
inline bool operator!=(const const_iterator &o) const noexcept { return i != o.i; }
inline const_iterator &operator++() noexcept
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
++i;
return *this;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline const_iterator operator++(int) noexcept
{
const_iterator r = *this;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
++i;
return r;
}
};
friend class const_iterator;
class key_iterator
{
const_iterator i;
public:
typedef typename const_iterator::iterator_category iterator_category;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
typedef qptrdiff difference_type;
typedef Key value_type;
typedef const Key *pointer;
typedef const Key &reference;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
key_iterator() noexcept = default;
explicit key_iterator(const_iterator o) noexcept : i(o) { }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const Key &operator*() const noexcept { return i.key(); }
const Key *operator->() const noexcept { return &i.key(); }
bool operator==(key_iterator o) const noexcept { return i == o.i; }
bool operator!=(key_iterator o) const noexcept { return i != o.i; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline key_iterator &operator++() noexcept { ++i; return *this; }
inline key_iterator operator++(int) noexcept { return key_iterator(i++);}
const_iterator base() const noexcept { return i; }
};
typedef QKeyValueIterator<const Key&, const T&, const_iterator> const_key_value_iterator;
typedef QKeyValueIterator<const Key&, T&, iterator> key_value_iterator;
// STL style
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline iterator begin() { detach(); return iterator(d->begin()); }
inline const_iterator begin() const noexcept { return d ? const_iterator(d->begin()): const_iterator(); }
inline const_iterator cbegin() const noexcept { return d ? const_iterator(d->begin()): const_iterator(); }
inline const_iterator constBegin() const noexcept { return d ? const_iterator(d->begin()): const_iterator(); }
inline iterator end() noexcept { return iterator(); }
inline const_iterator end() const noexcept { return const_iterator(); }
inline const_iterator cend() const noexcept { return const_iterator(); }
inline const_iterator constEnd() const noexcept { return const_iterator(); }
inline key_iterator keyBegin() const noexcept { return key_iterator(begin()); }
inline key_iterator keyEnd() const noexcept { return key_iterator(end()); }
inline key_value_iterator keyValueBegin() { return key_value_iterator(begin()); }
inline key_value_iterator keyValueEnd() { return key_value_iterator(end()); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline const_key_value_iterator keyValueBegin() const noexcept { return const_key_value_iterator(begin()); }
inline const_key_value_iterator constKeyValueBegin() const noexcept { return const_key_value_iterator(begin()); }
inline const_key_value_iterator keyValueEnd() const noexcept { return const_key_value_iterator(end()); }
inline const_key_value_iterator constKeyValueEnd() const noexcept { return const_key_value_iterator(end()); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
iterator erase(const_iterator it)
{
Q_ASSERT(it != constEnd());
detach();
// ensure a valid iterator across the detach:
iterator i = iterator{d->detachedIterator(it.i)};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
i.i = d->erase(i.i);
return i;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QPair<iterator, iterator> equal_range(const Key &key)
{
auto first = find(key);
auto second = first;
if (second != iterator())
++second;
return qMakePair(first, second);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QPair<const_iterator, const_iterator> equal_range(const Key &key) const noexcept
{
auto first = find(key);
auto second = first;
if (second != iterator())
++second;
return qMakePair(first, second);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
typedef iterator Iterator;
typedef const_iterator ConstIterator;
inline qsizetype count() const noexcept { return d ? qsizetype(d->size) : 0; }
iterator find(const Key &key)
{
if (isEmpty()) // prevents detaching shared null
return end();
detach();
auto it = d->find(key);
if (it.isUnused())
it = d->end();
return iterator(it);
}
const_iterator find(const Key &key) const noexcept
{
if (isEmpty())
return end();
auto it = d->find(key);
if (it.isUnused())
it = d->end();
return const_iterator(it);
}
const_iterator constFind(const Key &key) const noexcept
{
return find(key);
}
iterator insert(const Key &key, const T &value)
{
return emplace(key, value);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
void insert(const QHash &hash)
{
if (d == hash.d || !hash.d)
return;
if (!d) {
*this = hash;
return;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
detach();
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
for (auto it = hash.begin(); it != hash.end(); ++it)
emplace(it.key(), it.value());
}
template <typename ...Args>
iterator emplace(const Key &key, Args &&... args)
{
Key copy = key; // Needs to be explicit for MSVC 2019
return emplace(std::move(copy), std::forward<Args>(args)...);
}
template <typename ...Args>
iterator emplace(Key &&key, Args &&... args)
{
detach();
auto result = d->findOrInsert(key);
if (!result.initialized)
Node::createInPlace(result.it.node(), std::move(key), std::forward<Args>(args)...);
else
result.it.node()->emplaceValue(std::forward<Args>(args)...);
return iterator(result.it);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
float load_factor() const noexcept { return d ? d->loadFactor() : 0; }
static float max_load_factor() noexcept { return 0.5; }
size_t bucket_count() const noexcept { return d ? d->numBuckets : 0; }
static size_t max_bucket_count() noexcept { return QHashPrivate::GrowthPolicy::maxNumBuckets(); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline bool empty() const noexcept { return isEmpty(); }
};
template <class Key, class T>
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
class QMultiHash
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
using Node = QHashPrivate::MultiNode<Key, T>;
using Data = QHashPrivate::Data<Node>;
using Chain = QHashPrivate::MultiNodeChain<T>;
Data *d = nullptr;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
qsizetype m_size = 0;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
public:
using key_type = Key;
using mapped_type = T;
using value_type = T;
using size_type = qsizetype;
using difference_type = qsizetype;
using reference = T &;
using const_reference = const T &;
QMultiHash() noexcept = default;
inline QMultiHash(std::initializer_list<std::pair<Key,T> > list)
: d(new Data(list.size()))
{
for (typename std::initializer_list<std::pair<Key,T> >::const_iterator it = list.begin(); it != list.end(); ++it)
insert(it->first, it->second);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
#ifdef Q_QDOC
template <typename InputIterator>
QMultiHash(InputIterator f, InputIterator l);
#else
template <typename InputIterator, QtPrivate::IfAssociativeIteratorHasKeyAndValue<InputIterator> = true>
QMultiHash(InputIterator f, InputIterator l)
{
QtPrivate::reserveIfForwardIterator(this, f, l);
for (; f != l; ++f)
insert(f.key(), f.value());
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
template <typename InputIterator, QtPrivate::IfAssociativeIteratorHasFirstAndSecond<InputIterator> = true>
QMultiHash(InputIterator f, InputIterator l)
{
QtPrivate::reserveIfForwardIterator(this, f, l);
for (; f != l; ++f)
insert(f->first, f->second);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
#endif
QMultiHash(const QMultiHash &other) noexcept
: d(other.d), m_size(other.m_size)
{
if (d)
d->ref.ref();
}
~QMultiHash()
{
if (d && !d->ref.deref())
delete d;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QMultiHash &operator=(const QMultiHash &other) noexcept(std::is_nothrow_destructible<Node>::value)
{
if (d != other.d) {
Data *o = other.d;
if (o)
o->ref.ref();
if (d && !d->ref.deref())
delete d;
d = o;
m_size = other.m_size;
}
return *this;
}
QMultiHash(QMultiHash &&other) noexcept
: d(qExchange(other.d, nullptr)),
m_size(qExchange(other.m_size, 0))
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
}
QMultiHash &operator=(QMultiHash &&other) noexcept(std::is_nothrow_destructible<Node>::value)
{
QMultiHash moved(std::move(other));
swap(moved);
return *this;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QMultiHash(const QHash<Key, T> &other)
: QMultiHash(other.begin(), other.end())
{}
void swap(QMultiHash &other) noexcept { qSwap(d, other.d); qSwap(m_size, other.m_size); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
bool operator==(const QMultiHash &other) const noexcept
{
if (d == other.d)
return true;
if (m_size != other.m_size)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return false;
if (m_size == 0)
return true;
// equal size, and both non-zero size => d pointers allocated for both
Q_ASSERT(d);
Q_ASSERT(other.d);
if (d->size != other.d->size)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return false;
for (auto it = other.d->begin(); it != other.d->end(); ++it) {
auto i = d->find(it.node()->key);
if (i == d->end())
return false;
Chain *e = it.node()->value;
while (e) {
Chain *oe = i.node()->value;
while (oe) {
if (oe->value == e->value)
break;
oe = oe->next;
}
if (!oe)
return false;
e = e->next;
}
}
// all values must be the same as size is the same
return true;
}
bool operator!=(const QMultiHash &other) const noexcept { return !(*this == other); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline qsizetype size() const noexcept { return m_size; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline bool isEmpty() const noexcept { return !m_size; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline qsizetype capacity() const noexcept { return d ? qsizetype(d->numBuckets >> 1) : 0; }
void reserve(qsizetype size)
{
if (isDetached())
d->rehash(size);
else
d = Data::detached(d, size_t(size));
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline void squeeze() { reserve(0); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline void detach() { if (!d || d->ref.isShared()) d = Data::detached(d); }
inline bool isDetached() const noexcept { return d && !d->ref.isShared(); }
bool isSharedWith(const QMultiHash &other) const noexcept { return d == other.d; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
void clear() noexcept(std::is_nothrow_destructible<Node>::value)
{
if (d && !d->ref.deref())
delete d;
d = nullptr;
m_size = 0;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
qsizetype remove(const Key &key)
{
if (isEmpty()) // prevents detaching shared null
return 0;
detach();
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
auto it = d->find(key);
if (it.isUnused())
return 0;
qsizetype n = Node::freeChain(it.node());
m_size -= n;
Q_ASSERT(m_size >= 0);
d->erase(it);
return n;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
T take(const Key &key)
{
if (isEmpty()) // prevents detaching shared null
return T();
detach();
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
auto it = d->find(key);
if (it.isUnused())
return T();
Chain *e = it.node()->value;
Q_ASSERT(e);
T t = std::move(e->value);
if (e->next) {
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
it.node()->value = e->next;
delete e;
} else {
// erase() deletes the values.
d->erase(it);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
--m_size;
Q_ASSERT(m_size >= 0);
return t;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
bool contains(const Key &key) const noexcept
{
if (!d)
return false;
return d->findNode(key) != nullptr;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
Key key(const T &value, const Key &defaultKey = Key()) const noexcept
{
if (d) {
auto i = d->begin();
while (i != d->end()) {
Chain *e = i.node()->value;
if (e->contains(value))
return i.node()->key;
++i;
}
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return defaultKey;
}
T value(const Key &key, const T &defaultValue = T()) const noexcept
{
if (d) {
Node *n = d->findNode(key);
if (n) {
Q_ASSERT(n->value);
return n->value->value;
}
}
return defaultValue;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
T &operator[](const Key &key)
{
detach();
auto result = d->findOrInsert(key);
Q_ASSERT(!result.it.atEnd());
if (!result.initialized)
Node::createInPlace(result.it.node(), key, T());
return result.it.node()->value->value;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const T operator[](const Key &key) const noexcept
{
return value(key);
}
QList<Key> uniqueKeys() const
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
QList<Key> res;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
if (d) {
auto i = d->begin();
while (i != d->end()) {
res.append(i.node()->key);
++i;
}
}
return res;
}
QList<Key> keys() const { return QList<Key>(keyBegin(), keyEnd()); }
QList<Key> keys(const T &value) const
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
QList<Key> res;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const_iterator i = begin();
while (i != end()) {
if (i.value()->contains(value))
res.append(i.key());
++i;
}
return res;
}
QList<T> values() const { return QList<T>(begin(), end()); }
QList<T> values(const Key &key) const
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
QList<T> values;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
if (d) {
Node *n = d->findNode(key);
if (n) {
Chain *e = n->value;
while (e) {
values.append(e->value);
e = e->next;
}
}
}
return values;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
class const_iterator;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
class iterator
{
using piter = typename QHashPrivate::iterator<Node>;
friend class const_iterator;
friend class QMultiHash<Key, T>;
piter i;
Chain **e = nullptr;
explicit inline iterator(piter it, Chain **entry = nullptr) noexcept : i(it), e(entry)
{
if (!it.atEnd() && !e) {
e = &it.node()->value;
Q_ASSERT(e && *e);
}
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
public:
typedef std::forward_iterator_tag iterator_category;
typedef qptrdiff difference_type;
typedef T value_type;
typedef T *pointer;
typedef T &reference;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
constexpr iterator() noexcept = default;
inline const Key &key() const noexcept { return i.node()->key; }
inline T &value() const noexcept { return (*e)->value; }
inline T &operator*() const noexcept { return (*e)->value; }
inline T *operator->() const noexcept { return &(*e)->value; }
inline bool operator==(const iterator &o) const noexcept { return e == o.e; }
inline bool operator!=(const iterator &o) const noexcept { return e != o.e; }
inline iterator &operator++() noexcept {
Q_ASSERT(e && *e);
e = &(*e)->next;
Q_ASSERT(e);
if (!*e) {
++i;
e = i.atEnd() ? nullptr : &i.node()->value;
}
return *this;
}
inline iterator operator++(int) noexcept {
iterator r = *this;
++(*this);
return r;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline bool operator==(const const_iterator &o) const noexcept { return e == o.e; }
inline bool operator!=(const const_iterator &o) const noexcept { return e != o.e; }
};
friend class iterator;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
class const_iterator
{
using piter = typename QHashPrivate::iterator<Node>;
friend class iterator;
friend class QMultiHash<Key, T>;
piter i;
Chain **e = nullptr;
explicit inline const_iterator(piter it, Chain **entry = nullptr) noexcept : i(it), e(entry)
{
if (!it.atEnd() && !e) {
e = &it.node()->value;
Q_ASSERT(e && *e);
}
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
public:
typedef std::forward_iterator_tag iterator_category;
typedef qptrdiff difference_type;
typedef T value_type;
typedef const T *pointer;
typedef const T &reference;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
constexpr const_iterator() noexcept = default;
inline const_iterator(const iterator &o) noexcept : i(o.i), e(o.e) { }
inline const Key &key() const noexcept { return i.node()->key; }
inline T &value() const noexcept { return (*e)->value; }
inline T &operator*() const noexcept { return (*e)->value; }
inline T *operator->() const noexcept { return &(*e)->value; }
inline bool operator==(const const_iterator &o) const noexcept { return e == o.e; }
inline bool operator!=(const const_iterator &o) const noexcept { return e != o.e; }
inline const_iterator &operator++() noexcept {
Q_ASSERT(e && *e);
e = &(*e)->next;
Q_ASSERT(e);
if (!*e) {
++i;
e = i.atEnd() ? nullptr : &i.node()->value;
}
return *this;
}
inline const_iterator operator++(int) noexcept
{
const_iterator r = *this;
++(*this);
return r;
}
};
friend class const_iterator;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
class key_iterator
{
const_iterator i;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
public:
typedef typename const_iterator::iterator_category iterator_category;
typedef qptrdiff difference_type;
typedef Key value_type;
typedef const Key *pointer;
typedef const Key &reference;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
key_iterator() noexcept = default;
explicit key_iterator(const_iterator o) noexcept : i(o) { }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const Key &operator*() const noexcept { return i.key(); }
const Key *operator->() const noexcept { return &i.key(); }
bool operator==(key_iterator o) const noexcept { return i == o.i; }
bool operator!=(key_iterator o) const noexcept { return i != o.i; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline key_iterator &operator++() noexcept { ++i; return *this; }
inline key_iterator operator++(int) noexcept { return key_iterator(i++);}
const_iterator base() const noexcept { return i; }
};
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
typedef QKeyValueIterator<const Key&, const T&, const_iterator> const_key_value_iterator;
typedef QKeyValueIterator<const Key&, T&, iterator> key_value_iterator;
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
// STL style
inline iterator begin() { detach(); return iterator(d->begin()); }
inline const_iterator begin() const noexcept { return d ? const_iterator(d->begin()): const_iterator(); }
inline const_iterator cbegin() const noexcept { return d ? const_iterator(d->begin()): const_iterator(); }
inline const_iterator constBegin() const noexcept { return d ? const_iterator(d->begin()): const_iterator(); }
inline iterator end() noexcept { return iterator(); }
inline const_iterator end() const noexcept { return const_iterator(); }
inline const_iterator cend() const noexcept { return const_iterator(); }
inline const_iterator constEnd() const noexcept { return const_iterator(); }
inline key_iterator keyBegin() const noexcept { return key_iterator(begin()); }
inline key_iterator keyEnd() const noexcept { return key_iterator(end()); }
inline key_value_iterator keyValueBegin() noexcept { return key_value_iterator(begin()); }
inline key_value_iterator keyValueEnd() noexcept { return key_value_iterator(end()); }
inline const_key_value_iterator keyValueBegin() const noexcept { return const_key_value_iterator(begin()); }
inline const_key_value_iterator constKeyValueBegin() const noexcept { return const_key_value_iterator(begin()); }
inline const_key_value_iterator keyValueEnd() const noexcept { return const_key_value_iterator(end()); }
inline const_key_value_iterator constKeyValueEnd() const noexcept { return const_key_value_iterator(end()); }
iterator detach(const_iterator it)
{
auto i = it.i;
Chain **e = it.e;
if (d->ref.isShared()) {
// need to store iterator position before detaching
qsizetype n = 0;
Chain *entry = i.node()->value;
while (entry != *it.e) {
++n;
entry = entry->next;
}
Q_ASSERT(entry);
detach_helper();
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
i = d->detachedIterator(i);
e = &i.node()->value;
while (n) {
e = &(*e)->next;
--n;
}
Q_ASSERT(e && *e);
}
return iterator(i, e);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
iterator erase(const_iterator it)
{
Q_ASSERT(d);
iterator i = detach(it);
Chain *e = *i.e;
Chain *next = e->next;
*i.e = next;
delete e;
if (!next) {
if (i.e == &i.i.node()->value) {
// last remaining entry, erase
i = iterator(d->erase(i.i));
} else {
i = iterator(++it.i);
}
}
--m_size;
Q_ASSERT(m_size >= 0);
return i;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
// more Qt
typedef iterator Iterator;
typedef const_iterator ConstIterator;
inline qsizetype count() const noexcept { return size(); }
iterator find(const Key &key)
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
if (isEmpty())
return end();
detach();
auto it = d->find(key);
if (it.isUnused())
it = d->end();
return iterator(it);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const_iterator find(const Key &key) const noexcept
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return constFind(key);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const_iterator constFind(const Key &key) const noexcept
{
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
if (isEmpty())
return end();
auto it = d->find(key);
if (it.isUnused())
it = d->end();
return const_iterator(it);
}
iterator insert(const Key &key, const T &value)
{
return emplace(key, value);
}
template <typename ...Args>
iterator emplace(const Key &key, Args &&... args)
{
return emplace(Key(key), std::forward<Args>(args)...);
}
template <typename ...Args>
iterator emplace(Key &&key, Args &&... args)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
detach();
auto result = d->findOrInsert(key);
if (!result.initialized)
Node::createInPlace(result.it.node(), std::move(key), std::forward<Args>(args)...);
else
result.it.node()->insertMulti(std::forward<Args>(args)...);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
++m_size;
return iterator(result.it);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
float load_factor() const noexcept { return d ? d->loadFactor() : 0; }
static float max_load_factor() noexcept { return 0.5; }
size_t bucket_count() const noexcept { return d ? d->numBuckets : 0; }
static size_t max_bucket_count() noexcept { return QHashPrivate::GrowthPolicy::maxNumBuckets(); }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
inline bool empty() const noexcept { return isEmpty(); }
inline iterator replace(const Key &key, const T &value)
{
return emplaceReplace(key, value);
}
template <typename ...Args>
iterator emplaceReplace(const Key &key, Args &&... args)
{
return emplaceReplace(Key(key), std::forward<Args>(args)...);
}
template <typename ...Args>
iterator emplaceReplace(Key &&key, Args &&... args)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{
detach();
auto result = d->findOrInsert(key);
if (!result.initialized) {
++m_size;
Node::createInPlace(result.it.node(), std::move(key), std::forward<Args>(args)...);
} else {
result.it.node()->emplaceValue(std::forward<Args>(args)...);
}
return iterator(result.it);
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
}
inline QMultiHash &operator+=(const QMultiHash &other)
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
{ this->unite(other); return *this; }
inline QMultiHash operator+(const QMultiHash &other) const
{ QMultiHash result = *this; result += other; return result; }
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
bool contains(const Key &key, const T &value) const noexcept
{
if (isEmpty())
return false;
auto n = d->findNode(key);
if (n == nullptr)
return false;
return n->value->contains(value);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
qsizetype remove(const Key &key, const T &value)
{
if (isEmpty()) // prevents detaching shared null
return false;
detach();
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
auto it = d->find(key);
if (it.isUnused())
return 0;
qsizetype n = 0;
Chain **e = &it.node()->value;
while (*e) {
Chain *entry = *e;
if (entry->value == value) {
*e = entry->next;
delete entry;
++n;
} else {
e = &entry->next;
}
}
if (!it.node()->value)
d->erase(it);
m_size -= n;
Q_ASSERT(m_size >= 0);
return n;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
qsizetype count(const Key &key) const noexcept
{
auto it = d->find(key);
if (it.isUnused())
return 0;
qsizetype n = 0;
Chain *e = it.node()->value;
while (e) {
++n;
e = e->next;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return n;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
qsizetype count(const Key &key, const T &value) const noexcept
{
auto it = d->find(key);
if (it.isUnused())
return 0;
qsizetype n = 0;
Chain *e = it.node()->value;
while (e) {
if (e->value == value)
++n;
e = e->next;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return n;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
iterator find(const Key &key, const T &value)
{
detach();
auto it = constFind(key, value);
return iterator(it.i, it.e);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
const_iterator find(const Key &key, const T &value) const noexcept
{
return constFind(key, value);
}
const_iterator constFind(const Key &key, const T &value) const noexcept
{
const_iterator i(constFind(key));
const_iterator end(constEnd());
while (i != end && i.key() == key) {
if (i.value() == value)
return i;
++i;
}
return end;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QMultiHash &unite(const QMultiHash &other)
{
if (isEmpty()) {
*this = other;
} else if (other.isEmpty()) {
;
} else {
QMultiHash copy(other);
detach();
for (auto cit = copy.cbegin(); cit != copy.cend(); ++cit)
insert(cit.key(), *cit);
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
return *this;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QPair<iterator, iterator> equal_range(const Key &key)
{
detach();
auto pair = qAsConst(*this).equal_range(key);
return qMakePair(iterator(pair.first.i), iterator(pair.second.i));
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
QPair<const_iterator, const_iterator> equal_range(const Key &key) const noexcept
{
auto it = d->find(key);
if (it.isUnused())
return qMakePair(end(), end());
auto end = it;
++end;
return qMakePair(const_iterator(it), const_iterator(end));
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
private:
void detach_helper()
{
if (!d) {
d = new Data;
return;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
Data *dd = new Data(*d);
if (!d->ref.deref())
delete d;
d = dd;
}
New QHash implementation A brand new QHash implementation using a faster and more memory efficient data structure than the old QHash. A new implementation for QHash. Instead of a node based approach as the old QHash, this implementation now uses a two stage lookup table. The total amount of buckets in the table are divided into spans of 128 entries. Inside each span, we use an array of chars to index into a storage area for the span. The storage area for each span is a simple array, that gets (re-)allocated with size increments of 16 items. This gives an average memory overhead of 8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span. To give good performance and avoid too many collisions, the array keeps its load factor between .25 and .5 (and grows and rehashes if the load factor goes above .5). This design allows us to keep the memory overhead of the Hash very small, while at the same time giving very good performance. The calculated overhead for a QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for a QHash<ptr, ptr>. The new implementation also completely splits the QHash and QMultiHash classes. One behavioral change to note is that the new QHash implementation will not provide stable references to nodes in the hash when the table needs to grow. Benchmarking using https://github.com/Tessil/hash-table-shootout shows very nice performance compared to many different hash table implementation. Numbers shown below are for a hash<int64, int64> with 1 million entries. These numbers scale nicely (mostly in a linear fashion with some variation due to varying load factors) to smaller and larger tables. All numbers are in seconds, measured with gcc on Linux: Hash table random random random random reads full insertion insertion full full after iteration (reserved) deletes reads deletes ------------------------------------------------------------------------------ std::unordered_map 0,3842 0,1969 0,4511 0,1300 0,1169 0,0708 google::dense_hash_map 0,1091 0,0846 0,0550 0,0452 0,0754 0,0160 google::sparse_hash_map 0,2888 0,1582 0,0948 0,1020 0,1348 0,0112 tsl::sparse_map 0,1487 0,1013 0,0735 0,0448 0,0505 0,0042 old QHash 0,2886 0,1798 0,5065 0,0840 0,0717 0,1387 new QHash 0,0940 0,0714 0,1494 0,0579 0,0449 0,0146 Numbers for hash<std::string, int64>, with the string having 15 characters: Hash table random random random random reads insertion insertion full full after (reserved) deletes reads deletes -------------------------------------------------------------------- std::unordered_map 0,4993 0,2563 0,5515 0,2950 0,2153 google::dense_hash_map 0,2691 0,1870 0,1547 0,1125 0,1622 google::sparse_hash_map 0,6979 0,3304 0,1884 0,1822 0,2122 tsl::sparse_map 0,4066 0,2586 0,1929 0,1146 0,1095 old QHash 0,3236 0,2064 0,5986 0,2115 0,1666 new QHash 0,2119 0,1652 0,2390 0,1378 0,0965 Memory usage numbers (in MB for a table with 1M entries) also look very nice: Hash table Key int64 std::string (15 chars) Value int64 int64 --------------------------------------------------------- std::unordered_map 44.63 75.35 google::dense_hash_map 32.32 80,60 google::sparse_hash_map 18.08 44.21 tsl::sparse_map 20.44 45,93 old QHash 53.95 69,16 new QHash 23.23 51,32 Fixes: QTBUG-80311 Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-01-17 13:33:53 +00:00
};
Q_DECLARE_ASSOCIATIVE_FORWARD_ITERATOR(Hash)
Q_DECLARE_MUTABLE_ASSOCIATIVE_FORWARD_ITERATOR(Hash)
Q_DECLARE_ASSOCIATIVE_FORWARD_ITERATOR(MultiHash)
Q_DECLARE_MUTABLE_ASSOCIATIVE_FORWARD_ITERATOR(MultiHash)
template <class Key, class T>
size_t qHash(const QHash<Key, T> &key, size_t seed = 0)
noexcept(noexcept(qHash(std::declval<Key&>())) && noexcept(qHash(std::declval<T&>())))
{
QtPrivate::QHashCombineCommutative hash;
for (auto it = key.begin(), end = key.end(); it != end; ++it) {
const Key &k = it.key();
const T &v = it.value();
seed = hash(seed, std::pair<const Key&, const T&>(k, v));
}
return seed;
}
template <class Key, class T>
inline size_t qHash(const QMultiHash<Key, T> &key, size_t seed = 0)
noexcept(noexcept(qHash(std::declval<Key&>())) && noexcept(qHash(std::declval<T&>())))
{
const QHash<Key, T> &key2 = key;
return qHash(key2, seed);
}
QT_END_NAMESPACE
#endif // QHASH_H