google_hash 0.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README +21 -0
- data/Rakefile +11 -0
- data/VERSION +1 -0
- data/ext/extconf.rb +15 -0
- data/ext/go.cpp +109 -0
- data/ext/sparsehash-1.5.2/AUTHORS +2 -0
- data/ext/sparsehash-1.5.2/COPYING +28 -0
- data/ext/sparsehash-1.5.2/ChangeLog +167 -0
- data/ext/sparsehash-1.5.2/INSTALL +236 -0
- data/ext/sparsehash-1.5.2/Makefile.am +157 -0
- data/ext/sparsehash-1.5.2/Makefile.in +1019 -0
- data/ext/sparsehash-1.5.2/NEWS +0 -0
- data/ext/sparsehash-1.5.2/README +149 -0
- data/ext/sparsehash-1.5.2/README.windows +25 -0
- data/ext/sparsehash-1.5.2/TODO +28 -0
- data/ext/sparsehash-1.5.2/aclocal.m4 +868 -0
- data/ext/sparsehash-1.5.2/compile +99 -0
- data/ext/sparsehash-1.5.2/config.guess +1516 -0
- data/ext/sparsehash-1.5.2/config.sub +1626 -0
- data/ext/sparsehash-1.5.2/configure +8054 -0
- data/ext/sparsehash-1.5.2/configure.ac +74 -0
- data/ext/sparsehash-1.5.2/depcomp +530 -0
- data/ext/sparsehash-1.5.2/doc/dense_hash_map.html +1591 -0
- data/ext/sparsehash-1.5.2/doc/dense_hash_set.html +1445 -0
- data/ext/sparsehash-1.5.2/doc/designstyle.css +115 -0
- data/ext/sparsehash-1.5.2/doc/implementation.html +365 -0
- data/ext/sparsehash-1.5.2/doc/index.html +69 -0
- data/ext/sparsehash-1.5.2/doc/performance.html +96 -0
- data/ext/sparsehash-1.5.2/doc/sparse_hash_map.html +1527 -0
- data/ext/sparsehash-1.5.2/doc/sparse_hash_set.html +1376 -0
- data/ext/sparsehash-1.5.2/doc/sparsetable.html +1393 -0
- data/ext/sparsehash-1.5.2/experimental/Makefile +9 -0
- data/ext/sparsehash-1.5.2/experimental/README +14 -0
- data/ext/sparsehash-1.5.2/experimental/example.c +54 -0
- data/ext/sparsehash-1.5.2/experimental/libchash.c +1537 -0
- data/ext/sparsehash-1.5.2/experimental/libchash.h +252 -0
- data/ext/sparsehash-1.5.2/google-sparsehash.sln +47 -0
- data/ext/sparsehash-1.5.2/install-sh +323 -0
- data/ext/sparsehash-1.5.2/m4/acx_pthread.m4 +363 -0
- data/ext/sparsehash-1.5.2/m4/google_namespace.m4 +42 -0
- data/ext/sparsehash-1.5.2/m4/namespaces.m4 +15 -0
- data/ext/sparsehash-1.5.2/m4/stl_hash.m4 +70 -0
- data/ext/sparsehash-1.5.2/m4/stl_hash_fun.m4 +36 -0
- data/ext/sparsehash-1.5.2/m4/stl_namespace.m4 +25 -0
- data/ext/sparsehash-1.5.2/missing +360 -0
- data/ext/sparsehash-1.5.2/mkinstalldirs +158 -0
- data/ext/sparsehash-1.5.2/packages/deb.sh +74 -0
- data/ext/sparsehash-1.5.2/packages/deb/README +7 -0
- data/ext/sparsehash-1.5.2/packages/deb/changelog +107 -0
- data/ext/sparsehash-1.5.2/packages/deb/compat +1 -0
- data/ext/sparsehash-1.5.2/packages/deb/control +17 -0
- data/ext/sparsehash-1.5.2/packages/deb/copyright +35 -0
- data/ext/sparsehash-1.5.2/packages/deb/docs +16 -0
- data/ext/sparsehash-1.5.2/packages/deb/rules +117 -0
- data/ext/sparsehash-1.5.2/packages/deb/sparsehash.dirs +2 -0
- data/ext/sparsehash-1.5.2/packages/deb/sparsehash.install +2 -0
- data/ext/sparsehash-1.5.2/packages/rpm.sh +86 -0
- data/ext/sparsehash-1.5.2/packages/rpm/rpm.spec +61 -0
- data/ext/sparsehash-1.5.2/src/config.h.in +131 -0
- data/ext/sparsehash-1.5.2/src/config.h.include +23 -0
- data/ext/sparsehash-1.5.2/src/google/dense_hash_map +310 -0
- data/ext/sparsehash-1.5.2/src/google/dense_hash_set +287 -0
- data/ext/sparsehash-1.5.2/src/google/sparse_hash_map +294 -0
- data/ext/sparsehash-1.5.2/src/google/sparse_hash_set +275 -0
- data/ext/sparsehash-1.5.2/src/google/sparsehash/densehashtable.h +1062 -0
- data/ext/sparsehash-1.5.2/src/google/sparsehash/sparsehashtable.h +1015 -0
- data/ext/sparsehash-1.5.2/src/google/sparsetable +1468 -0
- data/ext/sparsehash-1.5.2/src/google/type_traits.h +250 -0
- data/ext/sparsehash-1.5.2/src/hashtable_unittest.cc +1375 -0
- data/ext/sparsehash-1.5.2/src/simple_test.cc +103 -0
- data/ext/sparsehash-1.5.2/src/sparsetable_unittest.cc +696 -0
- data/ext/sparsehash-1.5.2/src/time_hash_map.cc +488 -0
- data/ext/sparsehash-1.5.2/src/type_traits_unittest.cc +492 -0
- data/ext/sparsehash-1.5.2/src/windows/config.h +149 -0
- data/ext/sparsehash-1.5.2/src/windows/google/sparsehash/sparseconfig.h +32 -0
- data/ext/sparsehash-1.5.2/src/windows/port.cc +63 -0
- data/ext/sparsehash-1.5.2/src/windows/port.h +81 -0
- data/ext/sparsehash-1.5.2/src/words +8944 -0
- data/ext/sparsehash-1.5.2/vsprojects/hashtable_unittest/hashtable_unittest.vcproj +187 -0
- data/ext/sparsehash-1.5.2/vsprojects/sparsetable_unittest/sparsetable_unittest.vcproj +172 -0
- data/ext/sparsehash-1.5.2/vsprojects/time_hash_map/time_hash_map.vcproj +187 -0
- data/ext/sparsehash-1.5.2/vsprojects/type_traits_unittest/type_traits_unittest.vcproj +169 -0
- data/ext/test.rb +10 -0
- data/test/spec.go +70 -0
- metadata +147 -0
@@ -0,0 +1,115 @@
|
|
1
|
+
body {
|
2
|
+
background-color: #ffffff;
|
3
|
+
color: black;
|
4
|
+
margin-right: 1in;
|
5
|
+
margin-left: 1in;
|
6
|
+
}
|
7
|
+
|
8
|
+
|
9
|
+
h1, h2, h3, h4, h5, h6 {
|
10
|
+
color: #3366ff;
|
11
|
+
font-family: sans-serif;
|
12
|
+
}
|
13
|
+
@media print {
|
14
|
+
/* Darker version for printing */
|
15
|
+
h1, h2, h3, h4, h5, h6 {
|
16
|
+
color: #000080;
|
17
|
+
font-family: helvetica, sans-serif;
|
18
|
+
}
|
19
|
+
}
|
20
|
+
|
21
|
+
h1 {
|
22
|
+
text-align: center;
|
23
|
+
font-size: 18pt;
|
24
|
+
}
|
25
|
+
h2 {
|
26
|
+
margin-left: -0.5in;
|
27
|
+
}
|
28
|
+
h3 {
|
29
|
+
margin-left: -0.25in;
|
30
|
+
}
|
31
|
+
h4 {
|
32
|
+
margin-left: -0.125in;
|
33
|
+
}
|
34
|
+
hr {
|
35
|
+
margin-left: -1in;
|
36
|
+
}
|
37
|
+
|
38
|
+
/* Definition lists: definition term bold */
|
39
|
+
dt {
|
40
|
+
font-weight: bold;
|
41
|
+
}
|
42
|
+
|
43
|
+
address {
|
44
|
+
text-align: right;
|
45
|
+
}
|
46
|
+
/* Use the <code> tag for bits of code and <var> for variables and objects. */
|
47
|
+
code,pre,samp,var {
|
48
|
+
color: #006000;
|
49
|
+
}
|
50
|
+
/* Use the <file> tag for file and directory paths and names. */
|
51
|
+
file {
|
52
|
+
color: #905050;
|
53
|
+
font-family: monospace;
|
54
|
+
}
|
55
|
+
/* Use the <kbd> tag for stuff the user should type. */
|
56
|
+
kbd {
|
57
|
+
color: #600000;
|
58
|
+
}
|
59
|
+
div.note p {
|
60
|
+
float: right;
|
61
|
+
width: 3in;
|
62
|
+
margin-right: 0%;
|
63
|
+
padding: 1px;
|
64
|
+
border: 2px solid #6060a0;
|
65
|
+
background-color: #fffff0;
|
66
|
+
}
|
67
|
+
|
68
|
+
UL.nobullets {
|
69
|
+
list-style-type: none;
|
70
|
+
list-style-image: none;
|
71
|
+
margin-left: -1em;
|
72
|
+
}
|
73
|
+
|
74
|
+
/*
|
75
|
+
body:after {
|
76
|
+
content: "Google Confidential";
|
77
|
+
}
|
78
|
+
*/
|
79
|
+
|
80
|
+
/* pretty printing styles. See prettify.js */
|
81
|
+
.str { color: #080; }
|
82
|
+
.kwd { color: #008; }
|
83
|
+
.com { color: #800; }
|
84
|
+
.typ { color: #606; }
|
85
|
+
.lit { color: #066; }
|
86
|
+
.pun { color: #660; }
|
87
|
+
.pln { color: #000; }
|
88
|
+
.tag { color: #008; }
|
89
|
+
.atn { color: #606; }
|
90
|
+
.atv { color: #080; }
|
91
|
+
pre.prettyprint { padding: 2px; border: 1px solid #888; }
|
92
|
+
|
93
|
+
.embsrc { background: #eee; }
|
94
|
+
|
95
|
+
@media print {
|
96
|
+
.str { color: #060; }
|
97
|
+
.kwd { color: #006; font-weight: bold; }
|
98
|
+
.com { color: #600; font-style: italic; }
|
99
|
+
.typ { color: #404; font-weight: bold; }
|
100
|
+
.lit { color: #044; }
|
101
|
+
.pun { color: #440; }
|
102
|
+
.pln { color: #000; }
|
103
|
+
.tag { color: #006; font-weight: bold; }
|
104
|
+
.atn { color: #404; }
|
105
|
+
.atv { color: #060; }
|
106
|
+
}
|
107
|
+
|
108
|
+
/* Table Column Headers */
|
109
|
+
.hdr {
|
110
|
+
color: #006;
|
111
|
+
font-weight: bold;
|
112
|
+
background-color: #dddddd; }
|
113
|
+
.hdr2 {
|
114
|
+
color: #006;
|
115
|
+
background-color: #eeeeee; }
|
@@ -0,0 +1,365 @@
|
|
1
|
+
<HTML>
|
2
|
+
|
3
|
+
<HEAD>
|
4
|
+
<title>Implementation notes: sparse_hash, dense_hash, sparsetable</title>
|
5
|
+
</HEAD>
|
6
|
+
|
7
|
+
<BODY>
|
8
|
+
|
9
|
+
<h1>Implementation of sparse_hash_map, dense_hash_map, and
|
10
|
+
sparsetable</h1>
|
11
|
+
|
12
|
+
This document contains a few notes on how the data structures in this
|
13
|
+
package are implemented. This discussion refers at several points to
|
14
|
+
the classic text in this area: Knuth, <i>The Art of Computer
|
15
|
+
Programming</i>, Vol 3, Hashing.
|
16
|
+
|
17
|
+
|
18
|
+
<hr>
|
19
|
+
<h2><tt>sparsetable</tt></h2>
|
20
|
+
|
21
|
+
<p>For specificity, consider the declaration </p>
|
22
|
+
|
23
|
+
<pre>
|
24
|
+
sparsetable<Foo> t(100); // a sparse array with 100 elements
|
25
|
+
</pre>
|
26
|
+
|
27
|
+
<p>A sparsetable is a random container that implements a sparse array,
|
28
|
+
that is, an array that uses very little memory to store unassigned
|
29
|
+
indices (in this case, between 1-2 bits per unassigned index). For
|
30
|
+
instance, if you allocate an array of size 5 and assign a[2] = [big
|
31
|
+
struct], then a[2] will take up a lot of memory but a[0], a[1], a[3],
|
32
|
+
and a[4] will not. Array elements that have a value are called
|
33
|
+
"assigned". Array elements that have no value yet, or have had their
|
34
|
+
value cleared using erase() or clear(), are called "unassigned".
|
35
|
+
For assigned elements, lookups return the assigned value; for
|
36
|
+
unassigned elements, they return the default value, which for t is
|
37
|
+
Foo().</p>
|
38
|
+
|
39
|
+
<p>sparsetable is implemented as an array of "groups". Each group is
|
40
|
+
responsible for M array indices. The first group knows about
|
41
|
+
t[0]..t[M-1], the second about t[M]..t[2M-1], and so forth. (M is 48
|
42
|
+
by default.) At construct time, t creates an array of (99/M + 1)
|
43
|
+
groups. From this point on, all operations -- insert, delete, lookup
|
44
|
+
-- are passed to the appropriate group. In particular, any operation
|
45
|
+
on t[i] is actually performed on (t.group[i / M])[i % M].</p>
|
46
|
+
|
47
|
+
<p>Each group contains of a vector, which holds assigned values, and a
|
48
|
+
bitmap of size M, which indicates which indices are assigned. A
|
49
|
+
lookup works as follows: the group is asked to look up index i, where
|
50
|
+
i < M. The group looks at bitmap[i]. If it's 0, the lookup fails.
|
51
|
+
If it's 1, then the group has to find the appropriate value in the
|
52
|
+
vector.</p>
|
53
|
+
|
54
|
+
<h3><tt>find()</tt></h3>
|
55
|
+
|
56
|
+
<p>Finding the appropriate vector element is the most expensive part of
|
57
|
+
the lookup. The code counts all bitmap entries <= i that are set to
|
58
|
+
1. (There's at least 1 of them, since bitmap[i] is 1.) Suppose there
|
59
|
+
are 4 such entries. Then the right value to return is the 4th element
|
60
|
+
of the vector: vector[3]. This takes time O(M), which is a constant
|
61
|
+
since M is a constant.</p>
|
62
|
+
|
63
|
+
<h3><tt>insert()</tt></h3>
|
64
|
+
|
65
|
+
<p>Insert starts with a lookup. If the lookup succeeds, the code merely
|
66
|
+
replaces vector[3] with the new value. If the lookup fails, then the
|
67
|
+
code must insert a new entry into the middle of the vector. Again, to
|
68
|
+
insert at position i, the code must count all the bitmap entries <= i
|
69
|
+
that are set to i. This indicates the position to insert into the
|
70
|
+
vector. All vector entries above that position must be moved to make
|
71
|
+
room for the new entry. This takes time, but still constant time
|
72
|
+
since the vector has size at most M.</p>
|
73
|
+
|
74
|
+
<p>(Inserts could be made faster by using a list instead of a vector to
|
75
|
+
hold group values, but this would use much more memory, since each
|
76
|
+
list element requires a full pointer of overhead.)</p>
|
77
|
+
|
78
|
+
<p>The only metadata that needs to be updated, after the actual value is
|
79
|
+
inserted, is to set bitmap[i] to 1. No other counts must be
|
80
|
+
maintained.</p>
|
81
|
+
|
82
|
+
<h3><tt>delete()</tt></h3>
|
83
|
+
|
84
|
+
<p>Deletes are similar to inserts. They start with a lookup. If it
|
85
|
+
fails, the delete is a noop. Otherwise, the appropriate entry is
|
86
|
+
removed from the vector, all the vector elements above it are moved
|
87
|
+
down one, and bitmap[i] is set to 0.</p>
|
88
|
+
|
89
|
+
<h3>iterators</h3>
|
90
|
+
|
91
|
+
<p>Sparsetable iterators pose a special burden. They must iterate over
|
92
|
+
unassigned array values, but the act of iterating should not cause an
|
93
|
+
assignment to happen -- otherwise, iterating over a sparsetable would
|
94
|
+
cause it to take up much more room. For const iterators, the matter
|
95
|
+
is simple: the iterator is merely programmed to return the default
|
96
|
+
value -- Foo() -- when dereferenced while pointing to an unassigned
|
97
|
+
entry.</p>
|
98
|
+
|
99
|
+
<p>For non-const iterators, such simple techniques fail. Instead,
|
100
|
+
dereferencing a sparsetable_iterator returns an opaque object that
|
101
|
+
acts like a Foo in almost all situations, but isn't actually a Foo.
|
102
|
+
(It does this by defining operator=(), operator value_type(), and,
|
103
|
+
most sneakily, operator&().) This works in almost all cases. If it
|
104
|
+
doesn't, an explicit cast to value_type will solve the problem:</p>
|
105
|
+
|
106
|
+
<pre>
|
107
|
+
printf("%d", static_cast<Foo>(*t.find(0)));
|
108
|
+
</pre>
|
109
|
+
|
110
|
+
<p>To avoid such problems, consider using get() and set() instead of an
|
111
|
+
iterator:</p>
|
112
|
+
|
113
|
+
<pre>
|
114
|
+
for (int i = 0; i < t.size(); ++i)
|
115
|
+
if (t.get(i) == ...) t.set(i, ...);
|
116
|
+
</pre>
|
117
|
+
|
118
|
+
<p>Sparsetable also has a special class of iterator, besides normal and
|
119
|
+
const: nonempty_iterator. This only iterates over array values that
|
120
|
+
are assigned. This is particularly fast given the sparsetable
|
121
|
+
implementation, since it can ignore the bitmaps entirely and just
|
122
|
+
iterate over the various group vectors.</p>
|
123
|
+
|
124
|
+
<h3>Resource use</h3>
|
125
|
+
|
126
|
+
<p>The space overhead for an sparsetable of size N is N + 48N/M bits.
|
127
|
+
For the default value of M, this is exactly 2 bits per array entry.
|
128
|
+
(That's for 32-bit pointers; for machines with 64-bit pointers, it's N
|
129
|
+
+ 80N/M bits, or 2.67 bits per entry.)
|
130
|
+
A larger M would use less overhead -- approaching 1 bit per array
|
131
|
+
entry -- but take longer for inserts, deletes, and lookups. A smaller
|
132
|
+
M would use more overhead but make operations somewhat faster.</p>
|
133
|
+
|
134
|
+
<p>You can also look at some specific <A
|
135
|
+
HREF="performance.html">performance numbers</A>.</p>
|
136
|
+
|
137
|
+
|
138
|
+
<hr>
|
139
|
+
<h2><tt>sparse_hash_set</tt></h2>
|
140
|
+
|
141
|
+
<p>For specificity, consider the declaration </p>
|
142
|
+
|
143
|
+
<pre>
|
144
|
+
sparse_hash_set<Foo> t;
|
145
|
+
</pre>
|
146
|
+
|
147
|
+
<p>sparse_hash_set is a hashtable. For more information on hashtables,
|
148
|
+
see Knuth. Hashtables are basically arrays with complicated logic on
|
149
|
+
top of them. sparse_hash_set uses a sparsetable to implement the
|
150
|
+
underlying array.</p>
|
151
|
+
|
152
|
+
<p>In particular, sparse_hash_set stores its data in a sparsetable using
|
153
|
+
quadratic internal probing (see Knuth). Many hashtable
|
154
|
+
implementations use external probing, so each table element is
|
155
|
+
actually a pointer chain, holding many hashtable values.
|
156
|
+
sparse_hash_set, on the other hand, always stores at most one value in
|
157
|
+
each table location. If the hashtable wants to store a second value
|
158
|
+
at a given table location, it can't; it's forced to look somewhere
|
159
|
+
else.</p>
|
160
|
+
|
161
|
+
<h3><tt>insert()</tt></h3>
|
162
|
+
|
163
|
+
<p>As a specific example, suppose t is a new sparse_hash_set. It then
|
164
|
+
holds a sparsetable of size 32. The code for t.insert(foo) works as
|
165
|
+
follows:</p>
|
166
|
+
|
167
|
+
<p>
|
168
|
+
1) Call hash<Foo>(foo) to convert foo into an integer i. (hash<Foo> is
|
169
|
+
the default hash function; you can specify a different one in the
|
170
|
+
template arguments.)
|
171
|
+
|
172
|
+
</p><p>
|
173
|
+
2a) Look at t.sparsetable[i % 32]. If it's unassigned, assign it to
|
174
|
+
foo. foo is now in the hashtable.
|
175
|
+
|
176
|
+
</p><p>
|
177
|
+
2b) If t.sparsetable[i % 32] is assigned, and its value is foo, then
|
178
|
+
do nothing: foo was already in t and the insert is a noop.
|
179
|
+
|
180
|
+
</p><p>
|
181
|
+
2c) If t.sparsetable[i % 32] is assigned, but to a value other than
|
182
|
+
foo, look at t.sparsetable[(i+1) % 32]. If that also fails, try
|
183
|
+
t.sparsetable[(i+3) % 32], then t.sparsetable[(i+6) % 32]. In
|
184
|
+
general, keep trying the next triangular number.
|
185
|
+
|
186
|
+
</p><p>
|
187
|
+
3) If the table is now "too full" -- say, 25 of the 32 table entries
|
188
|
+
are now assigned -- grow the table by creating a new sparsetable
|
189
|
+
that's twice as big, and rehashing every single element from the
|
190
|
+
old table into the new one. This keeps the table from ever filling
|
191
|
+
up.
|
192
|
+
|
193
|
+
</p><p>
|
194
|
+
4) If the table is now "too empty" -- say, only 3 of the 32 table
|
195
|
+
entries are now assigned -- shrink the table by creating a new
|
196
|
+
sparsetable that's half as big, and rehashing every element as in
|
197
|
+
the growing case. This keeps the table overhead proportional to
|
198
|
+
the number of elements in the table.
|
199
|
+
</p>
|
200
|
+
|
201
|
+
<p>Instead of using triangular numbers as offsets, one could just use
|
202
|
+
regular integers: try i, then i+1, then i+2, then i+3. This has bad
|
203
|
+
'clumping' behavior, as explored in Knuth. Quadratic probing, using
|
204
|
+
the triangular numbers, avoids the clumping while keeping cache
|
205
|
+
coherency in the common case. As long as the table size is a power of
|
206
|
+
2, the quadratic-probing method described above will explore every
|
207
|
+
table element if necessary, to find a good place to insert.</p>
|
208
|
+
|
209
|
+
<p>(As a side note, using a table size that's a power of two has several
|
210
|
+
advantages, including the speed of calculating (i % table_size). On
|
211
|
+
the other hand, power-of-two tables are not very forgiving of a poor
|
212
|
+
hash function. Make sure your hash function is a good one! There are
|
213
|
+
plenty of dos and don'ts on the web (and in Knuth), for writing hash
|
214
|
+
functions.)</p>
|
215
|
+
|
216
|
+
<p>The "too full" value, also called the "maximum occupancy", determines
|
217
|
+
a time-space tradeoff: in general, the higher it is, the less space is
|
218
|
+
wasted but the more probes must be performed for each insert.
|
219
|
+
sparse_hash_set uses a high maximum occupancy, since space is more
|
220
|
+
important than speed for this data structure.</p>
|
221
|
+
|
222
|
+
<p>The "too empty" value is not necessary for performance but helps with
|
223
|
+
space use. It's rare for hashtable implementations to check this
|
224
|
+
value at insert() time -- after all, how will inserting cause a
|
225
|
+
hashtable to get too small? However, the sparse_hash_set
|
226
|
+
implementation never resizes on erase(); it's nice to have an erase()
|
227
|
+
that does not invalidate iterators. Thus, the first insert() after a
|
228
|
+
long string of erase()s could well trigger a hashtable shrink.</p>
|
229
|
+
|
230
|
+
<h3><tt>find()</tt></h3>
|
231
|
+
|
232
|
+
<p>find() works similarly to insert. The only difference is in step
|
233
|
+
(2a): if the value is unassigned, then the lookup fails immediately.</p>
|
234
|
+
|
235
|
+
<h3><tt>delete()</tt></h3>
|
236
|
+
|
237
|
+
<p>delete() is tricky in an internal-probing scheme. The obvious
|
238
|
+
implementation of just "unassigning" the relevant table entry doesn't
|
239
|
+
work. Consider the following scenario:</p>
|
240
|
+
|
241
|
+
<pre>
|
242
|
+
t.insert(foo1); // foo1 hashes to 4, is put in table[4]
|
243
|
+
t.insert(foo2); // foo2 hashes to 4, is put in table[5]
|
244
|
+
t.erase(foo1); // table[4] is now 'unassigned'
|
245
|
+
t.lookup(foo2); // fails since table[hash(foo2)] is unassigned
|
246
|
+
</pre>
|
247
|
+
|
248
|
+
<p>To avoid these failure situations, delete(foo1) is actually
|
249
|
+
implemented by replacing foo1 by a special 'delete' value in the
|
250
|
+
hashtable. This 'delete' value causes the table entry to be
|
251
|
+
considered unassigned for the purposes of insertion -- if foo3 hashes
|
252
|
+
to 4 as well, it can go into table[4] no problem -- but assigned for
|
253
|
+
the purposes of lookup.</p>
|
254
|
+
|
255
|
+
<p>What is this special 'delete' value? The delete value has to be an
|
256
|
+
element of type Foo, since the table can't hold anything else. It
|
257
|
+
obviously must be an element the client would never want to insert on
|
258
|
+
its own, or else the code couldn't distinguish deleted entries from
|
259
|
+
'real' entries with the same value. There's no way to determine a
|
260
|
+
good value automatically. The client has to specify it explicitly.
|
261
|
+
This is what the set_deleted_key() method does.</p>
|
262
|
+
|
263
|
+
<p>Note that set_deleted_key() is only necessary if the client actually
|
264
|
+
wants to call t.erase(). For insert-only hash-sets, set_deleted_key()
|
265
|
+
is unnecessary.</p>
|
266
|
+
|
267
|
+
<p>When copying the hashtable, either to grow it or shrink it, the
|
268
|
+
special 'delete' values are <b>not</b> copied into the new table. The
|
269
|
+
copy-time rehash makes them unnecessary.</p>
|
270
|
+
|
271
|
+
<h3>Resource use</h3>
|
272
|
+
|
273
|
+
<p>The data is stored in a sparsetable, so space use is the same as
|
274
|
+
for sparsetable. Time use is also determined in large part by the
|
275
|
+
sparsetable implementation. However, there is also an extra probing
|
276
|
+
cost in hashtables, which depends in large part on the "too full"
|
277
|
+
value. It should be rare to need more than 4-5 probes per lookup, and
|
278
|
+
usually significantly less will suffice.</p>
|
279
|
+
|
280
|
+
<p>A note on growing and shrinking the hashtable: all hashtable
|
281
|
+
implementations use the most memory when growing a hashtable, since
|
282
|
+
they must have room for both the old table and the new table at the
|
283
|
+
same time. sparse_hash_set is careful to delete entries from the old
|
284
|
+
hashtable as soon as they're copied into the new one, to minimize this
|
285
|
+
space overhead. (It does this efficiently by using its knowledge of
|
286
|
+
the sparsetable class and copying one sparsetable group at a time.)</p>
|
287
|
+
|
288
|
+
<p>You can also look at some specific <A
|
289
|
+
HREF="performance.html">performance numbers</A>.</p>
|
290
|
+
|
291
|
+
|
292
|
+
<hr>
|
293
|
+
<h2><tt>sparse_hash_map</tt></h2>
|
294
|
+
|
295
|
+
<p>sparse_hash_map is implemented identically to sparse_hash_set. The
|
296
|
+
only difference is instead of storing just Foo in each table entry,
|
297
|
+
the data structure stores pair<Foo, Value>.</p>
|
298
|
+
|
299
|
+
|
300
|
+
<hr>
|
301
|
+
<h2><tt>dense_hash_set</tt></h2>
|
302
|
+
|
303
|
+
<p>The hashtable aspects of dense_hash_set are identical to
|
304
|
+
sparse_hash_set: it uses quadratic internal probing, and resizes
|
305
|
+
hashtables in exactly the same way. The difference is in the
|
306
|
+
underlying array: instead of using a sparsetable, dense_hash_set uses
|
307
|
+
a C array. This means much more space is used, especially if Foo is
|
308
|
+
big. However, it makes all operations faster, since sparsetable has
|
309
|
+
memory management overhead that C arrays do not.</p>
|
310
|
+
|
311
|
+
<p>The use of C arrays instead of sparsetables points to one immediate
|
312
|
+
complication dense_hash_set has that sparse_hash_set does not: the
|
313
|
+
need to distinguish assigned from unassigned entries. In a
|
314
|
+
sparsetable, this is accomplished by a bitmap. dense_hash_set, on the
|
315
|
+
other hand, uses a dedicated value to specify unassigned entries.
|
316
|
+
Thus, dense_hash_set has two special values: one to indicate deleted
|
317
|
+
table entries, and one to indicated unassigned table entries. At
|
318
|
+
construct time, all table entries are initialized to 'unassigned'.</p>
|
319
|
+
|
320
|
+
<p>dense_hash_set provides the method set_empty_key() to indicate the
|
321
|
+
value that should be used for unassigned entries. Like
|
322
|
+
set_deleted_key(), set_empty_key() requires a value that will not be
|
323
|
+
used by the client for any legitimate purpose. Unlike
|
324
|
+
set_deleted_key(), set_empty_key() is always required, no matter what
|
325
|
+
hashtable operations the client wishes to perform.</p>
|
326
|
+
|
327
|
+
<h3>Resource use</h3>
|
328
|
+
|
329
|
+
<p>This implementation is fast because even though dense_hash_set may not
|
330
|
+
be space efficient, most lookups are localized: a single lookup may
|
331
|
+
need to access table[i], and maybe table[i+1] and table[i+3], but
|
332
|
+
nothing other than that. For all but the biggest data structures,
|
333
|
+
these will frequently be in a single cache line.</p>
|
334
|
+
|
335
|
+
<p>This implementation takes, for every unused bucket, space as big as
|
336
|
+
the key-type. Usually between half and two-thirds of the buckets are
|
337
|
+
empty.</p>
|
338
|
+
|
339
|
+
<p>The doubling method used by dense_hash_set tends to work poorly
|
340
|
+
with most memory allocators. This is because memory allocators tend
|
341
|
+
to have memory 'buckets' which are a power of two. Since each
|
342
|
+
doubling of a dense_hash_set doubles the memory use, a single
|
343
|
+
hashtable doubling will require a new memory 'bucket' from the memory
|
344
|
+
allocator, leaving the old bucket stranded as fragmented memory.
|
345
|
+
Hence, it's not recommended this data structure be used with many
|
346
|
+
inserts in memory-constrained situations.</p>
|
347
|
+
|
348
|
+
<p>You can also look at some specific <A
|
349
|
+
HREF="performance.html">performance numbers</A>.</p>
|
350
|
+
|
351
|
+
|
352
|
+
<hr>
|
353
|
+
<h2><tt>dense_hash_map</tt></h2>
|
354
|
+
|
355
|
+
<p>dense_hash_map is identical to dense_hash_set except for what values
|
356
|
+
are stored in each table entry.</p>
|
357
|
+
|
358
|
+
<hr>
|
359
|
+
<author>
|
360
|
+
Craig Silverstein<br>
|
361
|
+
Thu Jan 6 20:15:42 PST 2005
|
362
|
+
</author>
|
363
|
+
|
364
|
+
</body>
|
365
|
+
</html>
|