google_hash 0.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (85) hide show
  1. data/README +21 -0
  2. data/Rakefile +11 -0
  3. data/VERSION +1 -0
  4. data/ext/extconf.rb +15 -0
  5. data/ext/go.cpp +109 -0
  6. data/ext/sparsehash-1.5.2/AUTHORS +2 -0
  7. data/ext/sparsehash-1.5.2/COPYING +28 -0
  8. data/ext/sparsehash-1.5.2/ChangeLog +167 -0
  9. data/ext/sparsehash-1.5.2/INSTALL +236 -0
  10. data/ext/sparsehash-1.5.2/Makefile.am +157 -0
  11. data/ext/sparsehash-1.5.2/Makefile.in +1019 -0
  12. data/ext/sparsehash-1.5.2/NEWS +0 -0
  13. data/ext/sparsehash-1.5.2/README +149 -0
  14. data/ext/sparsehash-1.5.2/README.windows +25 -0
  15. data/ext/sparsehash-1.5.2/TODO +28 -0
  16. data/ext/sparsehash-1.5.2/aclocal.m4 +868 -0
  17. data/ext/sparsehash-1.5.2/compile +99 -0
  18. data/ext/sparsehash-1.5.2/config.guess +1516 -0
  19. data/ext/sparsehash-1.5.2/config.sub +1626 -0
  20. data/ext/sparsehash-1.5.2/configure +8054 -0
  21. data/ext/sparsehash-1.5.2/configure.ac +74 -0
  22. data/ext/sparsehash-1.5.2/depcomp +530 -0
  23. data/ext/sparsehash-1.5.2/doc/dense_hash_map.html +1591 -0
  24. data/ext/sparsehash-1.5.2/doc/dense_hash_set.html +1445 -0
  25. data/ext/sparsehash-1.5.2/doc/designstyle.css +115 -0
  26. data/ext/sparsehash-1.5.2/doc/implementation.html +365 -0
  27. data/ext/sparsehash-1.5.2/doc/index.html +69 -0
  28. data/ext/sparsehash-1.5.2/doc/performance.html +96 -0
  29. data/ext/sparsehash-1.5.2/doc/sparse_hash_map.html +1527 -0
  30. data/ext/sparsehash-1.5.2/doc/sparse_hash_set.html +1376 -0
  31. data/ext/sparsehash-1.5.2/doc/sparsetable.html +1393 -0
  32. data/ext/sparsehash-1.5.2/experimental/Makefile +9 -0
  33. data/ext/sparsehash-1.5.2/experimental/README +14 -0
  34. data/ext/sparsehash-1.5.2/experimental/example.c +54 -0
  35. data/ext/sparsehash-1.5.2/experimental/libchash.c +1537 -0
  36. data/ext/sparsehash-1.5.2/experimental/libchash.h +252 -0
  37. data/ext/sparsehash-1.5.2/google-sparsehash.sln +47 -0
  38. data/ext/sparsehash-1.5.2/install-sh +323 -0
  39. data/ext/sparsehash-1.5.2/m4/acx_pthread.m4 +363 -0
  40. data/ext/sparsehash-1.5.2/m4/google_namespace.m4 +42 -0
  41. data/ext/sparsehash-1.5.2/m4/namespaces.m4 +15 -0
  42. data/ext/sparsehash-1.5.2/m4/stl_hash.m4 +70 -0
  43. data/ext/sparsehash-1.5.2/m4/stl_hash_fun.m4 +36 -0
  44. data/ext/sparsehash-1.5.2/m4/stl_namespace.m4 +25 -0
  45. data/ext/sparsehash-1.5.2/missing +360 -0
  46. data/ext/sparsehash-1.5.2/mkinstalldirs +158 -0
  47. data/ext/sparsehash-1.5.2/packages/deb.sh +74 -0
  48. data/ext/sparsehash-1.5.2/packages/deb/README +7 -0
  49. data/ext/sparsehash-1.5.2/packages/deb/changelog +107 -0
  50. data/ext/sparsehash-1.5.2/packages/deb/compat +1 -0
  51. data/ext/sparsehash-1.5.2/packages/deb/control +17 -0
  52. data/ext/sparsehash-1.5.2/packages/deb/copyright +35 -0
  53. data/ext/sparsehash-1.5.2/packages/deb/docs +16 -0
  54. data/ext/sparsehash-1.5.2/packages/deb/rules +117 -0
  55. data/ext/sparsehash-1.5.2/packages/deb/sparsehash.dirs +2 -0
  56. data/ext/sparsehash-1.5.2/packages/deb/sparsehash.install +2 -0
  57. data/ext/sparsehash-1.5.2/packages/rpm.sh +86 -0
  58. data/ext/sparsehash-1.5.2/packages/rpm/rpm.spec +61 -0
  59. data/ext/sparsehash-1.5.2/src/config.h.in +131 -0
  60. data/ext/sparsehash-1.5.2/src/config.h.include +23 -0
  61. data/ext/sparsehash-1.5.2/src/google/dense_hash_map +310 -0
  62. data/ext/sparsehash-1.5.2/src/google/dense_hash_set +287 -0
  63. data/ext/sparsehash-1.5.2/src/google/sparse_hash_map +294 -0
  64. data/ext/sparsehash-1.5.2/src/google/sparse_hash_set +275 -0
  65. data/ext/sparsehash-1.5.2/src/google/sparsehash/densehashtable.h +1062 -0
  66. data/ext/sparsehash-1.5.2/src/google/sparsehash/sparsehashtable.h +1015 -0
  67. data/ext/sparsehash-1.5.2/src/google/sparsetable +1468 -0
  68. data/ext/sparsehash-1.5.2/src/google/type_traits.h +250 -0
  69. data/ext/sparsehash-1.5.2/src/hashtable_unittest.cc +1375 -0
  70. data/ext/sparsehash-1.5.2/src/simple_test.cc +103 -0
  71. data/ext/sparsehash-1.5.2/src/sparsetable_unittest.cc +696 -0
  72. data/ext/sparsehash-1.5.2/src/time_hash_map.cc +488 -0
  73. data/ext/sparsehash-1.5.2/src/type_traits_unittest.cc +492 -0
  74. data/ext/sparsehash-1.5.2/src/windows/config.h +149 -0
  75. data/ext/sparsehash-1.5.2/src/windows/google/sparsehash/sparseconfig.h +32 -0
  76. data/ext/sparsehash-1.5.2/src/windows/port.cc +63 -0
  77. data/ext/sparsehash-1.5.2/src/windows/port.h +81 -0
  78. data/ext/sparsehash-1.5.2/src/words +8944 -0
  79. data/ext/sparsehash-1.5.2/vsprojects/hashtable_unittest/hashtable_unittest.vcproj +187 -0
  80. data/ext/sparsehash-1.5.2/vsprojects/sparsetable_unittest/sparsetable_unittest.vcproj +172 -0
  81. data/ext/sparsehash-1.5.2/vsprojects/time_hash_map/time_hash_map.vcproj +187 -0
  82. data/ext/sparsehash-1.5.2/vsprojects/type_traits_unittest/type_traits_unittest.vcproj +169 -0
  83. data/ext/test.rb +10 -0
  84. data/test/spec.go +70 -0
  85. metadata +147 -0
@@ -0,0 +1,115 @@
1
+ body {
2
+ background-color: #ffffff;
3
+ color: black;
4
+ margin-right: 1in;
5
+ margin-left: 1in;
6
+ }
7
+
8
+
9
+ h1, h2, h3, h4, h5, h6 {
10
+ color: #3366ff;
11
+ font-family: sans-serif;
12
+ }
13
+ @media print {
14
+ /* Darker version for printing */
15
+ h1, h2, h3, h4, h5, h6 {
16
+ color: #000080;
17
+ font-family: helvetica, sans-serif;
18
+ }
19
+ }
20
+
21
+ h1 {
22
+ text-align: center;
23
+ font-size: 18pt;
24
+ }
25
+ h2 {
26
+ margin-left: -0.5in;
27
+ }
28
+ h3 {
29
+ margin-left: -0.25in;
30
+ }
31
+ h4 {
32
+ margin-left: -0.125in;
33
+ }
34
+ hr {
35
+ margin-left: -1in;
36
+ }
37
+
38
+ /* Definition lists: definition term bold */
39
+ dt {
40
+ font-weight: bold;
41
+ }
42
+
43
+ address {
44
+ text-align: right;
45
+ }
46
+ /* Use the <code> tag for bits of code and <var> for variables and objects. */
47
+ code,pre,samp,var {
48
+ color: #006000;
49
+ }
50
+ /* Use the <file> tag for file and directory paths and names. */
51
+ file {
52
+ color: #905050;
53
+ font-family: monospace;
54
+ }
55
+ /* Use the <kbd> tag for stuff the user should type. */
56
+ kbd {
57
+ color: #600000;
58
+ }
59
+ div.note p {
60
+ float: right;
61
+ width: 3in;
62
+ margin-right: 0%;
63
+ padding: 1px;
64
+ border: 2px solid #6060a0;
65
+ background-color: #fffff0;
66
+ }
67
+
68
+ UL.nobullets {
69
+ list-style-type: none;
70
+ list-style-image: none;
71
+ margin-left: -1em;
72
+ }
73
+
74
+ /*
75
+ body:after {
76
+ content: "Google Confidential";
77
+ }
78
+ */
79
+
80
+ /* pretty printing styles. See prettify.js */
81
+ .str { color: #080; }
82
+ .kwd { color: #008; }
83
+ .com { color: #800; }
84
+ .typ { color: #606; }
85
+ .lit { color: #066; }
86
+ .pun { color: #660; }
87
+ .pln { color: #000; }
88
+ .tag { color: #008; }
89
+ .atn { color: #606; }
90
+ .atv { color: #080; }
91
+ pre.prettyprint { padding: 2px; border: 1px solid #888; }
92
+
93
+ .embsrc { background: #eee; }
94
+
95
+ @media print {
96
+ .str { color: #060; }
97
+ .kwd { color: #006; font-weight: bold; }
98
+ .com { color: #600; font-style: italic; }
99
+ .typ { color: #404; font-weight: bold; }
100
+ .lit { color: #044; }
101
+ .pun { color: #440; }
102
+ .pln { color: #000; }
103
+ .tag { color: #006; font-weight: bold; }
104
+ .atn { color: #404; }
105
+ .atv { color: #060; }
106
+ }
107
+
108
+ /* Table Column Headers */
109
+ .hdr {
110
+ color: #006;
111
+ font-weight: bold;
112
+ background-color: #dddddd; }
113
+ .hdr2 {
114
+ color: #006;
115
+ background-color: #eeeeee; }
@@ -0,0 +1,365 @@
1
+ <HTML>
2
+
3
+ <HEAD>
4
+ <title>Implementation notes: sparse_hash, dense_hash, sparsetable</title>
5
+ </HEAD>
6
+
7
+ <BODY>
8
+
9
+ <h1>Implementation of sparse_hash_map, dense_hash_map, and
10
+ sparsetable</h1>
11
+
12
+ This document contains a few notes on how the data structures in this
13
+ package are implemented. This discussion refers at several points to
14
+ the classic text in this area: Knuth, <i>The Art of Computer
15
+ Programming</i>, Vol 3, Hashing.
16
+
17
+
18
+ <hr>
19
+ <h2><tt>sparsetable</tt></h2>
20
+
21
+ <p>For specificity, consider the declaration </p>
22
+
23
+ <pre>
24
+ sparsetable&lt;Foo&gt; t(100); // a sparse array with 100 elements
25
+ </pre>
26
+
27
+ <p>A sparsetable is a random container that implements a sparse array,
28
+ that is, an array that uses very little memory to store unassigned
29
+ indices (in this case, between 1-2 bits per unassigned index). For
30
+ instance, if you allocate an array of size 5 and assign a[2] = [big
31
+ struct], then a[2] will take up a lot of memory but a[0], a[1], a[3],
32
+ and a[4] will not. Array elements that have a value are called
33
+ "assigned". Array elements that have no value yet, or have had their
34
+ value cleared using erase() or clear(), are called "unassigned".
35
+ For assigned elements, lookups return the assigned value; for
36
+ unassigned elements, they return the default value, which for t is
37
+ Foo().</p>
38
+
39
+ <p>sparsetable is implemented as an array of "groups". Each group is
40
+ responsible for M array indices. The first group knows about
41
+ t[0]..t[M-1], the second about t[M]..t[2M-1], and so forth. (M is 48
42
+ by default.) At construct time, t creates an array of (99/M + 1)
43
+ groups. From this point on, all operations -- insert, delete, lookup
44
+ -- are passed to the appropriate group. In particular, any operation
45
+ on t[i] is actually performed on (t.group[i / M])[i % M].</p>
46
+
47
+ <p>Each group contains of a vector, which holds assigned values, and a
48
+ bitmap of size M, which indicates which indices are assigned. A
49
+ lookup works as follows: the group is asked to look up index i, where
50
+ i &lt; M. The group looks at bitmap[i]. If it's 0, the lookup fails.
51
+ If it's 1, then the group has to find the appropriate value in the
52
+ vector.</p>
53
+
54
+ <h3><tt>find()</tt></h3>
55
+
56
+ <p>Finding the appropriate vector element is the most expensive part of
57
+ the lookup. The code counts all bitmap entries &lt;= i that are set to
58
+ 1. (There's at least 1 of them, since bitmap[i] is 1.) Suppose there
59
+ are 4 such entries. Then the right value to return is the 4th element
60
+ of the vector: vector[3]. This takes time O(M), which is a constant
61
+ since M is a constant.</p>
62
+
63
+ <h3><tt>insert()</tt></h3>
64
+
65
+ <p>Insert starts with a lookup. If the lookup succeeds, the code merely
66
+ replaces vector[3] with the new value. If the lookup fails, then the
67
+ code must insert a new entry into the middle of the vector. Again, to
68
+ insert at position i, the code must count all the bitmap entries &lt;= i
69
+ that are set to i. This indicates the position to insert into the
70
+ vector. All vector entries above that position must be moved to make
71
+ room for the new entry. This takes time, but still constant time
72
+ since the vector has size at most M.</p>
73
+
74
+ <p>(Inserts could be made faster by using a list instead of a vector to
75
+ hold group values, but this would use much more memory, since each
76
+ list element requires a full pointer of overhead.)</p>
77
+
78
+ <p>The only metadata that needs to be updated, after the actual value is
79
+ inserted, is to set bitmap[i] to 1. No other counts must be
80
+ maintained.</p>
81
+
82
+ <h3><tt>delete()</tt></h3>
83
+
84
+ <p>Deletes are similar to inserts. They start with a lookup. If it
85
+ fails, the delete is a noop. Otherwise, the appropriate entry is
86
+ removed from the vector, all the vector elements above it are moved
87
+ down one, and bitmap[i] is set to 0.</p>
88
+
89
+ <h3>iterators</h3>
90
+
91
+ <p>Sparsetable iterators pose a special burden. They must iterate over
92
+ unassigned array values, but the act of iterating should not cause an
93
+ assignment to happen -- otherwise, iterating over a sparsetable would
94
+ cause it to take up much more room. For const iterators, the matter
95
+ is simple: the iterator is merely programmed to return the default
96
+ value -- Foo() -- when dereferenced while pointing to an unassigned
97
+ entry.</p>
98
+
99
+ <p>For non-const iterators, such simple techniques fail. Instead,
100
+ dereferencing a sparsetable_iterator returns an opaque object that
101
+ acts like a Foo in almost all situations, but isn't actually a Foo.
102
+ (It does this by defining operator=(), operator value_type(), and,
103
+ most sneakily, operator&().) This works in almost all cases. If it
104
+ doesn't, an explicit cast to value_type will solve the problem:</p>
105
+
106
+ <pre>
107
+ printf("%d", static_cast&lt;Foo&gt;(*t.find(0)));
108
+ </pre>
109
+
110
+ <p>To avoid such problems, consider using get() and set() instead of an
111
+ iterator:</p>
112
+
113
+ <pre>
114
+ for (int i = 0; i &lt; t.size(); ++i)
115
+ if (t.get(i) == ...) t.set(i, ...);
116
+ </pre>
117
+
118
+ <p>Sparsetable also has a special class of iterator, besides normal and
119
+ const: nonempty_iterator. This only iterates over array values that
120
+ are assigned. This is particularly fast given the sparsetable
121
+ implementation, since it can ignore the bitmaps entirely and just
122
+ iterate over the various group vectors.</p>
123
+
124
+ <h3>Resource use</h3>
125
+
126
+ <p>The space overhead for an sparsetable of size N is N + 48N/M bits.
127
+ For the default value of M, this is exactly 2 bits per array entry.
128
+ (That's for 32-bit pointers; for machines with 64-bit pointers, it's N
129
+ + 80N/M bits, or 2.67 bits per entry.)
130
+ A larger M would use less overhead -- approaching 1 bit per array
131
+ entry -- but take longer for inserts, deletes, and lookups. A smaller
132
+ M would use more overhead but make operations somewhat faster.</p>
133
+
134
+ <p>You can also look at some specific <A
135
+ HREF="performance.html">performance numbers</A>.</p>
136
+
137
+
138
+ <hr>
139
+ <h2><tt>sparse_hash_set</tt></h2>
140
+
141
+ <p>For specificity, consider the declaration </p>
142
+
143
+ <pre>
144
+ sparse_hash_set&lt;Foo&gt; t;
145
+ </pre>
146
+
147
+ <p>sparse_hash_set is a hashtable. For more information on hashtables,
148
+ see Knuth. Hashtables are basically arrays with complicated logic on
149
+ top of them. sparse_hash_set uses a sparsetable to implement the
150
+ underlying array.</p>
151
+
152
+ <p>In particular, sparse_hash_set stores its data in a sparsetable using
153
+ quadratic internal probing (see Knuth). Many hashtable
154
+ implementations use external probing, so each table element is
155
+ actually a pointer chain, holding many hashtable values.
156
+ sparse_hash_set, on the other hand, always stores at most one value in
157
+ each table location. If the hashtable wants to store a second value
158
+ at a given table location, it can't; it's forced to look somewhere
159
+ else.</p>
160
+
161
+ <h3><tt>insert()</tt></h3>
162
+
163
+ <p>As a specific example, suppose t is a new sparse_hash_set. It then
164
+ holds a sparsetable of size 32. The code for t.insert(foo) works as
165
+ follows:</p>
166
+
167
+ <p>
168
+ 1) Call hash&lt;Foo&gt;(foo) to convert foo into an integer i. (hash&lt;Foo&gt; is
169
+ the default hash function; you can specify a different one in the
170
+ template arguments.)
171
+
172
+ </p><p>
173
+ 2a) Look at t.sparsetable[i % 32]. If it's unassigned, assign it to
174
+ foo. foo is now in the hashtable.
175
+
176
+ </p><p>
177
+ 2b) If t.sparsetable[i % 32] is assigned, and its value is foo, then
178
+ do nothing: foo was already in t and the insert is a noop.
179
+
180
+ </p><p>
181
+ 2c) If t.sparsetable[i % 32] is assigned, but to a value other than
182
+ foo, look at t.sparsetable[(i+1) % 32]. If that also fails, try
183
+ t.sparsetable[(i+3) % 32], then t.sparsetable[(i+6) % 32]. In
184
+ general, keep trying the next triangular number.
185
+
186
+ </p><p>
187
+ 3) If the table is now "too full" -- say, 25 of the 32 table entries
188
+ are now assigned -- grow the table by creating a new sparsetable
189
+ that's twice as big, and rehashing every single element from the
190
+ old table into the new one. This keeps the table from ever filling
191
+ up.
192
+
193
+ </p><p>
194
+ 4) If the table is now "too empty" -- say, only 3 of the 32 table
195
+ entries are now assigned -- shrink the table by creating a new
196
+ sparsetable that's half as big, and rehashing every element as in
197
+ the growing case. This keeps the table overhead proportional to
198
+ the number of elements in the table.
199
+ </p>
200
+
201
+ <p>Instead of using triangular numbers as offsets, one could just use
202
+ regular integers: try i, then i+1, then i+2, then i+3. This has bad
203
+ 'clumping' behavior, as explored in Knuth. Quadratic probing, using
204
+ the triangular numbers, avoids the clumping while keeping cache
205
+ coherency in the common case. As long as the table size is a power of
206
+ 2, the quadratic-probing method described above will explore every
207
+ table element if necessary, to find a good place to insert.</p>
208
+
209
+ <p>(As a side note, using a table size that's a power of two has several
210
+ advantages, including the speed of calculating (i % table_size). On
211
+ the other hand, power-of-two tables are not very forgiving of a poor
212
+ hash function. Make sure your hash function is a good one! There are
213
+ plenty of dos and don'ts on the web (and in Knuth), for writing hash
214
+ functions.)</p>
215
+
216
+ <p>The "too full" value, also called the "maximum occupancy", determines
217
+ a time-space tradeoff: in general, the higher it is, the less space is
218
+ wasted but the more probes must be performed for each insert.
219
+ sparse_hash_set uses a high maximum occupancy, since space is more
220
+ important than speed for this data structure.</p>
221
+
222
+ <p>The "too empty" value is not necessary for performance but helps with
223
+ space use. It's rare for hashtable implementations to check this
224
+ value at insert() time -- after all, how will inserting cause a
225
+ hashtable to get too small? However, the sparse_hash_set
226
+ implementation never resizes on erase(); it's nice to have an erase()
227
+ that does not invalidate iterators. Thus, the first insert() after a
228
+ long string of erase()s could well trigger a hashtable shrink.</p>
229
+
230
+ <h3><tt>find()</tt></h3>
231
+
232
+ <p>find() works similarly to insert. The only difference is in step
233
+ (2a): if the value is unassigned, then the lookup fails immediately.</p>
234
+
235
+ <h3><tt>delete()</tt></h3>
236
+
237
+ <p>delete() is tricky in an internal-probing scheme. The obvious
238
+ implementation of just "unassigning" the relevant table entry doesn't
239
+ work. Consider the following scenario:</p>
240
+
241
+ <pre>
242
+ t.insert(foo1); // foo1 hashes to 4, is put in table[4]
243
+ t.insert(foo2); // foo2 hashes to 4, is put in table[5]
244
+ t.erase(foo1); // table[4] is now 'unassigned'
245
+ t.lookup(foo2); // fails since table[hash(foo2)] is unassigned
246
+ </pre>
247
+
248
+ <p>To avoid these failure situations, delete(foo1) is actually
249
+ implemented by replacing foo1 by a special 'delete' value in the
250
+ hashtable. This 'delete' value causes the table entry to be
251
+ considered unassigned for the purposes of insertion -- if foo3 hashes
252
+ to 4 as well, it can go into table[4] no problem -- but assigned for
253
+ the purposes of lookup.</p>
254
+
255
+ <p>What is this special 'delete' value? The delete value has to be an
256
+ element of type Foo, since the table can't hold anything else. It
257
+ obviously must be an element the client would never want to insert on
258
+ its own, or else the code couldn't distinguish deleted entries from
259
+ 'real' entries with the same value. There's no way to determine a
260
+ good value automatically. The client has to specify it explicitly.
261
+ This is what the set_deleted_key() method does.</p>
262
+
263
+ <p>Note that set_deleted_key() is only necessary if the client actually
264
+ wants to call t.erase(). For insert-only hash-sets, set_deleted_key()
265
+ is unnecessary.</p>
266
+
267
+ <p>When copying the hashtable, either to grow it or shrink it, the
268
+ special 'delete' values are <b>not</b> copied into the new table. The
269
+ copy-time rehash makes them unnecessary.</p>
270
+
271
+ <h3>Resource use</h3>
272
+
273
+ <p>The data is stored in a sparsetable, so space use is the same as
274
+ for sparsetable. Time use is also determined in large part by the
275
+ sparsetable implementation. However, there is also an extra probing
276
+ cost in hashtables, which depends in large part on the "too full"
277
+ value. It should be rare to need more than 4-5 probes per lookup, and
278
+ usually significantly less will suffice.</p>
279
+
280
+ <p>A note on growing and shrinking the hashtable: all hashtable
281
+ implementations use the most memory when growing a hashtable, since
282
+ they must have room for both the old table and the new table at the
283
+ same time. sparse_hash_set is careful to delete entries from the old
284
+ hashtable as soon as they're copied into the new one, to minimize this
285
+ space overhead. (It does this efficiently by using its knowledge of
286
+ the sparsetable class and copying one sparsetable group at a time.)</p>
287
+
288
+ <p>You can also look at some specific <A
289
+ HREF="performance.html">performance numbers</A>.</p>
290
+
291
+
292
+ <hr>
293
+ <h2><tt>sparse_hash_map</tt></h2>
294
+
295
+ <p>sparse_hash_map is implemented identically to sparse_hash_set. The
296
+ only difference is instead of storing just Foo in each table entry,
297
+ the data structure stores pair&lt;Foo, Value&gt;.</p>
298
+
299
+
300
+ <hr>
301
+ <h2><tt>dense_hash_set</tt></h2>
302
+
303
+ <p>The hashtable aspects of dense_hash_set are identical to
304
+ sparse_hash_set: it uses quadratic internal probing, and resizes
305
+ hashtables in exactly the same way. The difference is in the
306
+ underlying array: instead of using a sparsetable, dense_hash_set uses
307
+ a C array. This means much more space is used, especially if Foo is
308
+ big. However, it makes all operations faster, since sparsetable has
309
+ memory management overhead that C arrays do not.</p>
310
+
311
+ <p>The use of C arrays instead of sparsetables points to one immediate
312
+ complication dense_hash_set has that sparse_hash_set does not: the
313
+ need to distinguish assigned from unassigned entries. In a
314
+ sparsetable, this is accomplished by a bitmap. dense_hash_set, on the
315
+ other hand, uses a dedicated value to specify unassigned entries.
316
+ Thus, dense_hash_set has two special values: one to indicate deleted
317
+ table entries, and one to indicated unassigned table entries. At
318
+ construct time, all table entries are initialized to 'unassigned'.</p>
319
+
320
+ <p>dense_hash_set provides the method set_empty_key() to indicate the
321
+ value that should be used for unassigned entries. Like
322
+ set_deleted_key(), set_empty_key() requires a value that will not be
323
+ used by the client for any legitimate purpose. Unlike
324
+ set_deleted_key(), set_empty_key() is always required, no matter what
325
+ hashtable operations the client wishes to perform.</p>
326
+
327
+ <h3>Resource use</h3>
328
+
329
+ <p>This implementation is fast because even though dense_hash_set may not
330
+ be space efficient, most lookups are localized: a single lookup may
331
+ need to access table[i], and maybe table[i+1] and table[i+3], but
332
+ nothing other than that. For all but the biggest data structures,
333
+ these will frequently be in a single cache line.</p>
334
+
335
+ <p>This implementation takes, for every unused bucket, space as big as
336
+ the key-type. Usually between half and two-thirds of the buckets are
337
+ empty.</p>
338
+
339
+ <p>The doubling method used by dense_hash_set tends to work poorly
340
+ with most memory allocators. This is because memory allocators tend
341
+ to have memory 'buckets' which are a power of two. Since each
342
+ doubling of a dense_hash_set doubles the memory use, a single
343
+ hashtable doubling will require a new memory 'bucket' from the memory
344
+ allocator, leaving the old bucket stranded as fragmented memory.
345
+ Hence, it's not recommended this data structure be used with many
346
+ inserts in memory-constrained situations.</p>
347
+
348
+ <p>You can also look at some specific <A
349
+ HREF="performance.html">performance numbers</A>.</p>
350
+
351
+
352
+ <hr>
353
+ <h2><tt>dense_hash_map</tt></h2>
354
+
355
+ <p>dense_hash_map is identical to dense_hash_set except for what values
356
+ are stored in each table entry.</p>
357
+
358
+ <hr>
359
+ <author>
360
+ Craig Silverstein<br>
361
+ Thu Jan 6 20:15:42 PST 2005
362
+ </author>
363
+
364
+ </body>
365
+ </html>