google_hash 0.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (85) hide show
  1. data/README +21 -0
  2. data/Rakefile +11 -0
  3. data/VERSION +1 -0
  4. data/ext/extconf.rb +15 -0
  5. data/ext/go.cpp +109 -0
  6. data/ext/sparsehash-1.5.2/AUTHORS +2 -0
  7. data/ext/sparsehash-1.5.2/COPYING +28 -0
  8. data/ext/sparsehash-1.5.2/ChangeLog +167 -0
  9. data/ext/sparsehash-1.5.2/INSTALL +236 -0
  10. data/ext/sparsehash-1.5.2/Makefile.am +157 -0
  11. data/ext/sparsehash-1.5.2/Makefile.in +1019 -0
  12. data/ext/sparsehash-1.5.2/NEWS +0 -0
  13. data/ext/sparsehash-1.5.2/README +149 -0
  14. data/ext/sparsehash-1.5.2/README.windows +25 -0
  15. data/ext/sparsehash-1.5.2/TODO +28 -0
  16. data/ext/sparsehash-1.5.2/aclocal.m4 +868 -0
  17. data/ext/sparsehash-1.5.2/compile +99 -0
  18. data/ext/sparsehash-1.5.2/config.guess +1516 -0
  19. data/ext/sparsehash-1.5.2/config.sub +1626 -0
  20. data/ext/sparsehash-1.5.2/configure +8054 -0
  21. data/ext/sparsehash-1.5.2/configure.ac +74 -0
  22. data/ext/sparsehash-1.5.2/depcomp +530 -0
  23. data/ext/sparsehash-1.5.2/doc/dense_hash_map.html +1591 -0
  24. data/ext/sparsehash-1.5.2/doc/dense_hash_set.html +1445 -0
  25. data/ext/sparsehash-1.5.2/doc/designstyle.css +115 -0
  26. data/ext/sparsehash-1.5.2/doc/implementation.html +365 -0
  27. data/ext/sparsehash-1.5.2/doc/index.html +69 -0
  28. data/ext/sparsehash-1.5.2/doc/performance.html +96 -0
  29. data/ext/sparsehash-1.5.2/doc/sparse_hash_map.html +1527 -0
  30. data/ext/sparsehash-1.5.2/doc/sparse_hash_set.html +1376 -0
  31. data/ext/sparsehash-1.5.2/doc/sparsetable.html +1393 -0
  32. data/ext/sparsehash-1.5.2/experimental/Makefile +9 -0
  33. data/ext/sparsehash-1.5.2/experimental/README +14 -0
  34. data/ext/sparsehash-1.5.2/experimental/example.c +54 -0
  35. data/ext/sparsehash-1.5.2/experimental/libchash.c +1537 -0
  36. data/ext/sparsehash-1.5.2/experimental/libchash.h +252 -0
  37. data/ext/sparsehash-1.5.2/google-sparsehash.sln +47 -0
  38. data/ext/sparsehash-1.5.2/install-sh +323 -0
  39. data/ext/sparsehash-1.5.2/m4/acx_pthread.m4 +363 -0
  40. data/ext/sparsehash-1.5.2/m4/google_namespace.m4 +42 -0
  41. data/ext/sparsehash-1.5.2/m4/namespaces.m4 +15 -0
  42. data/ext/sparsehash-1.5.2/m4/stl_hash.m4 +70 -0
  43. data/ext/sparsehash-1.5.2/m4/stl_hash_fun.m4 +36 -0
  44. data/ext/sparsehash-1.5.2/m4/stl_namespace.m4 +25 -0
  45. data/ext/sparsehash-1.5.2/missing +360 -0
  46. data/ext/sparsehash-1.5.2/mkinstalldirs +158 -0
  47. data/ext/sparsehash-1.5.2/packages/deb.sh +74 -0
  48. data/ext/sparsehash-1.5.2/packages/deb/README +7 -0
  49. data/ext/sparsehash-1.5.2/packages/deb/changelog +107 -0
  50. data/ext/sparsehash-1.5.2/packages/deb/compat +1 -0
  51. data/ext/sparsehash-1.5.2/packages/deb/control +17 -0
  52. data/ext/sparsehash-1.5.2/packages/deb/copyright +35 -0
  53. data/ext/sparsehash-1.5.2/packages/deb/docs +16 -0
  54. data/ext/sparsehash-1.5.2/packages/deb/rules +117 -0
  55. data/ext/sparsehash-1.5.2/packages/deb/sparsehash.dirs +2 -0
  56. data/ext/sparsehash-1.5.2/packages/deb/sparsehash.install +2 -0
  57. data/ext/sparsehash-1.5.2/packages/rpm.sh +86 -0
  58. data/ext/sparsehash-1.5.2/packages/rpm/rpm.spec +61 -0
  59. data/ext/sparsehash-1.5.2/src/config.h.in +131 -0
  60. data/ext/sparsehash-1.5.2/src/config.h.include +23 -0
  61. data/ext/sparsehash-1.5.2/src/google/dense_hash_map +310 -0
  62. data/ext/sparsehash-1.5.2/src/google/dense_hash_set +287 -0
  63. data/ext/sparsehash-1.5.2/src/google/sparse_hash_map +294 -0
  64. data/ext/sparsehash-1.5.2/src/google/sparse_hash_set +275 -0
  65. data/ext/sparsehash-1.5.2/src/google/sparsehash/densehashtable.h +1062 -0
  66. data/ext/sparsehash-1.5.2/src/google/sparsehash/sparsehashtable.h +1015 -0
  67. data/ext/sparsehash-1.5.2/src/google/sparsetable +1468 -0
  68. data/ext/sparsehash-1.5.2/src/google/type_traits.h +250 -0
  69. data/ext/sparsehash-1.5.2/src/hashtable_unittest.cc +1375 -0
  70. data/ext/sparsehash-1.5.2/src/simple_test.cc +103 -0
  71. data/ext/sparsehash-1.5.2/src/sparsetable_unittest.cc +696 -0
  72. data/ext/sparsehash-1.5.2/src/time_hash_map.cc +488 -0
  73. data/ext/sparsehash-1.5.2/src/type_traits_unittest.cc +492 -0
  74. data/ext/sparsehash-1.5.2/src/windows/config.h +149 -0
  75. data/ext/sparsehash-1.5.2/src/windows/google/sparsehash/sparseconfig.h +32 -0
  76. data/ext/sparsehash-1.5.2/src/windows/port.cc +63 -0
  77. data/ext/sparsehash-1.5.2/src/windows/port.h +81 -0
  78. data/ext/sparsehash-1.5.2/src/words +8944 -0
  79. data/ext/sparsehash-1.5.2/vsprojects/hashtable_unittest/hashtable_unittest.vcproj +187 -0
  80. data/ext/sparsehash-1.5.2/vsprojects/sparsetable_unittest/sparsetable_unittest.vcproj +172 -0
  81. data/ext/sparsehash-1.5.2/vsprojects/time_hash_map/time_hash_map.vcproj +187 -0
  82. data/ext/sparsehash-1.5.2/vsprojects/type_traits_unittest/type_traits_unittest.vcproj +169 -0
  83. data/ext/test.rb +10 -0
  84. data/test/spec.go +70 -0
  85. metadata +147 -0
@@ -0,0 +1,115 @@
1
+ body {
2
+ background-color: #ffffff;
3
+ color: black;
4
+ margin-right: 1in;
5
+ margin-left: 1in;
6
+ }
7
+
8
+
9
+ h1, h2, h3, h4, h5, h6 {
10
+ color: #3366ff;
11
+ font-family: sans-serif;
12
+ }
13
+ @media print {
14
+ /* Darker version for printing */
15
+ h1, h2, h3, h4, h5, h6 {
16
+ color: #000080;
17
+ font-family: helvetica, sans-serif;
18
+ }
19
+ }
20
+
21
+ h1 {
22
+ text-align: center;
23
+ font-size: 18pt;
24
+ }
25
+ h2 {
26
+ margin-left: -0.5in;
27
+ }
28
+ h3 {
29
+ margin-left: -0.25in;
30
+ }
31
+ h4 {
32
+ margin-left: -0.125in;
33
+ }
34
+ hr {
35
+ margin-left: -1in;
36
+ }
37
+
38
+ /* Definition lists: definition term bold */
39
+ dt {
40
+ font-weight: bold;
41
+ }
42
+
43
+ address {
44
+ text-align: right;
45
+ }
46
+ /* Use the <code> tag for bits of code and <var> for variables and objects. */
47
+ code,pre,samp,var {
48
+ color: #006000;
49
+ }
50
+ /* Use the <file> tag for file and directory paths and names. */
51
+ file {
52
+ color: #905050;
53
+ font-family: monospace;
54
+ }
55
+ /* Use the <kbd> tag for stuff the user should type. */
56
+ kbd {
57
+ color: #600000;
58
+ }
59
+ div.note p {
60
+ float: right;
61
+ width: 3in;
62
+ margin-right: 0%;
63
+ padding: 1px;
64
+ border: 2px solid #6060a0;
65
+ background-color: #fffff0;
66
+ }
67
+
68
+ UL.nobullets {
69
+ list-style-type: none;
70
+ list-style-image: none;
71
+ margin-left: -1em;
72
+ }
73
+
74
+ /*
75
+ body:after {
76
+ content: "Google Confidential";
77
+ }
78
+ */
79
+
80
+ /* pretty printing styles. See prettify.js */
81
+ .str { color: #080; }
82
+ .kwd { color: #008; }
83
+ .com { color: #800; }
84
+ .typ { color: #606; }
85
+ .lit { color: #066; }
86
+ .pun { color: #660; }
87
+ .pln { color: #000; }
88
+ .tag { color: #008; }
89
+ .atn { color: #606; }
90
+ .atv { color: #080; }
91
+ pre.prettyprint { padding: 2px; border: 1px solid #888; }
92
+
93
+ .embsrc { background: #eee; }
94
+
95
+ @media print {
96
+ .str { color: #060; }
97
+ .kwd { color: #006; font-weight: bold; }
98
+ .com { color: #600; font-style: italic; }
99
+ .typ { color: #404; font-weight: bold; }
100
+ .lit { color: #044; }
101
+ .pun { color: #440; }
102
+ .pln { color: #000; }
103
+ .tag { color: #006; font-weight: bold; }
104
+ .atn { color: #404; }
105
+ .atv { color: #060; }
106
+ }
107
+
108
+ /* Table Column Headers */
109
+ .hdr {
110
+ color: #006;
111
+ font-weight: bold;
112
+ background-color: #dddddd; }
113
+ .hdr2 {
114
+ color: #006;
115
+ background-color: #eeeeee; }
@@ -0,0 +1,365 @@
1
+ <HTML>
2
+
3
+ <HEAD>
4
+ <title>Implementation notes: sparse_hash, dense_hash, sparsetable</title>
5
+ </HEAD>
6
+
7
+ <BODY>
8
+
9
+ <h1>Implementation of sparse_hash_map, dense_hash_map, and
10
+ sparsetable</h1>
11
+
12
+ This document contains a few notes on how the data structures in this
13
+ package are implemented. This discussion refers at several points to
14
+ the classic text in this area: Knuth, <i>The Art of Computer
15
+ Programming</i>, Vol 3, Hashing.
16
+
17
+
18
+ <hr>
19
+ <h2><tt>sparsetable</tt></h2>
20
+
21
+ <p>For specificity, consider the declaration </p>
22
+
23
+ <pre>
24
+ sparsetable&lt;Foo&gt; t(100); // a sparse array with 100 elements
25
+ </pre>
26
+
27
+ <p>A sparsetable is a random container that implements a sparse array,
28
+ that is, an array that uses very little memory to store unassigned
29
+ indices (in this case, between 1-2 bits per unassigned index). For
30
+ instance, if you allocate an array of size 5 and assign a[2] = [big
31
+ struct], then a[2] will take up a lot of memory but a[0], a[1], a[3],
32
+ and a[4] will not. Array elements that have a value are called
33
+ "assigned". Array elements that have no value yet, or have had their
34
+ value cleared using erase() or clear(), are called "unassigned".
35
+ For assigned elements, lookups return the assigned value; for
36
+ unassigned elements, they return the default value, which for t is
37
+ Foo().</p>
38
+
39
+ <p>sparsetable is implemented as an array of "groups". Each group is
40
+ responsible for M array indices. The first group knows about
41
+ t[0]..t[M-1], the second about t[M]..t[2M-1], and so forth. (M is 48
42
+ by default.) At construct time, t creates an array of (99/M + 1)
43
+ groups. From this point on, all operations -- insert, delete, lookup
44
+ -- are passed to the appropriate group. In particular, any operation
45
+ on t[i] is actually performed on (t.group[i / M])[i % M].</p>
46
+
47
+ <p>Each group contains of a vector, which holds assigned values, and a
48
+ bitmap of size M, which indicates which indices are assigned. A
49
+ lookup works as follows: the group is asked to look up index i, where
50
+ i &lt; M. The group looks at bitmap[i]. If it's 0, the lookup fails.
51
+ If it's 1, then the group has to find the appropriate value in the
52
+ vector.</p>
53
+
54
+ <h3><tt>find()</tt></h3>
55
+
56
+ <p>Finding the appropriate vector element is the most expensive part of
57
+ the lookup. The code counts all bitmap entries &lt;= i that are set to
58
+ 1. (There's at least 1 of them, since bitmap[i] is 1.) Suppose there
59
+ are 4 such entries. Then the right value to return is the 4th element
60
+ of the vector: vector[3]. This takes time O(M), which is a constant
61
+ since M is a constant.</p>
62
+
63
+ <h3><tt>insert()</tt></h3>
64
+
65
+ <p>Insert starts with a lookup. If the lookup succeeds, the code merely
66
+ replaces vector[3] with the new value. If the lookup fails, then the
67
+ code must insert a new entry into the middle of the vector. Again, to
68
+ insert at position i, the code must count all the bitmap entries &lt;= i
69
+ that are set to i. This indicates the position to insert into the
70
+ vector. All vector entries above that position must be moved to make
71
+ room for the new entry. This takes time, but still constant time
72
+ since the vector has size at most M.</p>
73
+
74
+ <p>(Inserts could be made faster by using a list instead of a vector to
75
+ hold group values, but this would use much more memory, since each
76
+ list element requires a full pointer of overhead.)</p>
77
+
78
+ <p>The only metadata that needs to be updated, after the actual value is
79
+ inserted, is to set bitmap[i] to 1. No other counts must be
80
+ maintained.</p>
81
+
82
+ <h3><tt>delete()</tt></h3>
83
+
84
+ <p>Deletes are similar to inserts. They start with a lookup. If it
85
+ fails, the delete is a noop. Otherwise, the appropriate entry is
86
+ removed from the vector, all the vector elements above it are moved
87
+ down one, and bitmap[i] is set to 0.</p>
88
+
89
+ <h3>iterators</h3>
90
+
91
+ <p>Sparsetable iterators pose a special burden. They must iterate over
92
+ unassigned array values, but the act of iterating should not cause an
93
+ assignment to happen -- otherwise, iterating over a sparsetable would
94
+ cause it to take up much more room. For const iterators, the matter
95
+ is simple: the iterator is merely programmed to return the default
96
+ value -- Foo() -- when dereferenced while pointing to an unassigned
97
+ entry.</p>
98
+
99
+ <p>For non-const iterators, such simple techniques fail. Instead,
100
+ dereferencing a sparsetable_iterator returns an opaque object that
101
+ acts like a Foo in almost all situations, but isn't actually a Foo.
102
+ (It does this by defining operator=(), operator value_type(), and,
103
+ most sneakily, operator&().) This works in almost all cases. If it
104
+ doesn't, an explicit cast to value_type will solve the problem:</p>
105
+
106
+ <pre>
107
+ printf("%d", static_cast&lt;Foo&gt;(*t.find(0)));
108
+ </pre>
109
+
110
+ <p>To avoid such problems, consider using get() and set() instead of an
111
+ iterator:</p>
112
+
113
+ <pre>
114
+ for (int i = 0; i &lt; t.size(); ++i)
115
+ if (t.get(i) == ...) t.set(i, ...);
116
+ </pre>
117
+
118
+ <p>Sparsetable also has a special class of iterator, besides normal and
119
+ const: nonempty_iterator. This only iterates over array values that
120
+ are assigned. This is particularly fast given the sparsetable
121
+ implementation, since it can ignore the bitmaps entirely and just
122
+ iterate over the various group vectors.</p>
123
+
124
+ <h3>Resource use</h3>
125
+
126
+ <p>The space overhead for an sparsetable of size N is N + 48N/M bits.
127
+ For the default value of M, this is exactly 2 bits per array entry.
128
+ (That's for 32-bit pointers; for machines with 64-bit pointers, it's N
129
+ + 80N/M bits, or 2.67 bits per entry.)
130
+ A larger M would use less overhead -- approaching 1 bit per array
131
+ entry -- but take longer for inserts, deletes, and lookups. A smaller
132
+ M would use more overhead but make operations somewhat faster.</p>
133
+
134
+ <p>You can also look at some specific <A
135
+ HREF="performance.html">performance numbers</A>.</p>
136
+
137
+
138
+ <hr>
139
+ <h2><tt>sparse_hash_set</tt></h2>
140
+
141
+ <p>For specificity, consider the declaration </p>
142
+
143
+ <pre>
144
+ sparse_hash_set&lt;Foo&gt; t;
145
+ </pre>
146
+
147
+ <p>sparse_hash_set is a hashtable. For more information on hashtables,
148
+ see Knuth. Hashtables are basically arrays with complicated logic on
149
+ top of them. sparse_hash_set uses a sparsetable to implement the
150
+ underlying array.</p>
151
+
152
+ <p>In particular, sparse_hash_set stores its data in a sparsetable using
153
+ quadratic internal probing (see Knuth). Many hashtable
154
+ implementations use external probing, so each table element is
155
+ actually a pointer chain, holding many hashtable values.
156
+ sparse_hash_set, on the other hand, always stores at most one value in
157
+ each table location. If the hashtable wants to store a second value
158
+ at a given table location, it can't; it's forced to look somewhere
159
+ else.</p>
160
+
161
+ <h3><tt>insert()</tt></h3>
162
+
163
+ <p>As a specific example, suppose t is a new sparse_hash_set. It then
164
+ holds a sparsetable of size 32. The code for t.insert(foo) works as
165
+ follows:</p>
166
+
167
+ <p>
168
+ 1) Call hash&lt;Foo&gt;(foo) to convert foo into an integer i. (hash&lt;Foo&gt; is
169
+ the default hash function; you can specify a different one in the
170
+ template arguments.)
171
+
172
+ </p><p>
173
+ 2a) Look at t.sparsetable[i % 32]. If it's unassigned, assign it to
174
+ foo. foo is now in the hashtable.
175
+
176
+ </p><p>
177
+ 2b) If t.sparsetable[i % 32] is assigned, and its value is foo, then
178
+ do nothing: foo was already in t and the insert is a noop.
179
+
180
+ </p><p>
181
+ 2c) If t.sparsetable[i % 32] is assigned, but to a value other than
182
+ foo, look at t.sparsetable[(i+1) % 32]. If that also fails, try
183
+ t.sparsetable[(i+3) % 32], then t.sparsetable[(i+6) % 32]. In
184
+ general, keep trying the next triangular number.
185
+
186
+ </p><p>
187
+ 3) If the table is now "too full" -- say, 25 of the 32 table entries
188
+ are now assigned -- grow the table by creating a new sparsetable
189
+ that's twice as big, and rehashing every single element from the
190
+ old table into the new one. This keeps the table from ever filling
191
+ up.
192
+
193
+ </p><p>
194
+ 4) If the table is now "too empty" -- say, only 3 of the 32 table
195
+ entries are now assigned -- shrink the table by creating a new
196
+ sparsetable that's half as big, and rehashing every element as in
197
+ the growing case. This keeps the table overhead proportional to
198
+ the number of elements in the table.
199
+ </p>
200
+
201
+ <p>Instead of using triangular numbers as offsets, one could just use
202
+ regular integers: try i, then i+1, then i+2, then i+3. This has bad
203
+ 'clumping' behavior, as explored in Knuth. Quadratic probing, using
204
+ the triangular numbers, avoids the clumping while keeping cache
205
+ coherency in the common case. As long as the table size is a power of
206
+ 2, the quadratic-probing method described above will explore every
207
+ table element if necessary, to find a good place to insert.</p>
208
+
209
+ <p>(As a side note, using a table size that's a power of two has several
210
+ advantages, including the speed of calculating (i % table_size). On
211
+ the other hand, power-of-two tables are not very forgiving of a poor
212
+ hash function. Make sure your hash function is a good one! There are
213
+ plenty of dos and don'ts on the web (and in Knuth), for writing hash
214
+ functions.)</p>
215
+
216
+ <p>The "too full" value, also called the "maximum occupancy", determines
217
+ a time-space tradeoff: in general, the higher it is, the less space is
218
+ wasted but the more probes must be performed for each insert.
219
+ sparse_hash_set uses a high maximum occupancy, since space is more
220
+ important than speed for this data structure.</p>
221
+
222
+ <p>The "too empty" value is not necessary for performance but helps with
223
+ space use. It's rare for hashtable implementations to check this
224
+ value at insert() time -- after all, how will inserting cause a
225
+ hashtable to get too small? However, the sparse_hash_set
226
+ implementation never resizes on erase(); it's nice to have an erase()
227
+ that does not invalidate iterators. Thus, the first insert() after a
228
+ long string of erase()s could well trigger a hashtable shrink.</p>
229
+
230
+ <h3><tt>find()</tt></h3>
231
+
232
+ <p>find() works similarly to insert. The only difference is in step
233
+ (2a): if the value is unassigned, then the lookup fails immediately.</p>
234
+
235
+ <h3><tt>delete()</tt></h3>
236
+
237
+ <p>delete() is tricky in an internal-probing scheme. The obvious
238
+ implementation of just "unassigning" the relevant table entry doesn't
239
+ work. Consider the following scenario:</p>
240
+
241
+ <pre>
242
+ t.insert(foo1); // foo1 hashes to 4, is put in table[4]
243
+ t.insert(foo2); // foo2 hashes to 4, is put in table[5]
244
+ t.erase(foo1); // table[4] is now 'unassigned'
245
+ t.lookup(foo2); // fails since table[hash(foo2)] is unassigned
246
+ </pre>
247
+
248
+ <p>To avoid these failure situations, delete(foo1) is actually
249
+ implemented by replacing foo1 by a special 'delete' value in the
250
+ hashtable. This 'delete' value causes the table entry to be
251
+ considered unassigned for the purposes of insertion -- if foo3 hashes
252
+ to 4 as well, it can go into table[4] no problem -- but assigned for
253
+ the purposes of lookup.</p>
254
+
255
+ <p>What is this special 'delete' value? The delete value has to be an
256
+ element of type Foo, since the table can't hold anything else. It
257
+ obviously must be an element the client would never want to insert on
258
+ its own, or else the code couldn't distinguish deleted entries from
259
+ 'real' entries with the same value. There's no way to determine a
260
+ good value automatically. The client has to specify it explicitly.
261
+ This is what the set_deleted_key() method does.</p>
262
+
263
+ <p>Note that set_deleted_key() is only necessary if the client actually
264
+ wants to call t.erase(). For insert-only hash-sets, set_deleted_key()
265
+ is unnecessary.</p>
266
+
267
+ <p>When copying the hashtable, either to grow it or shrink it, the
268
+ special 'delete' values are <b>not</b> copied into the new table. The
269
+ copy-time rehash makes them unnecessary.</p>
270
+
271
+ <h3>Resource use</h3>
272
+
273
+ <p>The data is stored in a sparsetable, so space use is the same as
274
+ for sparsetable. Time use is also determined in large part by the
275
+ sparsetable implementation. However, there is also an extra probing
276
+ cost in hashtables, which depends in large part on the "too full"
277
+ value. It should be rare to need more than 4-5 probes per lookup, and
278
+ usually significantly less will suffice.</p>
279
+
280
+ <p>A note on growing and shrinking the hashtable: all hashtable
281
+ implementations use the most memory when growing a hashtable, since
282
+ they must have room for both the old table and the new table at the
283
+ same time. sparse_hash_set is careful to delete entries from the old
284
+ hashtable as soon as they're copied into the new one, to minimize this
285
+ space overhead. (It does this efficiently by using its knowledge of
286
+ the sparsetable class and copying one sparsetable group at a time.)</p>
287
+
288
+ <p>You can also look at some specific <A
289
+ HREF="performance.html">performance numbers</A>.</p>
290
+
291
+
292
+ <hr>
293
+ <h2><tt>sparse_hash_map</tt></h2>
294
+
295
+ <p>sparse_hash_map is implemented identically to sparse_hash_set. The
296
+ only difference is instead of storing just Foo in each table entry,
297
+ the data structure stores pair&lt;Foo, Value&gt;.</p>
298
+
299
+
300
+ <hr>
301
+ <h2><tt>dense_hash_set</tt></h2>
302
+
303
+ <p>The hashtable aspects of dense_hash_set are identical to
304
+ sparse_hash_set: it uses quadratic internal probing, and resizes
305
+ hashtables in exactly the same way. The difference is in the
306
+ underlying array: instead of using a sparsetable, dense_hash_set uses
307
+ a C array. This means much more space is used, especially if Foo is
308
+ big. However, it makes all operations faster, since sparsetable has
309
+ memory management overhead that C arrays do not.</p>
310
+
311
+ <p>The use of C arrays instead of sparsetables points to one immediate
312
+ complication dense_hash_set has that sparse_hash_set does not: the
313
+ need to distinguish assigned from unassigned entries. In a
314
+ sparsetable, this is accomplished by a bitmap. dense_hash_set, on the
315
+ other hand, uses a dedicated value to specify unassigned entries.
316
+ Thus, dense_hash_set has two special values: one to indicate deleted
317
+ table entries, and one to indicated unassigned table entries. At
318
+ construct time, all table entries are initialized to 'unassigned'.</p>
319
+
320
+ <p>dense_hash_set provides the method set_empty_key() to indicate the
321
+ value that should be used for unassigned entries. Like
322
+ set_deleted_key(), set_empty_key() requires a value that will not be
323
+ used by the client for any legitimate purpose. Unlike
324
+ set_deleted_key(), set_empty_key() is always required, no matter what
325
+ hashtable operations the client wishes to perform.</p>
326
+
327
+ <h3>Resource use</h3>
328
+
329
+ <p>This implementation is fast because even though dense_hash_set may not
330
+ be space efficient, most lookups are localized: a single lookup may
331
+ need to access table[i], and maybe table[i+1] and table[i+3], but
332
+ nothing other than that. For all but the biggest data structures,
333
+ these will frequently be in a single cache line.</p>
334
+
335
+ <p>This implementation takes, for every unused bucket, space as big as
336
+ the key-type. Usually between half and two-thirds of the buckets are
337
+ empty.</p>
338
+
339
+ <p>The doubling method used by dense_hash_set tends to work poorly
340
+ with most memory allocators. This is because memory allocators tend
341
+ to have memory 'buckets' which are a power of two. Since each
342
+ doubling of a dense_hash_set doubles the memory use, a single
343
+ hashtable doubling will require a new memory 'bucket' from the memory
344
+ allocator, leaving the old bucket stranded as fragmented memory.
345
+ Hence, it's not recommended this data structure be used with many
346
+ inserts in memory-constrained situations.</p>
347
+
348
+ <p>You can also look at some specific <A
349
+ HREF="performance.html">performance numbers</A>.</p>
350
+
351
+
352
+ <hr>
353
+ <h2><tt>dense_hash_map</tt></h2>
354
+
355
+ <p>dense_hash_map is identical to dense_hash_set except for what values
356
+ are stored in each table entry.</p>
357
+
358
+ <hr>
359
+ <author>
360
+ Craig Silverstein<br>
361
+ Thu Jan 6 20:15:42 PST 2005
362
+ </author>
363
+
364
+ </body>
365
+ </html>