d_heap 0.5.0 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3dd1049e0a8041a328da4ed65622c2f0589475bc386a0eb6f20e466c79587bc5
4
- data.tar.gz: ec44970feaa5ce6aef37f511e71e55342ec93b7e1a0b2a3d40f249afa4e9ac25
3
+ metadata.gz: 35213e5ac430b07cf2b43a7f065ff5c409506022835a5326cb2bfa25daa7f210
4
+ data.tar.gz: e87a64fb9fd6eb8bdd281d8fe289b7f4993f4bde9a671b0b414aca194b724691
5
5
  SHA512:
6
- metadata.gz: a6b6e192dbe5980b2b79728e4b4bf413151b3e193733d1435119482edb977f0a2edd692fd156614fdfbc86f4fa1dc6ac9f7907ca640c21ca23050d07b9a1caa6
7
- data.tar.gz: 27a987139a1fd14f73c16459f72be2bf1059dadbd212e250a28b3691dff2372e708cdf740155e2d2aa21de78cc789816a9d83f80f42d3c396510cd6fce6e6bf2
6
+ metadata.gz: 77518eb11bf8dd5fa8a29ad88f48c30650ff375ae9b74001fa59d30f634feddf5c4f3ef8d791dc4064afc1d9d68b9b463eaf0e778e927655e0da0c0a9da6fdee
7
+ data.tar.gz: 2911b20a882d8b6f577bda9a388a9a755a093cfd3c1aaaaa355aa5be97ffe506935620043adb115158565ead6941e52f789cdd714e4932aff4916412c60a3aee
@@ -1,4 +1,4 @@
1
- name: Ruby
1
+ name: CI
2
2
 
3
3
  on: [push,pull_request]
4
4
 
@@ -7,7 +7,7 @@ jobs:
7
7
  strategy:
8
8
  fail-fast: false
9
9
  matrix:
10
- ruby: [2.5, 2.6, 2.7, 3.0]
10
+ ruby: [2.4, 2.5, 2.6, 2.7, 3.0]
11
11
  os: [ubuntu, macos]
12
12
  experimental: [false]
13
13
  runs-on: ${{ matrix.os }}-latest
data/.gitignore CHANGED
@@ -10,6 +10,7 @@
10
10
  *.so
11
11
  *.o
12
12
  *.a
13
+ compile_commands.json
13
14
  mkmf.log
14
15
 
15
16
  # rspec failure tracking
@@ -3,7 +3,7 @@ inherit_mode:
3
3
  - Exclude
4
4
 
5
5
  AllCops:
6
- TargetRubyVersion: 2.5
6
+ TargetRubyVersion: 2.4
7
7
  NewCops: disable
8
8
  Exclude:
9
9
  - bin/benchmark-driver
@@ -0,0 +1,10 @@
1
+ -o doc
2
+ --embed-mixins
3
+ --hide-void-return
4
+ --no-private
5
+ --asset images:images
6
+ --exclude lib/benchmark_driver
7
+ --exclude lib/d_heap/benchmarks*
8
+ -
9
+ CHANGELOG.md
10
+ CODE_OF_CONDUCT.md
@@ -1,5 +1,17 @@
1
1
  ## Current/Unreleased
2
2
 
3
+ ## Release v0.6.0 (2021-01-24)
4
+
5
+ * 🔥 **Breaking**: `#initialize` uses a keyword argument for `d`
6
+ * ✨ Added `#initialize(capacity: capa)` to set initial capacity.
7
+ * ✨ Added `peek_with_score` and `peek_score`
8
+ * ✨ Added `pop_with_score` and `each_pop(with_score: true)`
9
+ * ✨ Added `pop_all_below(max_score, array = [])`
10
+ * ✨ Added aliases for `shift` and `next`
11
+ * 📈 Added benchmark charts to README, and `bin/bench_charts` to generate them.
12
+ * requires `gruff` which requires `rmagick` which requires `imagemagick`
13
+ * 📝 Many documentation updates and fixes.
14
+
3
15
  ## Release v0.5.0 (2021-01-17)
4
16
 
5
17
  * 🔥 **Breaking**: reversed order of `#push` arguments to `value, score`.
@@ -8,19 +20,20 @@
8
20
  * ✨ Added aliases for `deq`, `enq`, `first`, `pop_below`, `length`, and
9
21
  `count`, to mimic other classes in ruby's stdlib.
10
22
  * ⚡️♻️ More performance improvements:
11
- * Created an `ENTRY` struct and store both the score and the value pointer in
12
- the same `ENTRY *entries` array.
13
- * Reduced unnecessary allocations or copies in both sift loops. A similar
14
- refactoring also sped up the pure ruby benchmark implementation.
15
- * Compiling with `-O3`.
23
+ * Created an `ENTRY` struct and store both the score and the value pointer in
24
+ the same `ENTRY *entries` array.
25
+ * Reduced unnecessary allocations or copies in both sift loops. A similar
26
+ refactoring also sped up the pure ruby benchmark implementation.
27
+ * Compiling with `-O3`.
16
28
  * 📝 Updated (and in some cases, fixed) yardoc
17
29
  * ♻️ Moved aliases and less performance sensitive code into ruby.
18
30
  * ♻️ DRY up push/insert methods
19
31
 
20
32
  ## Release v0.4.0 (2021-01-12)
21
33
 
34
+ * 🔥 **Breaking**: Scores must be `Integer` or convertable to `Float`
35
+ * ⚠️ `Integer` scores must fit in `-ULONG_LONG_MAX` to `+ULONG_LONG_MAX`.
22
36
  * ⚡️ Big performance improvements, by using C `long double *cscores` array
23
- * ⚡️ Scores must be `Integer` in `-uint64..+uint64`, or convertable to `Float`
24
37
  * ⚡️ many many (so many) updates to benchmarks
25
38
  * ✨ Added `DHeap#clear`
26
39
  * 🐛 Fixed `DHeap#initialize_copy` and `#freeze`
data/Gemfile CHANGED
@@ -11,6 +11,10 @@ gem "rake-compiler"
11
11
  gem "rspec", "~> 3.10"
12
12
  gem "rubocop", "~> 1.0"
13
13
 
14
+ install_if -> { RUBY_PLATFORM !~ /darwin/ } do
15
+ gem "benchmark_driver-output-gruff"
16
+ end
17
+
14
18
  gem "perf"
15
19
  gem "priority_queue_cxx"
16
20
  gem "stackprof"
@@ -1,15 +1,22 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- d_heap (0.5.0)
4
+ d_heap (0.6.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
9
  ast (2.4.1)
10
10
  benchmark_driver (0.15.16)
11
+ benchmark_driver-output-gruff (0.3.1)
12
+ benchmark_driver (>= 0.12.0)
13
+ gruff
11
14
  coderay (1.1.3)
12
15
  diff-lcs (1.4.4)
16
+ gruff (0.12.1)
17
+ histogram
18
+ rmagick
19
+ histogram (0.2.4.1)
13
20
  method_source (1.0.0)
14
21
  parallel (1.19.2)
15
22
  parser (2.7.2.0)
@@ -25,6 +32,7 @@ GEM
25
32
  rake
26
33
  regexp_parser (1.8.2)
27
34
  rexml (3.2.3)
35
+ rmagick (4.1.2)
28
36
  rspec (3.10.0)
29
37
  rspec-core (~> 3.10.0)
30
38
  rspec-expectations (~> 3.10.0)
@@ -59,6 +67,7 @@ PLATFORMS
59
67
 
60
68
  DEPENDENCIES
61
69
  benchmark_driver
70
+ benchmark_driver-output-gruff
62
71
  d_heap!
63
72
  perf
64
73
  priority_queue_cxx
data/N ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/sh
2
+ set -eu
3
+
4
+ export BENCH_N="$1"
5
+ shift
6
+
7
+ exec ruby "$@"
data/README.md CHANGED
@@ -1,8 +1,21 @@
1
- # DHeap
1
+ # DHeap - Fast d-ary heap for ruby
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/d_heap.svg)](https://badge.fury.io/rb/d_heap)
4
+ [![Build Status](https://github.com/nevans/d_heap/workflows/CI/badge.svg)](https://github.com/nevans/d_heap/actions?query=workflow%3ACI)
5
+ [![Maintainability](https://api.codeclimate.com/v1/badges/ff274acd0683c99c03e1/maintainability)](https://codeclimate.com/github/nevans/d_heap/maintainability)
2
6
 
3
7
  A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
4
8
  implemented as a C extension.
5
9
 
10
+ From [wikipedia](https://en.wikipedia.org/wiki/Heap_(data_structure)):
11
+ > A heap is a specialized tree-based data structure which is essentially an
12
+ > almost complete tree that satisfies the heap property: in a min heap, for any
13
+ > given node C, if P is a parent node of C, then the key (the value) of P is
14
+ > less than or equal to the key of C. The node at the "top" of the heap (with no
15
+ > parents) is called the root node.
16
+
17
+ ![tree representation of a min heap](images/wikipedia-min-heap.png)
18
+
6
19
  With a regular queue, you expect "FIFO" behavior: first in, first out. With a
7
20
  stack you expect "LIFO": last in first out. A priority queue has a score for
8
21
  each element and elements are popped in order by score. Priority queues are
@@ -13,14 +26,16 @@ management, for [Huffman coding], and various graph search algorithms such as
13
26
  The _d_-ary heap data structure is a generalization of the [binary heap], in
14
27
  which the nodes have _d_ children instead of 2. This allows for "insert" and
15
28
  "decrease priority" operations to be performed more quickly with the tradeoff of
16
- slower delete minimum. Additionally, _d_-ary heaps can have better memory cache
17
- behavior than binary heaps, allowing them to run more quickly in practice
18
- despite slower worst-case time complexity. In the worst case, a _d_-ary heap
19
- requires only `O(log n / log d)` operations to push, with the tradeoff that pop
20
- requires `O(d log n / log d)`.
29
+ slower delete minimum or "increase priority". Additionally, _d_-ary heaps can
30
+ have better memory cache behavior than binary heaps, allowing them to run more
31
+ quickly in practice despite slower worst-case time complexity. In the worst
32
+ case, a _d_-ary heap requires only `O(log n / log d)` operations to push, with
33
+ the tradeoff that pop requires `O(d log n / log d)`.
21
34
 
22
35
  Although you should probably just use the default _d_ value of `4` (see the
23
- analysis below), it's always advisable to benchmark your specific use-case.
36
+ analysis below), it's always advisable to benchmark your specific use-case. In
37
+ particular, if you push items more than you pop, higher values for _d_ can give
38
+ a faster total runtime.
24
39
 
25
40
  [d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
26
41
  [priority queue]: https://en.wikipedia.org/wiki/Priority_queue
@@ -33,26 +48,43 @@ analysis below), it's always advisable to benchmark your specific use-case.
33
48
 
34
49
  ## Usage
35
50
 
36
- Quick reference:
51
+ The basic API is `#push(object, score)` and `#pop`. Please read the
52
+ [gem documentation] for more details and other methods.
53
+
54
+ Quick reference for some common methods:
37
55
 
38
56
  * `heap << object` adds a value, with `Float(object)` as its score.
39
57
  * `heap.push(object, score)` adds a value with an extrinsic score.
40
58
  * `heap.pop` removes and returns the value with the minimum score.
41
- * `heap.pop_lte(score)` pops if the minimum score is `<=` the provided score.
59
+ * `heap.pop_lte(max_score)` pops only if the next score is `<=` the argument.
42
60
  * `heap.peek` to view the minimum value without popping it.
43
61
  * `heap.clear` to remove all items from the heap.
44
62
  * `heap.empty?` returns true if the heap is empty.
45
63
  * `heap.size` returns the number of items in the heap.
46
64
 
47
- The basic API is `#push(object, score)` and `pop`. If your values behave as
48
- their own score, then you can push with `#<<`. If the score changes while the
49
- object is still in the heap, it will not be re-evaluated again. The score must
50
- either be `Integer` or `Float` or convertable to a `Float` via `Float(score)`
51
- (i.e. it should implement `#to_f`).
65
+ If the score changes while the object is still in the heap, it will not be
66
+ re-evaluated again.
52
67
 
53
- ```ruby
54
- require "d_heap"
68
+ The score must either be `Integer` or `Float` or convertable to a `Float` via
69
+ `Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
70
+ numeric values gives more than 50% speedup under some benchmarks! _n.b._
71
+ `Integer` _scores must have an absolute value that fits into_ `unsigned long
72
+ long`. This is compiler and architecture dependant but with gcc on an IA-64
73
+ system it's 64 bits, which gives a range of -18,446,744,073,709,551,615 to
74
+ +18,446,744,073,709,551,615, which is more than enough to store e.g. POSIX time
75
+ in nanoseconds.
55
76
 
77
+ _Comparing arbitary objects via_ `a <=> b` _was the original design and may be
78
+ added back in a future version,_ if (and only if) _it can be done without
79
+ impacting the speed of numeric comparisons. The speedup from this constraint is
80
+ huge!_
81
+
82
+ [gem documentation]: https://rubydoc.info/gems/d_heap/DHeap
83
+
84
+ ### Examples
85
+
86
+ ```ruby
87
+ # create some example objects to place in our heap
56
88
  Task = Struct.new(:id, :time) do
57
89
  def to_f; time.to_f end
58
90
  end
@@ -61,72 +93,42 @@ t2 = Task.new(2, Time.now + 50)
61
93
  t3 = Task.new(3, Time.now + 60)
62
94
  t4 = Task.new(4, Time.now + 5)
63
95
 
64
- # if the object returns its own score via #to_f, "<<" is the simplest API
65
- heap << t1 << t2
96
+ # create the heap
97
+ require "d_heap"
98
+ heap = DHeap.new
99
+
100
+ # push with an explicit score (which might be extrinsic to the value)
101
+ heap.push t1, t1.to_f
102
+
103
+ # the score will be implicitly cast with Float, so any object with #to_f
104
+ heap.push t2, t2
66
105
 
67
- # or push with an explicit score
68
- heap.push t3, t4.to_f
69
- heap.push t4, t4 # score can be implicitly cast with Float
106
+ # if the object has an intrinsic score via #to_f, "<<" is the simplest API
107
+ heap << t3 << t4
70
108
 
71
- # peek and pop
109
+ # pop returns the lowest scored item, and removes it from the heap
72
110
  heap.pop # => #<struct Task id=4, time=2021-01-17 17:02:22.5574 -0500>
73
111
  heap.pop # => #<struct Task id=2, time=2021-01-17 17:03:07.5574 -0500>
112
+
113
+ # peek returns the lowest scored item, without removing it from the heap
74
114
  heap.peek # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
75
115
  heap.pop # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
76
- heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
77
- heap.empty? # => true
78
- heap.pop # => nil
79
- ```
80
116
 
81
- Constraining scores to numeric values gives more than 50% speedup under some
82
- benchmarks! _n.b._ `Integer` _scores must have an absolute value that fits
83
- into_ `unsigned long long`. _This is architecture dependant but on an IA-64
84
- system this is 64 bits, which gives a range of -18,446,744,073,709,551,615 to
85
- +18446744073709551615. Comparing arbitary objects via_ `a <=> b` _was the
86
- original design and may be added back in a future version,_ if (and only if) _it
87
- can be done without impacting the speed of numeric comparisons._
117
+ # pop_lte handles the common "h.pop if h.peek_score < max" pattern
118
+ heap.pop_lte(Time.now + 65) # => nil
88
119
 
89
- ```ruby
90
- heap.clear
91
-
92
- # The score can be derived from the value by using to_f.
93
- # "a <=> b" is *much* slower than comparing numbers, so it isn't used.
94
- class Event
95
- include Comparable
96
- attr_reader :time, :payload
97
- alias_method :to_time, :time
98
-
99
- def initialize(time, payload)
100
- @time = time.to_time
101
- @payload = payload
102
- freeze
103
- end
104
-
105
- def to_f
106
- time.to_f
107
- end
108
-
109
- def <=>(other)
110
- to_f <=> other.to_f
111
- end
112
- end
113
-
114
- heap << comparable_max # sorts last, using <=>
115
- heap << comparable_min # sorts first, using <=>
116
- heap << comparable_mid # sorts in the middle, using <=>
117
- heap.pop # => comparable_min
118
- heap.pop # => comparable_mid
119
- heap.pop # => comparable_max
120
+ # the heap size can be inspected with size and empty?
121
+ heap.empty? # => false
122
+ heap.size # => 1
123
+ heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
120
124
  heap.empty? # => true
125
+ heap.size # => 0
126
+
127
+ # popping from an empty heap returns nil
121
128
  heap.pop # => nil
122
129
  ```
123
130
 
124
- You can also pass a value into `#pop(max)` which will only pop if the minimum
125
- score is less than or equal to `max`.
126
-
127
- Read the [API documentation] for more detailed documentation and examples.
128
-
129
- [API documentation]: https://rubydoc.info/gems/d_heap/DHeap
131
+ Please see the [gem documentation] for more methods and more examples.
130
132
 
131
133
  ## Installation
132
134
 
@@ -153,104 +155,74 @@ for insert is `O(n)` because it may need to `memcpy` a significant portion of
153
155
  the array.
154
156
 
155
157
  The standard way to implement a priority queue is with a binary heap. Although
156
- this increases the time for `pop`, it converts the amortized time per push + pop
157
- from `O(n)` to `O(d log n / log d)`.
158
-
159
- However, I was surprised to find that—at least for some benchmarks—my pure ruby
160
- heap implementation was much slower than inserting into and popping from a fully
161
- sorted array. The reasons for this surprising result: Although it is `O(n)`,
162
- `memcpy` has a _very_ small constant factor, and calling `<=>` from ruby code
163
- has relatively _much_ larger constant factors. If your queue contains only a
164
- few thousand items, the overhead of those extra calls to `<=>` is _far_ more
165
- than occasionally calling `memcpy`. In the worst case, a _d_-heap will require
166
- `d + 1` times more comparisons for each push + pop than a `bsearch` + `insert`
167
- sorted array.
168
-
169
- Moving the sift-up and sift-down code into C helps some. But much more helpful
170
- is optimizing the comparison of numeric scores, so `a <=> b` never needs to be
171
- called. I'm hopeful that MJIT will eventually obsolete this C-extension. This
172
- can be hotspot code, and a basic ruby implementation could perform well if `<=>`
173
- had much lower overhead.
158
+ this increases the time complexity for `pop` alone, it reduces the combined time
159
+ compexity for the combined `push` + `pop`. Using a d-ary heap with d > 2
160
+ makes the tree shorter but broader, which reduces to `O(log n / log d)` while
161
+ increasing the comparisons needed by sift-down to `O(d log n/ log d)`.
174
162
 
175
- ## Analysis
163
+ However, I was disappointed when my best ruby heap implementation ran much more
164
+ slowly than the naive approach—even for heaps containing ten thousand items.
165
+ Although it _is_ `O(n)`, `memcpy` is _very_ fast, while calling `<=>` from ruby
166
+ has _much_ higher overhead. And a _d_-heap needs `d + 1` times more comparisons
167
+ for each push + pop than `bsearch` + `insert`.
176
168
 
177
- ### Time complexity
169
+ Additionally, when researching how other systems handle their scheduling, I was
170
+ inspired by reading go's "timer.go" implementation to experiment with a 4-ary
171
+ heap instead of the traditional binary heap.
178
172
 
179
- There are two fundamental heap operations: sift-up (used by push) and sift-down
180
- (used by pop).
173
+ ## Benchmarks
181
174
 
182
- * Both sift operations can perform as many as `log n / log d` swaps, as the
183
- element may sift from the bottom of the tree to the top, or vice versa.
184
- * Sift-up performs a single comparison per swap: `O(1)`.
185
- So pushing a new element is `O(log n / log d)`.
186
- * Swap down performs as many as d comparions per swap: `O(d)`.
187
- So popping the min element is `O(d log n / log d)`.
188
-
189
- Assuming every inserted element is eventually deleted from the root, d=4
190
- requires the fewest comparisons for combined insert and delete:
191
-
192
- * (1 + 2) lg 2 = 4.328085
193
- * (1 + 3) lg 3 = 3.640957
194
- * (1 + 4) lg 4 = 3.606738
195
- * (1 + 5) lg 5 = 3.728010
196
- * (1 + 6) lg 6 = 3.906774
197
- * etc...
175
+ _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
176
+ `docs/profile.txt` for much more detail or updated results. These benchmarks
177
+ were measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
198
178
 
199
- Leaf nodes require no comparisons to shift down, and higher values for d have
200
- higher percentage of leaf nodes:
179
+ These benchmarks use very simple implementations for a pure-ruby heap and an
180
+ array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
181
+ comparison, I also compare to the [priority_queue_cxx gem] which uses the [C++
182
+ STL priority_queue], and another naive implementation that uses `Array#min` and
183
+ `Array#delete_at` with an unsorted array.
201
184
 
202
- * d=2 has ~50% leaves,
203
- * d=3 has ~67% leaves,
204
- * d=4 has ~75% leaves,
205
- * and so on...
185
+ In these benchmarks, `DHeap` runs faster than all other implementations for
186
+ every scenario and every value of N, although the difference is usually more
187
+ noticable at higher values of N. The pure ruby heap implementation is
188
+ competitive for `push` alone at every value of N, but is significantly slower
189
+ than bsearch + insert for push + pop, until N is _very_ large (somewhere between
190
+ 10k and 100k)!
206
191
 
207
- See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
192
+ [priority_queue_cxx gem]: https://rubygems.org/gems/priority_queue_cxx
193
+ [C++ STL priority_queue]: http://www.cplusplus.com/reference/queue/priority_queue/
208
194
 
209
- ### Space complexity
195
+ Three different scenarios are measured:
210
196
 
211
- Space usage is linear, regardless of d. However higher d values may
212
- provide better cache locality. Because the heap is a complete binary tree, the
213
- elements can be stored in an array, without the need for tree or list pointers.
197
+ ### push N items onto an empty heap
214
198
 
215
- Ruby can compare Numeric values _much_ faster than other ruby objects, even if
216
- those objects simply delegate comparison to internal Numeric values. And it is
217
- often useful to use external scores for otherwise uncomparable values. So
218
- `DHeap` uses twice as many entries (one for score and one for value)
219
- as an array which only stores values.
199
+ ...but never pop (clearing between each set of pushes).
220
200
 
221
- ## Benchmarks
201
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n.png)
222
202
 
223
- _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
224
- `docs/profile.txt` for more details or updated results. These benchmarks were
225
- measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
203
+ ### push N items onto an empty heap then pop all N
226
204
 
227
- These benchmarks use very simple implementations for a pure-ruby heap and an
228
- array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
229
- comparison, an alternate implementation `Array#min` and `Array#delete_at` is
230
- also shown.
205
+ Although this could be used for heap sort, we're unlikely to choose heap sort
206
+ over Ruby's quick sort implementation. I'm using this scenario to represent
207
+ the amortized cost of creating a heap and (eventually) draining it.
231
208
 
232
- Three different scenarios are measured:
233
- * push N values but never pop (clearing between each set of pushes).
234
- * push N values and then pop N values.
235
- Although this could be used for heap sort, we're unlikely to choose heap sort
236
- over Ruby's quick sort implementation. I'm using this scenario to represent
237
- the amortized cost of creating a heap and (eventually) draining it.
238
- * For a heap of size N, repeatedly push and pop while keeping a stable size.
239
- This is a _very simple_ approximation for how most scheduler/timer heaps
240
- would be used. Usually when a timer fires it will be quickly replaced by a
241
- new timer, and the overall count of timers will remain roughly stable.
209
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n_pop_n.png)
242
210
 
243
- In these benchmarks, `DHeap` runs faster than all other implementations for
244
- every scenario and every value of N, although the difference is much more
245
- noticable at higher values of N. The pure ruby heap implementation is
246
- competitive for `push` alone at every value of N, but is significantly slower
247
- than bsearch + insert for push + pop until N is _very_ large (somewhere between
248
- 10k and 100k)!
211
+ ### push and pop on a heap with N values
212
+
213
+ Repeatedly push and pop while keeping a stable heap size. This is a _very
214
+ simplistic_ approximation for how most scheduler/timer heaps might be used.
215
+ Usually when a timer fires it will be quickly replaced by a new timer, and the
216
+ overall count of timers will remain roughly stable.
217
+
218
+ ![bar graph for push_pop benchmarks](./images/push_pop.png)
249
219
 
250
- For very small N values the benchmark implementations, `DHeap` runs faster than
251
- the other implementations for each scenario, although the difference is still
252
- relatively small. The pure ruby binary heap is 2x or more slower than bsearch +
253
- insert for common common push/pop scenario.
220
+ ### numbers
221
+
222
+ Even for very small N values the benchmark implementations, `DHeap` runs faster
223
+ than the other implementations for each scenario, although the difference is
224
+ still relatively small. The pure ruby binary heap is 2x or more slower than
225
+ bsearch + insert for common push/pop scenario.
254
226
 
255
227
  == push N (N=5) ==========================================================
256
228
  push N (c_dheap): 1969700.7 i/s
@@ -341,78 +313,68 @@ the linear time compexity to keep a sorted array dominates.
341
313
  queue size = 5000000: 2664897.7 i/s - 2.74x slower
342
314
  queue size = 10000000: 2137927.6 i/s - 3.42x slower
343
315
 
344
- ## Profiling
345
-
346
- _n.b. `Array#fetch` is reading the input data, external to heap operations.
347
- These benchmarks use integers for all scores, which enables significantly faster
348
- comparisons. If `a <=> b` were used instead, then the difference between push
349
- and pop would be much larger. And ruby's `Tracepoint` impacts these different
350
- implementations differently. So we can't use these profiler results for
351
- comparisons between implementations. A sampling profiler would be needed for
352
- more accurate relative measurements._
353
-
354
- It's informative to look at the `ruby-prof` results for a simple binary search +
355
- insert implementation, repeatedly pushing and popping to a large heap. In
356
- particular, even with 1000 members, the linear `Array#insert` is _still_ faster
357
- than the logarithmic `Array#bsearch_index`. At this scale, ruby comparisons are
358
- still (relatively) slow and `memcpy` is (relatively) quite fast!
359
-
360
- %self total self wait child calls name location
361
- 34.79 2.222 2.222 0.000 0.000 1000000 Array#insert
362
- 32.59 2.081 2.081 0.000 0.000 1000000 Array#bsearch_index
363
- 12.84 6.386 0.820 0.000 5.566 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
364
- 10.38 4.966 0.663 0.000 4.303 1000000 DHeap::Benchmarks::BinarySearchAndInsert#<< d_heap/benchmarks/implementations.rb:61
365
- 5.38 0.468 0.343 0.000 0.125 1000000 DHeap::Benchmarks::BinarySearchAndInsert#pop d_heap/benchmarks/implementations.rb:70
366
- 2.06 0.132 0.132 0.000 0.000 1000000 Array#fetch
367
- 1.95 0.125 0.125 0.000 0.000 1000000 Array#pop
368
-
369
- Contrast this with a simplistic pure-ruby implementation of a binary heap:
370
-
371
- %self total self wait child calls name location
372
- 48.52 8.487 8.118 0.000 0.369 1000000 DHeap::Benchmarks::NaiveBinaryHeap#pop d_heap/benchmarks/implementations.rb:96
373
- 42.94 7.310 7.184 0.000 0.126 1000000 DHeap::Benchmarks::NaiveBinaryHeap#<< d_heap/benchmarks/implementations.rb:80
374
- 4.80 16.732 0.803 0.000 15.929 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
375
-
376
- You can see that it spends almost more time in pop than it does in push. That
377
- is expected behavior for a heap: although both are O(log n), pop is
378
- significantly more complex, and has _d_ comparisons per layer.
379
-
380
- And `DHeap` shows a similar comparison between push and pop, although it spends
381
- half of its time in the benchmark code (which is written in ruby):
382
-
383
- %self total self wait child calls name location
384
- 43.09 1.685 0.726 0.000 0.959 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
385
- 26.05 0.439 0.439 0.000 0.000 1000000 DHeap#<<
386
- 23.57 0.397 0.397 0.000 0.000 1000000 DHeap#pop
387
- 7.29 0.123 0.123 0.000 0.000 1000000 Array#fetch
388
-
389
- ### Timers
390
-
391
- Additionally, when used to sort timers, we can reasonably assume that:
392
- * New timers usually sort after most existing timers.
393
- * Most timers will be canceled before executing.
394
- * Canceled timers usually sort after most existing timers.
395
-
396
- So, if we are able to delete an item without searching for it, by keeping a map
397
- of positions within the heap, most timers can be inserted and deleted in O(1)
398
- time. Canceling a non-leaf timer can be further optimized by marking it as
399
- canceled without immediately removing it from the heap. If the timer is
400
- rescheduled before we garbage collect, adjusting its position will usually be
401
- faster than a delete and re-insert.
316
+ ## Analysis
317
+
318
+ ### Time complexity
319
+
320
+ There are two fundamental heap operations: sift-up (used by push) and sift-down
321
+ (used by pop).
322
+
323
+ * A _d_-ary heap will have `log n / log d` layers, so both sift operations can
324
+ perform as many as `log n / log d` writes, when a member sifts the entire
325
+ length of the tree.
326
+ * Sift-up makes one comparison per layer, so push runs in `O(log n / log d)`.
327
+ * Sift-down makes d comparions per layer, so pop runs in `O(d log n / log d)`.
328
+
329
+ So, in the simplest case of running balanced push/pop while maintaining the same
330
+ heap size, `(1 + d) log n / log d` comparisons are made. In the worst case,
331
+ when every sift traverses every layer of the tree, `d=4` requires the fewest
332
+ comparisons for combined insert and delete:
333
+
334
+ * (1 + 2) lg n / lg d ≈ 4.328085 lg n
335
+ * (1 + 3) lg n / lg d ≈ 3.640957 lg n
336
+ * (1 + 4) lg n / lg d ≈ 3.606738 lg n
337
+ * (1 + 5) lg n / lg d ≈ 3.728010 lg n
338
+ * (1 + 6) lg n / lg d ≈ 3.906774 lg n
339
+ * (1 + 7) lg n / lg d ≈ 4.111187 lg n
340
+ * (1 + 8) lg n / lg d ≈ 4.328085 lg n
341
+ * (1 + 9) lg n / lg d 4.551196 lg n
342
+ * (1 + 10) lg n / lg d ≈ 4.777239 lg n
343
+ * etc...
344
+
345
+ See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
346
+
347
+ ### Space complexity
348
+
349
+ Space usage is linear, regardless of d. However higher d values may
350
+ provide better cache locality. Because the heap is a complete binary tree, the
351
+ elements can be stored in an array, without the need for tree or list pointers.
352
+
353
+ Ruby can compare Numeric values _much_ faster than other ruby objects, even if
354
+ those objects simply delegate comparison to internal Numeric values. And it is
355
+ often useful to use external scores for otherwise uncomparable values. So
356
+ `DHeap` uses twice as many entries (one for score and one for value)
357
+ as an array which only stores values.
358
+
359
+ ## Thread safety
360
+
361
+ `DHeap` is _not_ thread-safe, so concurrent access from multiple threads need to
362
+ take precautions such as locking access behind a mutex.
402
363
 
403
364
  ## Alternative data structures
404
365
 
405
366
  As always, you should run benchmarks with your expected scenarios to determine
406
- which is right.
367
+ which is best for your application.
407
368
 
408
- Depending on what you're doing, maintaining a sorted `Array` using
409
- `#bsearch_index` and `#insert` might be just fine! As discussed above, although
410
- it is `O(n)` for insertions, `memcpy` is so fast on modern hardware that this
411
- may not matter. Also, if you can arrange for insertions to occur near the end
412
- of the array, that could significantly reduce the `memcpy` overhead even more.
369
+ Depending on your use-case, maintaining a sorted `Array` using `#bsearch_index`
370
+ and `#insert` might be just fine! Even `min` plus `delete` with an unsorted
371
+ array can be very fast on small queues. Although insertions run with `O(n)`,
372
+ `memcpy` is so fast on modern hardware that your dataset might not be large
373
+ enough for it to matter.
413
374
 
414
- More complex heap varients, e.g. [Fibonacci heap], can allow heaps to be merged
415
- as well as lower amortized time.
375
+ More complex heap varients, e.g. [Fibonacci heap], allow heaps to be split and
376
+ merged which gives some graph algorithms a lower amortized time complexity. But
377
+ in practice, _d_-ary heaps have much lower overhead and often run faster.
416
378
 
417
379
  [Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
418
380
 
@@ -432,25 +394,17 @@ complex than a heap, it may be necessary for enormous values of N.
432
394
 
433
395
  ## TODOs...
434
396
 
435
- _TODO:_ Also ~~included is~~ _will include_ `DHeap::Set`, which augments the
436
- basic heap with an internal `Hash`, which maps a set of values to scores.
437
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
438
- and adjustments, to achieve faster average runtime for *add* and *cancel*
439
- operations.
397
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Map`, which augments the
398
+ basic heap with an internal `Hash`, which maps objects to their position in the
399
+ heap. This enforces a uniqueness constraint on items on the heap, and also
400
+ allows items to be more efficiently deleted or adjusted. However maintaining
401
+ the hash does lead to a small drop in normal `#push` and `#pop` performance.
440
402
 
441
403
  _TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
442
404
  features that are loosely inspired by go's timers. e.g: It lazily sifts its
443
405
  heap after deletion and adjustments, to achieve faster average runtime for *add*
444
406
  and *cancel* operations.
445
407
 
446
- Additionally, I was inspired by reading go's "timer.go" implementation to
447
- experiment with a 4-ary heap instead of the traditional binary heap. In the
448
- case of timers, new timers are usually scheduled to run after most of the
449
- existing timers. And timers are usually canceled before they have a chance to
450
- run. While a binary heap holds 50% of its elements in its last layer, 75% of a
451
- 4-ary heap will have no children. That diminishes the extra comparison overhead
452
- during sift-down.
453
-
454
408
  ## Development
455
409
 
456
410
  After checking out the repo, run `bin/setup` to install dependencies. Then, run