d_heap 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3dd1049e0a8041a328da4ed65622c2f0589475bc386a0eb6f20e466c79587bc5
4
- data.tar.gz: ec44970feaa5ce6aef37f511e71e55342ec93b7e1a0b2a3d40f249afa4e9ac25
3
+ metadata.gz: 35213e5ac430b07cf2b43a7f065ff5c409506022835a5326cb2bfa25daa7f210
4
+ data.tar.gz: e87a64fb9fd6eb8bdd281d8fe289b7f4993f4bde9a671b0b414aca194b724691
5
5
  SHA512:
6
- metadata.gz: a6b6e192dbe5980b2b79728e4b4bf413151b3e193733d1435119482edb977f0a2edd692fd156614fdfbc86f4fa1dc6ac9f7907ca640c21ca23050d07b9a1caa6
7
- data.tar.gz: 27a987139a1fd14f73c16459f72be2bf1059dadbd212e250a28b3691dff2372e708cdf740155e2d2aa21de78cc789816a9d83f80f42d3c396510cd6fce6e6bf2
6
+ metadata.gz: 77518eb11bf8dd5fa8a29ad88f48c30650ff375ae9b74001fa59d30f634feddf5c4f3ef8d791dc4064afc1d9d68b9b463eaf0e778e927655e0da0c0a9da6fdee
7
+ data.tar.gz: 2911b20a882d8b6f577bda9a388a9a755a093cfd3c1aaaaa355aa5be97ffe506935620043adb115158565ead6941e52f789cdd714e4932aff4916412c60a3aee
@@ -1,4 +1,4 @@
1
- name: Ruby
1
+ name: CI
2
2
 
3
3
  on: [push,pull_request]
4
4
 
@@ -7,7 +7,7 @@ jobs:
7
7
  strategy:
8
8
  fail-fast: false
9
9
  matrix:
10
- ruby: [2.5, 2.6, 2.7, 3.0]
10
+ ruby: [2.4, 2.5, 2.6, 2.7, 3.0]
11
11
  os: [ubuntu, macos]
12
12
  experimental: [false]
13
13
  runs-on: ${{ matrix.os }}-latest
data/.gitignore CHANGED
@@ -10,6 +10,7 @@
10
10
  *.so
11
11
  *.o
12
12
  *.a
13
+ compile_commands.json
13
14
  mkmf.log
14
15
 
15
16
  # rspec failure tracking
@@ -3,7 +3,7 @@ inherit_mode:
3
3
  - Exclude
4
4
 
5
5
  AllCops:
6
- TargetRubyVersion: 2.5
6
+ TargetRubyVersion: 2.4
7
7
  NewCops: disable
8
8
  Exclude:
9
9
  - bin/benchmark-driver
@@ -0,0 +1,10 @@
1
+ -o doc
2
+ --embed-mixins
3
+ --hide-void-return
4
+ --no-private
5
+ --asset images:images
6
+ --exclude lib/benchmark_driver
7
+ --exclude lib/d_heap/benchmarks*
8
+ -
9
+ CHANGELOG.md
10
+ CODE_OF_CONDUCT.md
@@ -1,5 +1,17 @@
1
1
  ## Current/Unreleased
2
2
 
3
+ ## Release v0.6.0 (2021-01-24)
4
+
5
+ * 🔥 **Breaking**: `#initialize` uses a keyword argument for `d`
6
+ * ✨ Added `#initialize(capacity: capa)` to set initial capacity.
7
+ * ✨ Added `peek_with_score` and `peek_score`
8
+ * ✨ Added `pop_with_score` and `each_pop(with_score: true)`
9
+ * ✨ Added `pop_all_below(max_score, array = [])`
10
+ * ✨ Added aliases for `shift` and `next`
11
+ * 📈 Added benchmark charts to README, and `bin/bench_charts` to generate them.
12
+ * requires `gruff` which requires `rmagick` which requires `imagemagick`
13
+ * 📝 Many documentation updates and fixes.
14
+
3
15
  ## Release v0.5.0 (2021-01-17)
4
16
 
5
17
  * 🔥 **Breaking**: reversed order of `#push` arguments to `value, score`.
@@ -8,19 +20,20 @@
8
20
  * ✨ Added aliases for `deq`, `enq`, `first`, `pop_below`, `length`, and
9
21
  `count`, to mimic other classes in ruby's stdlib.
10
22
  * ⚡️♻️ More performance improvements:
11
- * Created an `ENTRY` struct and store both the score and the value pointer in
12
- the same `ENTRY *entries` array.
13
- * Reduced unnecessary allocations or copies in both sift loops. A similar
14
- refactoring also sped up the pure ruby benchmark implementation.
15
- * Compiling with `-O3`.
23
+ * Created an `ENTRY` struct and store both the score and the value pointer in
24
+ the same `ENTRY *entries` array.
25
+ * Reduced unnecessary allocations or copies in both sift loops. A similar
26
+ refactoring also sped up the pure ruby benchmark implementation.
27
+ * Compiling with `-O3`.
16
28
  * 📝 Updated (and in some cases, fixed) yardoc
17
29
  * ♻️ Moved aliases and less performance sensitive code into ruby.
18
30
  * ♻️ DRY up push/insert methods
19
31
 
20
32
  ## Release v0.4.0 (2021-01-12)
21
33
 
34
+ * 🔥 **Breaking**: Scores must be `Integer` or convertable to `Float`
35
+ * ⚠️ `Integer` scores must fit in `-ULONG_LONG_MAX` to `+ULONG_LONG_MAX`.
22
36
  * ⚡️ Big performance improvements, by using C `long double *cscores` array
23
- * ⚡️ Scores must be `Integer` in `-uint64..+uint64`, or convertable to `Float`
24
37
  * ⚡️ many many (so many) updates to benchmarks
25
38
  * ✨ Added `DHeap#clear`
26
39
  * 🐛 Fixed `DHeap#initialize_copy` and `#freeze`
data/Gemfile CHANGED
@@ -11,6 +11,10 @@ gem "rake-compiler"
11
11
  gem "rspec", "~> 3.10"
12
12
  gem "rubocop", "~> 1.0"
13
13
 
14
+ install_if -> { RUBY_PLATFORM !~ /darwin/ } do
15
+ gem "benchmark_driver-output-gruff"
16
+ end
17
+
14
18
  gem "perf"
15
19
  gem "priority_queue_cxx"
16
20
  gem "stackprof"
@@ -1,15 +1,22 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- d_heap (0.5.0)
4
+ d_heap (0.6.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
9
  ast (2.4.1)
10
10
  benchmark_driver (0.15.16)
11
+ benchmark_driver-output-gruff (0.3.1)
12
+ benchmark_driver (>= 0.12.0)
13
+ gruff
11
14
  coderay (1.1.3)
12
15
  diff-lcs (1.4.4)
16
+ gruff (0.12.1)
17
+ histogram
18
+ rmagick
19
+ histogram (0.2.4.1)
13
20
  method_source (1.0.0)
14
21
  parallel (1.19.2)
15
22
  parser (2.7.2.0)
@@ -25,6 +32,7 @@ GEM
25
32
  rake
26
33
  regexp_parser (1.8.2)
27
34
  rexml (3.2.3)
35
+ rmagick (4.1.2)
28
36
  rspec (3.10.0)
29
37
  rspec-core (~> 3.10.0)
30
38
  rspec-expectations (~> 3.10.0)
@@ -59,6 +67,7 @@ PLATFORMS
59
67
 
60
68
  DEPENDENCIES
61
69
  benchmark_driver
70
+ benchmark_driver-output-gruff
62
71
  d_heap!
63
72
  perf
64
73
  priority_queue_cxx
data/N ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/sh
2
+ set -eu
3
+
4
+ export BENCH_N="$1"
5
+ shift
6
+
7
+ exec ruby "$@"
data/README.md CHANGED
@@ -1,8 +1,21 @@
1
- # DHeap
1
+ # DHeap - Fast d-ary heap for ruby
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/d_heap.svg)](https://badge.fury.io/rb/d_heap)
4
+ [![Build Status](https://github.com/nevans/d_heap/workflows/CI/badge.svg)](https://github.com/nevans/d_heap/actions?query=workflow%3ACI)
5
+ [![Maintainability](https://api.codeclimate.com/v1/badges/ff274acd0683c99c03e1/maintainability)](https://codeclimate.com/github/nevans/d_heap/maintainability)
2
6
 
3
7
  A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
4
8
  implemented as a C extension.
5
9
 
10
+ From [wikipedia](https://en.wikipedia.org/wiki/Heap_(data_structure)):
11
+ > A heap is a specialized tree-based data structure which is essentially an
12
+ > almost complete tree that satisfies the heap property: in a min heap, for any
13
+ > given node C, if P is a parent node of C, then the key (the value) of P is
14
+ > less than or equal to the key of C. The node at the "top" of the heap (with no
15
+ > parents) is called the root node.
16
+
17
+ ![tree representation of a min heap](images/wikipedia-min-heap.png)
18
+
6
19
  With a regular queue, you expect "FIFO" behavior: first in, first out. With a
7
20
  stack you expect "LIFO": last in first out. A priority queue has a score for
8
21
  each element and elements are popped in order by score. Priority queues are
@@ -13,14 +26,16 @@ management, for [Huffman coding], and various graph search algorithms such as
13
26
  The _d_-ary heap data structure is a generalization of the [binary heap], in
14
27
  which the nodes have _d_ children instead of 2. This allows for "insert" and
15
28
  "decrease priority" operations to be performed more quickly with the tradeoff of
16
- slower delete minimum. Additionally, _d_-ary heaps can have better memory cache
17
- behavior than binary heaps, allowing them to run more quickly in practice
18
- despite slower worst-case time complexity. In the worst case, a _d_-ary heap
19
- requires only `O(log n / log d)` operations to push, with the tradeoff that pop
20
- requires `O(d log n / log d)`.
29
+ slower delete minimum or "increase priority". Additionally, _d_-ary heaps can
30
+ have better memory cache behavior than binary heaps, allowing them to run more
31
+ quickly in practice despite slower worst-case time complexity. In the worst
32
+ case, a _d_-ary heap requires only `O(log n / log d)` operations to push, with
33
+ the tradeoff that pop requires `O(d log n / log d)`.
21
34
 
22
35
  Although you should probably just use the default _d_ value of `4` (see the
23
- analysis below), it's always advisable to benchmark your specific use-case.
36
+ analysis below), it's always advisable to benchmark your specific use-case. In
37
+ particular, if you push items more than you pop, higher values for _d_ can give
38
+ a faster total runtime.
24
39
 
25
40
  [d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
26
41
  [priority queue]: https://en.wikipedia.org/wiki/Priority_queue
@@ -33,26 +48,43 @@ analysis below), it's always advisable to benchmark your specific use-case.
33
48
 
34
49
  ## Usage
35
50
 
36
- Quick reference:
51
+ The basic API is `#push(object, score)` and `#pop`. Please read the
52
+ [gem documentation] for more details and other methods.
53
+
54
+ Quick reference for some common methods:
37
55
 
38
56
  * `heap << object` adds a value, with `Float(object)` as its score.
39
57
  * `heap.push(object, score)` adds a value with an extrinsic score.
40
58
  * `heap.pop` removes and returns the value with the minimum score.
41
- * `heap.pop_lte(score)` pops if the minimum score is `<=` the provided score.
59
+ * `heap.pop_lte(max_score)` pops only if the next score is `<=` the argument.
42
60
  * `heap.peek` to view the minimum value without popping it.
43
61
  * `heap.clear` to remove all items from the heap.
44
62
  * `heap.empty?` returns true if the heap is empty.
45
63
  * `heap.size` returns the number of items in the heap.
46
64
 
47
- The basic API is `#push(object, score)` and `pop`. If your values behave as
48
- their own score, then you can push with `#<<`. If the score changes while the
49
- object is still in the heap, it will not be re-evaluated again. The score must
50
- either be `Integer` or `Float` or convertable to a `Float` via `Float(score)`
51
- (i.e. it should implement `#to_f`).
65
+ If the score changes while the object is still in the heap, it will not be
66
+ re-evaluated again.
52
67
 
53
- ```ruby
54
- require "d_heap"
68
+ The score must either be `Integer` or `Float` or convertable to a `Float` via
69
+ `Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
70
+ numeric values gives more than 50% speedup under some benchmarks! _n.b._
71
+ `Integer` _scores must have an absolute value that fits into_ `unsigned long
72
+ long`. This is compiler and architecture dependant but with gcc on an IA-64
73
+ system it's 64 bits, which gives a range of -18,446,744,073,709,551,615 to
74
+ +18,446,744,073,709,551,615, which is more than enough to store e.g. POSIX time
75
+ in nanoseconds.
55
76
 
77
+ _Comparing arbitary objects via_ `a <=> b` _was the original design and may be
78
+ added back in a future version,_ if (and only if) _it can be done without
79
+ impacting the speed of numeric comparisons. The speedup from this constraint is
80
+ huge!_
81
+
82
+ [gem documentation]: https://rubydoc.info/gems/d_heap/DHeap
83
+
84
+ ### Examples
85
+
86
+ ```ruby
87
+ # create some example objects to place in our heap
56
88
  Task = Struct.new(:id, :time) do
57
89
  def to_f; time.to_f end
58
90
  end
@@ -61,72 +93,42 @@ t2 = Task.new(2, Time.now + 50)
61
93
  t3 = Task.new(3, Time.now + 60)
62
94
  t4 = Task.new(4, Time.now + 5)
63
95
 
64
- # if the object returns its own score via #to_f, "<<" is the simplest API
65
- heap << t1 << t2
96
+ # create the heap
97
+ require "d_heap"
98
+ heap = DHeap.new
99
+
100
+ # push with an explicit score (which might be extrinsic to the value)
101
+ heap.push t1, t1.to_f
102
+
103
+ # the score will be implicitly cast with Float, so any object with #to_f
104
+ heap.push t2, t2
66
105
 
67
- # or push with an explicit score
68
- heap.push t3, t4.to_f
69
- heap.push t4, t4 # score can be implicitly cast with Float
106
+ # if the object has an intrinsic score via #to_f, "<<" is the simplest API
107
+ heap << t3 << t4
70
108
 
71
- # peek and pop
109
+ # pop returns the lowest scored item, and removes it from the heap
72
110
  heap.pop # => #<struct Task id=4, time=2021-01-17 17:02:22.5574 -0500>
73
111
  heap.pop # => #<struct Task id=2, time=2021-01-17 17:03:07.5574 -0500>
112
+
113
+ # peek returns the lowest scored item, without removing it from the heap
74
114
  heap.peek # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
75
115
  heap.pop # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
76
- heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
77
- heap.empty? # => true
78
- heap.pop # => nil
79
- ```
80
116
 
81
- Constraining scores to numeric values gives more than 50% speedup under some
82
- benchmarks! _n.b._ `Integer` _scores must have an absolute value that fits
83
- into_ `unsigned long long`. _This is architecture dependant but on an IA-64
84
- system this is 64 bits, which gives a range of -18,446,744,073,709,551,615 to
85
- +18446744073709551615. Comparing arbitary objects via_ `a <=> b` _was the
86
- original design and may be added back in a future version,_ if (and only if) _it
87
- can be done without impacting the speed of numeric comparisons._
117
+ # pop_lte handles the common "h.pop if h.peek_score < max" pattern
118
+ heap.pop_lte(Time.now + 65) # => nil
88
119
 
89
- ```ruby
90
- heap.clear
91
-
92
- # The score can be derived from the value by using to_f.
93
- # "a <=> b" is *much* slower than comparing numbers, so it isn't used.
94
- class Event
95
- include Comparable
96
- attr_reader :time, :payload
97
- alias_method :to_time, :time
98
-
99
- def initialize(time, payload)
100
- @time = time.to_time
101
- @payload = payload
102
- freeze
103
- end
104
-
105
- def to_f
106
- time.to_f
107
- end
108
-
109
- def <=>(other)
110
- to_f <=> other.to_f
111
- end
112
- end
113
-
114
- heap << comparable_max # sorts last, using <=>
115
- heap << comparable_min # sorts first, using <=>
116
- heap << comparable_mid # sorts in the middle, using <=>
117
- heap.pop # => comparable_min
118
- heap.pop # => comparable_mid
119
- heap.pop # => comparable_max
120
+ # the heap size can be inspected with size and empty?
121
+ heap.empty? # => false
122
+ heap.size # => 1
123
+ heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
120
124
  heap.empty? # => true
125
+ heap.size # => 0
126
+
127
+ # popping from an empty heap returns nil
121
128
  heap.pop # => nil
122
129
  ```
123
130
 
124
- You can also pass a value into `#pop(max)` which will only pop if the minimum
125
- score is less than or equal to `max`.
126
-
127
- Read the [API documentation] for more detailed documentation and examples.
128
-
129
- [API documentation]: https://rubydoc.info/gems/d_heap/DHeap
131
+ Please see the [gem documentation] for more methods and more examples.
130
132
 
131
133
  ## Installation
132
134
 
@@ -153,104 +155,74 @@ for insert is `O(n)` because it may need to `memcpy` a significant portion of
153
155
  the array.
154
156
 
155
157
  The standard way to implement a priority queue is with a binary heap. Although
156
- this increases the time for `pop`, it converts the amortized time per push + pop
157
- from `O(n)` to `O(d log n / log d)`.
158
-
159
- However, I was surprised to find that—at least for some benchmarks—my pure ruby
160
- heap implementation was much slower than inserting into and popping from a fully
161
- sorted array. The reasons for this surprising result: Although it is `O(n)`,
162
- `memcpy` has a _very_ small constant factor, and calling `<=>` from ruby code
163
- has relatively _much_ larger constant factors. If your queue contains only a
164
- few thousand items, the overhead of those extra calls to `<=>` is _far_ more
165
- than occasionally calling `memcpy`. In the worst case, a _d_-heap will require
166
- `d + 1` times more comparisons for each push + pop than a `bsearch` + `insert`
167
- sorted array.
168
-
169
- Moving the sift-up and sift-down code into C helps some. But much more helpful
170
- is optimizing the comparison of numeric scores, so `a <=> b` never needs to be
171
- called. I'm hopeful that MJIT will eventually obsolete this C-extension. This
172
- can be hotspot code, and a basic ruby implementation could perform well if `<=>`
173
- had much lower overhead.
158
+ this increases the time complexity for `pop` alone, it reduces the combined time
159
+ compexity for the combined `push` + `pop`. Using a d-ary heap with d > 2
160
+ makes the tree shorter but broader, which reduces to `O(log n / log d)` while
161
+ increasing the comparisons needed by sift-down to `O(d log n/ log d)`.
174
162
 
175
- ## Analysis
163
+ However, I was disappointed when my best ruby heap implementation ran much more
164
+ slowly than the naive approach—even for heaps containing ten thousand items.
165
+ Although it _is_ `O(n)`, `memcpy` is _very_ fast, while calling `<=>` from ruby
166
+ has _much_ higher overhead. And a _d_-heap needs `d + 1` times more comparisons
167
+ for each push + pop than `bsearch` + `insert`.
176
168
 
177
- ### Time complexity
169
+ Additionally, when researching how other systems handle their scheduling, I was
170
+ inspired by reading go's "timer.go" implementation to experiment with a 4-ary
171
+ heap instead of the traditional binary heap.
178
172
 
179
- There are two fundamental heap operations: sift-up (used by push) and sift-down
180
- (used by pop).
173
+ ## Benchmarks
181
174
 
182
- * Both sift operations can perform as many as `log n / log d` swaps, as the
183
- element may sift from the bottom of the tree to the top, or vice versa.
184
- * Sift-up performs a single comparison per swap: `O(1)`.
185
- So pushing a new element is `O(log n / log d)`.
186
- * Swap down performs as many as d comparions per swap: `O(d)`.
187
- So popping the min element is `O(d log n / log d)`.
188
-
189
- Assuming every inserted element is eventually deleted from the root, d=4
190
- requires the fewest comparisons for combined insert and delete:
191
-
192
- * (1 + 2) lg 2 = 4.328085
193
- * (1 + 3) lg 3 = 3.640957
194
- * (1 + 4) lg 4 = 3.606738
195
- * (1 + 5) lg 5 = 3.728010
196
- * (1 + 6) lg 6 = 3.906774
197
- * etc...
175
+ _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
176
+ `docs/profile.txt` for much more detail or updated results. These benchmarks
177
+ were measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
198
178
 
199
- Leaf nodes require no comparisons to shift down, and higher values for d have
200
- higher percentage of leaf nodes:
179
+ These benchmarks use very simple implementations for a pure-ruby heap and an
180
+ array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
181
+ comparison, I also compare to the [priority_queue_cxx gem] which uses the [C++
182
+ STL priority_queue], and another naive implementation that uses `Array#min` and
183
+ `Array#delete_at` with an unsorted array.
201
184
 
202
- * d=2 has ~50% leaves,
203
- * d=3 has ~67% leaves,
204
- * d=4 has ~75% leaves,
205
- * and so on...
185
+ In these benchmarks, `DHeap` runs faster than all other implementations for
186
+ every scenario and every value of N, although the difference is usually more
187
+ noticable at higher values of N. The pure ruby heap implementation is
188
+ competitive for `push` alone at every value of N, but is significantly slower
189
+ than bsearch + insert for push + pop, until N is _very_ large (somewhere between
190
+ 10k and 100k)!
206
191
 
207
- See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
192
+ [priority_queue_cxx gem]: https://rubygems.org/gems/priority_queue_cxx
193
+ [C++ STL priority_queue]: http://www.cplusplus.com/reference/queue/priority_queue/
208
194
 
209
- ### Space complexity
195
+ Three different scenarios are measured:
210
196
 
211
- Space usage is linear, regardless of d. However higher d values may
212
- provide better cache locality. Because the heap is a complete binary tree, the
213
- elements can be stored in an array, without the need for tree or list pointers.
197
+ ### push N items onto an empty heap
214
198
 
215
- Ruby can compare Numeric values _much_ faster than other ruby objects, even if
216
- those objects simply delegate comparison to internal Numeric values. And it is
217
- often useful to use external scores for otherwise uncomparable values. So
218
- `DHeap` uses twice as many entries (one for score and one for value)
219
- as an array which only stores values.
199
+ ...but never pop (clearing between each set of pushes).
220
200
 
221
- ## Benchmarks
201
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n.png)
222
202
 
223
- _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
224
- `docs/profile.txt` for more details or updated results. These benchmarks were
225
- measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
203
+ ### push N items onto an empty heap then pop all N
226
204
 
227
- These benchmarks use very simple implementations for a pure-ruby heap and an
228
- array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
229
- comparison, an alternate implementation `Array#min` and `Array#delete_at` is
230
- also shown.
205
+ Although this could be used for heap sort, we're unlikely to choose heap sort
206
+ over Ruby's quick sort implementation. I'm using this scenario to represent
207
+ the amortized cost of creating a heap and (eventually) draining it.
231
208
 
232
- Three different scenarios are measured:
233
- * push N values but never pop (clearing between each set of pushes).
234
- * push N values and then pop N values.
235
- Although this could be used for heap sort, we're unlikely to choose heap sort
236
- over Ruby's quick sort implementation. I'm using this scenario to represent
237
- the amortized cost of creating a heap and (eventually) draining it.
238
- * For a heap of size N, repeatedly push and pop while keeping a stable size.
239
- This is a _very simple_ approximation for how most scheduler/timer heaps
240
- would be used. Usually when a timer fires it will be quickly replaced by a
241
- new timer, and the overall count of timers will remain roughly stable.
209
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n_pop_n.png)
242
210
 
243
- In these benchmarks, `DHeap` runs faster than all other implementations for
244
- every scenario and every value of N, although the difference is much more
245
- noticable at higher values of N. The pure ruby heap implementation is
246
- competitive for `push` alone at every value of N, but is significantly slower
247
- than bsearch + insert for push + pop until N is _very_ large (somewhere between
248
- 10k and 100k)!
211
+ ### push and pop on a heap with N values
212
+
213
+ Repeatedly push and pop while keeping a stable heap size. This is a _very
214
+ simplistic_ approximation for how most scheduler/timer heaps might be used.
215
+ Usually when a timer fires it will be quickly replaced by a new timer, and the
216
+ overall count of timers will remain roughly stable.
217
+
218
+ ![bar graph for push_pop benchmarks](./images/push_pop.png)
249
219
 
250
- For very small N values the benchmark implementations, `DHeap` runs faster than
251
- the other implementations for each scenario, although the difference is still
252
- relatively small. The pure ruby binary heap is 2x or more slower than bsearch +
253
- insert for common common push/pop scenario.
220
+ ### numbers
221
+
222
+ Even for very small N values the benchmark implementations, `DHeap` runs faster
223
+ than the other implementations for each scenario, although the difference is
224
+ still relatively small. The pure ruby binary heap is 2x or more slower than
225
+ bsearch + insert for common push/pop scenario.
254
226
 
255
227
  == push N (N=5) ==========================================================
256
228
  push N (c_dheap): 1969700.7 i/s
@@ -341,78 +313,68 @@ the linear time compexity to keep a sorted array dominates.
341
313
  queue size = 5000000: 2664897.7 i/s - 2.74x slower
342
314
  queue size = 10000000: 2137927.6 i/s - 3.42x slower
343
315
 
344
- ## Profiling
345
-
346
- _n.b. `Array#fetch` is reading the input data, external to heap operations.
347
- These benchmarks use integers for all scores, which enables significantly faster
348
- comparisons. If `a <=> b` were used instead, then the difference between push
349
- and pop would be much larger. And ruby's `Tracepoint` impacts these different
350
- implementations differently. So we can't use these profiler results for
351
- comparisons between implementations. A sampling profiler would be needed for
352
- more accurate relative measurements._
353
-
354
- It's informative to look at the `ruby-prof` results for a simple binary search +
355
- insert implementation, repeatedly pushing and popping to a large heap. In
356
- particular, even with 1000 members, the linear `Array#insert` is _still_ faster
357
- than the logarithmic `Array#bsearch_index`. At this scale, ruby comparisons are
358
- still (relatively) slow and `memcpy` is (relatively) quite fast!
359
-
360
- %self total self wait child calls name location
361
- 34.79 2.222 2.222 0.000 0.000 1000000 Array#insert
362
- 32.59 2.081 2.081 0.000 0.000 1000000 Array#bsearch_index
363
- 12.84 6.386 0.820 0.000 5.566 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
364
- 10.38 4.966 0.663 0.000 4.303 1000000 DHeap::Benchmarks::BinarySearchAndInsert#<< d_heap/benchmarks/implementations.rb:61
365
- 5.38 0.468 0.343 0.000 0.125 1000000 DHeap::Benchmarks::BinarySearchAndInsert#pop d_heap/benchmarks/implementations.rb:70
366
- 2.06 0.132 0.132 0.000 0.000 1000000 Array#fetch
367
- 1.95 0.125 0.125 0.000 0.000 1000000 Array#pop
368
-
369
- Contrast this with a simplistic pure-ruby implementation of a binary heap:
370
-
371
- %self total self wait child calls name location
372
- 48.52 8.487 8.118 0.000 0.369 1000000 DHeap::Benchmarks::NaiveBinaryHeap#pop d_heap/benchmarks/implementations.rb:96
373
- 42.94 7.310 7.184 0.000 0.126 1000000 DHeap::Benchmarks::NaiveBinaryHeap#<< d_heap/benchmarks/implementations.rb:80
374
- 4.80 16.732 0.803 0.000 15.929 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
375
-
376
- You can see that it spends almost more time in pop than it does in push. That
377
- is expected behavior for a heap: although both are O(log n), pop is
378
- significantly more complex, and has _d_ comparisons per layer.
379
-
380
- And `DHeap` shows a similar comparison between push and pop, although it spends
381
- half of its time in the benchmark code (which is written in ruby):
382
-
383
- %self total self wait child calls name location
384
- 43.09 1.685 0.726 0.000 0.959 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
385
- 26.05 0.439 0.439 0.000 0.000 1000000 DHeap#<<
386
- 23.57 0.397 0.397 0.000 0.000 1000000 DHeap#pop
387
- 7.29 0.123 0.123 0.000 0.000 1000000 Array#fetch
388
-
389
- ### Timers
390
-
391
- Additionally, when used to sort timers, we can reasonably assume that:
392
- * New timers usually sort after most existing timers.
393
- * Most timers will be canceled before executing.
394
- * Canceled timers usually sort after most existing timers.
395
-
396
- So, if we are able to delete an item without searching for it, by keeping a map
397
- of positions within the heap, most timers can be inserted and deleted in O(1)
398
- time. Canceling a non-leaf timer can be further optimized by marking it as
399
- canceled without immediately removing it from the heap. If the timer is
400
- rescheduled before we garbage collect, adjusting its position will usually be
401
- faster than a delete and re-insert.
316
+ ## Analysis
317
+
318
+ ### Time complexity
319
+
320
+ There are two fundamental heap operations: sift-up (used by push) and sift-down
321
+ (used by pop).
322
+
323
+ * A _d_-ary heap will have `log n / log d` layers, so both sift operations can
324
+ perform as many as `log n / log d` writes, when a member sifts the entire
325
+ length of the tree.
326
+ * Sift-up makes one comparison per layer, so push runs in `O(log n / log d)`.
327
+ * Sift-down makes d comparions per layer, so pop runs in `O(d log n / log d)`.
328
+
329
+ So, in the simplest case of running balanced push/pop while maintaining the same
330
+ heap size, `(1 + d) log n / log d` comparisons are made. In the worst case,
331
+ when every sift traverses every layer of the tree, `d=4` requires the fewest
332
+ comparisons for combined insert and delete:
333
+
334
+ * (1 + 2) lg n / lg d ≈ 4.328085 lg n
335
+ * (1 + 3) lg n / lg d ≈ 3.640957 lg n
336
+ * (1 + 4) lg n / lg d ≈ 3.606738 lg n
337
+ * (1 + 5) lg n / lg d ≈ 3.728010 lg n
338
+ * (1 + 6) lg n / lg d ≈ 3.906774 lg n
339
+ * (1 + 7) lg n / lg d ≈ 4.111187 lg n
340
+ * (1 + 8) lg n / lg d ≈ 4.328085 lg n
341
+ * (1 + 9) lg n / lg d 4.551196 lg n
342
+ * (1 + 10) lg n / lg d ≈ 4.777239 lg n
343
+ * etc...
344
+
345
+ See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
346
+
347
+ ### Space complexity
348
+
349
+ Space usage is linear, regardless of d. However higher d values may
350
+ provide better cache locality. Because the heap is a complete binary tree, the
351
+ elements can be stored in an array, without the need for tree or list pointers.
352
+
353
+ Ruby can compare Numeric values _much_ faster than other ruby objects, even if
354
+ those objects simply delegate comparison to internal Numeric values. And it is
355
+ often useful to use external scores for otherwise uncomparable values. So
356
+ `DHeap` uses twice as many entries (one for score and one for value)
357
+ as an array which only stores values.
358
+
359
+ ## Thread safety
360
+
361
+ `DHeap` is _not_ thread-safe, so concurrent access from multiple threads need to
362
+ take precautions such as locking access behind a mutex.
402
363
 
403
364
  ## Alternative data structures
404
365
 
405
366
  As always, you should run benchmarks with your expected scenarios to determine
406
- which is right.
367
+ which is best for your application.
407
368
 
408
- Depending on what you're doing, maintaining a sorted `Array` using
409
- `#bsearch_index` and `#insert` might be just fine! As discussed above, although
410
- it is `O(n)` for insertions, `memcpy` is so fast on modern hardware that this
411
- may not matter. Also, if you can arrange for insertions to occur near the end
412
- of the array, that could significantly reduce the `memcpy` overhead even more.
369
+ Depending on your use-case, maintaining a sorted `Array` using `#bsearch_index`
370
+ and `#insert` might be just fine! Even `min` plus `delete` with an unsorted
371
+ array can be very fast on small queues. Although insertions run with `O(n)`,
372
+ `memcpy` is so fast on modern hardware that your dataset might not be large
373
+ enough for it to matter.
413
374
 
414
- More complex heap varients, e.g. [Fibonacci heap], can allow heaps to be merged
415
- as well as lower amortized time.
375
+ More complex heap varients, e.g. [Fibonacci heap], allow heaps to be split and
376
+ merged which gives some graph algorithms a lower amortized time complexity. But
377
+ in practice, _d_-ary heaps have much lower overhead and often run faster.
416
378
 
417
379
  [Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
418
380
 
@@ -432,25 +394,17 @@ complex than a heap, it may be necessary for enormous values of N.
432
394
 
433
395
  ## TODOs...
434
396
 
435
- _TODO:_ Also ~~included is~~ _will include_ `DHeap::Set`, which augments the
436
- basic heap with an internal `Hash`, which maps a set of values to scores.
437
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
438
- and adjustments, to achieve faster average runtime for *add* and *cancel*
439
- operations.
397
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Map`, which augments the
398
+ basic heap with an internal `Hash`, which maps objects to their position in the
399
+ heap. This enforces a uniqueness constraint on items on the heap, and also
400
+ allows items to be more efficiently deleted or adjusted. However maintaining
401
+ the hash does lead to a small drop in normal `#push` and `#pop` performance.
440
402
 
441
403
  _TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
442
404
  features that are loosely inspired by go's timers. e.g: It lazily sifts its
443
405
  heap after deletion and adjustments, to achieve faster average runtime for *add*
444
406
  and *cancel* operations.
445
407
 
446
- Additionally, I was inspired by reading go's "timer.go" implementation to
447
- experiment with a 4-ary heap instead of the traditional binary heap. In the
448
- case of timers, new timers are usually scheduled to run after most of the
449
- existing timers. And timers are usually canceled before they have a chance to
450
- run. While a binary heap holds 50% of its elements in its last layer, 75% of a
451
- 4-ary heap will have no children. That diminishes the extra comparison overhead
452
- during sift-down.
453
-
454
408
  ## Development
455
409
 
456
410
  After checking out the repo, run `bin/setup` to install dependencies. Then, run