d_heap 0.5.0 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.github/workflows/main.yml +2 -2
- data/.gitignore +1 -0
- data/.rubocop.yml +1 -1
- data/.yardopts +10 -0
- data/CHANGELOG.md +19 -6
- data/Gemfile +4 -0
- data/Gemfile.lock +10 -1
- data/N +7 -0
- data/README.md +185 -231
- data/benchmarks/push_n.yml +10 -6
- data/benchmarks/push_n_pop_n.yml +27 -10
- data/benchmarks/push_pop.yml +5 -0
- data/bin/bench_charts +13 -0
- data/d_heap.gemspec +1 -1
- data/ext/d_heap/d_heap.c +435 -140
- data/ext/d_heap/extconf.rb +3 -4
- data/images/push_n.png +0 -0
- data/images/push_n_pop_n.png +0 -0
- data/images/push_pop.png +0 -0
- data/images/wikipedia-min-heap.png +0 -0
- data/lib/benchmark_driver/runner/ips_zero_fail.rb +89 -51
- data/lib/d_heap.rb +81 -18
- data/lib/d_heap/benchmarks/implementations.rb +30 -28
- data/lib/d_heap/benchmarks/rspec_matchers.rb +29 -51
- data/lib/d_heap/version.rb +1 -1
- metadata +10 -4
- data/ext/d_heap/d_heap.h +0 -50
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 35213e5ac430b07cf2b43a7f065ff5c409506022835a5326cb2bfa25daa7f210
|
4
|
+
data.tar.gz: e87a64fb9fd6eb8bdd281d8fe289b7f4993f4bde9a671b0b414aca194b724691
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 77518eb11bf8dd5fa8a29ad88f48c30650ff375ae9b74001fa59d30f634feddf5c4f3ef8d791dc4064afc1d9d68b9b463eaf0e778e927655e0da0c0a9da6fdee
|
7
|
+
data.tar.gz: 2911b20a882d8b6f577bda9a388a9a755a093cfd3c1aaaaa355aa5be97ffe506935620043adb115158565ead6941e52f789cdd714e4932aff4916412c60a3aee
|
data/.github/workflows/main.yml
CHANGED
@@ -1,4 +1,4 @@
|
|
1
|
-
name:
|
1
|
+
name: CI
|
2
2
|
|
3
3
|
on: [push,pull_request]
|
4
4
|
|
@@ -7,7 +7,7 @@ jobs:
|
|
7
7
|
strategy:
|
8
8
|
fail-fast: false
|
9
9
|
matrix:
|
10
|
-
ruby: [2.5, 2.6, 2.7, 3.0]
|
10
|
+
ruby: [2.4, 2.5, 2.6, 2.7, 3.0]
|
11
11
|
os: [ubuntu, macos]
|
12
12
|
experimental: [false]
|
13
13
|
runs-on: ${{ matrix.os }}-latest
|
data/.gitignore
CHANGED
data/.rubocop.yml
CHANGED
data/.yardopts
ADDED
data/CHANGELOG.md
CHANGED
@@ -1,5 +1,17 @@
|
|
1
1
|
## Current/Unreleased
|
2
2
|
|
3
|
+
## Release v0.6.0 (2021-01-24)
|
4
|
+
|
5
|
+
* 🔥 **Breaking**: `#initialize` uses a keyword argument for `d`
|
6
|
+
* ✨ Added `#initialize(capacity: capa)` to set initial capacity.
|
7
|
+
* ✨ Added `peek_with_score` and `peek_score`
|
8
|
+
* ✨ Added `pop_with_score` and `each_pop(with_score: true)`
|
9
|
+
* ✨ Added `pop_all_below(max_score, array = [])`
|
10
|
+
* ✨ Added aliases for `shift` and `next`
|
11
|
+
* 📈 Added benchmark charts to README, and `bin/bench_charts` to generate them.
|
12
|
+
* requires `gruff` which requires `rmagick` which requires `imagemagick`
|
13
|
+
* 📝 Many documentation updates and fixes.
|
14
|
+
|
3
15
|
## Release v0.5.0 (2021-01-17)
|
4
16
|
|
5
17
|
* 🔥 **Breaking**: reversed order of `#push` arguments to `value, score`.
|
@@ -8,19 +20,20 @@
|
|
8
20
|
* ✨ Added aliases for `deq`, `enq`, `first`, `pop_below`, `length`, and
|
9
21
|
`count`, to mimic other classes in ruby's stdlib.
|
10
22
|
* ⚡️♻️ More performance improvements:
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
23
|
+
* Created an `ENTRY` struct and store both the score and the value pointer in
|
24
|
+
the same `ENTRY *entries` array.
|
25
|
+
* Reduced unnecessary allocations or copies in both sift loops. A similar
|
26
|
+
refactoring also sped up the pure ruby benchmark implementation.
|
27
|
+
* Compiling with `-O3`.
|
16
28
|
* 📝 Updated (and in some cases, fixed) yardoc
|
17
29
|
* ♻️ Moved aliases and less performance sensitive code into ruby.
|
18
30
|
* ♻️ DRY up push/insert methods
|
19
31
|
|
20
32
|
## Release v0.4.0 (2021-01-12)
|
21
33
|
|
34
|
+
* 🔥 **Breaking**: Scores must be `Integer` or convertable to `Float`
|
35
|
+
* ⚠️ `Integer` scores must fit in `-ULONG_LONG_MAX` to `+ULONG_LONG_MAX`.
|
22
36
|
* ⚡️ Big performance improvements, by using C `long double *cscores` array
|
23
|
-
* ⚡️ Scores must be `Integer` in `-uint64..+uint64`, or convertable to `Float`
|
24
37
|
* ⚡️ many many (so many) updates to benchmarks
|
25
38
|
* ✨ Added `DHeap#clear`
|
26
39
|
* 🐛 Fixed `DHeap#initialize_copy` and `#freeze`
|
data/Gemfile
CHANGED
data/Gemfile.lock
CHANGED
@@ -1,15 +1,22 @@
|
|
1
1
|
PATH
|
2
2
|
remote: .
|
3
3
|
specs:
|
4
|
-
d_heap (0.
|
4
|
+
d_heap (0.6.0)
|
5
5
|
|
6
6
|
GEM
|
7
7
|
remote: https://rubygems.org/
|
8
8
|
specs:
|
9
9
|
ast (2.4.1)
|
10
10
|
benchmark_driver (0.15.16)
|
11
|
+
benchmark_driver-output-gruff (0.3.1)
|
12
|
+
benchmark_driver (>= 0.12.0)
|
13
|
+
gruff
|
11
14
|
coderay (1.1.3)
|
12
15
|
diff-lcs (1.4.4)
|
16
|
+
gruff (0.12.1)
|
17
|
+
histogram
|
18
|
+
rmagick
|
19
|
+
histogram (0.2.4.1)
|
13
20
|
method_source (1.0.0)
|
14
21
|
parallel (1.19.2)
|
15
22
|
parser (2.7.2.0)
|
@@ -25,6 +32,7 @@ GEM
|
|
25
32
|
rake
|
26
33
|
regexp_parser (1.8.2)
|
27
34
|
rexml (3.2.3)
|
35
|
+
rmagick (4.1.2)
|
28
36
|
rspec (3.10.0)
|
29
37
|
rspec-core (~> 3.10.0)
|
30
38
|
rspec-expectations (~> 3.10.0)
|
@@ -59,6 +67,7 @@ PLATFORMS
|
|
59
67
|
|
60
68
|
DEPENDENCIES
|
61
69
|
benchmark_driver
|
70
|
+
benchmark_driver-output-gruff
|
62
71
|
d_heap!
|
63
72
|
perf
|
64
73
|
priority_queue_cxx
|
data/N
ADDED
data/README.md
CHANGED
@@ -1,8 +1,21 @@
|
|
1
|
-
# DHeap
|
1
|
+
# DHeap - Fast d-ary heap for ruby
|
2
|
+
|
3
|
+
[![Gem Version](https://badge.fury.io/rb/d_heap.svg)](https://badge.fury.io/rb/d_heap)
|
4
|
+
[![Build Status](https://github.com/nevans/d_heap/workflows/CI/badge.svg)](https://github.com/nevans/d_heap/actions?query=workflow%3ACI)
|
5
|
+
[![Maintainability](https://api.codeclimate.com/v1/badges/ff274acd0683c99c03e1/maintainability)](https://codeclimate.com/github/nevans/d_heap/maintainability)
|
2
6
|
|
3
7
|
A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
|
4
8
|
implemented as a C extension.
|
5
9
|
|
10
|
+
From [wikipedia](https://en.wikipedia.org/wiki/Heap_(data_structure)):
|
11
|
+
> A heap is a specialized tree-based data structure which is essentially an
|
12
|
+
> almost complete tree that satisfies the heap property: in a min heap, for any
|
13
|
+
> given node C, if P is a parent node of C, then the key (the value) of P is
|
14
|
+
> less than or equal to the key of C. The node at the "top" of the heap (with no
|
15
|
+
> parents) is called the root node.
|
16
|
+
|
17
|
+
![tree representation of a min heap](images/wikipedia-min-heap.png)
|
18
|
+
|
6
19
|
With a regular queue, you expect "FIFO" behavior: first in, first out. With a
|
7
20
|
stack you expect "LIFO": last in first out. A priority queue has a score for
|
8
21
|
each element and elements are popped in order by score. Priority queues are
|
@@ -13,14 +26,16 @@ management, for [Huffman coding], and various graph search algorithms such as
|
|
13
26
|
The _d_-ary heap data structure is a generalization of the [binary heap], in
|
14
27
|
which the nodes have _d_ children instead of 2. This allows for "insert" and
|
15
28
|
"decrease priority" operations to be performed more quickly with the tradeoff of
|
16
|
-
slower delete minimum. Additionally, _d_-ary heaps can
|
17
|
-
behavior than binary heaps, allowing them to run more
|
18
|
-
despite slower worst-case time complexity. In the worst
|
19
|
-
requires only `O(log n / log d)` operations to push, with
|
20
|
-
requires `O(d log n / log d)`.
|
29
|
+
slower delete minimum or "increase priority". Additionally, _d_-ary heaps can
|
30
|
+
have better memory cache behavior than binary heaps, allowing them to run more
|
31
|
+
quickly in practice despite slower worst-case time complexity. In the worst
|
32
|
+
case, a _d_-ary heap requires only `O(log n / log d)` operations to push, with
|
33
|
+
the tradeoff that pop requires `O(d log n / log d)`.
|
21
34
|
|
22
35
|
Although you should probably just use the default _d_ value of `4` (see the
|
23
|
-
analysis below), it's always advisable to benchmark your specific use-case.
|
36
|
+
analysis below), it's always advisable to benchmark your specific use-case. In
|
37
|
+
particular, if you push items more than you pop, higher values for _d_ can give
|
38
|
+
a faster total runtime.
|
24
39
|
|
25
40
|
[d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
|
26
41
|
[priority queue]: https://en.wikipedia.org/wiki/Priority_queue
|
@@ -33,26 +48,43 @@ analysis below), it's always advisable to benchmark your specific use-case.
|
|
33
48
|
|
34
49
|
## Usage
|
35
50
|
|
36
|
-
|
51
|
+
The basic API is `#push(object, score)` and `#pop`. Please read the
|
52
|
+
[gem documentation] for more details and other methods.
|
53
|
+
|
54
|
+
Quick reference for some common methods:
|
37
55
|
|
38
56
|
* `heap << object` adds a value, with `Float(object)` as its score.
|
39
57
|
* `heap.push(object, score)` adds a value with an extrinsic score.
|
40
58
|
* `heap.pop` removes and returns the value with the minimum score.
|
41
|
-
* `heap.pop_lte(
|
59
|
+
* `heap.pop_lte(max_score)` pops only if the next score is `<=` the argument.
|
42
60
|
* `heap.peek` to view the minimum value without popping it.
|
43
61
|
* `heap.clear` to remove all items from the heap.
|
44
62
|
* `heap.empty?` returns true if the heap is empty.
|
45
63
|
* `heap.size` returns the number of items in the heap.
|
46
64
|
|
47
|
-
|
48
|
-
|
49
|
-
object is still in the heap, it will not be re-evaluated again. The score must
|
50
|
-
either be `Integer` or `Float` or convertable to a `Float` via `Float(score)`
|
51
|
-
(i.e. it should implement `#to_f`).
|
65
|
+
If the score changes while the object is still in the heap, it will not be
|
66
|
+
re-evaluated again.
|
52
67
|
|
53
|
-
|
54
|
-
|
68
|
+
The score must either be `Integer` or `Float` or convertable to a `Float` via
|
69
|
+
`Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
|
70
|
+
numeric values gives more than 50% speedup under some benchmarks! _n.b._
|
71
|
+
`Integer` _scores must have an absolute value that fits into_ `unsigned long
|
72
|
+
long`. This is compiler and architecture dependant but with gcc on an IA-64
|
73
|
+
system it's 64 bits, which gives a range of -18,446,744,073,709,551,615 to
|
74
|
+
+18,446,744,073,709,551,615, which is more than enough to store e.g. POSIX time
|
75
|
+
in nanoseconds.
|
55
76
|
|
77
|
+
_Comparing arbitary objects via_ `a <=> b` _was the original design and may be
|
78
|
+
added back in a future version,_ if (and only if) _it can be done without
|
79
|
+
impacting the speed of numeric comparisons. The speedup from this constraint is
|
80
|
+
huge!_
|
81
|
+
|
82
|
+
[gem documentation]: https://rubydoc.info/gems/d_heap/DHeap
|
83
|
+
|
84
|
+
### Examples
|
85
|
+
|
86
|
+
```ruby
|
87
|
+
# create some example objects to place in our heap
|
56
88
|
Task = Struct.new(:id, :time) do
|
57
89
|
def to_f; time.to_f end
|
58
90
|
end
|
@@ -61,72 +93,42 @@ t2 = Task.new(2, Time.now + 50)
|
|
61
93
|
t3 = Task.new(3, Time.now + 60)
|
62
94
|
t4 = Task.new(4, Time.now + 5)
|
63
95
|
|
64
|
-
#
|
65
|
-
|
96
|
+
# create the heap
|
97
|
+
require "d_heap"
|
98
|
+
heap = DHeap.new
|
99
|
+
|
100
|
+
# push with an explicit score (which might be extrinsic to the value)
|
101
|
+
heap.push t1, t1.to_f
|
102
|
+
|
103
|
+
# the score will be implicitly cast with Float, so any object with #to_f
|
104
|
+
heap.push t2, t2
|
66
105
|
|
67
|
-
#
|
68
|
-
heap
|
69
|
-
heap.push t4, t4 # score can be implicitly cast with Float
|
106
|
+
# if the object has an intrinsic score via #to_f, "<<" is the simplest API
|
107
|
+
heap << t3 << t4
|
70
108
|
|
71
|
-
#
|
109
|
+
# pop returns the lowest scored item, and removes it from the heap
|
72
110
|
heap.pop # => #<struct Task id=4, time=2021-01-17 17:02:22.5574 -0500>
|
73
111
|
heap.pop # => #<struct Task id=2, time=2021-01-17 17:03:07.5574 -0500>
|
112
|
+
|
113
|
+
# peek returns the lowest scored item, without removing it from the heap
|
74
114
|
heap.peek # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
|
75
115
|
heap.pop # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
|
76
|
-
heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
|
77
|
-
heap.empty? # => true
|
78
|
-
heap.pop # => nil
|
79
|
-
```
|
80
116
|
|
81
|
-
|
82
|
-
|
83
|
-
into_ `unsigned long long`. _This is architecture dependant but on an IA-64
|
84
|
-
system this is 64 bits, which gives a range of -18,446,744,073,709,551,615 to
|
85
|
-
+18446744073709551615. Comparing arbitary objects via_ `a <=> b` _was the
|
86
|
-
original design and may be added back in a future version,_ if (and only if) _it
|
87
|
-
can be done without impacting the speed of numeric comparisons._
|
117
|
+
# pop_lte handles the common "h.pop if h.peek_score < max" pattern
|
118
|
+
heap.pop_lte(Time.now + 65) # => nil
|
88
119
|
|
89
|
-
|
90
|
-
heap.
|
91
|
-
|
92
|
-
#
|
93
|
-
# "a <=> b" is *much* slower than comparing numbers, so it isn't used.
|
94
|
-
class Event
|
95
|
-
include Comparable
|
96
|
-
attr_reader :time, :payload
|
97
|
-
alias_method :to_time, :time
|
98
|
-
|
99
|
-
def initialize(time, payload)
|
100
|
-
@time = time.to_time
|
101
|
-
@payload = payload
|
102
|
-
freeze
|
103
|
-
end
|
104
|
-
|
105
|
-
def to_f
|
106
|
-
time.to_f
|
107
|
-
end
|
108
|
-
|
109
|
-
def <=>(other)
|
110
|
-
to_f <=> other.to_f
|
111
|
-
end
|
112
|
-
end
|
113
|
-
|
114
|
-
heap << comparable_max # sorts last, using <=>
|
115
|
-
heap << comparable_min # sorts first, using <=>
|
116
|
-
heap << comparable_mid # sorts in the middle, using <=>
|
117
|
-
heap.pop # => comparable_min
|
118
|
-
heap.pop # => comparable_mid
|
119
|
-
heap.pop # => comparable_max
|
120
|
+
# the heap size can be inspected with size and empty?
|
121
|
+
heap.empty? # => false
|
122
|
+
heap.size # => 1
|
123
|
+
heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
|
120
124
|
heap.empty? # => true
|
125
|
+
heap.size # => 0
|
126
|
+
|
127
|
+
# popping from an empty heap returns nil
|
121
128
|
heap.pop # => nil
|
122
129
|
```
|
123
130
|
|
124
|
-
|
125
|
-
score is less than or equal to `max`.
|
126
|
-
|
127
|
-
Read the [API documentation] for more detailed documentation and examples.
|
128
|
-
|
129
|
-
[API documentation]: https://rubydoc.info/gems/d_heap/DHeap
|
131
|
+
Please see the [gem documentation] for more methods and more examples.
|
130
132
|
|
131
133
|
## Installation
|
132
134
|
|
@@ -153,104 +155,74 @@ for insert is `O(n)` because it may need to `memcpy` a significant portion of
|
|
153
155
|
the array.
|
154
156
|
|
155
157
|
The standard way to implement a priority queue is with a binary heap. Although
|
156
|
-
this increases the time for `pop
|
157
|
-
|
158
|
-
|
159
|
-
|
160
|
-
heap implementation was much slower than inserting into and popping from a fully
|
161
|
-
sorted array. The reasons for this surprising result: Although it is `O(n)`,
|
162
|
-
`memcpy` has a _very_ small constant factor, and calling `<=>` from ruby code
|
163
|
-
has relatively _much_ larger constant factors. If your queue contains only a
|
164
|
-
few thousand items, the overhead of those extra calls to `<=>` is _far_ more
|
165
|
-
than occasionally calling `memcpy`. In the worst case, a _d_-heap will require
|
166
|
-
`d + 1` times more comparisons for each push + pop than a `bsearch` + `insert`
|
167
|
-
sorted array.
|
168
|
-
|
169
|
-
Moving the sift-up and sift-down code into C helps some. But much more helpful
|
170
|
-
is optimizing the comparison of numeric scores, so `a <=> b` never needs to be
|
171
|
-
called. I'm hopeful that MJIT will eventually obsolete this C-extension. This
|
172
|
-
can be hotspot code, and a basic ruby implementation could perform well if `<=>`
|
173
|
-
had much lower overhead.
|
158
|
+
this increases the time complexity for `pop` alone, it reduces the combined time
|
159
|
+
compexity for the combined `push` + `pop`. Using a d-ary heap with d > 2
|
160
|
+
makes the tree shorter but broader, which reduces to `O(log n / log d)` while
|
161
|
+
increasing the comparisons needed by sift-down to `O(d log n/ log d)`.
|
174
162
|
|
175
|
-
|
163
|
+
However, I was disappointed when my best ruby heap implementation ran much more
|
164
|
+
slowly than the naive approach—even for heaps containing ten thousand items.
|
165
|
+
Although it _is_ `O(n)`, `memcpy` is _very_ fast, while calling `<=>` from ruby
|
166
|
+
has _much_ higher overhead. And a _d_-heap needs `d + 1` times more comparisons
|
167
|
+
for each push + pop than `bsearch` + `insert`.
|
176
168
|
|
177
|
-
|
169
|
+
Additionally, when researching how other systems handle their scheduling, I was
|
170
|
+
inspired by reading go's "timer.go" implementation to experiment with a 4-ary
|
171
|
+
heap instead of the traditional binary heap.
|
178
172
|
|
179
|
-
|
180
|
-
(used by pop).
|
173
|
+
## Benchmarks
|
181
174
|
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
So pushing a new element is `O(log n / log d)`.
|
186
|
-
* Swap down performs as many as d comparions per swap: `O(d)`.
|
187
|
-
So popping the min element is `O(d log n / log d)`.
|
188
|
-
|
189
|
-
Assuming every inserted element is eventually deleted from the root, d=4
|
190
|
-
requires the fewest comparisons for combined insert and delete:
|
191
|
-
|
192
|
-
* (1 + 2) lg 2 = 4.328085
|
193
|
-
* (1 + 3) lg 3 = 3.640957
|
194
|
-
* (1 + 4) lg 4 = 3.606738
|
195
|
-
* (1 + 5) lg 5 = 3.728010
|
196
|
-
* (1 + 6) lg 6 = 3.906774
|
197
|
-
* etc...
|
175
|
+
_See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
|
176
|
+
`docs/profile.txt` for much more detail or updated results. These benchmarks
|
177
|
+
were measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
|
198
178
|
|
199
|
-
|
200
|
-
|
179
|
+
These benchmarks use very simple implementations for a pure-ruby heap and an
|
180
|
+
array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
|
181
|
+
comparison, I also compare to the [priority_queue_cxx gem] which uses the [C++
|
182
|
+
STL priority_queue], and another naive implementation that uses `Array#min` and
|
183
|
+
`Array#delete_at` with an unsorted array.
|
201
184
|
|
202
|
-
|
203
|
-
|
204
|
-
|
205
|
-
|
185
|
+
In these benchmarks, `DHeap` runs faster than all other implementations for
|
186
|
+
every scenario and every value of N, although the difference is usually more
|
187
|
+
noticable at higher values of N. The pure ruby heap implementation is
|
188
|
+
competitive for `push` alone at every value of N, but is significantly slower
|
189
|
+
than bsearch + insert for push + pop, until N is _very_ large (somewhere between
|
190
|
+
10k and 100k)!
|
206
191
|
|
207
|
-
|
192
|
+
[priority_queue_cxx gem]: https://rubygems.org/gems/priority_queue_cxx
|
193
|
+
[C++ STL priority_queue]: http://www.cplusplus.com/reference/queue/priority_queue/
|
208
194
|
|
209
|
-
|
195
|
+
Three different scenarios are measured:
|
210
196
|
|
211
|
-
|
212
|
-
provide better cache locality. Because the heap is a complete binary tree, the
|
213
|
-
elements can be stored in an array, without the need for tree or list pointers.
|
197
|
+
### push N items onto an empty heap
|
214
198
|
|
215
|
-
|
216
|
-
those objects simply delegate comparison to internal Numeric values. And it is
|
217
|
-
often useful to use external scores for otherwise uncomparable values. So
|
218
|
-
`DHeap` uses twice as many entries (one for score and one for value)
|
219
|
-
as an array which only stores values.
|
199
|
+
...but never pop (clearing between each set of pushes).
|
220
200
|
|
221
|
-
|
201
|
+
![bar graph for push_n_pop_n benchmarks](./images/push_n.png)
|
222
202
|
|
223
|
-
|
224
|
-
`docs/profile.txt` for more details or updated results. These benchmarks were
|
225
|
-
measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
|
203
|
+
### push N items onto an empty heap then pop all N
|
226
204
|
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
also shown.
|
205
|
+
Although this could be used for heap sort, we're unlikely to choose heap sort
|
206
|
+
over Ruby's quick sort implementation. I'm using this scenario to represent
|
207
|
+
the amortized cost of creating a heap and (eventually) draining it.
|
231
208
|
|
232
|
-
|
233
|
-
* push N values but never pop (clearing between each set of pushes).
|
234
|
-
* push N values and then pop N values.
|
235
|
-
Although this could be used for heap sort, we're unlikely to choose heap sort
|
236
|
-
over Ruby's quick sort implementation. I'm using this scenario to represent
|
237
|
-
the amortized cost of creating a heap and (eventually) draining it.
|
238
|
-
* For a heap of size N, repeatedly push and pop while keeping a stable size.
|
239
|
-
This is a _very simple_ approximation for how most scheduler/timer heaps
|
240
|
-
would be used. Usually when a timer fires it will be quickly replaced by a
|
241
|
-
new timer, and the overall count of timers will remain roughly stable.
|
209
|
+
![bar graph for push_n_pop_n benchmarks](./images/push_n_pop_n.png)
|
242
210
|
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
|
211
|
+
### push and pop on a heap with N values
|
212
|
+
|
213
|
+
Repeatedly push and pop while keeping a stable heap size. This is a _very
|
214
|
+
simplistic_ approximation for how most scheduler/timer heaps might be used.
|
215
|
+
Usually when a timer fires it will be quickly replaced by a new timer, and the
|
216
|
+
overall count of timers will remain roughly stable.
|
217
|
+
|
218
|
+
![bar graph for push_pop benchmarks](./images/push_pop.png)
|
249
219
|
|
250
|
-
|
251
|
-
|
252
|
-
|
253
|
-
|
220
|
+
### numbers
|
221
|
+
|
222
|
+
Even for very small N values the benchmark implementations, `DHeap` runs faster
|
223
|
+
than the other implementations for each scenario, although the difference is
|
224
|
+
still relatively small. The pure ruby binary heap is 2x or more slower than
|
225
|
+
bsearch + insert for common push/pop scenario.
|
254
226
|
|
255
227
|
== push N (N=5) ==========================================================
|
256
228
|
push N (c_dheap): 1969700.7 i/s
|
@@ -341,78 +313,68 @@ the linear time compexity to keep a sorted array dominates.
|
|
341
313
|
queue size = 5000000: 2664897.7 i/s - 2.74x slower
|
342
314
|
queue size = 10000000: 2137927.6 i/s - 3.42x slower
|
343
315
|
|
344
|
-
##
|
345
|
-
|
346
|
-
|
347
|
-
|
348
|
-
|
349
|
-
|
350
|
-
|
351
|
-
|
352
|
-
|
353
|
-
|
354
|
-
|
355
|
-
|
356
|
-
|
357
|
-
|
358
|
-
|
359
|
-
|
360
|
-
|
361
|
-
|
362
|
-
|
363
|
-
|
364
|
-
|
365
|
-
|
366
|
-
|
367
|
-
|
368
|
-
|
369
|
-
|
370
|
-
|
371
|
-
|
372
|
-
|
373
|
-
|
374
|
-
|
375
|
-
|
376
|
-
|
377
|
-
|
378
|
-
|
379
|
-
|
380
|
-
|
381
|
-
|
382
|
-
|
383
|
-
|
384
|
-
|
385
|
-
|
386
|
-
|
387
|
-
|
388
|
-
|
389
|
-
|
390
|
-
|
391
|
-
Additionally, when used to sort timers, we can reasonably assume that:
|
392
|
-
* New timers usually sort after most existing timers.
|
393
|
-
* Most timers will be canceled before executing.
|
394
|
-
* Canceled timers usually sort after most existing timers.
|
395
|
-
|
396
|
-
So, if we are able to delete an item without searching for it, by keeping a map
|
397
|
-
of positions within the heap, most timers can be inserted and deleted in O(1)
|
398
|
-
time. Canceling a non-leaf timer can be further optimized by marking it as
|
399
|
-
canceled without immediately removing it from the heap. If the timer is
|
400
|
-
rescheduled before we garbage collect, adjusting its position will usually be
|
401
|
-
faster than a delete and re-insert.
|
316
|
+
## Analysis
|
317
|
+
|
318
|
+
### Time complexity
|
319
|
+
|
320
|
+
There are two fundamental heap operations: sift-up (used by push) and sift-down
|
321
|
+
(used by pop).
|
322
|
+
|
323
|
+
* A _d_-ary heap will have `log n / log d` layers, so both sift operations can
|
324
|
+
perform as many as `log n / log d` writes, when a member sifts the entire
|
325
|
+
length of the tree.
|
326
|
+
* Sift-up makes one comparison per layer, so push runs in `O(log n / log d)`.
|
327
|
+
* Sift-down makes d comparions per layer, so pop runs in `O(d log n / log d)`.
|
328
|
+
|
329
|
+
So, in the simplest case of running balanced push/pop while maintaining the same
|
330
|
+
heap size, `(1 + d) log n / log d` comparisons are made. In the worst case,
|
331
|
+
when every sift traverses every layer of the tree, `d=4` requires the fewest
|
332
|
+
comparisons for combined insert and delete:
|
333
|
+
|
334
|
+
* (1 + 2) lg n / lg d ≈ 4.328085 lg n
|
335
|
+
* (1 + 3) lg n / lg d ≈ 3.640957 lg n
|
336
|
+
* (1 + 4) lg n / lg d ≈ 3.606738 lg n
|
337
|
+
* (1 + 5) lg n / lg d ≈ 3.728010 lg n
|
338
|
+
* (1 + 6) lg n / lg d ≈ 3.906774 lg n
|
339
|
+
* (1 + 7) lg n / lg d ≈ 4.111187 lg n
|
340
|
+
* (1 + 8) lg n / lg d ≈ 4.328085 lg n
|
341
|
+
* (1 + 9) lg n / lg d ≈ 4.551196 lg n
|
342
|
+
* (1 + 10) lg n / lg d ≈ 4.777239 lg n
|
343
|
+
* etc...
|
344
|
+
|
345
|
+
See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
|
346
|
+
|
347
|
+
### Space complexity
|
348
|
+
|
349
|
+
Space usage is linear, regardless of d. However higher d values may
|
350
|
+
provide better cache locality. Because the heap is a complete binary tree, the
|
351
|
+
elements can be stored in an array, without the need for tree or list pointers.
|
352
|
+
|
353
|
+
Ruby can compare Numeric values _much_ faster than other ruby objects, even if
|
354
|
+
those objects simply delegate comparison to internal Numeric values. And it is
|
355
|
+
often useful to use external scores for otherwise uncomparable values. So
|
356
|
+
`DHeap` uses twice as many entries (one for score and one for value)
|
357
|
+
as an array which only stores values.
|
358
|
+
|
359
|
+
## Thread safety
|
360
|
+
|
361
|
+
`DHeap` is _not_ thread-safe, so concurrent access from multiple threads need to
|
362
|
+
take precautions such as locking access behind a mutex.
|
402
363
|
|
403
364
|
## Alternative data structures
|
404
365
|
|
405
366
|
As always, you should run benchmarks with your expected scenarios to determine
|
406
|
-
which is
|
367
|
+
which is best for your application.
|
407
368
|
|
408
|
-
Depending on
|
409
|
-
|
410
|
-
|
411
|
-
|
412
|
-
|
369
|
+
Depending on your use-case, maintaining a sorted `Array` using `#bsearch_index`
|
370
|
+
and `#insert` might be just fine! Even `min` plus `delete` with an unsorted
|
371
|
+
array can be very fast on small queues. Although insertions run with `O(n)`,
|
372
|
+
`memcpy` is so fast on modern hardware that your dataset might not be large
|
373
|
+
enough for it to matter.
|
413
374
|
|
414
|
-
More complex heap varients, e.g. [Fibonacci heap],
|
415
|
-
|
375
|
+
More complex heap varients, e.g. [Fibonacci heap], allow heaps to be split and
|
376
|
+
merged which gives some graph algorithms a lower amortized time complexity. But
|
377
|
+
in practice, _d_-ary heaps have much lower overhead and often run faster.
|
416
378
|
|
417
379
|
[Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
|
418
380
|
|
@@ -432,25 +394,17 @@ complex than a heap, it may be necessary for enormous values of N.
|
|
432
394
|
|
433
395
|
## TODOs...
|
434
396
|
|
435
|
-
_TODO:_ Also ~~included is~~ _will include_ `DHeap::
|
436
|
-
basic heap with an internal `Hash`, which maps
|
437
|
-
|
438
|
-
|
439
|
-
|
397
|
+
_TODO:_ Also ~~included is~~ _will include_ `DHeap::Map`, which augments the
|
398
|
+
basic heap with an internal `Hash`, which maps objects to their position in the
|
399
|
+
heap. This enforces a uniqueness constraint on items on the heap, and also
|
400
|
+
allows items to be more efficiently deleted or adjusted. However maintaining
|
401
|
+
the hash does lead to a small drop in normal `#push` and `#pop` performance.
|
440
402
|
|
441
403
|
_TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
|
442
404
|
features that are loosely inspired by go's timers. e.g: It lazily sifts its
|
443
405
|
heap after deletion and adjustments, to achieve faster average runtime for *add*
|
444
406
|
and *cancel* operations.
|
445
407
|
|
446
|
-
Additionally, I was inspired by reading go's "timer.go" implementation to
|
447
|
-
experiment with a 4-ary heap instead of the traditional binary heap. In the
|
448
|
-
case of timers, new timers are usually scheduled to run after most of the
|
449
|
-
existing timers. And timers are usually canceled before they have a chance to
|
450
|
-
run. While a binary heap holds 50% of its elements in its last layer, 75% of a
|
451
|
-
4-ary heap will have no children. That diminishes the extra comparison overhead
|
452
|
-
during sift-down.
|
453
|
-
|
454
408
|
## Development
|
455
409
|
|
456
410
|
After checking out the repo, run `bin/setup` to install dependencies. Then, run
|