d_heap 0.2.2 → 0.6.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f549e01dd83eb6b48c1190443495a628ed1dd64ead7eb94851e661aef2607e14
4
- data.tar.gz: 492ce5c17ace9ecc9deaccf8fcd47883835da28d4e5f32328aef60989186b2ff
3
+ metadata.gz: 1ad095ff29343f83c8bbe6fd0bc7f4acd79fa9c298aa4f8d007acf02ebedba30
4
+ data.tar.gz: b2806a066a173a83d12259342c3f7d90900c83dc628063955d861f05acc98796
5
5
  SHA512:
6
- metadata.gz: 85521dee7f2a9992980935756571e87afbe8ae13347b5b3fbad17b501b5709111972b98ad0c9e1fca6d318c4be20ce2983086dfd84f7c0e73636ac9e4f11f253
7
- data.tar.gz: e1daac5b02fcc817b3c6c6a99395e3ca0b92f42bb14bd813fedbb3037eed698bda335c2898d5b0131b48fecd73e4ccf1615943a52287c36f73764b08bf8b1969
6
+ metadata.gz: 297aad8a8b4c7845fbea64808a2beaf4aa66b8431a23841c3d17952aaf85f41a3377c2dadc7651858e038adc69a35b2fe8e6ca484d45999f026efb41817e281b
7
+ data.tar.gz: 1e3f123c7f723c752b2e8326c70b4208188ad09c275574bd0cee3dc7a119c7e3f07173f4ad4ed32035d2103a10b1a979400dfa35bdc1dd55272b53bcc8eaa2b9
@@ -1,4 +1,4 @@
1
- name: Ruby
1
+ name: CI
2
2
 
3
3
  on: [push,pull_request]
4
4
 
@@ -7,7 +7,7 @@ jobs:
7
7
  strategy:
8
8
  fail-fast: false
9
9
  matrix:
10
- ruby: [2.5, 2.6, 2.7, 3.0]
10
+ ruby: [2.4, 2.5, 2.6, 2.7, 3.0]
11
11
  os: [ubuntu, macos]
12
12
  experimental: [false]
13
13
  runs-on: ${{ matrix.os }}-latest
data/.gitignore CHANGED
@@ -10,6 +10,7 @@
10
10
  *.so
11
11
  *.o
12
12
  *.a
13
+ compile_commands.json
13
14
  mkmf.log
14
15
 
15
16
  # rspec failure tracking
@@ -3,9 +3,10 @@ inherit_mode:
3
3
  - Exclude
4
4
 
5
5
  AllCops:
6
- TargetRubyVersion: 2.5
6
+ TargetRubyVersion: 2.4
7
7
  NewCops: disable
8
8
  Exclude:
9
+ - bin/benchmark-driver
9
10
  - bin/rake
10
11
  - bin/rspec
11
12
  - bin/rubocop
@@ -44,6 +45,7 @@ Layout/EmptyLineBetweenDefs:
44
45
  Layout/EmptyLinesAroundAttributeAccessor:
45
46
  inherit_mode:
46
47
  merge:
48
+ - Exclude
47
49
  - AllowedMethods
48
50
  Enabled: true
49
51
  AllowedMethods:
@@ -105,26 +107,49 @@ Naming/RescuedExceptionsVariableName: { Enabled: false }
105
107
  ###########################################################################
106
108
  # Matrics:
107
109
 
110
+ Metrics/CyclomaticComplexity:
111
+ Max: 10
112
+
108
113
  # Although it may be better to split specs into multiple files...?
109
114
  Metrics/BlockLength:
110
115
  Exclude:
111
116
  - "spec/**/*_spec.rb"
117
+ CountAsOne:
118
+ - array
119
+ - hash
120
+ - heredoc
121
+
122
+ Metrics/ClassLength:
123
+ Max: 200
124
+ CountAsOne:
125
+ - array
126
+ - hash
127
+ - heredoc
112
128
 
113
129
  ###########################################################################
114
130
  # Style...
115
131
 
116
132
  Style/AccessorGrouping: { Enabled: false }
117
133
  Style/AsciiComments: { Enabled: false } # 👮 can't stop our 🎉🥳🎊🥳!
134
+ Style/ClassAndModuleChildren: { Enabled: false }
118
135
  Style/EachWithObject: { Enabled: false }
119
136
  Style/FormatStringToken: { Enabled: false }
120
137
  Style/FloatDivision: { Enabled: false }
138
+ Style/IfUnlessModifier: { Enabled: false }
139
+ Style/IfWithSemicolon: { Enabled: false }
121
140
  Style/Lambda: { Enabled: false }
122
141
  Style/LineEndConcatenation: { Enabled: false }
123
142
  Style/MixinGrouping: { Enabled: false }
143
+ Style/MultilineBlockChain: { Enabled: false }
124
144
  Style/PerlBackrefs: { Enabled: false } # use occasionally/sparingly
125
145
  Style/RescueStandardError: { Enabled: false }
146
+ Style/Semicolon: { Enabled: false }
126
147
  Style/SingleLineMethods: { Enabled: false }
127
148
  Style/StabbyLambdaParentheses: { Enabled: false }
149
+ Style/WhenThen : { Enabled: false }
150
+
151
+ # I require trailing commas elsewhere, but these are optional
152
+ Style/TrailingCommaInArguments: { Enabled: false }
128
153
 
129
154
  # If rubocop had an option to only enforce this on constants and literals (e.g.
130
155
  # strings, regexp, range), I'd agree.
@@ -139,8 +164,19 @@ Style/TernaryParentheses:
139
164
  Enabled: false
140
165
 
141
166
  Style/BlockDelimiters:
167
+ inherit_mode:
168
+ merge:
169
+ - Exclude
170
+ - ProceduralMethods
171
+ - IgnoredMethods
172
+ - FunctionalMethods
142
173
  EnforcedStyle: semantic
143
174
  AllowBracesOnProceduralOneLiners: true
175
+ IgnoredMethods:
176
+ - expect # rspec
177
+ - profile # ruby-prof
178
+ - ips # benchmark-ips
179
+
144
180
 
145
181
  Style/FormatString:
146
182
  EnforcedStyle: percent
@@ -158,3 +194,6 @@ Style/TrailingCommaInHashLiteral:
158
194
 
159
195
  Style/TrailingCommaInArrayLiteral:
160
196
  EnforcedStyleForMultiline: consistent_comma
197
+
198
+ Style/YodaCondition:
199
+ EnforcedStyle: forbid_for_equality_operators_only
@@ -0,0 +1,10 @@
1
+ -o doc
2
+ --embed-mixins
3
+ --hide-void-return
4
+ --no-private
5
+ --asset images:images
6
+ --exclude lib/benchmark_driver
7
+ --exclude lib/d_heap/benchmarks*
8
+ -
9
+ CHANGELOG.md
10
+ CODE_OF_CONDUCT.md
@@ -0,0 +1,76 @@
1
+ ## Current/Unreleased
2
+
3
+ ## Release v0.6.1 (2021-01-24)
4
+
5
+ * 📝 Fix link to CHANGELOG.md in gemspec
6
+
7
+ ## Release v0.6.0 (2021-01-24)
8
+
9
+ * 🔥 **Breaking**: `#initialize` uses a keyword argument for `d`
10
+ * ✨ Added `#initialize(capacity: capa)` to set initial capacity.
11
+ * ✨ Added `peek_with_score` and `peek_score`
12
+ * ✨ Added `pop_with_score` and `each_pop(with_score: true)`
13
+ * ✨ Added `pop_all_below(max_score, array = [])`
14
+ * ✨ Added aliases for `shift` and `next`
15
+ * 📈 Added benchmark charts to README, and `bin/bench_charts` to generate them.
16
+ * requires `gruff` which requires `rmagick` which requires `imagemagick`
17
+ * 📝 Many documentation updates and fixes.
18
+
19
+ ## Release v0.5.0 (2021-01-17)
20
+
21
+ * 🔥 **Breaking**: reversed order of `#push` arguments to `value, score`.
22
+ * ✨ Added `#insert(score, value)` to replace earlier version of `#push`.
23
+ * ✨ Added `#each_pop` enumerator.
24
+ * ✨ Added aliases for `deq`, `enq`, `first`, `pop_below`, `length`, and
25
+ `count`, to mimic other classes in ruby's stdlib.
26
+ * ⚡️♻️ More performance improvements:
27
+ * Created an `ENTRY` struct and store both the score and the value pointer in
28
+ the same `ENTRY *entries` array.
29
+ * Reduced unnecessary allocations or copies in both sift loops. A similar
30
+ refactoring also sped up the pure ruby benchmark implementation.
31
+ * Compiling with `-O3`.
32
+ * 📝 Updated (and in some cases, fixed) yardoc
33
+ * ♻️ Moved aliases and less performance sensitive code into ruby.
34
+ * ♻️ DRY up push/insert methods
35
+
36
+ ## Release v0.4.0 (2021-01-12)
37
+
38
+ * 🔥 **Breaking**: Scores must be `Integer` or convertable to `Float`
39
+ * ⚠️ `Integer` scores must fit in `-ULONG_LONG_MAX` to `+ULONG_LONG_MAX`.
40
+ * ⚡️ Big performance improvements, by using C `long double *cscores` array
41
+ * ⚡️ many many (so many) updates to benchmarks
42
+ * ✨ Added `DHeap#clear`
43
+ * 🐛 Fixed `DHeap#initialize_copy` and `#freeze`
44
+ * ♻️ significant refactoring
45
+ * 📝 Updated docs (mostly adding benchmarks)
46
+
47
+ ## Release v0.3.0 (2020-12-29)
48
+
49
+ * 🔥 **Breaking**: Removed class methods that operated directly on an array.
50
+ They weren't compatible with the performance improvements.
51
+ * ⚡️ Big performance improvements, by converting to a `T_DATA` struct.
52
+ * ♻️ Major refactoring/rewriting of dheap.c
53
+ * ✅ Added benchmark specs
54
+
55
+ ## Release v0.2.2 (2020-12-27)
56
+
57
+ * 🐛 fix `optimized_cmp`, avoiding internal symbols
58
+ * 📝 Update documentation
59
+ * 💚 fix macos CI
60
+ * ➕ Add rubocop 👮🎨
61
+
62
+ ## Release v0.2.1 (2020-12-26)
63
+
64
+ * ⬆️ Upgraded rake (and bundler) to support ruby 3.0
65
+
66
+ ## Release v0.2.0 (2020-12-24)
67
+
68
+ * ✨ Add ability to push separate score and value
69
+ * ⚡️ Big performance gain, by storing scores separately and using ruby's
70
+ internal `OPTIMIZED_CMP` instead of always directly calling `<=>`
71
+
72
+ ## Release v0.1.0 (2020-12-22)
73
+
74
+ 🎉 initial release 🎉
75
+
76
+ * ✨ Add basic d-ary Heap implementation
data/Gemfile CHANGED
@@ -5,7 +5,16 @@ source "https://rubygems.org"
5
5
  # Specify your gem's dependencies in d_heap.gemspec
6
6
  gemspec
7
7
 
8
+ gem "pry"
8
9
  gem "rake", "~> 13.0"
9
10
  gem "rake-compiler"
10
11
  gem "rspec", "~> 3.10"
11
12
  gem "rubocop", "~> 1.0"
13
+
14
+ install_if -> { RUBY_PLATFORM !~ /darwin/ } do
15
+ gem "benchmark_driver-output-gruff"
16
+ end
17
+
18
+ gem "perf"
19
+ gem "priority_queue_cxx"
20
+ gem "stackprof"
@@ -1,22 +1,38 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- d_heap (0.2.2)
4
+ d_heap (0.6.1)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
9
  ast (2.4.1)
10
+ benchmark_driver (0.15.16)
11
+ benchmark_driver-output-gruff (0.3.1)
12
+ benchmark_driver (>= 0.12.0)
13
+ gruff
14
+ coderay (1.1.3)
10
15
  diff-lcs (1.4.4)
16
+ gruff (0.12.1)
17
+ histogram
18
+ rmagick
19
+ histogram (0.2.4.1)
20
+ method_source (1.0.0)
11
21
  parallel (1.19.2)
12
22
  parser (2.7.2.0)
13
23
  ast (~> 2.4.1)
24
+ perf (0.1.2)
25
+ priority_queue_cxx (0.3.4)
26
+ pry (0.13.1)
27
+ coderay (~> 1.1)
28
+ method_source (~> 1.0)
14
29
  rainbow (3.0.0)
15
30
  rake (13.0.3)
16
31
  rake-compiler (1.1.1)
17
32
  rake
18
33
  regexp_parser (1.8.2)
19
34
  rexml (3.2.3)
35
+ rmagick (4.1.2)
20
36
  rspec (3.10.0)
21
37
  rspec-core (~> 3.10.0)
22
38
  rspec-expectations (~> 3.10.0)
@@ -41,18 +57,27 @@ GEM
41
57
  unicode-display_width (>= 1.4.0, < 2.0)
42
58
  rubocop-ast (1.1.1)
43
59
  parser (>= 2.7.1.5)
60
+ ruby-prof (1.4.2)
44
61
  ruby-progressbar (1.10.1)
62
+ stackprof (0.2.16)
45
63
  unicode-display_width (1.7.0)
46
64
 
47
65
  PLATFORMS
48
66
  ruby
49
67
 
50
68
  DEPENDENCIES
69
+ benchmark_driver
70
+ benchmark_driver-output-gruff
51
71
  d_heap!
72
+ perf
73
+ priority_queue_cxx
74
+ pry
52
75
  rake (~> 13.0)
53
76
  rake-compiler
54
77
  rspec (~> 3.10)
55
78
  rubocop (~> 1.0)
79
+ ruby-prof
80
+ stackprof
56
81
 
57
82
  BUNDLED WITH
58
83
  2.2.3
data/N ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/sh
2
+ set -eu
3
+
4
+ export BENCH_N="$1"
5
+ shift
6
+
7
+ exec ruby "$@"
data/README.md CHANGED
@@ -1,53 +1,134 @@
1
- # DHeap
1
+ # DHeap - Fast d-ary heap for ruby
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/d_heap.svg)](https://badge.fury.io/rb/d_heap)
4
+ [![Build Status](https://github.com/nevans/d_heap/workflows/CI/badge.svg)](https://github.com/nevans/d_heap/actions?query=workflow%3ACI)
5
+ [![Maintainability](https://api.codeclimate.com/v1/badges/ff274acd0683c99c03e1/maintainability)](https://codeclimate.com/github/nevans/d_heap/maintainability)
6
+
7
+ A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
8
+ implemented as a C extension.
9
+
10
+ From [wikipedia](https://en.wikipedia.org/wiki/Heap_(data_structure)):
11
+ > A heap is a specialized tree-based data structure which is essentially an
12
+ > almost complete tree that satisfies the heap property: in a min heap, for any
13
+ > given node C, if P is a parent node of C, then the key (the value) of P is
14
+ > less than or equal to the key of C. The node at the "top" of the heap (with no
15
+ > parents) is called the root node.
16
+
17
+ ![tree representation of a min heap](images/wikipedia-min-heap.png)
18
+
19
+ With a regular queue, you expect "FIFO" behavior: first in, first out. With a
20
+ stack you expect "LIFO": last in first out. A priority queue has a score for
21
+ each element and elements are popped in order by score. Priority queues are
22
+ often used in algorithms for e.g. [scheduling] of timers or bandwidth
23
+ management, for [Huffman coding], and various graph search algorithms such as
24
+ [Dijkstra's algorithm], [A* search], or [Prim's algorithm].
25
+
26
+ The _d_-ary heap data structure is a generalization of the [binary heap], in
27
+ which the nodes have _d_ children instead of 2. This allows for "insert" and
28
+ "decrease priority" operations to be performed more quickly with the tradeoff of
29
+ slower delete minimum or "increase priority". Additionally, _d_-ary heaps can
30
+ have better memory cache behavior than binary heaps, allowing them to run more
31
+ quickly in practice despite slower worst-case time complexity. In the worst
32
+ case, a _d_-ary heap requires only `O(log n / log d)` operations to push, with
33
+ the tradeoff that pop requires `O(d log n / log d)`.
34
+
35
+ Although you should probably just use the default _d_ value of `4` (see the
36
+ analysis below), it's always advisable to benchmark your specific use-case. In
37
+ particular, if you push items more than you pop, higher values for _d_ can give
38
+ a faster total runtime.
39
+
40
+ [d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
41
+ [priority queue]: https://en.wikipedia.org/wiki/Priority_queue
42
+ [binary heap]: https://en.wikipedia.org/wiki/Binary_heap
43
+ [scheduling]: https://en.wikipedia.org/wiki/Scheduling_(computing)
44
+ [Huffman coding]: https://en.wikipedia.org/wiki/Huffman_coding#Compression
45
+ [Dijkstra's algorithm]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Using_a_priority_queue
46
+ [A* search]: https://en.wikipedia.org/wiki/A*_search_algorithm#Description
47
+ [Prim's algorithm]: https://en.wikipedia.org/wiki/Prim%27s_algorithm
2
48
 
3
- A fast _d_-ary heap implementation for ruby, useful in priority queues and graph
4
- algorithms.
49
+ ## Usage
5
50
 
6
- The _d_-ary heap data structure is a generalization of the binary heap, in which
7
- the nodes have _d_ children instead of 2. This allows for "decrease priority"
8
- operations to be performed more quickly with the tradeoff of slower delete
9
- minimum. Additionally, _d_-ary heaps can have better memory cache behavior than
10
- binary heaps, allowing them to run more quickly in practice despite slower
11
- worst-case time complexity. In the worst case, a _d_-ary heap requires only
12
- `O(log n / log d)` to push, with the tradeoff that pop is `O(d log n / log d)`.
51
+ The basic API is `#push(object, score)` and `#pop`. Please read the
52
+ [gem documentation] for more details and other methods.
13
53
 
14
- Although you should probably just stick with the default _d_ value of `4`, it
15
- may be worthwhile to benchmark your specific scenario.
54
+ Quick reference for some common methods:
16
55
 
17
- ## Motivation
56
+ * `heap << object` adds a value, with `Float(object)` as its score.
57
+ * `heap.push(object, score)` adds a value with an extrinsic score.
58
+ * `heap.pop` removes and returns the value with the minimum score.
59
+ * `heap.pop_lte(max_score)` pops only if the next score is `<=` the argument.
60
+ * `heap.peek` to view the minimum value without popping it.
61
+ * `heap.clear` to remove all items from the heap.
62
+ * `heap.empty?` returns true if the heap is empty.
63
+ * `heap.size` returns the number of items in the heap.
64
+
65
+ If the score changes while the object is still in the heap, it will not be
66
+ re-evaluated again.
67
+
68
+ The score must either be `Integer` or `Float` or convertable to a `Float` via
69
+ `Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
70
+ numeric values gives more than 50% speedup under some benchmarks! _n.b._
71
+ `Integer` _scores must have an absolute value that fits into_ `unsigned long
72
+ long`. This is compiler and architecture dependant but with gcc on an IA-64
73
+ system it's 64 bits, which gives a range of -18,446,744,073,709,551,615 to
74
+ +18,446,744,073,709,551,615, which is more than enough to store e.g. POSIX time
75
+ in nanoseconds.
76
+
77
+ _Comparing arbitary objects via_ `a <=> b` _was the original design and may be
78
+ added back in a future version,_ if (and only if) _it can be done without
79
+ impacting the speed of numeric comparisons. The speedup from this constraint is
80
+ huge!_
81
+
82
+ [gem documentation]: https://rubydoc.info/gems/d_heap/DHeap
83
+
84
+ ### Examples
85
+
86
+ ```ruby
87
+ # create some example objects to place in our heap
88
+ Task = Struct.new(:id, :time) do
89
+ def to_f; time.to_f end
90
+ end
91
+ t1 = Task.new(1, Time.now + 5*60)
92
+ t2 = Task.new(2, Time.now + 50)
93
+ t3 = Task.new(3, Time.now + 60)
94
+ t4 = Task.new(4, Time.now + 5)
95
+
96
+ # create the heap
97
+ require "d_heap"
98
+ heap = DHeap.new
99
+
100
+ # push with an explicit score (which might be extrinsic to the value)
101
+ heap.push t1, t1.to_f
102
+
103
+ # the score will be implicitly cast with Float, so any object with #to_f
104
+ heap.push t2, t2
18
105
 
19
- Sometimes you just need a priority queue, right? With a regular queue, you
20
- expect "FIFO" behavior: first in, first out. With a priority queue, you push
21
- with a score (or your elements are comparable), and you want to be able to
22
- efficiently pop off the minimum (or maximum) element.
23
-
24
- One obvious approach is to simply maintain an array in sorted order. And
25
- ruby's Array class makes it simple to maintain a sorted array by combining
26
- `#bsearch_index` with `#insert`. With certain insert/remove workloads that can
27
- perform very well, but in the worst-case an insert or delete can result in O(n),
28
- since `#insert` may need to `memcpy` or `memmove` a significant portion of the
29
- array.
30
-
31
- But the standard way to efficiently and simply solve this problem is using a
32
- binary heap. Although it increases the time for `pop`, it converts the
33
- amortized time per push + pop from `O(n)` to `O(d log n / log d)`.
34
-
35
- I was surprised to find that, at least under certain benchmarks, my pure ruby
36
- heap implementation was usually slower than inserting into a fully sorted
37
- array. While this is a testament to ruby's fine-tuned Array implementationw, a
38
- heap implementated in C should easily peform faster than `Array#insert`.
39
-
40
- The biggest issue is that it just takes far too much time to call `<=>` from
41
- ruby code: A sorted array only requires `log n / log 2` comparisons to insert
42
- and no comparisons to pop. However a _d_-ary heap requires `log n / log d` to
43
- insert plus an additional `d log n / log d` to pop. If your queue contains only
44
- a few hundred items at once, the overhead of those extra calls to `<=>` is far
45
- more than occasionally calling `memcpy`.
46
-
47
- It's likely that MJIT will eventually make the C-extension completely
48
- unnecessary. This is definitely hotspot code, and the basic ruby implementation
49
- would work fine, if not for that `<=>` overhead. Until then... this gem gets
50
- the job done.
106
+ # if the object has an intrinsic score via #to_f, "<<" is the simplest API
107
+ heap << t3 << t4
108
+
109
+ # pop returns the lowest scored item, and removes it from the heap
110
+ heap.pop # => #<struct Task id=4, time=2021-01-17 17:02:22.5574 -0500>
111
+ heap.pop # => #<struct Task id=2, time=2021-01-17 17:03:07.5574 -0500>
112
+
113
+ # peek returns the lowest scored item, without removing it from the heap
114
+ heap.peek # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
115
+ heap.pop # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
116
+
117
+ # pop_lte handles the common "h.pop if h.peek_score < max" pattern
118
+ heap.pop_lte(Time.now + 65) # => nil
119
+
120
+ # the heap size can be inspected with size and empty?
121
+ heap.empty? # => false
122
+ heap.size # => 1
123
+ heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
124
+ heap.empty? # => true
125
+ heap.size # => 0
126
+
127
+ # popping from an empty heap returns nil
128
+ heap.pop # => nil
129
+ ```
130
+
131
+ Please see the [gem documentation] for more methods and more examples.
51
132
 
52
133
  ## Installation
53
134
 
@@ -65,134 +146,264 @@ Or install it yourself as:
65
146
 
66
147
  $ gem install d_heap
67
148
 
68
- ## Usage
149
+ ## Motivation
69
150
 
70
- The simplest way to use it is simply with `#push` and `#pop`. Push will
151
+ One naive approach to a priority queue is to maintain an array in sorted order.
152
+ This can be very simply implemented in ruby with `Array#bseach_index` +
153
+ `Array#insert`. This can be very fast—`Array#pop` is `O(1)`—but the worst-case
154
+ for insert is `O(n)` because it may need to `memcpy` a significant portion of
155
+ the array.
71
156
 
72
- ```ruby
73
- require "d_heap"
157
+ The standard way to implement a priority queue is with a binary heap. Although
158
+ this increases the time complexity for `pop` alone, it reduces the combined time
159
+ compexity for the combined `push` + `pop`. Using a d-ary heap with d > 2
160
+ makes the tree shorter but broader, which reduces to `O(log n / log d)` while
161
+ increasing the comparisons needed by sift-down to `O(d log n/ log d)`.
74
162
 
75
- heap = DHeap.new # defaults to a 4-ary heap
163
+ However, I was disappointed when my best ruby heap implementation ran much more
164
+ slowly than the naive approach—even for heaps containing ten thousand items.
165
+ Although it _is_ `O(n)`, `memcpy` is _very_ fast, while calling `<=>` from ruby
166
+ has _much_ higher overhead. And a _d_-heap needs `d + 1` times more comparisons
167
+ for each push + pop than `bsearch` + `insert`.
76
168
 
77
- # storing [time, task] tuples
78
- heap << [Time.now + 5*60, Task.new(1)]
79
- heap << [Time.now + 30, Task.new(2)]
80
- heap << [Time.now + 60, Task.new(3)]
81
- heap << [Time.now + 5, Task.new(4)]
169
+ Additionally, when researching how other systems handle their scheduling, I was
170
+ inspired by reading go's "timer.go" implementation to experiment with a 4-ary
171
+ heap instead of the traditional binary heap.
82
172
 
83
- # peeking and popping (using last to get the task and ignore the time)
84
- heap.pop.last # => Task[4]
85
- heap.pop.last # => Task[2]
86
- heap.peak.last # => Task[3]
87
- heap.pop.last # => Task[3]
88
- heap.pop.last # => Task[1]
89
- ```
173
+ ## Benchmarks
90
174
 
91
- Read the `rdoc` for more detailed documentation and examples.
175
+ _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
176
+ `docs/profile.txt` for much more detail or updated results. These benchmarks
177
+ were measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
178
+
179
+ These benchmarks use very simple implementations for a pure-ruby heap and an
180
+ array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
181
+ comparison, I also compare to the [priority_queue_cxx gem] which uses the [C++
182
+ STL priority_queue], and another naive implementation that uses `Array#min` and
183
+ `Array#delete_at` with an unsorted array.
184
+
185
+ In these benchmarks, `DHeap` runs faster than all other implementations for
186
+ every scenario and every value of N, although the difference is usually more
187
+ noticable at higher values of N. The pure ruby heap implementation is
188
+ competitive for `push` alone at every value of N, but is significantly slower
189
+ than bsearch + insert for push + pop, until N is _very_ large (somewhere between
190
+ 10k and 100k)!
191
+
192
+ [priority_queue_cxx gem]: https://rubygems.org/gems/priority_queue_cxx
193
+ [C++ STL priority_queue]: http://www.cplusplus.com/reference/queue/priority_queue/
194
+
195
+ Three different scenarios are measured:
196
+
197
+ ### push N items onto an empty heap
198
+
199
+ ...but never pop (clearing between each set of pushes).
200
+
201
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n.png)
202
+
203
+ ### push N items onto an empty heap then pop all N
204
+
205
+ Although this could be used for heap sort, we're unlikely to choose heap sort
206
+ over Ruby's quick sort implementation. I'm using this scenario to represent
207
+ the amortized cost of creating a heap and (eventually) draining it.
208
+
209
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n_pop_n.png)
210
+
211
+ ### push and pop on a heap with N values
212
+
213
+ Repeatedly push and pop while keeping a stable heap size. This is a _very
214
+ simplistic_ approximation for how most scheduler/timer heaps might be used.
215
+ Usually when a timer fires it will be quickly replaced by a new timer, and the
216
+ overall count of timers will remain roughly stable.
217
+
218
+ ![bar graph for push_pop benchmarks](./images/push_pop.png)
219
+
220
+ ### numbers
221
+
222
+ Even for very small N values the benchmark implementations, `DHeap` runs faster
223
+ than the other implementations for each scenario, although the difference is
224
+ still relatively small. The pure ruby binary heap is 2x or more slower than
225
+ bsearch + insert for common push/pop scenario.
226
+
227
+ == push N (N=5) ==========================================================
228
+ push N (c_dheap): 1969700.7 i/s
229
+ push N (c++ stl): 1049738.1 i/s - 1.88x slower
230
+ push N (rb_heap): 928435.2 i/s - 2.12x slower
231
+ push N (bsearch): 921060.0 i/s - 2.14x slower
232
+
233
+ == push N then pop N (N=5) ===============================================
234
+ push N + pop N (c_dheap): 1375805.0 i/s
235
+ push N + pop N (c++ stl): 1134997.5 i/s - 1.21x slower
236
+ push N + pop N (findmin): 862913.1 i/s - 1.59x slower
237
+ push N + pop N (bsearch): 762887.1 i/s - 1.80x slower
238
+ push N + pop N (rb_heap): 506890.4 i/s - 2.71x slower
239
+
240
+ == Push/pop with pre-filled queue of size=N (N=5) ========================
241
+ push + pop (c_dheap): 9044435.5 i/s
242
+ push + pop (c++ stl): 7534583.4 i/s - 1.20x slower
243
+ push + pop (findmin): 5026155.1 i/s - 1.80x slower
244
+ push + pop (bsearch): 4300260.0 i/s - 2.10x slower
245
+ push + pop (rb_heap): 2299499.7 i/s - 3.93x slower
246
+
247
+ By N=21, `DHeap` has pulled significantly ahead of bsearch + insert for all
248
+ scenarios, but the pure ruby heap is still slower than every other
249
+ implementation—even resorting the array after every `#push`—in any scenario that
250
+ uses `#pop`.
251
+
252
+ == push N (N=21) =========================================================
253
+ push N (c_dheap): 464231.4 i/s
254
+ push N (c++ stl): 305546.7 i/s - 1.52x slower
255
+ push N (rb_heap): 202803.7 i/s - 2.29x slower
256
+ push N (bsearch): 168678.7 i/s - 2.75x slower
257
+
258
+ == push N then pop N (N=21) ==============================================
259
+ push N + pop N (c_dheap): 298350.3 i/s
260
+ push N + pop N (c++ stl): 252227.1 i/s - 1.18x slower
261
+ push N + pop N (findmin): 161998.7 i/s - 1.84x slower
262
+ push N + pop N (bsearch): 143432.3 i/s - 2.08x slower
263
+ push N + pop N (rb_heap): 79622.1 i/s - 3.75x slower
264
+
265
+ == Push/pop with pre-filled queue of size=N (N=21) =======================
266
+ push + pop (c_dheap): 8855093.4 i/s
267
+ push + pop (c++ stl): 7223079.5 i/s - 1.23x slower
268
+ push + pop (findmin): 4542913.7 i/s - 1.95x slower
269
+ push + pop (bsearch): 3461802.4 i/s - 2.56x slower
270
+ push + pop (rb_heap): 1845488.7 i/s - 4.80x slower
271
+
272
+ At higher values of N, a heaps logarithmic growth leads to only a little
273
+ slowdown of `#push`, while insert's linear growth causes it to run noticably
274
+ slower and slower. But because `#pop` is `O(1)` for a sorted array and `O(d log
275
+ n / log d)` for a heap, scenarios involving both `#push` and `#pop` remain
276
+ relatively close, and bsearch + insert still runs faster than a pure ruby heap,
277
+ even up to queues with 10k items. But as queue size increases beyond than that,
278
+ the linear time compexity to keep a sorted array dominates.
279
+
280
+ == push + pop (rb_heap)
281
+ queue size = 10000: 736618.2 i/s
282
+ queue size = 25000: 670186.8 i/s - 1.10x slower
283
+ queue size = 50000: 618156.7 i/s - 1.19x slower
284
+ queue size = 100000: 579250.7 i/s - 1.27x slower
285
+ queue size = 250000: 572795.0 i/s - 1.29x slower
286
+ queue size = 500000: 543648.3 i/s - 1.35x slower
287
+ queue size = 1000000: 513523.4 i/s - 1.43x slower
288
+ queue size = 2500000: 460848.9 i/s - 1.60x slower
289
+ queue size = 5000000: 445234.5 i/s - 1.65x slower
290
+ queue size = 10000000: 423119.0 i/s - 1.74x slower
291
+
292
+ == push + pop (bsearch)
293
+ queue size = 10000: 786334.2 i/s
294
+ queue size = 25000: 364963.8 i/s - 2.15x slower
295
+ queue size = 50000: 200520.6 i/s - 3.92x slower
296
+ queue size = 100000: 88607.0 i/s - 8.87x slower
297
+ queue size = 250000: 34530.5 i/s - 22.77x slower
298
+ queue size = 500000: 17965.4 i/s - 43.77x slower
299
+ queue size = 1000000: 5638.7 i/s - 139.45x slower
300
+ queue size = 2500000: 1302.0 i/s - 603.93x slower
301
+ queue size = 5000000: 592.0 i/s - 1328.25x slower
302
+ queue size = 10000000: 288.8 i/s - 2722.66x slower
303
+
304
+ == push + pop (c_dheap)
305
+ queue size = 10000: 7311366.6 i/s
306
+ queue size = 50000: 6737824.5 i/s - 1.09x slower
307
+ queue size = 25000: 6407340.6 i/s - 1.14x slower
308
+ queue size = 100000: 6254396.3 i/s - 1.17x slower
309
+ queue size = 250000: 5917684.5 i/s - 1.24x slower
310
+ queue size = 500000: 5126307.6 i/s - 1.43x slower
311
+ queue size = 1000000: 4403494.1 i/s - 1.66x slower
312
+ queue size = 2500000: 3304088.2 i/s - 2.21x slower
313
+ queue size = 5000000: 2664897.7 i/s - 2.74x slower
314
+ queue size = 10000000: 2137927.6 i/s - 3.42x slower
92
315
 
93
- ## TODOs...
316
+ ## Analysis
94
317
 
95
- _TODO:_ In addition to a basic _d_-ary heap class (`DHeap`), this library
96
- ~~includes~~ _will include_ extensions to `Array`, allowing an Array to be
97
- directly handled as a priority queue. These extension methods are meant to be
98
- used similarly to how `#bsearch` and `#bsearch_index` might be used.
318
+ ### Time complexity
99
319
 
100
- _TODO:_ Also ~~included is~~ _will include_ `DHeap::Set`, which augments the
101
- basic heap with an internal `Hash`, which maps a set of values to scores.
102
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
103
- and adjustments, to achieve faster average runtime for *add* and *cancel*
104
- operations.
320
+ There are two fundamental heap operations: sift-up (used by push) and sift-down
321
+ (used by pop).
322
+
323
+ * A _d_-ary heap will have `log n / log d` layers, so both sift operations can
324
+ perform as many as `log n / log d` writes, when a member sifts the entire
325
+ length of the tree.
326
+ * Sift-up makes one comparison per layer, so push runs in `O(log n / log d)`.
327
+ * Sift-down makes d comparions per layer, so pop runs in `O(d log n / log d)`.
328
+
329
+ So, in the simplest case of running balanced push/pop while maintaining the same
330
+ heap size, `(1 + d) log n / log d` comparisons are made. In the worst case,
331
+ when every sift traverses every layer of the tree, `d=4` requires the fewest
332
+ comparisons for combined insert and delete:
333
+
334
+ * (1 + 2) lg n / lg d ≈ 4.328085 lg n
335
+ * (1 + 3) lg n / lg d ≈ 3.640957 lg n
336
+ * (1 + 4) lg n / lg d ≈ 3.606738 lg n
337
+ * (1 + 5) lg n / lg d ≈ 3.728010 lg n
338
+ * (1 + 6) lg n / lg d ≈ 3.906774 lg n
339
+ * (1 + 7) lg n / lg d ≈ 4.111187 lg n
340
+ * (1 + 8) lg n / lg d ≈ 4.328085 lg n
341
+ * (1 + 9) lg n / lg d ≈ 4.551196 lg n
342
+ * (1 + 10) lg n / lg d ≈ 4.777239 lg n
343
+ * etc...
105
344
 
106
- _TODO:_ Also ~~included is~~ _will include_ `DHeap::Timers`, which contains some
107
- features that are loosely inspired by go's timers. e.g: It lazily sifts its
108
- heap after deletion and adjustments, to achieve faster average runtime for *add*
109
- and *cancel* operations.
345
+ See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
110
346
 
111
- Additionally, I was inspired by reading go's "timer.go" implementation to
112
- experiment with a 4-ary heap instead of the traditional binary heap. In the
113
- case of timers, new timers are usually scheduled to run after most of the
114
- existing timers. And timers are usually canceled before they have a chance to
115
- run. While a binary heap holds 50% of its elements in its last layer, 75% of a
116
- 4-ary heap will have no children. That diminishes the extra comparison overhead
117
- during sift-down.
347
+ ### Space complexity
118
348
 
119
- ## Benchmarks
349
+ Space usage is linear, regardless of d. However higher d values may
350
+ provide better cache locality. Because the heap is a complete binary tree, the
351
+ elements can be stored in an array, without the need for tree or list pointers.
120
352
 
121
- _TODO: put benchmarks here._
353
+ Ruby can compare Numeric values _much_ faster than other ruby objects, even if
354
+ those objects simply delegate comparison to internal Numeric values. And it is
355
+ often useful to use external scores for otherwise uncomparable values. So
356
+ `DHeap` uses twice as many entries (one for score and one for value)
357
+ as an array which only stores values.
122
358
 
123
- ## Analysis
359
+ ## Thread safety
124
360
 
125
- ### Time complexity
361
+ `DHeap` is _not_ thread-safe, so concurrent access from multiple threads need to
362
+ take precautions such as locking access behind a mutex.
126
363
 
127
- Both sift operations can perform (log[d] n = log n / log d) swaps.
128
- Swap up performs only a single comparison per swap: O(1).
129
- Swap down performs as many as d comparions per swap: O(d).
130
-
131
- Inserting an item is O(log n / log d).
132
- Deleting the root is O(d log n / log d).
133
-
134
- Assuming every inserted item is eventually deleted from the root, d=4 requires
135
- the fewest comparisons for combined insert and delete:
136
- * (1 + 2) lg 2 = 4.328085
137
- * (1 + 3) lg 3 = 3.640957
138
- * (1 + 4) lg 4 = 3.606738
139
- * (1 + 5) lg 5 = 3.728010
140
- * (1 + 6) lg 6 = 3.906774
141
- * etc...
142
-
143
- Leaf nodes require no comparisons to shift down, and higher values for d have
144
- higher percentage of leaf nodes:
145
- * d=2 has ~50% leaves,
146
- * d=3 has ~67% leaves,
147
- * d=4 has ~75% leaves,
148
- * and so on...
364
+ ## Alternative data structures
149
365
 
150
- See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
366
+ As always, you should run benchmarks with your expected scenarios to determine
367
+ which is best for your application.
151
368
 
152
- ### Space complexity
369
+ Depending on your use-case, maintaining a sorted `Array` using `#bsearch_index`
370
+ and `#insert` might be just fine! Even `min` plus `delete` with an unsorted
371
+ array can be very fast on small queues. Although insertions run with `O(n)`,
372
+ `memcpy` is so fast on modern hardware that your dataset might not be large
373
+ enough for it to matter.
153
374
 
154
- Because the heap is a complete binary tree, space usage is linear, regardless
155
- of d. However higher d values may provide better cache locality.
375
+ More complex heap varients, e.g. [Fibonacci heap], allow heaps to be split and
376
+ merged which gives some graph algorithms a lower amortized time complexity. But
377
+ in practice, _d_-ary heaps have much lower overhead and often run faster.
156
378
 
157
- We can run comparisons much much faster for Numeric or String objects than for
158
- ruby objects which delegate comparison to internal Numeric or String objects.
159
- And it is often advantageous to use extrinsic scores for uncomparable items.
160
- For this, our internal array uses twice as many entries (one for score and one
161
- for value) as it would if it only supported intrinsic comparison or used an
162
- un-memoized "sort_by" proc.
379
+ [Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
163
380
 
164
- ### Timers
381
+ If it is important to be able to quickly enumerate the set or find the ranking
382
+ of values in it, then you may want to use a self-balancing binary search tree
383
+ (e.g. a [red-black tree]) or a [skip-list].
165
384
 
166
- Additionally, when used to sort timers, we can reasonably assume that:
167
- * New timers usually sort after most existing timers.
168
- * Most timers will be canceled before executing.
169
- * Canceled timers usually sort after most existing timers.
385
+ [red-black tree]: https://en.wikipedia.org/wiki/Red%E2%80%93black_tree
386
+ [skip-list]: https://en.wikipedia.org/wiki/Skip_list
170
387
 
171
- So, if we are able to delete an item without searching for it, by keeping a map
172
- of positions within the heap, most timers can be inserted and deleted in O(1)
173
- time. Canceling a non-leaf timer can be further optimized by marking it as
174
- canceled without immediately removing it from the heap. If the timer is
175
- rescheduled before we garbage collect, adjusting its position will usually be
176
- faster than a delete and re-insert.
388
+ [Hashed and Heirarchical Timing Wheels][timing wheels] (or some variant in that
389
+ family of data structures) can be constructed to have effectively `O(1)` running
390
+ time in most cases. Although the implementation for that data structure is more
391
+ complex than a heap, it may be necessary for enormous values of N.
177
392
 
178
- ## Alternative data structures
393
+ [timing wheels]: http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
179
394
 
180
- Depending on what you're doing, maintaining a sorted `Array` using
181
- `#bsearch_index` and `#insert` might be faster! Although it is technically
182
- O(n) for insertions, the implementations for `memcpy` or `memmove` can be *very*
183
- fast on modern architectures. Also, it can be faster O(n) on average, if
184
- insertions are usually near the end of the array. You should run benchmarks
185
- with your expected scenarios to determine which is right.
395
+ ## TODOs...
186
396
 
187
- If it is important to be able to quickly enumerate the set or find the ranking
188
- of values in it, then you probably want to use a self-balancing binary search
189
- tree (e.g. a red-black tree) or a skip-list.
190
-
191
- A Hashed Timing Wheel or Heirarchical Timing Wheels (or some variant in that
192
- family of data structures) can be constructed to have effectively O(1) running
193
- time in most cases. However, the implementation for that data structure is more
194
- complex than a heap. If a 4-ary heap is good enough for go's timers, it should
195
- be suitable for many use cases.
397
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Map`, which augments the
398
+ basic heap with an internal `Hash`, which maps objects to their position in the
399
+ heap. This enforces a uniqueness constraint on items on the heap, and also
400
+ allows items to be more efficiently deleted or adjusted. However maintaining
401
+ the hash does lead to a small drop in normal `#push` and `#pop` performance.
402
+
403
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
404
+ features that are loosely inspired by go's timers. e.g: It lazily sifts its
405
+ heap after deletion and adjustments, to achieve faster average runtime for *add*
406
+ and *cancel* operations.
196
407
 
197
408
  ## Development
198
409