d_heap 0.1.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 585cdfa768be7f778b92fa500d952f4207b98ea46378d7289f2ff936bc1f06f0
4
- data.tar.gz: d306234243d78a9dfd0ef9beb06e569ddd9915f2c733ebd9109953fc84584e11
3
+ metadata.gz: 413c0a93e2c3cbdbb86ee433df47a310034d453e441a150d8317dc055b4a9a90
4
+ data.tar.gz: 4bf67447021da03b07da7f44bcf97a66f13fa42f6f67bcfe9a49d0866c8b8167
5
5
  SHA512:
6
- metadata.gz: dd379eecd4bc218b62729d2fdd39773ffbf1e248fb83eeffda910f439644ad5ed8588cf1e8fd4de807ca074f3165e073300d584eb64fae2c54aeed5206374cfc
7
- data.tar.gz: 7fac571f7d14bb2a7b6d2773fe531a803474587a83508710c4898c15649117558aaa399151d5be3dd49f07afbb06ffd38fe55161e07e483cee729c4f1efdbb20
6
+ metadata.gz: 5e55bf53c1062686e0863fb9c3b09f3c2b8b936b0cf83985092e1e906b0b24f40e02a42eada048dcea732a60ec4e3695bb861943b424a8cd3152b227abad8a4e
7
+ data.tar.gz: e021616d6dcdcec943fec11783f2147a2d175aa3a0caf668c2339e05795e32475cdf7be90201c38d7108c74087e22140a9da9d3f211d7f0335e3c173ae83893b
@@ -0,0 +1,26 @@
1
+ name: Ruby
2
+
3
+ on: [push,pull_request]
4
+
5
+ jobs:
6
+ build:
7
+ strategy:
8
+ fail-fast: false
9
+ matrix:
10
+ ruby: [2.5, 2.6, 2.7, 3.0]
11
+ os: [ubuntu, macos]
12
+ experimental: [false]
13
+ runs-on: ${{ matrix.os }}-latest
14
+ continue-on-error: ${{ matrix.experimental }}
15
+ steps:
16
+ - uses: actions/checkout@v2
17
+ - name: Set up Ruby
18
+ uses: ruby/setup-ruby@v1
19
+ with:
20
+ ruby-version: ${{ matrix.ruby }}
21
+ bundler-cache: true
22
+ - name: Run the default task
23
+ run: |
24
+ gem install bundler -v 2.2.3
25
+ bundle install
26
+ bundle exec rake
@@ -0,0 +1,199 @@
1
+ inherit_mode:
2
+ merge:
3
+ - Exclude
4
+
5
+ AllCops:
6
+ TargetRubyVersion: 2.5
7
+ NewCops: disable
8
+ Exclude:
9
+ - bin/benchmark-driver
10
+ - bin/rake
11
+ - bin/rspec
12
+ - bin/rubocop
13
+
14
+ ###########################################################################
15
+ # rubocop defaults are simply WRONG about many rules... Sorry. It's true.
16
+
17
+ ###########################################################################
18
+ # Layout: Alignment. I want these to work, I really do...
19
+
20
+ # I wish this worked with "table". but that goes wrong sometimes.
21
+ Layout/HashAlignment: { Enabled: false }
22
+
23
+ # This needs to be configurable so parenthesis calls are aligned with first
24
+ # parameter, and non-parenthesis calls are aligned with fixed indentation.
25
+ Layout/ParameterAlignment: { Enabled: false }
26
+
27
+ ###########################################################################
28
+ # Layout: Empty lines
29
+
30
+ Layout/EmptyLineAfterGuardClause: { Enabled: false }
31
+ Layout/EmptyLineAfterMagicComment: { Enabled: true }
32
+ Layout/EmptyLineAfterMultilineCondition: { Enabled: false }
33
+ Layout/EmptyLines: { Enabled: true }
34
+ Layout/EmptyLinesAroundAccessModifier: { Enabled: true }
35
+ Layout/EmptyLinesAroundArguments: { Enabled: true }
36
+ Layout/EmptyLinesAroundBeginBody: { Enabled: true }
37
+ Layout/EmptyLinesAroundBlockBody: { Enabled: false }
38
+ Layout/EmptyLinesAroundExceptionHandlingKeywords: { Enabled: true }
39
+ Layout/EmptyLinesAroundMethodBody: { Enabled: true }
40
+
41
+ Layout/EmptyLineBetweenDefs:
42
+ Enabled: true
43
+ AllowAdjacentOneLineDefs: true
44
+
45
+ Layout/EmptyLinesAroundAttributeAccessor:
46
+ inherit_mode:
47
+ merge:
48
+ - Exclude
49
+ - AllowedMethods
50
+ Enabled: true
51
+ AllowedMethods:
52
+ - delegate
53
+ - def_delegator
54
+ - def_delegators
55
+ - def_instance_delegators
56
+
57
+ # "empty_lines_special" sometimes does the wrong thing and annoys me.
58
+ # But I've mostly learned to live with it... mostly. 🙁
59
+
60
+ Layout/EmptyLinesAroundClassBody:
61
+ Enabled: true
62
+ EnforcedStyle: empty_lines_special
63
+
64
+ Layout/EmptyLinesAroundModuleBody:
65
+ Enabled: true
66
+ EnforcedStyle: empty_lines_special
67
+
68
+ ###########################################################################
69
+ # Layout: Space around, before, inside, etc
70
+
71
+ Layout/SpaceAroundEqualsInParameterDefault: { Enabled: false }
72
+ Layout/SpaceBeforeBlockBraces: { Enabled: false }
73
+ Layout/SpaceBeforeFirstArg: { Enabled: false }
74
+ Layout/SpaceInLambdaLiteral: { Enabled: false }
75
+ Layout/SpaceInsideArrayLiteralBrackets: { Enabled: false }
76
+ Layout/SpaceInsideHashLiteralBraces: { Enabled: false }
77
+
78
+ Layout/SpaceInsideBlockBraces:
79
+ EnforcedStyle: space
80
+ EnforcedStyleForEmptyBraces: space
81
+ SpaceBeforeBlockParameters: false
82
+
83
+ # I would enable this if it were a bit better at handling alignment.
84
+ Layout/ExtraSpacing:
85
+ Enabled: false
86
+ AllowForAlignment: true
87
+ AllowBeforeTrailingComments: true
88
+
89
+ ###########################################################################
90
+ # Layout: Misc
91
+
92
+ Layout/LineLength:
93
+ Max: 90 # should stay under 80, but we'll allow a little wiggle-room
94
+
95
+ Layout/MultilineOperationIndentation: { Enabled: false }
96
+
97
+ Layout/MultilineMethodCallIndentation:
98
+ EnforcedStyle: indented
99
+
100
+ ###########################################################################
101
+ # Lint and Naming: rubocop defaults are mostly good, but...
102
+
103
+ Lint/UnusedMethodArgument: { Enabled: false }
104
+ Naming/BinaryOperatorParameterName: { Enabled: false } # def /(denominator)
105
+ Naming/RescuedExceptionsVariableName: { Enabled: false }
106
+
107
+ ###########################################################################
108
+ # Matrics:
109
+
110
+ Metrics/CyclomaticComplexity:
111
+ Max: 10
112
+
113
+ # Although it may be better to split specs into multiple files...?
114
+ Metrics/BlockLength:
115
+ Exclude:
116
+ - "spec/**/*_spec.rb"
117
+ CountAsOne:
118
+ - array
119
+ - hash
120
+ - heredoc
121
+
122
+ Metrics/ClassLength:
123
+ Max: 200
124
+ CountAsOne:
125
+ - array
126
+ - hash
127
+ - heredoc
128
+
129
+ ###########################################################################
130
+ # Style...
131
+
132
+ Style/AccessorGrouping: { Enabled: false }
133
+ Style/AsciiComments: { Enabled: false } # 👮 can't stop our 🎉🥳🎊🥳!
134
+ Style/ClassAndModuleChildren: { Enabled: false }
135
+ Style/EachWithObject: { Enabled: false }
136
+ Style/FormatStringToken: { Enabled: false }
137
+ Style/FloatDivision: { Enabled: false }
138
+ Style/IfUnlessModifier: { Enabled: false }
139
+ Style/IfWithSemicolon: { Enabled: false }
140
+ Style/Lambda: { Enabled: false }
141
+ Style/LineEndConcatenation: { Enabled: false }
142
+ Style/MixinGrouping: { Enabled: false }
143
+ Style/MultilineBlockChain: { Enabled: false }
144
+ Style/PerlBackrefs: { Enabled: false } # use occasionally/sparingly
145
+ Style/RescueStandardError: { Enabled: false }
146
+ Style/Semicolon: { Enabled: false }
147
+ Style/SingleLineMethods: { Enabled: false }
148
+ Style/StabbyLambdaParentheses: { Enabled: false }
149
+ Style/WhenThen : { Enabled: false }
150
+
151
+ # I require trailing commas elsewhere, but these are optional
152
+ Style/TrailingCommaInArguments: { Enabled: false }
153
+
154
+ # If rubocop had an option to only enforce this on constants and literals (e.g.
155
+ # strings, regexp, range), I'd agree.
156
+ #
157
+ # But if you are using it e.g. on method arguments of unknown type, in the same
158
+ # style that ruby uses it with grep, then you are doing exactly the right thing.
159
+ Style/CaseEquality: { Enabled: false }
160
+
161
+ # I'd enable if "require_parentheses_when_complex" considered unary '!' simple.
162
+ Style/TernaryParentheses:
163
+ EnforcedStyle: require_parentheses_when_complex
164
+ Enabled: false
165
+
166
+ Style/BlockDelimiters:
167
+ inherit_mode:
168
+ merge:
169
+ - Exclude
170
+ - ProceduralMethods
171
+ - IgnoredMethods
172
+ - FunctionalMethods
173
+ EnforcedStyle: semantic
174
+ AllowBracesOnProceduralOneLiners: true
175
+ IgnoredMethods:
176
+ - expect # rspec
177
+ - profile # ruby-prof
178
+ - ips # benchmark-ips
179
+
180
+
181
+ Style/FormatString:
182
+ EnforcedStyle: percent
183
+
184
+ Style/StringLiterals:
185
+ Enabled: true
186
+ EnforcedStyle: double_quotes
187
+
188
+ Style/StringLiteralsInInterpolation:
189
+ Enabled: true
190
+ EnforcedStyle: double_quotes
191
+
192
+ Style/TrailingCommaInHashLiteral:
193
+ EnforcedStyleForMultiline: consistent_comma
194
+
195
+ Style/TrailingCommaInArrayLiteral:
196
+ EnforcedStyleForMultiline: consistent_comma
197
+
198
+ Style/YodaCondition:
199
+ EnforcedStyle: forbid_for_equality_operators_only
@@ -0,0 +1,42 @@
1
+ ## Current/Unreleased
2
+
3
+ ## Release v0.4.0 (2021-01-12)
4
+
5
+ * ⚡️ Big performance improvements, by using C `long double *cscores` array
6
+ * ⚡️ Scores must be `Integer` in `-uint64..+uint64`, or convertable to `Float`
7
+ * ⚡️ many many (so many) updates to benchmarks
8
+ * ✨ Added `DHeap#clear`
9
+ * 🐛 Fixed `DHeap#initialize_copy` and `#freeze`
10
+ * ♻️ significant refactoring
11
+ * 📝 Updated docs (mostly adding benchmarks)
12
+
13
+ ## Release v0.3.0 (2020-12-29)
14
+
15
+ * ⚡️ Big performance improvements, by converting to a `T_DATA` struct.
16
+ * ♻️ Major refactoring/rewriting of dheap.c
17
+ * ✅ Added benchmark specs
18
+ * 🔥 Removed class methods that operated directly on an array. They weren't
19
+ compatible with the performance improvements.
20
+
21
+ ## Release v0.2.2 (2020-12-27)
22
+
23
+ * 🐛 fix `optimized_cmp`, avoiding internal symbols
24
+ * 📝 Update documentation
25
+ * 💚 fix macos CI
26
+ * ➕ Add rubocop 👮🎨
27
+
28
+ ## Release v0.2.1 (2020-12-26)
29
+
30
+ * ⬆️ Upgraded rake (and bundler) to support ruby 3.0
31
+
32
+ ## Release v0.2.0 (2020-12-24)
33
+
34
+ * ✨ Add ability to push separate score and value
35
+ * ⚡️ Big performance gain, by storing scores separately and using ruby's
36
+ internal `OPTIMIZED_CMP` instead of always directly calling `<=>`
37
+
38
+ ## Release v0.1.0 (2020-12-22)
39
+
40
+ 🎉 initial release 🎉
41
+
42
+ * ✨ Add basic d-ary Heap implementation
data/Gemfile CHANGED
@@ -1,8 +1,12 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source "https://rubygems.org"
2
4
 
3
5
  # Specify your gem's dependencies in d_heap.gemspec
4
6
  gemspec
5
7
 
6
- gem "rake", "~> 12.0"
8
+ gem "pry"
9
+ gem "rake", "~> 13.0"
7
10
  gem "rake-compiler"
8
- gem "rspec", "~> 3.0"
11
+ gem "rspec", "~> 3.10"
12
+ gem "rubocop", "~> 1.0"
@@ -1,15 +1,28 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- d_heap (0.1.0)
4
+ d_heap (0.4.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
+ ast (2.4.1)
10
+ benchmark_driver (0.15.16)
11
+ coderay (1.1.3)
9
12
  diff-lcs (1.4.4)
10
- rake (12.3.3)
13
+ method_source (1.0.0)
14
+ parallel (1.19.2)
15
+ parser (2.7.2.0)
16
+ ast (~> 2.4.1)
17
+ pry (0.13.1)
18
+ coderay (~> 1.1)
19
+ method_source (~> 1.0)
20
+ rainbow (3.0.0)
21
+ rake (13.0.3)
11
22
  rake-compiler (1.1.1)
12
23
  rake
24
+ regexp_parser (1.8.2)
25
+ rexml (3.2.3)
13
26
  rspec (3.10.0)
14
27
  rspec-core (~> 3.10.0)
15
28
  rspec-expectations (~> 3.10.0)
@@ -23,15 +36,33 @@ GEM
23
36
  diff-lcs (>= 1.2.0, < 2.0)
24
37
  rspec-support (~> 3.10.0)
25
38
  rspec-support (3.10.0)
39
+ rubocop (1.2.0)
40
+ parallel (~> 1.10)
41
+ parser (>= 2.7.1.5)
42
+ rainbow (>= 2.2.2, < 4.0)
43
+ regexp_parser (>= 1.8)
44
+ rexml
45
+ rubocop-ast (>= 1.0.1)
46
+ ruby-progressbar (~> 1.7)
47
+ unicode-display_width (>= 1.4.0, < 2.0)
48
+ rubocop-ast (1.1.1)
49
+ parser (>= 2.7.1.5)
50
+ ruby-prof (1.4.2)
51
+ ruby-progressbar (1.10.1)
52
+ unicode-display_width (1.7.0)
26
53
 
27
54
  PLATFORMS
28
55
  ruby
29
56
 
30
57
  DEPENDENCIES
58
+ benchmark_driver
31
59
  d_heap!
32
- rake (~> 12.0)
60
+ pry
61
+ rake (~> 13.0)
33
62
  rake-compiler
34
- rspec (~> 3.0)
63
+ rspec (~> 3.10)
64
+ rubocop (~> 1.0)
65
+ ruby-prof
35
66
 
36
67
  BUNDLED WITH
37
- 2.1.4
68
+ 2.2.3
data/README.md CHANGED
@@ -1,53 +1,127 @@
1
1
  # DHeap
2
2
 
3
- A fast _d_-ary heap implementation for ruby, useful in priority queues and graph
4
- algorithms.
5
-
6
- The _d_-ary heap data structure is a generalization of the binary heap, in which
7
- the nodes have _d_ children instead of 2. This allows for "decrease priority"
8
- operations to be performed more quickly with the tradeoff of slower delete
9
- minimum. Additionally, _d_-ary heaps can have better memory cache behavior than
10
- binary heaps, allowing them to run more quickly in practice despite slower
11
- worst-case time complexity.
12
-
13
- _TODO:_ In addition to a basic _d_-ary heap class (`DHeap`), this library
14
- ~~includes~~ _will include_ extensions to `Array`, allowing an Array to be
15
- directly handled as a priority queue. These extension methods are meant to be
16
- used similarly to how `#bsearch` and `#bsearch_index` might be used.
17
-
18
- _TODO:_ Also included is `DHeap::Set`, which augments the basic heap with an
19
- internal `Hash`, which maps a set of values to scores.
20
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
21
- and adjustments, to achieve faster average runtime for *add* and *cancel*
22
- operations.
3
+ A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
4
+ implemented as a C extension.
5
+
6
+ With a regular queue, you expect "FIFO" behavior: first in, first out. With a
7
+ stack you expect "LIFO": last in first out. With a priority queue, you push
8
+ elements along with a score and the lowest scored element is the first to be
9
+ popped. Priority queues are often used in algorithms for e.g. [scheduling] of
10
+ timers or bandwidth management, [Huffman coding], and various graph search
11
+ algorithms such as [Dijkstra's algorithm], [A* search], or [Prim's algorithm].
12
+
13
+ The _d_-ary heap data structure is a generalization of the [binary heap], in
14
+ which the nodes have _d_ children instead of 2. This allows for "decrease
15
+ priority" operations to be performed more quickly with the tradeoff of slower
16
+ delete minimum. Additionally, _d_-ary heaps can have better memory cache
17
+ behavior than binary heaps, allowing them to run more quickly in practice
18
+ despite slower worst-case time complexity. In the worst case, a _d_-ary heap
19
+ requires only `O(log n / log d)` to push, with the tradeoff that pop is `O(d log
20
+ n / log d)`.
21
+
22
+ Although you should probably just use the default _d_ value of `4` (see the
23
+ analysis below), it's always advisable to benchmark your specific use-case.
24
+
25
+ [d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
26
+ [priority queue]: https://en.wikipedia.org/wiki/Priority_queue
27
+ [binary heap]: https://en.wikipedia.org/wiki/Binary_heap
28
+ [scheduling]: https://en.wikipedia.org/wiki/Scheduling_(computing)
29
+ [Huffman coding]: https://en.wikipedia.org/wiki/Huffman_coding#Compression
30
+ [Dijkstra's algorithm]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Using_a_priority_queue
31
+ [A* search]: https://en.wikipedia.org/wiki/A*_search_algorithm#Description
32
+ [Prim's algorithm]: https://en.wikipedia.org/wiki/Prim%27s_algorithm
23
33
 
24
- _TODO:_ Also included is `DHeap::Timers`, which contains some features that are
25
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
26
- and adjustments, to achieve faster average runtime for *add* and *cancel*
27
- operations.
34
+ ## Usage
28
35
 
29
- ## Motivation
36
+ The basic API is:
37
+ * `heap << object` adds a value as its own score.
38
+ * `heap.push(score, value)` adds a value with an extrinsic score.
39
+ * `heap.pop` removes and returns the value with the minimum score.
40
+ * `heap.pop_lte(score)` pops if the minimum score is `<=` the provided score.
41
+ * `heap.peek` to view the minimum value without popping it.
30
42
 
31
- Ruby's Array class comes with some helpful methods for maintaining a sorted
32
- array, by combining `#bsearch_index` with `#insert`. With certain insert/remove
33
- workloads that can perform very well, but in the worst-case an insert or delete
34
- can result in O(n), since it may need to memcopy a significant portion of the
35
- array. Knowing that priority queues are usually implemented with a heap, and
36
- that the heap is a relatively simple data structure, I set out to replace my
37
- `#bsearch_index` and `#insert` code with a one. I was surprised to find that,
38
- at least under certain benchmarks, my ruby Heap implementation was tied with or
39
- slower than inserting into a fully sorted array. On the one hand, this is a
40
- testament to ruby's fine-tuned Array implementation. On the other hand, it
41
- seemed like a heap implementated in C should easily match the speed of ruby's
42
- bsearch + insert.
43
+ The score must be `Integer` or `Float` or convertable to a `Float` via
44
+ `Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
45
+ numeric values gives a 40+% speedup under some benchmarks!
43
46
 
44
- Additionally, I was inspired by reading go's "timer.go" implementation to
45
- experiment with a 4-ary heap, instead of the traditional binary heap. In the
46
- case of timers, new timers are usually scheduled to run after most of the
47
- existing timers and timers are usually canceled before they have a chance to
48
- run. While a binary heap holds 50% of its elements in its last layer, 75% of a
49
- 4-ary heap will have no children. That diminishes the extra comparison
50
- overhead during sift-down.
47
+ _n.b._ `Integer` _scores must have an absolute value that fits into_ `unsigned
48
+ long long`. _This is architecture dependant but on an IA-64 system this is 64
49
+ bits, which gives a range of -18,446,744,073,709,551,615 to
50
+ +18446744073709551615._
51
+
52
+ _Comparing arbitary objects via_ `a <=> b` _was the original design and may
53
+ be added back in a future version,_ if (and only if) _it can be done without
54
+ impacting the speed of numeric comparisons._
55
+
56
+ ```ruby
57
+ require "d_heap"
58
+
59
+ Task = Struct.new(:id) # for demonstration
60
+
61
+ heap = DHeap.new # defaults to a 4-heap
62
+
63
+ # storing [score, value] tuples
64
+ heap.push Time.now + 5*60, Task.new(1)
65
+ heap.push Time.now + 30, Task.new(2)
66
+ heap.push Time.now + 60, Task.new(3)
67
+ heap.push Time.now + 5, Task.new(4)
68
+
69
+ # peeking and popping (using last to get the task and ignore the time)
70
+ heap.pop # => Task[4]
71
+ heap.pop # => Task[2]
72
+ heap.peek # => Task[3], but don't pop it from the heap
73
+ heap.pop # => Task[3]
74
+ heap.pop # => Task[1]
75
+ heap.empty? # => true
76
+ heap.pop # => nil
77
+ ```
78
+
79
+ If your values behave as their own score, by being convertible via
80
+ `Float(value)`, then you can use `#<<` for implicit scoring. The score should
81
+ not change for as long as the value remains in the heap, since it will not be
82
+ re-evaluated after being pushed.
83
+
84
+ ```ruby
85
+ heap.clear
86
+
87
+ # The score can be derived from the value by using to_f.
88
+ # "a <=> b" is *much* slower than comparing numbers, so it isn't used.
89
+ class Event
90
+ include Comparable
91
+ attr_reader :time, :payload
92
+ alias_method :to_time, :time
93
+
94
+ def initialize(time, payload)
95
+ @time = time.to_time
96
+ @payload = payload
97
+ freeze
98
+ end
99
+
100
+ def to_f
101
+ time.to_f
102
+ end
103
+
104
+ def <=>(other)
105
+ to_f <=> other.to_f
106
+ end
107
+ end
108
+
109
+ heap << comparable_max # sorts last, using <=>
110
+ heap << comparable_min # sorts first, using <=>
111
+ heap << comparable_mid # sorts in the middle, using <=>
112
+ heap.pop # => comparable_min
113
+ heap.pop # => comparable_mid
114
+ heap.pop # => comparable_max
115
+ heap.empty? # => true
116
+ heap.pop # => nil
117
+ ```
118
+
119
+ You can also pass a value into `#pop(max)` which will only pop if the minimum
120
+ score is less than or equal to `max`.
121
+
122
+ Read the [API documentation] for more detailed documentation and examples.
123
+
124
+ [API documentation]: https://rubydoc.info/gems/d_heap/DHeap
51
125
 
52
126
  ## Installation
53
127
 
@@ -65,49 +139,293 @@ Or install it yourself as:
65
139
 
66
140
  $ gem install d_heap
67
141
 
68
- ## Usage
142
+ ## Motivation
69
143
 
70
- The simplest way to use it is simply with `#push` and `#pop`. Push will
144
+ One naive approach to a priority queue is to maintain an array in sorted order.
145
+ This can be very simply implemented using `Array#bseach_index` + `Array#insert`.
146
+ This can be very fast—`Array#pop` is `O(1)`—but the worst-case for insert is
147
+ `O(n)` because it may need to `memcpy` a significant portion of the array.
148
+
149
+ The standard way to implement a priority queue is with a binary heap. Although
150
+ this increases the time for `pop`, it converts the amortized time per push + pop
151
+ from `O(n)` to `O(d log n / log d)`.
152
+
153
+ However, I was surprised to find that—at least for some benchmarks—my pure ruby
154
+ heap implementation was much slower than inserting into and popping from a fully
155
+ sorted array. The reason for this surprising result: Although it is `O(n)`,
156
+ `memcpy` has a _very_ small constant factor, and calling `<=>` from ruby code
157
+ has relatively _much_ larger constant factors. If your queue contains only a
158
+ few thousand items, the overhead of those extra calls to `<=>` is _far_ more
159
+ than occasionally calling `memcpy`. In the worst case, a _d_-heap will require
160
+ `d + 1` times more comparisons for each push + pop than a `bsearch` + `insert`
161
+ sorted array.
162
+
163
+ Moving the sift-up and sift-down code into C helps some. But much more helpful
164
+ is optimizing the comparison of numeric scores, so `a <=> b` never needs to be
165
+ called. I'm hopeful that MJIT will eventually obsolete this C-extension. JRuby
166
+ or TruffleRuby may already run the pure ruby version at high speed. This can be
167
+ hotspot code, and the basic ruby implementation should perform well if not for
168
+ the high overhead of `<=>`.
169
+
170
+ ## Analysis
171
+
172
+ ### Time complexity
173
+
174
+ There are two fundamental heap operations: sift-up (used by push) and sift-down
175
+ (used by pop).
176
+
177
+ * Both sift operations can perform as many as `log n / log d` swaps, as the
178
+ element may sift from the bottom of the tree to the top, or vice versa.
179
+ * Sift-up performs a single comparison per swap: `O(1)`.
180
+ So pushing a new element is `O(log n / log d)`.
181
+ * Swap down performs as many as d comparions per swap: `O(d)`.
182
+ So popping the min element is `O(d log n / log d)`.
183
+
184
+ Assuming every inserted element is eventually deleted from the root, d=4
185
+ requires the fewest comparisons for combined insert and delete:
186
+
187
+ * (1 + 2) lg 2 = 4.328085
188
+ * (1 + 3) lg 3 = 3.640957
189
+ * (1 + 4) lg 4 = 3.606738
190
+ * (1 + 5) lg 5 = 3.728010
191
+ * (1 + 6) lg 6 = 3.906774
192
+ * etc...
193
+
194
+ Leaf nodes require no comparisons to shift down, and higher values for d have
195
+ higher percentage of leaf nodes:
196
+
197
+ * d=2 has ~50% leaves,
198
+ * d=3 has ~67% leaves,
199
+ * d=4 has ~75% leaves,
200
+ * and so on...
201
+
202
+ See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
203
+
204
+ ### Space complexity
205
+
206
+ Space usage is linear, regardless of d. However higher d values may
207
+ provide better cache locality. Because the heap is a complete binary tree, the
208
+ elements can be stored in an array, without the need for tree or list pointers.
209
+
210
+ Ruby can compare Numeric values _much_ faster than other ruby objects, even if
211
+ those objects simply delegate comparison to internal Numeric values. And it is
212
+ often useful to use external scores for otherwise uncomparable values. So
213
+ `DHeap` uses twice as many entries (one for score and one for value)
214
+ as an array which only stores values.
71
215
 
72
- ```ruby
73
- require "d_heap"
216
+ ## Benchmarks
74
217
 
75
- heap = DHeap.new # defaults to a 4-ary heap
218
+ _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
219
+ `docs/profile.txt` for more details or updated results. These benchmarks were
220
+ measured with v0.4.0 and ruby 2.7.2 without MJIT enabled._
221
+
222
+ These benchmarks use very simple implementations for a pure-ruby heap and an
223
+ array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
224
+ comparison, an alternate implementation `Array#min` and `Array#delete_at` is
225
+ also shown.
226
+
227
+ Three different scenarios are measured:
228
+ * push N values but never pop (clearing between each set of pushes).
229
+ * push N values and then pop N values.
230
+ Although this could be used for heap sort, we're unlikely to choose heap sort
231
+ over Ruby's quick sort implementation. I'm using this scenario to represent
232
+ the amortized cost of creating a heap and (eventually) draining it.
233
+ * For a heap of size N, repeatedly push and pop while keeping a stable size.
234
+ This is a _very simple_ approximation for how most scheduler/timer heaps
235
+ would be used. Usually when a timer fires it will be quickly replaced by a
236
+ new timer, and the overall count of timers will remain roughly stable.
237
+
238
+ In these benchmarks, `DHeap` runs faster than all other implementations for
239
+ every scenario and every value of N, although the difference is much more
240
+ noticable at higher values of N. The pure ruby heap implementation is
241
+ competitive for `push` alone at every value of N, but is significantly slower
242
+ than bsearch + insert for push + pop until N is _very_ large (somewhere between
243
+ 10k and 100k)!
244
+
245
+ For very small N values the benchmark implementations, `DHeap` runs faster than
246
+ the other implementations for each scenario, although the difference is still
247
+ relatively small. The pure ruby binary heap is 2x or more slower than bsearch +
248
+ insert for common common push/pop scenario.
249
+
250
+ == push N (N=5) ==========================================================
251
+ push N (c_dheap): 1701338.1 i/s
252
+ push N (rb_heap): 971614.1 i/s - 1.75x slower
253
+ push N (bsearch): 946363.7 i/s - 1.80x slower
254
+
255
+ == push N then pop N (N=5) ===============================================
256
+ push N + pop N (c_dheap): 1087944.8 i/s
257
+ push N + pop N (findmin): 841708.1 i/s - 1.29x slower
258
+ push N + pop N (bsearch): 773252.7 i/s - 1.41x slower
259
+ push N + pop N (rb_heap): 471852.9 i/s - 2.31x slower
260
+
261
+ == Push/pop with pre-filled queue of size=N (N=5) ========================
262
+ push + pop (c_dheap): 5525418.8 i/s
263
+ push + pop (findmin): 5003904.8 i/s - 1.10x slower
264
+ push + pop (bsearch): 4320581.8 i/s - 1.28x slower
265
+ push + pop (rb_heap): 2207042.0 i/s - 2.50x slower
266
+
267
+ By N=21, `DHeap` has pulled significantly ahead of bsearch + insert for all
268
+ scenarios, but the pure ruby heap is still slower than every other
269
+ implementation—even resorting the array after every `#push`—in any scenario that
270
+ uses `#pop`.
271
+
272
+ == push N (N=21) =========================================================
273
+ push N (c_dheap): 408307.0 i/s
274
+ push N (rb_heap): 212275.2 i/s - 1.92x slower
275
+ push N (bsearch): 169583.2 i/s - 2.41x slower
276
+
277
+ == push N then pop N (N=21) ==============================================
278
+ push N + pop N (c_dheap): 199435.5 i/s
279
+ push N + pop N (findmin): 162024.5 i/s - 1.23x slower
280
+ push N + pop N (bsearch): 146284.3 i/s - 1.36x slower
281
+ push N + pop N (rb_heap): 72289.0 i/s - 2.76x slower
282
+
283
+ == Push/pop with pre-filled queue of size=N (N=21) =======================
284
+ push + pop (c_dheap): 4836860.0 i/s
285
+ push + pop (findmin): 4467453.9 i/s - 1.08x slower
286
+ push + pop (bsearch): 3345458.4 i/s - 1.45x slower
287
+ push + pop (rb_heap): 1560476.3 i/s - 3.10x slower
288
+
289
+ At higher values of N, `DHeap`'s logarithmic growth leads to little slowdown
290
+ of `DHeap#push`, while insert's linear growth causes it to run slower and
291
+ slower. But because `#pop` is O(1) for a sorted array and O(d log n / log d)
292
+ for a _d_-heap, scenarios involving `#pop` remain relatively close even as N
293
+ increases to 5k:
294
+
295
+ == Push/pop with pre-filled queue of size=N (N=5461) ==============
296
+ push + pop (c_dheap): 2718225.1 i/s
297
+ push + pop (bsearch): 1793546.4 i/s - 1.52x slower
298
+ push + pop (rb_heap): 707139.9 i/s - 3.84x slower
299
+ push + pop (findmin): 122316.0 i/s - 22.22x slower
300
+
301
+ Somewhat surprisingly, bsearch + insert still runs faster than a pure ruby heap
302
+ for the repeated push/pop scenario, all the way up to N as high as 87k:
303
+
304
+ == push N (N=87381) ======================================================
305
+ push N (c_dheap): 92.8 i/s
306
+ push N (rb_heap): 43.5 i/s - 2.13x slower
307
+ push N (bsearch): 2.9 i/s - 31.70x slower
308
+
309
+ == push N then pop N (N=87381) ===========================================
310
+ push N + pop N (c_dheap): 22.6 i/s
311
+ push N + pop N (rb_heap): 5.5 i/s - 4.08x slower
312
+ push N + pop N (bsearch): 2.9 i/s - 7.90x slower
313
+
314
+ == Push/pop with pre-filled queue of size=N (N=87381) ====================
315
+ push + pop (c_dheap): 1815277.3 i/s
316
+ push + pop (bsearch): 762343.2 i/s - 2.38x slower
317
+ push + pop (rb_heap): 535913.6 i/s - 3.39x slower
318
+ push + pop (findmin): 2262.8 i/s - 802.24x slower
319
+
320
+ ## Profiling
321
+
322
+ _n.b. `Array#fetch` is reading the input data, external to heap operations.
323
+ These benchmarks use integers for all scores, which enables significantly faster
324
+ comparisons. If `a <=> b` were used instead, then the difference between push
325
+ and pop would be much larger. And ruby's `Tracepoint` impacts these different
326
+ implementations differently. So we can't use these profiler results for
327
+ comparisons between implementations. A sampling profiler would be needed for
328
+ more accurate relative measurements._
329
+
330
+ It's informative to look at the `ruby-prof` results for a simple binary search +
331
+ insert implementation, repeatedly pushing and popping to a large heap. In
332
+ particular, even with 1000 members, the linear `Array#insert` is _still_ faster
333
+ than the logarithmic `Array#bsearch_index`. At this scale, ruby comparisons are
334
+ still (relatively) slow and `memcpy` is (relatively) quite fast!
335
+
336
+ %self total self wait child calls name location
337
+ 34.79 2.222 2.222 0.000 0.000 1000000 Array#insert
338
+ 32.59 2.081 2.081 0.000 0.000 1000000 Array#bsearch_index
339
+ 12.84 6.386 0.820 0.000 5.566 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
340
+ 10.38 4.966 0.663 0.000 4.303 1000000 DHeap::Benchmarks::BinarySearchAndInsert#<< d_heap/benchmarks/implementations.rb:61
341
+ 5.38 0.468 0.343 0.000 0.125 1000000 DHeap::Benchmarks::BinarySearchAndInsert#pop d_heap/benchmarks/implementations.rb:70
342
+ 2.06 0.132 0.132 0.000 0.000 1000000 Array#fetch
343
+ 1.95 0.125 0.125 0.000 0.000 1000000 Array#pop
344
+
345
+ Contrast this with a simplistic pure-ruby implementation of a binary heap:
346
+
347
+ %self total self wait child calls name location
348
+ 48.52 8.487 8.118 0.000 0.369 1000000 DHeap::Benchmarks::NaiveBinaryHeap#pop d_heap/benchmarks/implementations.rb:96
349
+ 42.94 7.310 7.184 0.000 0.126 1000000 DHeap::Benchmarks::NaiveBinaryHeap#<< d_heap/benchmarks/implementations.rb:80
350
+ 4.80 16.732 0.803 0.000 15.929 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
351
+
352
+ You can see that it spends almost more time in pop than it does in push. That
353
+ is expected behavior for a heap: although both are O(log n), pop is
354
+ significantly more complex, and has _d_ comparisons per layer.
355
+
356
+ And `DHeap` shows a similar comparison between push and pop, although it spends
357
+ half of its time in the benchmark code (which is written in ruby):
358
+
359
+ %self total self wait child calls name location
360
+ 43.09 1.685 0.726 0.000 0.959 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
361
+ 26.05 0.439 0.439 0.000 0.000 1000000 DHeap#<<
362
+ 23.57 0.397 0.397 0.000 0.000 1000000 DHeap#pop
363
+ 7.29 0.123 0.123 0.000 0.000 1000000 Array#fetch
364
+
365
+ ### Timers
366
+
367
+ Additionally, when used to sort timers, we can reasonably assume that:
368
+ * New timers usually sort after most existing timers.
369
+ * Most timers will be canceled before executing.
370
+ * Canceled timers usually sort after most existing timers.
371
+
372
+ So, if we are able to delete an item without searching for it, by keeping a map
373
+ of positions within the heap, most timers can be inserted and deleted in O(1)
374
+ time. Canceling a non-leaf timer can be further optimized by marking it as
375
+ canceled without immediately removing it from the heap. If the timer is
376
+ rescheduled before we garbage collect, adjusting its position will usually be
377
+ faster than a delete and re-insert.
76
378
 
77
- # storing [time, task] tuples
78
- heap << [Time.now + 5*60, Task.new(1)]
79
- heap << [Time.now + 30, Task.new(2)]
80
- heap << [Time.now + 60, Task.new(3)]
81
- heap << [Time.now + 5, Task.new(4)]
379
+ ## Alternative data structures
82
380
 
83
- # peeking and popping (using last to get the task and ignore the time)
84
- heap.pop.last # => Task[4]
85
- heap.pop.last # => Task[2]
86
- heap.peak.last # => Task[3]
87
- heap.pop.last # => Task[3]
88
- heap.pop.last # => Task[1]
89
- ```
381
+ As always, you should run benchmarks with your expected scenarios to determine
382
+ which is right.
90
383
 
91
- Read the `rdoc` for more detailed documentation and examples.
384
+ Depending on what you're doing, maintaining a sorted `Array` using
385
+ `#bsearch_index` and `#insert` might be just fine! As discussed above, although
386
+ it is `O(n)` for insertions, `memcpy` is so fast on modern hardware that this
387
+ may not matter. Also, if you can arrange for insertions to occur near the end
388
+ of the array, that could significantly reduce the `memcpy` overhead even more.
92
389
 
93
- ## Benchmarks
390
+ More complex heap varients, e.g. [Fibonacci heap], can allow heaps to be merged
391
+ as well as lower amortized time.
94
392
 
95
- _TODO: put benchmarks here._
393
+ [Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
96
394
 
97
- ## Alternative data structures
395
+ If it is important to be able to quickly enumerate the set or find the ranking
396
+ of values in it, then you may want to use a self-balancing binary search tree
397
+ (e.g. a [red-black tree]) or a [skip-list].
98
398
 
99
- Depending on what you're doing, maintaining a sorted `Array` using
100
- `#bsearch_index` and `#insert` might be faster!
399
+ [red-black tree]: https://en.wikipedia.org/wiki/Red%E2%80%93black_tree
400
+ [skip-list]: https://en.wikipedia.org/wiki/Skip_list
101
401
 
102
- If it is important to be able to quickly enumerate the set or find the ranking
103
- of values in it, then you probably want to use a self-balancing binary search
104
- tree (e.g. a red-black tree) or a skip-list.
105
-
106
- A Hashed Timing Wheel or Heirarchical Timing Wheels (or some variant in that
107
- family of data structures) can be constructed to have effectively O(1) running
108
- time in most cases. However, the implementation for that data structure is much
109
- more complex than a heap. If a 4-ary heap is good enough for go's timers,
110
- it should be suitable for many use cases.
402
+ [Hashed and Heirarchical Timing Wheels][timing wheels] (or some variant in that
403
+ family of data structures) can be constructed to have effectively `O(1)` running
404
+ time in most cases. Although the implementation for that data structure is more
405
+ complex than a heap, it may be necessary for enormous values of N.
406
+
407
+ [timing wheels]: http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
408
+
409
+ ## TODOs...
410
+
411
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Set`, which augments the
412
+ basic heap with an internal `Hash`, which maps a set of values to scores.
413
+ loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
414
+ and adjustments, to achieve faster average runtime for *add* and *cancel*
415
+ operations.
416
+
417
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
418
+ features that are loosely inspired by go's timers. e.g: It lazily sifts its
419
+ heap after deletion and adjustments, to achieve faster average runtime for *add*
420
+ and *cancel* operations.
421
+
422
+ Additionally, I was inspired by reading go's "timer.go" implementation to
423
+ experiment with a 4-ary heap instead of the traditional binary heap. In the
424
+ case of timers, new timers are usually scheduled to run after most of the
425
+ existing timers. And timers are usually canceled before they have a chance to
426
+ run. While a binary heap holds 50% of its elements in its last layer, 75% of a
427
+ 4-ary heap will have no children. That diminishes the extra comparison overhead
428
+ during sift-down.
111
429
 
112
430
  ## Development
113
431