d_heap 0.1.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 585cdfa768be7f778b92fa500d952f4207b98ea46378d7289f2ff936bc1f06f0
4
- data.tar.gz: d306234243d78a9dfd0ef9beb06e569ddd9915f2c733ebd9109953fc84584e11
3
+ metadata.gz: 413c0a93e2c3cbdbb86ee433df47a310034d453e441a150d8317dc055b4a9a90
4
+ data.tar.gz: 4bf67447021da03b07da7f44bcf97a66f13fa42f6f67bcfe9a49d0866c8b8167
5
5
  SHA512:
6
- metadata.gz: dd379eecd4bc218b62729d2fdd39773ffbf1e248fb83eeffda910f439644ad5ed8588cf1e8fd4de807ca074f3165e073300d584eb64fae2c54aeed5206374cfc
7
- data.tar.gz: 7fac571f7d14bb2a7b6d2773fe531a803474587a83508710c4898c15649117558aaa399151d5be3dd49f07afbb06ffd38fe55161e07e483cee729c4f1efdbb20
6
+ metadata.gz: 5e55bf53c1062686e0863fb9c3b09f3c2b8b936b0cf83985092e1e906b0b24f40e02a42eada048dcea732a60ec4e3695bb861943b424a8cd3152b227abad8a4e
7
+ data.tar.gz: e021616d6dcdcec943fec11783f2147a2d175aa3a0caf668c2339e05795e32475cdf7be90201c38d7108c74087e22140a9da9d3f211d7f0335e3c173ae83893b
@@ -0,0 +1,26 @@
1
+ name: Ruby
2
+
3
+ on: [push,pull_request]
4
+
5
+ jobs:
6
+ build:
7
+ strategy:
8
+ fail-fast: false
9
+ matrix:
10
+ ruby: [2.5, 2.6, 2.7, 3.0]
11
+ os: [ubuntu, macos]
12
+ experimental: [false]
13
+ runs-on: ${{ matrix.os }}-latest
14
+ continue-on-error: ${{ matrix.experimental }}
15
+ steps:
16
+ - uses: actions/checkout@v2
17
+ - name: Set up Ruby
18
+ uses: ruby/setup-ruby@v1
19
+ with:
20
+ ruby-version: ${{ matrix.ruby }}
21
+ bundler-cache: true
22
+ - name: Run the default task
23
+ run: |
24
+ gem install bundler -v 2.2.3
25
+ bundle install
26
+ bundle exec rake
@@ -0,0 +1,199 @@
1
+ inherit_mode:
2
+ merge:
3
+ - Exclude
4
+
5
+ AllCops:
6
+ TargetRubyVersion: 2.5
7
+ NewCops: disable
8
+ Exclude:
9
+ - bin/benchmark-driver
10
+ - bin/rake
11
+ - bin/rspec
12
+ - bin/rubocop
13
+
14
+ ###########################################################################
15
+ # rubocop defaults are simply WRONG about many rules... Sorry. It's true.
16
+
17
+ ###########################################################################
18
+ # Layout: Alignment. I want these to work, I really do...
19
+
20
+ # I wish this worked with "table". but that goes wrong sometimes.
21
+ Layout/HashAlignment: { Enabled: false }
22
+
23
+ # This needs to be configurable so parenthesis calls are aligned with first
24
+ # parameter, and non-parenthesis calls are aligned with fixed indentation.
25
+ Layout/ParameterAlignment: { Enabled: false }
26
+
27
+ ###########################################################################
28
+ # Layout: Empty lines
29
+
30
+ Layout/EmptyLineAfterGuardClause: { Enabled: false }
31
+ Layout/EmptyLineAfterMagicComment: { Enabled: true }
32
+ Layout/EmptyLineAfterMultilineCondition: { Enabled: false }
33
+ Layout/EmptyLines: { Enabled: true }
34
+ Layout/EmptyLinesAroundAccessModifier: { Enabled: true }
35
+ Layout/EmptyLinesAroundArguments: { Enabled: true }
36
+ Layout/EmptyLinesAroundBeginBody: { Enabled: true }
37
+ Layout/EmptyLinesAroundBlockBody: { Enabled: false }
38
+ Layout/EmptyLinesAroundExceptionHandlingKeywords: { Enabled: true }
39
+ Layout/EmptyLinesAroundMethodBody: { Enabled: true }
40
+
41
+ Layout/EmptyLineBetweenDefs:
42
+ Enabled: true
43
+ AllowAdjacentOneLineDefs: true
44
+
45
+ Layout/EmptyLinesAroundAttributeAccessor:
46
+ inherit_mode:
47
+ merge:
48
+ - Exclude
49
+ - AllowedMethods
50
+ Enabled: true
51
+ AllowedMethods:
52
+ - delegate
53
+ - def_delegator
54
+ - def_delegators
55
+ - def_instance_delegators
56
+
57
+ # "empty_lines_special" sometimes does the wrong thing and annoys me.
58
+ # But I've mostly learned to live with it... mostly. 🙁
59
+
60
+ Layout/EmptyLinesAroundClassBody:
61
+ Enabled: true
62
+ EnforcedStyle: empty_lines_special
63
+
64
+ Layout/EmptyLinesAroundModuleBody:
65
+ Enabled: true
66
+ EnforcedStyle: empty_lines_special
67
+
68
+ ###########################################################################
69
+ # Layout: Space around, before, inside, etc
70
+
71
+ Layout/SpaceAroundEqualsInParameterDefault: { Enabled: false }
72
+ Layout/SpaceBeforeBlockBraces: { Enabled: false }
73
+ Layout/SpaceBeforeFirstArg: { Enabled: false }
74
+ Layout/SpaceInLambdaLiteral: { Enabled: false }
75
+ Layout/SpaceInsideArrayLiteralBrackets: { Enabled: false }
76
+ Layout/SpaceInsideHashLiteralBraces: { Enabled: false }
77
+
78
+ Layout/SpaceInsideBlockBraces:
79
+ EnforcedStyle: space
80
+ EnforcedStyleForEmptyBraces: space
81
+ SpaceBeforeBlockParameters: false
82
+
83
+ # I would enable this if it were a bit better at handling alignment.
84
+ Layout/ExtraSpacing:
85
+ Enabled: false
86
+ AllowForAlignment: true
87
+ AllowBeforeTrailingComments: true
88
+
89
+ ###########################################################################
90
+ # Layout: Misc
91
+
92
+ Layout/LineLength:
93
+ Max: 90 # should stay under 80, but we'll allow a little wiggle-room
94
+
95
+ Layout/MultilineOperationIndentation: { Enabled: false }
96
+
97
+ Layout/MultilineMethodCallIndentation:
98
+ EnforcedStyle: indented
99
+
100
+ ###########################################################################
101
+ # Lint and Naming: rubocop defaults are mostly good, but...
102
+
103
+ Lint/UnusedMethodArgument: { Enabled: false }
104
+ Naming/BinaryOperatorParameterName: { Enabled: false } # def /(denominator)
105
+ Naming/RescuedExceptionsVariableName: { Enabled: false }
106
+
107
+ ###########################################################################
108
+ # Matrics:
109
+
110
+ Metrics/CyclomaticComplexity:
111
+ Max: 10
112
+
113
+ # Although it may be better to split specs into multiple files...?
114
+ Metrics/BlockLength:
115
+ Exclude:
116
+ - "spec/**/*_spec.rb"
117
+ CountAsOne:
118
+ - array
119
+ - hash
120
+ - heredoc
121
+
122
+ Metrics/ClassLength:
123
+ Max: 200
124
+ CountAsOne:
125
+ - array
126
+ - hash
127
+ - heredoc
128
+
129
+ ###########################################################################
130
+ # Style...
131
+
132
+ Style/AccessorGrouping: { Enabled: false }
133
+ Style/AsciiComments: { Enabled: false } # 👮 can't stop our 🎉🥳🎊🥳!
134
+ Style/ClassAndModuleChildren: { Enabled: false }
135
+ Style/EachWithObject: { Enabled: false }
136
+ Style/FormatStringToken: { Enabled: false }
137
+ Style/FloatDivision: { Enabled: false }
138
+ Style/IfUnlessModifier: { Enabled: false }
139
+ Style/IfWithSemicolon: { Enabled: false }
140
+ Style/Lambda: { Enabled: false }
141
+ Style/LineEndConcatenation: { Enabled: false }
142
+ Style/MixinGrouping: { Enabled: false }
143
+ Style/MultilineBlockChain: { Enabled: false }
144
+ Style/PerlBackrefs: { Enabled: false } # use occasionally/sparingly
145
+ Style/RescueStandardError: { Enabled: false }
146
+ Style/Semicolon: { Enabled: false }
147
+ Style/SingleLineMethods: { Enabled: false }
148
+ Style/StabbyLambdaParentheses: { Enabled: false }
149
+ Style/WhenThen : { Enabled: false }
150
+
151
+ # I require trailing commas elsewhere, but these are optional
152
+ Style/TrailingCommaInArguments: { Enabled: false }
153
+
154
+ # If rubocop had an option to only enforce this on constants and literals (e.g.
155
+ # strings, regexp, range), I'd agree.
156
+ #
157
+ # But if you are using it e.g. on method arguments of unknown type, in the same
158
+ # style that ruby uses it with grep, then you are doing exactly the right thing.
159
+ Style/CaseEquality: { Enabled: false }
160
+
161
+ # I'd enable if "require_parentheses_when_complex" considered unary '!' simple.
162
+ Style/TernaryParentheses:
163
+ EnforcedStyle: require_parentheses_when_complex
164
+ Enabled: false
165
+
166
+ Style/BlockDelimiters:
167
+ inherit_mode:
168
+ merge:
169
+ - Exclude
170
+ - ProceduralMethods
171
+ - IgnoredMethods
172
+ - FunctionalMethods
173
+ EnforcedStyle: semantic
174
+ AllowBracesOnProceduralOneLiners: true
175
+ IgnoredMethods:
176
+ - expect # rspec
177
+ - profile # ruby-prof
178
+ - ips # benchmark-ips
179
+
180
+
181
+ Style/FormatString:
182
+ EnforcedStyle: percent
183
+
184
+ Style/StringLiterals:
185
+ Enabled: true
186
+ EnforcedStyle: double_quotes
187
+
188
+ Style/StringLiteralsInInterpolation:
189
+ Enabled: true
190
+ EnforcedStyle: double_quotes
191
+
192
+ Style/TrailingCommaInHashLiteral:
193
+ EnforcedStyleForMultiline: consistent_comma
194
+
195
+ Style/TrailingCommaInArrayLiteral:
196
+ EnforcedStyleForMultiline: consistent_comma
197
+
198
+ Style/YodaCondition:
199
+ EnforcedStyle: forbid_for_equality_operators_only
@@ -0,0 +1,42 @@
1
+ ## Current/Unreleased
2
+
3
+ ## Release v0.4.0 (2021-01-12)
4
+
5
+ * ⚡️ Big performance improvements, by using C `long double *cscores` array
6
+ * ⚡️ Scores must be `Integer` in `-uint64..+uint64`, or convertable to `Float`
7
+ * ⚡️ many many (so many) updates to benchmarks
8
+ * ✨ Added `DHeap#clear`
9
+ * 🐛 Fixed `DHeap#initialize_copy` and `#freeze`
10
+ * ♻️ significant refactoring
11
+ * 📝 Updated docs (mostly adding benchmarks)
12
+
13
+ ## Release v0.3.0 (2020-12-29)
14
+
15
+ * ⚡️ Big performance improvements, by converting to a `T_DATA` struct.
16
+ * ♻️ Major refactoring/rewriting of dheap.c
17
+ * ✅ Added benchmark specs
18
+ * 🔥 Removed class methods that operated directly on an array. They weren't
19
+ compatible with the performance improvements.
20
+
21
+ ## Release v0.2.2 (2020-12-27)
22
+
23
+ * 🐛 fix `optimized_cmp`, avoiding internal symbols
24
+ * 📝 Update documentation
25
+ * 💚 fix macos CI
26
+ * ➕ Add rubocop 👮🎨
27
+
28
+ ## Release v0.2.1 (2020-12-26)
29
+
30
+ * ⬆️ Upgraded rake (and bundler) to support ruby 3.0
31
+
32
+ ## Release v0.2.0 (2020-12-24)
33
+
34
+ * ✨ Add ability to push separate score and value
35
+ * ⚡️ Big performance gain, by storing scores separately and using ruby's
36
+ internal `OPTIMIZED_CMP` instead of always directly calling `<=>`
37
+
38
+ ## Release v0.1.0 (2020-12-22)
39
+
40
+ 🎉 initial release 🎉
41
+
42
+ * ✨ Add basic d-ary Heap implementation
data/Gemfile CHANGED
@@ -1,8 +1,12 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source "https://rubygems.org"
2
4
 
3
5
  # Specify your gem's dependencies in d_heap.gemspec
4
6
  gemspec
5
7
 
6
- gem "rake", "~> 12.0"
8
+ gem "pry"
9
+ gem "rake", "~> 13.0"
7
10
  gem "rake-compiler"
8
- gem "rspec", "~> 3.0"
11
+ gem "rspec", "~> 3.10"
12
+ gem "rubocop", "~> 1.0"
@@ -1,15 +1,28 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- d_heap (0.1.0)
4
+ d_heap (0.4.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
+ ast (2.4.1)
10
+ benchmark_driver (0.15.16)
11
+ coderay (1.1.3)
9
12
  diff-lcs (1.4.4)
10
- rake (12.3.3)
13
+ method_source (1.0.0)
14
+ parallel (1.19.2)
15
+ parser (2.7.2.0)
16
+ ast (~> 2.4.1)
17
+ pry (0.13.1)
18
+ coderay (~> 1.1)
19
+ method_source (~> 1.0)
20
+ rainbow (3.0.0)
21
+ rake (13.0.3)
11
22
  rake-compiler (1.1.1)
12
23
  rake
24
+ regexp_parser (1.8.2)
25
+ rexml (3.2.3)
13
26
  rspec (3.10.0)
14
27
  rspec-core (~> 3.10.0)
15
28
  rspec-expectations (~> 3.10.0)
@@ -23,15 +36,33 @@ GEM
23
36
  diff-lcs (>= 1.2.0, < 2.0)
24
37
  rspec-support (~> 3.10.0)
25
38
  rspec-support (3.10.0)
39
+ rubocop (1.2.0)
40
+ parallel (~> 1.10)
41
+ parser (>= 2.7.1.5)
42
+ rainbow (>= 2.2.2, < 4.0)
43
+ regexp_parser (>= 1.8)
44
+ rexml
45
+ rubocop-ast (>= 1.0.1)
46
+ ruby-progressbar (~> 1.7)
47
+ unicode-display_width (>= 1.4.0, < 2.0)
48
+ rubocop-ast (1.1.1)
49
+ parser (>= 2.7.1.5)
50
+ ruby-prof (1.4.2)
51
+ ruby-progressbar (1.10.1)
52
+ unicode-display_width (1.7.0)
26
53
 
27
54
  PLATFORMS
28
55
  ruby
29
56
 
30
57
  DEPENDENCIES
58
+ benchmark_driver
31
59
  d_heap!
32
- rake (~> 12.0)
60
+ pry
61
+ rake (~> 13.0)
33
62
  rake-compiler
34
- rspec (~> 3.0)
63
+ rspec (~> 3.10)
64
+ rubocop (~> 1.0)
65
+ ruby-prof
35
66
 
36
67
  BUNDLED WITH
37
- 2.1.4
68
+ 2.2.3
data/README.md CHANGED
@@ -1,53 +1,127 @@
1
1
  # DHeap
2
2
 
3
- A fast _d_-ary heap implementation for ruby, useful in priority queues and graph
4
- algorithms.
5
-
6
- The _d_-ary heap data structure is a generalization of the binary heap, in which
7
- the nodes have _d_ children instead of 2. This allows for "decrease priority"
8
- operations to be performed more quickly with the tradeoff of slower delete
9
- minimum. Additionally, _d_-ary heaps can have better memory cache behavior than
10
- binary heaps, allowing them to run more quickly in practice despite slower
11
- worst-case time complexity.
12
-
13
- _TODO:_ In addition to a basic _d_-ary heap class (`DHeap`), this library
14
- ~~includes~~ _will include_ extensions to `Array`, allowing an Array to be
15
- directly handled as a priority queue. These extension methods are meant to be
16
- used similarly to how `#bsearch` and `#bsearch_index` might be used.
17
-
18
- _TODO:_ Also included is `DHeap::Set`, which augments the basic heap with an
19
- internal `Hash`, which maps a set of values to scores.
20
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
21
- and adjustments, to achieve faster average runtime for *add* and *cancel*
22
- operations.
3
+ A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
4
+ implemented as a C extension.
5
+
6
+ With a regular queue, you expect "FIFO" behavior: first in, first out. With a
7
+ stack you expect "LIFO": last in first out. With a priority queue, you push
8
+ elements along with a score and the lowest scored element is the first to be
9
+ popped. Priority queues are often used in algorithms for e.g. [scheduling] of
10
+ timers or bandwidth management, [Huffman coding], and various graph search
11
+ algorithms such as [Dijkstra's algorithm], [A* search], or [Prim's algorithm].
12
+
13
+ The _d_-ary heap data structure is a generalization of the [binary heap], in
14
+ which the nodes have _d_ children instead of 2. This allows for "decrease
15
+ priority" operations to be performed more quickly with the tradeoff of slower
16
+ delete minimum. Additionally, _d_-ary heaps can have better memory cache
17
+ behavior than binary heaps, allowing them to run more quickly in practice
18
+ despite slower worst-case time complexity. In the worst case, a _d_-ary heap
19
+ requires only `O(log n / log d)` to push, with the tradeoff that pop is `O(d log
20
+ n / log d)`.
21
+
22
+ Although you should probably just use the default _d_ value of `4` (see the
23
+ analysis below), it's always advisable to benchmark your specific use-case.
24
+
25
+ [d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
26
+ [priority queue]: https://en.wikipedia.org/wiki/Priority_queue
27
+ [binary heap]: https://en.wikipedia.org/wiki/Binary_heap
28
+ [scheduling]: https://en.wikipedia.org/wiki/Scheduling_(computing)
29
+ [Huffman coding]: https://en.wikipedia.org/wiki/Huffman_coding#Compression
30
+ [Dijkstra's algorithm]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Using_a_priority_queue
31
+ [A* search]: https://en.wikipedia.org/wiki/A*_search_algorithm#Description
32
+ [Prim's algorithm]: https://en.wikipedia.org/wiki/Prim%27s_algorithm
23
33
 
24
- _TODO:_ Also included is `DHeap::Timers`, which contains some features that are
25
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
26
- and adjustments, to achieve faster average runtime for *add* and *cancel*
27
- operations.
34
+ ## Usage
28
35
 
29
- ## Motivation
36
+ The basic API is:
37
+ * `heap << object` adds a value as its own score.
38
+ * `heap.push(score, value)` adds a value with an extrinsic score.
39
+ * `heap.pop` removes and returns the value with the minimum score.
40
+ * `heap.pop_lte(score)` pops if the minimum score is `<=` the provided score.
41
+ * `heap.peek` to view the minimum value without popping it.
30
42
 
31
- Ruby's Array class comes with some helpful methods for maintaining a sorted
32
- array, by combining `#bsearch_index` with `#insert`. With certain insert/remove
33
- workloads that can perform very well, but in the worst-case an insert or delete
34
- can result in O(n), since it may need to memcopy a significant portion of the
35
- array. Knowing that priority queues are usually implemented with a heap, and
36
- that the heap is a relatively simple data structure, I set out to replace my
37
- `#bsearch_index` and `#insert` code with a one. I was surprised to find that,
38
- at least under certain benchmarks, my ruby Heap implementation was tied with or
39
- slower than inserting into a fully sorted array. On the one hand, this is a
40
- testament to ruby's fine-tuned Array implementation. On the other hand, it
41
- seemed like a heap implementated in C should easily match the speed of ruby's
42
- bsearch + insert.
43
+ The score must be `Integer` or `Float` or convertable to a `Float` via
44
+ `Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
45
+ numeric values gives a 40+% speedup under some benchmarks!
43
46
 
44
- Additionally, I was inspired by reading go's "timer.go" implementation to
45
- experiment with a 4-ary heap, instead of the traditional binary heap. In the
46
- case of timers, new timers are usually scheduled to run after most of the
47
- existing timers and timers are usually canceled before they have a chance to
48
- run. While a binary heap holds 50% of its elements in its last layer, 75% of a
49
- 4-ary heap will have no children. That diminishes the extra comparison
50
- overhead during sift-down.
47
+ _n.b._ `Integer` _scores must have an absolute value that fits into_ `unsigned
48
+ long long`. _This is architecture dependant but on an IA-64 system this is 64
49
+ bits, which gives a range of -18,446,744,073,709,551,615 to
50
+ +18446744073709551615._
51
+
52
+ _Comparing arbitary objects via_ `a <=> b` _was the original design and may
53
+ be added back in a future version,_ if (and only if) _it can be done without
54
+ impacting the speed of numeric comparisons._
55
+
56
+ ```ruby
57
+ require "d_heap"
58
+
59
+ Task = Struct.new(:id) # for demonstration
60
+
61
+ heap = DHeap.new # defaults to a 4-heap
62
+
63
+ # storing [score, value] tuples
64
+ heap.push Time.now + 5*60, Task.new(1)
65
+ heap.push Time.now + 30, Task.new(2)
66
+ heap.push Time.now + 60, Task.new(3)
67
+ heap.push Time.now + 5, Task.new(4)
68
+
69
+ # peeking and popping (using last to get the task and ignore the time)
70
+ heap.pop # => Task[4]
71
+ heap.pop # => Task[2]
72
+ heap.peek # => Task[3], but don't pop it from the heap
73
+ heap.pop # => Task[3]
74
+ heap.pop # => Task[1]
75
+ heap.empty? # => true
76
+ heap.pop # => nil
77
+ ```
78
+
79
+ If your values behave as their own score, by being convertible via
80
+ `Float(value)`, then you can use `#<<` for implicit scoring. The score should
81
+ not change for as long as the value remains in the heap, since it will not be
82
+ re-evaluated after being pushed.
83
+
84
+ ```ruby
85
+ heap.clear
86
+
87
+ # The score can be derived from the value by using to_f.
88
+ # "a <=> b" is *much* slower than comparing numbers, so it isn't used.
89
+ class Event
90
+ include Comparable
91
+ attr_reader :time, :payload
92
+ alias_method :to_time, :time
93
+
94
+ def initialize(time, payload)
95
+ @time = time.to_time
96
+ @payload = payload
97
+ freeze
98
+ end
99
+
100
+ def to_f
101
+ time.to_f
102
+ end
103
+
104
+ def <=>(other)
105
+ to_f <=> other.to_f
106
+ end
107
+ end
108
+
109
+ heap << comparable_max # sorts last, using <=>
110
+ heap << comparable_min # sorts first, using <=>
111
+ heap << comparable_mid # sorts in the middle, using <=>
112
+ heap.pop # => comparable_min
113
+ heap.pop # => comparable_mid
114
+ heap.pop # => comparable_max
115
+ heap.empty? # => true
116
+ heap.pop # => nil
117
+ ```
118
+
119
+ You can also pass a value into `#pop(max)` which will only pop if the minimum
120
+ score is less than or equal to `max`.
121
+
122
+ Read the [API documentation] for more detailed documentation and examples.
123
+
124
+ [API documentation]: https://rubydoc.info/gems/d_heap/DHeap
51
125
 
52
126
  ## Installation
53
127
 
@@ -65,49 +139,293 @@ Or install it yourself as:
65
139
 
66
140
  $ gem install d_heap
67
141
 
68
- ## Usage
142
+ ## Motivation
69
143
 
70
- The simplest way to use it is simply with `#push` and `#pop`. Push will
144
+ One naive approach to a priority queue is to maintain an array in sorted order.
145
+ This can be very simply implemented using `Array#bseach_index` + `Array#insert`.
146
+ This can be very fast—`Array#pop` is `O(1)`—but the worst-case for insert is
147
+ `O(n)` because it may need to `memcpy` a significant portion of the array.
148
+
149
+ The standard way to implement a priority queue is with a binary heap. Although
150
+ this increases the time for `pop`, it converts the amortized time per push + pop
151
+ from `O(n)` to `O(d log n / log d)`.
152
+
153
+ However, I was surprised to find that—at least for some benchmarks—my pure ruby
154
+ heap implementation was much slower than inserting into and popping from a fully
155
+ sorted array. The reason for this surprising result: Although it is `O(n)`,
156
+ `memcpy` has a _very_ small constant factor, and calling `<=>` from ruby code
157
+ has relatively _much_ larger constant factors. If your queue contains only a
158
+ few thousand items, the overhead of those extra calls to `<=>` is _far_ more
159
+ than occasionally calling `memcpy`. In the worst case, a _d_-heap will require
160
+ `d + 1` times more comparisons for each push + pop than a `bsearch` + `insert`
161
+ sorted array.
162
+
163
+ Moving the sift-up and sift-down code into C helps some. But much more helpful
164
+ is optimizing the comparison of numeric scores, so `a <=> b` never needs to be
165
+ called. I'm hopeful that MJIT will eventually obsolete this C-extension. JRuby
166
+ or TruffleRuby may already run the pure ruby version at high speed. This can be
167
+ hotspot code, and the basic ruby implementation should perform well if not for
168
+ the high overhead of `<=>`.
169
+
170
+ ## Analysis
171
+
172
+ ### Time complexity
173
+
174
+ There are two fundamental heap operations: sift-up (used by push) and sift-down
175
+ (used by pop).
176
+
177
+ * Both sift operations can perform as many as `log n / log d` swaps, as the
178
+ element may sift from the bottom of the tree to the top, or vice versa.
179
+ * Sift-up performs a single comparison per swap: `O(1)`.
180
+ So pushing a new element is `O(log n / log d)`.
181
+ * Swap down performs as many as d comparions per swap: `O(d)`.
182
+ So popping the min element is `O(d log n / log d)`.
183
+
184
+ Assuming every inserted element is eventually deleted from the root, d=4
185
+ requires the fewest comparisons for combined insert and delete:
186
+
187
+ * (1 + 2) lg 2 = 4.328085
188
+ * (1 + 3) lg 3 = 3.640957
189
+ * (1 + 4) lg 4 = 3.606738
190
+ * (1 + 5) lg 5 = 3.728010
191
+ * (1 + 6) lg 6 = 3.906774
192
+ * etc...
193
+
194
+ Leaf nodes require no comparisons to shift down, and higher values for d have
195
+ higher percentage of leaf nodes:
196
+
197
+ * d=2 has ~50% leaves,
198
+ * d=3 has ~67% leaves,
199
+ * d=4 has ~75% leaves,
200
+ * and so on...
201
+
202
+ See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
203
+
204
+ ### Space complexity
205
+
206
+ Space usage is linear, regardless of d. However higher d values may
207
+ provide better cache locality. Because the heap is a complete binary tree, the
208
+ elements can be stored in an array, without the need for tree or list pointers.
209
+
210
+ Ruby can compare Numeric values _much_ faster than other ruby objects, even if
211
+ those objects simply delegate comparison to internal Numeric values. And it is
212
+ often useful to use external scores for otherwise uncomparable values. So
213
+ `DHeap` uses twice as many entries (one for score and one for value)
214
+ as an array which only stores values.
71
215
 
72
- ```ruby
73
- require "d_heap"
216
+ ## Benchmarks
74
217
 
75
- heap = DHeap.new # defaults to a 4-ary heap
218
+ _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
219
+ `docs/profile.txt` for more details or updated results. These benchmarks were
220
+ measured with v0.4.0 and ruby 2.7.2 without MJIT enabled._
221
+
222
+ These benchmarks use very simple implementations for a pure-ruby heap and an
223
+ array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
224
+ comparison, an alternate implementation `Array#min` and `Array#delete_at` is
225
+ also shown.
226
+
227
+ Three different scenarios are measured:
228
+ * push N values but never pop (clearing between each set of pushes).
229
+ * push N values and then pop N values.
230
+ Although this could be used for heap sort, we're unlikely to choose heap sort
231
+ over Ruby's quick sort implementation. I'm using this scenario to represent
232
+ the amortized cost of creating a heap and (eventually) draining it.
233
+ * For a heap of size N, repeatedly push and pop while keeping a stable size.
234
+ This is a _very simple_ approximation for how most scheduler/timer heaps
235
+ would be used. Usually when a timer fires it will be quickly replaced by a
236
+ new timer, and the overall count of timers will remain roughly stable.
237
+
238
+ In these benchmarks, `DHeap` runs faster than all other implementations for
239
+ every scenario and every value of N, although the difference is much more
240
+ noticable at higher values of N. The pure ruby heap implementation is
241
+ competitive for `push` alone at every value of N, but is significantly slower
242
+ than bsearch + insert for push + pop until N is _very_ large (somewhere between
243
+ 10k and 100k)!
244
+
245
+ For very small N values the benchmark implementations, `DHeap` runs faster than
246
+ the other implementations for each scenario, although the difference is still
247
+ relatively small. The pure ruby binary heap is 2x or more slower than bsearch +
248
+ insert for common common push/pop scenario.
249
+
250
+ == push N (N=5) ==========================================================
251
+ push N (c_dheap): 1701338.1 i/s
252
+ push N (rb_heap): 971614.1 i/s - 1.75x slower
253
+ push N (bsearch): 946363.7 i/s - 1.80x slower
254
+
255
+ == push N then pop N (N=5) ===============================================
256
+ push N + pop N (c_dheap): 1087944.8 i/s
257
+ push N + pop N (findmin): 841708.1 i/s - 1.29x slower
258
+ push N + pop N (bsearch): 773252.7 i/s - 1.41x slower
259
+ push N + pop N (rb_heap): 471852.9 i/s - 2.31x slower
260
+
261
+ == Push/pop with pre-filled queue of size=N (N=5) ========================
262
+ push + pop (c_dheap): 5525418.8 i/s
263
+ push + pop (findmin): 5003904.8 i/s - 1.10x slower
264
+ push + pop (bsearch): 4320581.8 i/s - 1.28x slower
265
+ push + pop (rb_heap): 2207042.0 i/s - 2.50x slower
266
+
267
+ By N=21, `DHeap` has pulled significantly ahead of bsearch + insert for all
268
+ scenarios, but the pure ruby heap is still slower than every other
269
+ implementation—even resorting the array after every `#push`—in any scenario that
270
+ uses `#pop`.
271
+
272
+ == push N (N=21) =========================================================
273
+ push N (c_dheap): 408307.0 i/s
274
+ push N (rb_heap): 212275.2 i/s - 1.92x slower
275
+ push N (bsearch): 169583.2 i/s - 2.41x slower
276
+
277
+ == push N then pop N (N=21) ==============================================
278
+ push N + pop N (c_dheap): 199435.5 i/s
279
+ push N + pop N (findmin): 162024.5 i/s - 1.23x slower
280
+ push N + pop N (bsearch): 146284.3 i/s - 1.36x slower
281
+ push N + pop N (rb_heap): 72289.0 i/s - 2.76x slower
282
+
283
+ == Push/pop with pre-filled queue of size=N (N=21) =======================
284
+ push + pop (c_dheap): 4836860.0 i/s
285
+ push + pop (findmin): 4467453.9 i/s - 1.08x slower
286
+ push + pop (bsearch): 3345458.4 i/s - 1.45x slower
287
+ push + pop (rb_heap): 1560476.3 i/s - 3.10x slower
288
+
289
+ At higher values of N, `DHeap`'s logarithmic growth leads to little slowdown
290
+ of `DHeap#push`, while insert's linear growth causes it to run slower and
291
+ slower. But because `#pop` is O(1) for a sorted array and O(d log n / log d)
292
+ for a _d_-heap, scenarios involving `#pop` remain relatively close even as N
293
+ increases to 5k:
294
+
295
+ == Push/pop with pre-filled queue of size=N (N=5461) ==============
296
+ push + pop (c_dheap): 2718225.1 i/s
297
+ push + pop (bsearch): 1793546.4 i/s - 1.52x slower
298
+ push + pop (rb_heap): 707139.9 i/s - 3.84x slower
299
+ push + pop (findmin): 122316.0 i/s - 22.22x slower
300
+
301
+ Somewhat surprisingly, bsearch + insert still runs faster than a pure ruby heap
302
+ for the repeated push/pop scenario, all the way up to N as high as 87k:
303
+
304
+ == push N (N=87381) ======================================================
305
+ push N (c_dheap): 92.8 i/s
306
+ push N (rb_heap): 43.5 i/s - 2.13x slower
307
+ push N (bsearch): 2.9 i/s - 31.70x slower
308
+
309
+ == push N then pop N (N=87381) ===========================================
310
+ push N + pop N (c_dheap): 22.6 i/s
311
+ push N + pop N (rb_heap): 5.5 i/s - 4.08x slower
312
+ push N + pop N (bsearch): 2.9 i/s - 7.90x slower
313
+
314
+ == Push/pop with pre-filled queue of size=N (N=87381) ====================
315
+ push + pop (c_dheap): 1815277.3 i/s
316
+ push + pop (bsearch): 762343.2 i/s - 2.38x slower
317
+ push + pop (rb_heap): 535913.6 i/s - 3.39x slower
318
+ push + pop (findmin): 2262.8 i/s - 802.24x slower
319
+
320
+ ## Profiling
321
+
322
+ _n.b. `Array#fetch` is reading the input data, external to heap operations.
323
+ These benchmarks use integers for all scores, which enables significantly faster
324
+ comparisons. If `a <=> b` were used instead, then the difference between push
325
+ and pop would be much larger. And ruby's `Tracepoint` impacts these different
326
+ implementations differently. So we can't use these profiler results for
327
+ comparisons between implementations. A sampling profiler would be needed for
328
+ more accurate relative measurements._
329
+
330
+ It's informative to look at the `ruby-prof` results for a simple binary search +
331
+ insert implementation, repeatedly pushing and popping to a large heap. In
332
+ particular, even with 1000 members, the linear `Array#insert` is _still_ faster
333
+ than the logarithmic `Array#bsearch_index`. At this scale, ruby comparisons are
334
+ still (relatively) slow and `memcpy` is (relatively) quite fast!
335
+
336
+ %self total self wait child calls name location
337
+ 34.79 2.222 2.222 0.000 0.000 1000000 Array#insert
338
+ 32.59 2.081 2.081 0.000 0.000 1000000 Array#bsearch_index
339
+ 12.84 6.386 0.820 0.000 5.566 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
340
+ 10.38 4.966 0.663 0.000 4.303 1000000 DHeap::Benchmarks::BinarySearchAndInsert#<< d_heap/benchmarks/implementations.rb:61
341
+ 5.38 0.468 0.343 0.000 0.125 1000000 DHeap::Benchmarks::BinarySearchAndInsert#pop d_heap/benchmarks/implementations.rb:70
342
+ 2.06 0.132 0.132 0.000 0.000 1000000 Array#fetch
343
+ 1.95 0.125 0.125 0.000 0.000 1000000 Array#pop
344
+
345
+ Contrast this with a simplistic pure-ruby implementation of a binary heap:
346
+
347
+ %self total self wait child calls name location
348
+ 48.52 8.487 8.118 0.000 0.369 1000000 DHeap::Benchmarks::NaiveBinaryHeap#pop d_heap/benchmarks/implementations.rb:96
349
+ 42.94 7.310 7.184 0.000 0.126 1000000 DHeap::Benchmarks::NaiveBinaryHeap#<< d_heap/benchmarks/implementations.rb:80
350
+ 4.80 16.732 0.803 0.000 15.929 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
351
+
352
+ You can see that it spends almost more time in pop than it does in push. That
353
+ is expected behavior for a heap: although both are O(log n), pop is
354
+ significantly more complex, and has _d_ comparisons per layer.
355
+
356
+ And `DHeap` shows a similar comparison between push and pop, although it spends
357
+ half of its time in the benchmark code (which is written in ruby):
358
+
359
+ %self total self wait child calls name location
360
+ 43.09 1.685 0.726 0.000 0.959 1 DHeap::Benchmarks::Scenarios#repeated_push_pop d_heap/benchmarks.rb:77
361
+ 26.05 0.439 0.439 0.000 0.000 1000000 DHeap#<<
362
+ 23.57 0.397 0.397 0.000 0.000 1000000 DHeap#pop
363
+ 7.29 0.123 0.123 0.000 0.000 1000000 Array#fetch
364
+
365
+ ### Timers
366
+
367
+ Additionally, when used to sort timers, we can reasonably assume that:
368
+ * New timers usually sort after most existing timers.
369
+ * Most timers will be canceled before executing.
370
+ * Canceled timers usually sort after most existing timers.
371
+
372
+ So, if we are able to delete an item without searching for it, by keeping a map
373
+ of positions within the heap, most timers can be inserted and deleted in O(1)
374
+ time. Canceling a non-leaf timer can be further optimized by marking it as
375
+ canceled without immediately removing it from the heap. If the timer is
376
+ rescheduled before we garbage collect, adjusting its position will usually be
377
+ faster than a delete and re-insert.
76
378
 
77
- # storing [time, task] tuples
78
- heap << [Time.now + 5*60, Task.new(1)]
79
- heap << [Time.now + 30, Task.new(2)]
80
- heap << [Time.now + 60, Task.new(3)]
81
- heap << [Time.now + 5, Task.new(4)]
379
+ ## Alternative data structures
82
380
 
83
- # peeking and popping (using last to get the task and ignore the time)
84
- heap.pop.last # => Task[4]
85
- heap.pop.last # => Task[2]
86
- heap.peak.last # => Task[3]
87
- heap.pop.last # => Task[3]
88
- heap.pop.last # => Task[1]
89
- ```
381
+ As always, you should run benchmarks with your expected scenarios to determine
382
+ which is right.
90
383
 
91
- Read the `rdoc` for more detailed documentation and examples.
384
+ Depending on what you're doing, maintaining a sorted `Array` using
385
+ `#bsearch_index` and `#insert` might be just fine! As discussed above, although
386
+ it is `O(n)` for insertions, `memcpy` is so fast on modern hardware that this
387
+ may not matter. Also, if you can arrange for insertions to occur near the end
388
+ of the array, that could significantly reduce the `memcpy` overhead even more.
92
389
 
93
- ## Benchmarks
390
+ More complex heap varients, e.g. [Fibonacci heap], can allow heaps to be merged
391
+ as well as lower amortized time.
94
392
 
95
- _TODO: put benchmarks here._
393
+ [Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
96
394
 
97
- ## Alternative data structures
395
+ If it is important to be able to quickly enumerate the set or find the ranking
396
+ of values in it, then you may want to use a self-balancing binary search tree
397
+ (e.g. a [red-black tree]) or a [skip-list].
98
398
 
99
- Depending on what you're doing, maintaining a sorted `Array` using
100
- `#bsearch_index` and `#insert` might be faster!
399
+ [red-black tree]: https://en.wikipedia.org/wiki/Red%E2%80%93black_tree
400
+ [skip-list]: https://en.wikipedia.org/wiki/Skip_list
101
401
 
102
- If it is important to be able to quickly enumerate the set or find the ranking
103
- of values in it, then you probably want to use a self-balancing binary search
104
- tree (e.g. a red-black tree) or a skip-list.
105
-
106
- A Hashed Timing Wheel or Heirarchical Timing Wheels (or some variant in that
107
- family of data structures) can be constructed to have effectively O(1) running
108
- time in most cases. However, the implementation for that data structure is much
109
- more complex than a heap. If a 4-ary heap is good enough for go's timers,
110
- it should be suitable for many use cases.
402
+ [Hashed and Heirarchical Timing Wheels][timing wheels] (or some variant in that
403
+ family of data structures) can be constructed to have effectively `O(1)` running
404
+ time in most cases. Although the implementation for that data structure is more
405
+ complex than a heap, it may be necessary for enormous values of N.
406
+
407
+ [timing wheels]: http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
408
+
409
+ ## TODOs...
410
+
411
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Set`, which augments the
412
+ basic heap with an internal `Hash`, which maps a set of values to scores.
413
+ loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
414
+ and adjustments, to achieve faster average runtime for *add* and *cancel*
415
+ operations.
416
+
417
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
418
+ features that are loosely inspired by go's timers. e.g: It lazily sifts its
419
+ heap after deletion and adjustments, to achieve faster average runtime for *add*
420
+ and *cancel* operations.
421
+
422
+ Additionally, I was inspired by reading go's "timer.go" implementation to
423
+ experiment with a 4-ary heap instead of the traditional binary heap. In the
424
+ case of timers, new timers are usually scheduled to run after most of the
425
+ existing timers. And timers are usually canceled before they have a chance to
426
+ run. While a binary heap holds 50% of its elements in its last layer, 75% of a
427
+ 4-ary heap will have no children. That diminishes the extra comparison overhead
428
+ during sift-down.
111
429
 
112
430
  ## Development
113
431