d_heap 0.2.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 89808cd668688e16b5dd1e7e6b1ce9e57651217089b0686b4aeb83d41c565020
4
- data.tar.gz: 43e11d72c7143061d0b424f8290e4ef7f66dfaedc8df2f5eff11e20cce2f7796
3
+ metadata.gz: 35213e5ac430b07cf2b43a7f065ff5c409506022835a5326cb2bfa25daa7f210
4
+ data.tar.gz: e87a64fb9fd6eb8bdd281d8fe289b7f4993f4bde9a671b0b414aca194b724691
5
5
  SHA512:
6
- metadata.gz: 7ce7b4a755692a99fdee3d30e27f7dd02f6f90c12fe478a5d20f5e224183f82c5d7f82141c80d229edee48a1dcf6a90878c5648b3bb2107d093dcef5884abf59
7
- data.tar.gz: 03daf597a17aee15f67f2bf3aa37625e84c9d2f759c22313a9e1010b9f3878c1e8d2dc005b0f82fca5f1792ee57abf7f93d3ab176edad5e29686ca7dedaae905
6
+ metadata.gz: 77518eb11bf8dd5fa8a29ad88f48c30650ff375ae9b74001fa59d30f634feddf5c4f3ef8d791dc4064afc1d9d68b9b463eaf0e778e927655e0da0c0a9da6fdee
7
+ data.tar.gz: 2911b20a882d8b6f577bda9a388a9a755a093cfd3c1aaaaa355aa5be97ffe506935620043adb115158565ead6941e52f789cdd714e4932aff4916412c60a3aee
@@ -0,0 +1,26 @@
1
+ name: CI
2
+
3
+ on: [push,pull_request]
4
+
5
+ jobs:
6
+ build:
7
+ strategy:
8
+ fail-fast: false
9
+ matrix:
10
+ ruby: [2.4, 2.5, 2.6, 2.7, 3.0]
11
+ os: [ubuntu, macos]
12
+ experimental: [false]
13
+ runs-on: ${{ matrix.os }}-latest
14
+ continue-on-error: ${{ matrix.experimental }}
15
+ steps:
16
+ - uses: actions/checkout@v2
17
+ - name: Set up Ruby
18
+ uses: ruby/setup-ruby@v1
19
+ with:
20
+ ruby-version: ${{ matrix.ruby }}
21
+ bundler-cache: true
22
+ - name: Run the default task
23
+ run: |
24
+ gem install bundler -v 2.2.3
25
+ bundle install
26
+ bundle exec rake
data/.gitignore CHANGED
@@ -10,6 +10,7 @@
10
10
  *.so
11
11
  *.o
12
12
  *.a
13
+ compile_commands.json
13
14
  mkmf.log
14
15
 
15
16
  # rspec failure tracking
@@ -0,0 +1,199 @@
1
+ inherit_mode:
2
+ merge:
3
+ - Exclude
4
+
5
+ AllCops:
6
+ TargetRubyVersion: 2.4
7
+ NewCops: disable
8
+ Exclude:
9
+ - bin/benchmark-driver
10
+ - bin/rake
11
+ - bin/rspec
12
+ - bin/rubocop
13
+
14
+ ###########################################################################
15
+ # rubocop defaults are simply WRONG about many rules... Sorry. It's true.
16
+
17
+ ###########################################################################
18
+ # Layout: Alignment. I want these to work, I really do...
19
+
20
+ # I wish this worked with "table". but that goes wrong sometimes.
21
+ Layout/HashAlignment: { Enabled: false }
22
+
23
+ # This needs to be configurable so parenthesis calls are aligned with first
24
+ # parameter, and non-parenthesis calls are aligned with fixed indentation.
25
+ Layout/ParameterAlignment: { Enabled: false }
26
+
27
+ ###########################################################################
28
+ # Layout: Empty lines
29
+
30
+ Layout/EmptyLineAfterGuardClause: { Enabled: false }
31
+ Layout/EmptyLineAfterMagicComment: { Enabled: true }
32
+ Layout/EmptyLineAfterMultilineCondition: { Enabled: false }
33
+ Layout/EmptyLines: { Enabled: true }
34
+ Layout/EmptyLinesAroundAccessModifier: { Enabled: true }
35
+ Layout/EmptyLinesAroundArguments: { Enabled: true }
36
+ Layout/EmptyLinesAroundBeginBody: { Enabled: true }
37
+ Layout/EmptyLinesAroundBlockBody: { Enabled: false }
38
+ Layout/EmptyLinesAroundExceptionHandlingKeywords: { Enabled: true }
39
+ Layout/EmptyLinesAroundMethodBody: { Enabled: true }
40
+
41
+ Layout/EmptyLineBetweenDefs:
42
+ Enabled: true
43
+ AllowAdjacentOneLineDefs: true
44
+
45
+ Layout/EmptyLinesAroundAttributeAccessor:
46
+ inherit_mode:
47
+ merge:
48
+ - Exclude
49
+ - AllowedMethods
50
+ Enabled: true
51
+ AllowedMethods:
52
+ - delegate
53
+ - def_delegator
54
+ - def_delegators
55
+ - def_instance_delegators
56
+
57
+ # "empty_lines_special" sometimes does the wrong thing and annoys me.
58
+ # But I've mostly learned to live with it... mostly. 🙁
59
+
60
+ Layout/EmptyLinesAroundClassBody:
61
+ Enabled: true
62
+ EnforcedStyle: empty_lines_special
63
+
64
+ Layout/EmptyLinesAroundModuleBody:
65
+ Enabled: true
66
+ EnforcedStyle: empty_lines_special
67
+
68
+ ###########################################################################
69
+ # Layout: Space around, before, inside, etc
70
+
71
+ Layout/SpaceAroundEqualsInParameterDefault: { Enabled: false }
72
+ Layout/SpaceBeforeBlockBraces: { Enabled: false }
73
+ Layout/SpaceBeforeFirstArg: { Enabled: false }
74
+ Layout/SpaceInLambdaLiteral: { Enabled: false }
75
+ Layout/SpaceInsideArrayLiteralBrackets: { Enabled: false }
76
+ Layout/SpaceInsideHashLiteralBraces: { Enabled: false }
77
+
78
+ Layout/SpaceInsideBlockBraces:
79
+ EnforcedStyle: space
80
+ EnforcedStyleForEmptyBraces: space
81
+ SpaceBeforeBlockParameters: false
82
+
83
+ # I would enable this if it were a bit better at handling alignment.
84
+ Layout/ExtraSpacing:
85
+ Enabled: false
86
+ AllowForAlignment: true
87
+ AllowBeforeTrailingComments: true
88
+
89
+ ###########################################################################
90
+ # Layout: Misc
91
+
92
+ Layout/LineLength:
93
+ Max: 90 # should stay under 80, but we'll allow a little wiggle-room
94
+
95
+ Layout/MultilineOperationIndentation: { Enabled: false }
96
+
97
+ Layout/MultilineMethodCallIndentation:
98
+ EnforcedStyle: indented
99
+
100
+ ###########################################################################
101
+ # Lint and Naming: rubocop defaults are mostly good, but...
102
+
103
+ Lint/UnusedMethodArgument: { Enabled: false }
104
+ Naming/BinaryOperatorParameterName: { Enabled: false } # def /(denominator)
105
+ Naming/RescuedExceptionsVariableName: { Enabled: false }
106
+
107
+ ###########################################################################
108
+ # Matrics:
109
+
110
+ Metrics/CyclomaticComplexity:
111
+ Max: 10
112
+
113
+ # Although it may be better to split specs into multiple files...?
114
+ Metrics/BlockLength:
115
+ Exclude:
116
+ - "spec/**/*_spec.rb"
117
+ CountAsOne:
118
+ - array
119
+ - hash
120
+ - heredoc
121
+
122
+ Metrics/ClassLength:
123
+ Max: 200
124
+ CountAsOne:
125
+ - array
126
+ - hash
127
+ - heredoc
128
+
129
+ ###########################################################################
130
+ # Style...
131
+
132
+ Style/AccessorGrouping: { Enabled: false }
133
+ Style/AsciiComments: { Enabled: false } # 👮 can't stop our 🎉🥳🎊🥳!
134
+ Style/ClassAndModuleChildren: { Enabled: false }
135
+ Style/EachWithObject: { Enabled: false }
136
+ Style/FormatStringToken: { Enabled: false }
137
+ Style/FloatDivision: { Enabled: false }
138
+ Style/IfUnlessModifier: { Enabled: false }
139
+ Style/IfWithSemicolon: { Enabled: false }
140
+ Style/Lambda: { Enabled: false }
141
+ Style/LineEndConcatenation: { Enabled: false }
142
+ Style/MixinGrouping: { Enabled: false }
143
+ Style/MultilineBlockChain: { Enabled: false }
144
+ Style/PerlBackrefs: { Enabled: false } # use occasionally/sparingly
145
+ Style/RescueStandardError: { Enabled: false }
146
+ Style/Semicolon: { Enabled: false }
147
+ Style/SingleLineMethods: { Enabled: false }
148
+ Style/StabbyLambdaParentheses: { Enabled: false }
149
+ Style/WhenThen : { Enabled: false }
150
+
151
+ # I require trailing commas elsewhere, but these are optional
152
+ Style/TrailingCommaInArguments: { Enabled: false }
153
+
154
+ # If rubocop had an option to only enforce this on constants and literals (e.g.
155
+ # strings, regexp, range), I'd agree.
156
+ #
157
+ # But if you are using it e.g. on method arguments of unknown type, in the same
158
+ # style that ruby uses it with grep, then you are doing exactly the right thing.
159
+ Style/CaseEquality: { Enabled: false }
160
+
161
+ # I'd enable if "require_parentheses_when_complex" considered unary '!' simple.
162
+ Style/TernaryParentheses:
163
+ EnforcedStyle: require_parentheses_when_complex
164
+ Enabled: false
165
+
166
+ Style/BlockDelimiters:
167
+ inherit_mode:
168
+ merge:
169
+ - Exclude
170
+ - ProceduralMethods
171
+ - IgnoredMethods
172
+ - FunctionalMethods
173
+ EnforcedStyle: semantic
174
+ AllowBracesOnProceduralOneLiners: true
175
+ IgnoredMethods:
176
+ - expect # rspec
177
+ - profile # ruby-prof
178
+ - ips # benchmark-ips
179
+
180
+
181
+ Style/FormatString:
182
+ EnforcedStyle: percent
183
+
184
+ Style/StringLiterals:
185
+ Enabled: true
186
+ EnforcedStyle: double_quotes
187
+
188
+ Style/StringLiteralsInInterpolation:
189
+ Enabled: true
190
+ EnforcedStyle: double_quotes
191
+
192
+ Style/TrailingCommaInHashLiteral:
193
+ EnforcedStyleForMultiline: consistent_comma
194
+
195
+ Style/TrailingCommaInArrayLiteral:
196
+ EnforcedStyleForMultiline: consistent_comma
197
+
198
+ Style/YodaCondition:
199
+ EnforcedStyle: forbid_for_equality_operators_only
@@ -0,0 +1,10 @@
1
+ -o doc
2
+ --embed-mixins
3
+ --hide-void-return
4
+ --no-private
5
+ --asset images:images
6
+ --exclude lib/benchmark_driver
7
+ --exclude lib/d_heap/benchmarks*
8
+ -
9
+ CHANGELOG.md
10
+ CODE_OF_CONDUCT.md
@@ -0,0 +1,72 @@
1
+ ## Current/Unreleased
2
+
3
+ ## Release v0.6.0 (2021-01-24)
4
+
5
+ * 🔥 **Breaking**: `#initialize` uses a keyword argument for `d`
6
+ * ✨ Added `#initialize(capacity: capa)` to set initial capacity.
7
+ * ✨ Added `peek_with_score` and `peek_score`
8
+ * ✨ Added `pop_with_score` and `each_pop(with_score: true)`
9
+ * ✨ Added `pop_all_below(max_score, array = [])`
10
+ * ✨ Added aliases for `shift` and `next`
11
+ * 📈 Added benchmark charts to README, and `bin/bench_charts` to generate them.
12
+ * requires `gruff` which requires `rmagick` which requires `imagemagick`
13
+ * 📝 Many documentation updates and fixes.
14
+
15
+ ## Release v0.5.0 (2021-01-17)
16
+
17
+ * 🔥 **Breaking**: reversed order of `#push` arguments to `value, score`.
18
+ * ✨ Added `#insert(score, value)` to replace earlier version of `#push`.
19
+ * ✨ Added `#each_pop` enumerator.
20
+ * ✨ Added aliases for `deq`, `enq`, `first`, `pop_below`, `length`, and
21
+ `count`, to mimic other classes in ruby's stdlib.
22
+ * ⚡️♻️ More performance improvements:
23
+ * Created an `ENTRY` struct and store both the score and the value pointer in
24
+ the same `ENTRY *entries` array.
25
+ * Reduced unnecessary allocations or copies in both sift loops. A similar
26
+ refactoring also sped up the pure ruby benchmark implementation.
27
+ * Compiling with `-O3`.
28
+ * 📝 Updated (and in some cases, fixed) yardoc
29
+ * ♻️ Moved aliases and less performance sensitive code into ruby.
30
+ * ♻️ DRY up push/insert methods
31
+
32
+ ## Release v0.4.0 (2021-01-12)
33
+
34
+ * 🔥 **Breaking**: Scores must be `Integer` or convertable to `Float`
35
+ * ⚠️ `Integer` scores must fit in `-ULONG_LONG_MAX` to `+ULONG_LONG_MAX`.
36
+ * ⚡️ Big performance improvements, by using C `long double *cscores` array
37
+ * ⚡️ many many (so many) updates to benchmarks
38
+ * ✨ Added `DHeap#clear`
39
+ * 🐛 Fixed `DHeap#initialize_copy` and `#freeze`
40
+ * ♻️ significant refactoring
41
+ * 📝 Updated docs (mostly adding benchmarks)
42
+
43
+ ## Release v0.3.0 (2020-12-29)
44
+
45
+ * 🔥 **Breaking**: Removed class methods that operated directly on an array.
46
+ They weren't compatible with the performance improvements.
47
+ * ⚡️ Big performance improvements, by converting to a `T_DATA` struct.
48
+ * ♻️ Major refactoring/rewriting of dheap.c
49
+ * ✅ Added benchmark specs
50
+
51
+ ## Release v0.2.2 (2020-12-27)
52
+
53
+ * 🐛 fix `optimized_cmp`, avoiding internal symbols
54
+ * 📝 Update documentation
55
+ * 💚 fix macos CI
56
+ * ➕ Add rubocop 👮🎨
57
+
58
+ ## Release v0.2.1 (2020-12-26)
59
+
60
+ * ⬆️ Upgraded rake (and bundler) to support ruby 3.0
61
+
62
+ ## Release v0.2.0 (2020-12-24)
63
+
64
+ * ✨ Add ability to push separate score and value
65
+ * ⚡️ Big performance gain, by storing scores separately and using ruby's
66
+ internal `OPTIMIZED_CMP` instead of always directly calling `<=>`
67
+
68
+ ## Release v0.1.0 (2020-12-22)
69
+
70
+ 🎉 initial release 🎉
71
+
72
+ * ✨ Add basic d-ary Heap implementation
data/Gemfile CHANGED
@@ -1,8 +1,20 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source "https://rubygems.org"
2
4
 
3
5
  # Specify your gem's dependencies in d_heap.gemspec
4
6
  gemspec
5
7
 
8
+ gem "pry"
6
9
  gem "rake", "~> 13.0"
7
10
  gem "rake-compiler"
8
11
  gem "rspec", "~> 3.10"
12
+ gem "rubocop", "~> 1.0"
13
+
14
+ install_if -> { RUBY_PLATFORM !~ /darwin/ } do
15
+ gem "benchmark_driver-output-gruff"
16
+ end
17
+
18
+ gem "perf"
19
+ gem "priority_queue_cxx"
20
+ gem "stackprof"
@@ -1,15 +1,38 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- d_heap (0.2.1)
4
+ d_heap (0.6.0)
5
5
 
6
6
  GEM
7
7
  remote: https://rubygems.org/
8
8
  specs:
9
+ ast (2.4.1)
10
+ benchmark_driver (0.15.16)
11
+ benchmark_driver-output-gruff (0.3.1)
12
+ benchmark_driver (>= 0.12.0)
13
+ gruff
14
+ coderay (1.1.3)
9
15
  diff-lcs (1.4.4)
16
+ gruff (0.12.1)
17
+ histogram
18
+ rmagick
19
+ histogram (0.2.4.1)
20
+ method_source (1.0.0)
21
+ parallel (1.19.2)
22
+ parser (2.7.2.0)
23
+ ast (~> 2.4.1)
24
+ perf (0.1.2)
25
+ priority_queue_cxx (0.3.4)
26
+ pry (0.13.1)
27
+ coderay (~> 1.1)
28
+ method_source (~> 1.0)
29
+ rainbow (3.0.0)
10
30
  rake (13.0.3)
11
31
  rake-compiler (1.1.1)
12
32
  rake
33
+ regexp_parser (1.8.2)
34
+ rexml (3.2.3)
35
+ rmagick (4.1.2)
13
36
  rspec (3.10.0)
14
37
  rspec-core (~> 3.10.0)
15
38
  rspec-expectations (~> 3.10.0)
@@ -23,15 +46,38 @@ GEM
23
46
  diff-lcs (>= 1.2.0, < 2.0)
24
47
  rspec-support (~> 3.10.0)
25
48
  rspec-support (3.10.0)
49
+ rubocop (1.2.0)
50
+ parallel (~> 1.10)
51
+ parser (>= 2.7.1.5)
52
+ rainbow (>= 2.2.2, < 4.0)
53
+ regexp_parser (>= 1.8)
54
+ rexml
55
+ rubocop-ast (>= 1.0.1)
56
+ ruby-progressbar (~> 1.7)
57
+ unicode-display_width (>= 1.4.0, < 2.0)
58
+ rubocop-ast (1.1.1)
59
+ parser (>= 2.7.1.5)
60
+ ruby-prof (1.4.2)
61
+ ruby-progressbar (1.10.1)
62
+ stackprof (0.2.16)
63
+ unicode-display_width (1.7.0)
26
64
 
27
65
  PLATFORMS
28
66
  ruby
29
67
 
30
68
  DEPENDENCIES
69
+ benchmark_driver
70
+ benchmark_driver-output-gruff
31
71
  d_heap!
72
+ perf
73
+ priority_queue_cxx
74
+ pry
32
75
  rake (~> 13.0)
33
76
  rake-compiler
34
77
  rspec (~> 3.10)
78
+ rubocop (~> 1.0)
79
+ ruby-prof
80
+ stackprof
35
81
 
36
82
  BUNDLED WITH
37
83
  2.2.3
data/N ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/sh
2
+ set -eu
3
+
4
+ export BENCH_N="$1"
5
+ shift
6
+
7
+ exec ruby "$@"
data/README.md CHANGED
@@ -1,53 +1,134 @@
1
- # DHeap
2
-
3
- A fast _d_-ary heap implementation for ruby, useful in priority queues and graph
4
- algorithms.
5
-
6
- The _d_-ary heap data structure is a generalization of the binary heap, in which
7
- the nodes have _d_ children instead of 2. This allows for "decrease priority"
8
- operations to be performed more quickly with the tradeoff of slower delete
9
- minimum. Additionally, _d_-ary heaps can have better memory cache behavior than
10
- binary heaps, allowing them to run more quickly in practice despite slower
11
- worst-case time complexity.
12
-
13
- _TODO:_ In addition to a basic _d_-ary heap class (`DHeap`), this library
14
- ~~includes~~ _will include_ extensions to `Array`, allowing an Array to be
15
- directly handled as a priority queue. These extension methods are meant to be
16
- used similarly to how `#bsearch` and `#bsearch_index` might be used.
17
-
18
- _TODO:_ Also included is `DHeap::Set`, which augments the basic heap with an
19
- internal `Hash`, which maps a set of values to scores.
20
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
21
- and adjustments, to achieve faster average runtime for *add* and *cancel*
22
- operations.
23
-
24
- _TODO:_ Also included is `DHeap::Timers`, which contains some features that are
25
- loosely inspired by go's timers. e.g: It lazily sifts its heap after deletion
26
- and adjustments, to achieve faster average runtime for *add* and *cancel*
27
- operations.
1
+ # DHeap - Fast d-ary heap for ruby
2
+
3
+ [![Gem Version](https://badge.fury.io/rb/d_heap.svg)](https://badge.fury.io/rb/d_heap)
4
+ [![Build Status](https://github.com/nevans/d_heap/workflows/CI/badge.svg)](https://github.com/nevans/d_heap/actions?query=workflow%3ACI)
5
+ [![Maintainability](https://api.codeclimate.com/v1/badges/ff274acd0683c99c03e1/maintainability)](https://codeclimate.com/github/nevans/d_heap/maintainability)
6
+
7
+ A fast [_d_-ary heap][d-ary heap] [priority queue] implementation for ruby,
8
+ implemented as a C extension.
9
+
10
+ From [wikipedia](https://en.wikipedia.org/wiki/Heap_(data_structure)):
11
+ > A heap is a specialized tree-based data structure which is essentially an
12
+ > almost complete tree that satisfies the heap property: in a min heap, for any
13
+ > given node C, if P is a parent node of C, then the key (the value) of P is
14
+ > less than or equal to the key of C. The node at the "top" of the heap (with no
15
+ > parents) is called the root node.
16
+
17
+ ![tree representation of a min heap](images/wikipedia-min-heap.png)
18
+
19
+ With a regular queue, you expect "FIFO" behavior: first in, first out. With a
20
+ stack you expect "LIFO": last in first out. A priority queue has a score for
21
+ each element and elements are popped in order by score. Priority queues are
22
+ often used in algorithms for e.g. [scheduling] of timers or bandwidth
23
+ management, for [Huffman coding], and various graph search algorithms such as
24
+ [Dijkstra's algorithm], [A* search], or [Prim's algorithm].
25
+
26
+ The _d_-ary heap data structure is a generalization of the [binary heap], in
27
+ which the nodes have _d_ children instead of 2. This allows for "insert" and
28
+ "decrease priority" operations to be performed more quickly with the tradeoff of
29
+ slower delete minimum or "increase priority". Additionally, _d_-ary heaps can
30
+ have better memory cache behavior than binary heaps, allowing them to run more
31
+ quickly in practice despite slower worst-case time complexity. In the worst
32
+ case, a _d_-ary heap requires only `O(log n / log d)` operations to push, with
33
+ the tradeoff that pop requires `O(d log n / log d)`.
34
+
35
+ Although you should probably just use the default _d_ value of `4` (see the
36
+ analysis below), it's always advisable to benchmark your specific use-case. In
37
+ particular, if you push items more than you pop, higher values for _d_ can give
38
+ a faster total runtime.
39
+
40
+ [d-ary heap]: https://en.wikipedia.org/wiki/D-ary_heap
41
+ [priority queue]: https://en.wikipedia.org/wiki/Priority_queue
42
+ [binary heap]: https://en.wikipedia.org/wiki/Binary_heap
43
+ [scheduling]: https://en.wikipedia.org/wiki/Scheduling_(computing)
44
+ [Huffman coding]: https://en.wikipedia.org/wiki/Huffman_coding#Compression
45
+ [Dijkstra's algorithm]: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm#Using_a_priority_queue
46
+ [A* search]: https://en.wikipedia.org/wiki/A*_search_algorithm#Description
47
+ [Prim's algorithm]: https://en.wikipedia.org/wiki/Prim%27s_algorithm
28
48
 
29
- ## Motivation
49
+ ## Usage
50
+
51
+ The basic API is `#push(object, score)` and `#pop`. Please read the
52
+ [gem documentation] for more details and other methods.
53
+
54
+ Quick reference for some common methods:
55
+
56
+ * `heap << object` adds a value, with `Float(object)` as its score.
57
+ * `heap.push(object, score)` adds a value with an extrinsic score.
58
+ * `heap.pop` removes and returns the value with the minimum score.
59
+ * `heap.pop_lte(max_score)` pops only if the next score is `<=` the argument.
60
+ * `heap.peek` to view the minimum value without popping it.
61
+ * `heap.clear` to remove all items from the heap.
62
+ * `heap.empty?` returns true if the heap is empty.
63
+ * `heap.size` returns the number of items in the heap.
64
+
65
+ If the score changes while the object is still in the heap, it will not be
66
+ re-evaluated again.
67
+
68
+ The score must either be `Integer` or `Float` or convertable to a `Float` via
69
+ `Float(score)` (i.e. it should implement `#to_f`). Constraining scores to
70
+ numeric values gives more than 50% speedup under some benchmarks! _n.b._
71
+ `Integer` _scores must have an absolute value that fits into_ `unsigned long
72
+ long`. This is compiler and architecture dependant but with gcc on an IA-64
73
+ system it's 64 bits, which gives a range of -18,446,744,073,709,551,615 to
74
+ +18,446,744,073,709,551,615, which is more than enough to store e.g. POSIX time
75
+ in nanoseconds.
76
+
77
+ _Comparing arbitary objects via_ `a <=> b` _was the original design and may be
78
+ added back in a future version,_ if (and only if) _it can be done without
79
+ impacting the speed of numeric comparisons. The speedup from this constraint is
80
+ huge!_
81
+
82
+ [gem documentation]: https://rubydoc.info/gems/d_heap/DHeap
83
+
84
+ ### Examples
30
85
 
31
- Ruby's Array class comes with some helpful methods for maintaining a sorted
32
- array, by combining `#bsearch_index` with `#insert`. With certain insert/remove
33
- workloads that can perform very well, but in the worst-case an insert or delete
34
- can result in O(n), since it may need to memcopy a significant portion of the
35
- array. Knowing that priority queues are usually implemented with a heap, and
36
- that the heap is a relatively simple data structure, I set out to replace my
37
- `#bsearch_index` and `#insert` code with a one. I was surprised to find that,
38
- at least under certain benchmarks, my ruby Heap implementation was tied with or
39
- slower than inserting into a fully sorted array. On the one hand, this is a
40
- testament to ruby's fine-tuned Array implementation. On the other hand, it
41
- seemed like a heap implementated in C should easily match the speed of ruby's
42
- bsearch + insert.
43
-
44
- Additionally, I was inspired by reading go's "timer.go" implementation to
45
- experiment with a 4-ary heap, instead of the traditional binary heap. In the
46
- case of timers, new timers are usually scheduled to run after most of the
47
- existing timers and timers are usually canceled before they have a chance to
48
- run. While a binary heap holds 50% of its elements in its last layer, 75% of a
49
- 4-ary heap will have no children. That diminishes the extra comparison
50
- overhead during sift-down.
86
+ ```ruby
87
+ # create some example objects to place in our heap
88
+ Task = Struct.new(:id, :time) do
89
+ def to_f; time.to_f end
90
+ end
91
+ t1 = Task.new(1, Time.now + 5*60)
92
+ t2 = Task.new(2, Time.now + 50)
93
+ t3 = Task.new(3, Time.now + 60)
94
+ t4 = Task.new(4, Time.now + 5)
95
+
96
+ # create the heap
97
+ require "d_heap"
98
+ heap = DHeap.new
99
+
100
+ # push with an explicit score (which might be extrinsic to the value)
101
+ heap.push t1, t1.to_f
102
+
103
+ # the score will be implicitly cast with Float, so any object with #to_f
104
+ heap.push t2, t2
105
+
106
+ # if the object has an intrinsic score via #to_f, "<<" is the simplest API
107
+ heap << t3 << t4
108
+
109
+ # pop returns the lowest scored item, and removes it from the heap
110
+ heap.pop # => #<struct Task id=4, time=2021-01-17 17:02:22.5574 -0500>
111
+ heap.pop # => #<struct Task id=2, time=2021-01-17 17:03:07.5574 -0500>
112
+
113
+ # peek returns the lowest scored item, without removing it from the heap
114
+ heap.peek # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
115
+ heap.pop # => #<struct Task id=3, time=2021-01-17 17:03:17.5574 -0500>
116
+
117
+ # pop_lte handles the common "h.pop if h.peek_score < max" pattern
118
+ heap.pop_lte(Time.now + 65) # => nil
119
+
120
+ # the heap size can be inspected with size and empty?
121
+ heap.empty? # => false
122
+ heap.size # => 1
123
+ heap.pop # => #<struct Task id=1, time=2021-01-17 17:07:17.5574 -0500>
124
+ heap.empty? # => true
125
+ heap.size # => 0
126
+
127
+ # popping from an empty heap returns nil
128
+ heap.pop # => nil
129
+ ```
130
+
131
+ Please see the [gem documentation] for more methods and more examples.
51
132
 
52
133
  ## Installation
53
134
 
@@ -65,108 +146,264 @@ Or install it yourself as:
65
146
 
66
147
  $ gem install d_heap
67
148
 
68
- ## Usage
69
-
70
- The simplest way to use it is simply with `#push` and `#pop`. Push will
71
-
72
- ```ruby
73
- require "d_heap"
149
+ ## Motivation
74
150
 
75
- heap = DHeap.new # defaults to a 4-ary heap
151
+ One naive approach to a priority queue is to maintain an array in sorted order.
152
+ This can be very simply implemented in ruby with `Array#bseach_index` +
153
+ `Array#insert`. This can be very fast—`Array#pop` is `O(1)`—but the worst-case
154
+ for insert is `O(n)` because it may need to `memcpy` a significant portion of
155
+ the array.
76
156
 
77
- # storing [time, task] tuples
78
- heap << [Time.now + 5*60, Task.new(1)]
79
- heap << [Time.now + 30, Task.new(2)]
80
- heap << [Time.now + 60, Task.new(3)]
81
- heap << [Time.now + 5, Task.new(4)]
157
+ The standard way to implement a priority queue is with a binary heap. Although
158
+ this increases the time complexity for `pop` alone, it reduces the combined time
159
+ compexity for the combined `push` + `pop`. Using a d-ary heap with d > 2
160
+ makes the tree shorter but broader, which reduces to `O(log n / log d)` while
161
+ increasing the comparisons needed by sift-down to `O(d log n/ log d)`.
82
162
 
83
- # peeking and popping (using last to get the task and ignore the time)
84
- heap.pop.last # => Task[4]
85
- heap.pop.last # => Task[2]
86
- heap.peak.last # => Task[3]
87
- heap.pop.last # => Task[3]
88
- heap.pop.last # => Task[1]
89
- ```
163
+ However, I was disappointed when my best ruby heap implementation ran much more
164
+ slowly than the naive approach—even for heaps containing ten thousand items.
165
+ Although it _is_ `O(n)`, `memcpy` is _very_ fast, while calling `<=>` from ruby
166
+ has _much_ higher overhead. And a _d_-heap needs `d + 1` times more comparisons
167
+ for each push + pop than `bsearch` + `insert`.
90
168
 
91
- Read the `rdoc` for more detailed documentation and examples.
169
+ Additionally, when researching how other systems handle their scheduling, I was
170
+ inspired by reading go's "timer.go" implementation to experiment with a 4-ary
171
+ heap instead of the traditional binary heap.
92
172
 
93
173
  ## Benchmarks
94
174
 
95
- _TODO: put benchmarks here._
175
+ _See `bin/benchmarks` and `docs/benchmarks.txt`, as well as `bin/profile` and
176
+ `docs/profile.txt` for much more detail or updated results. These benchmarks
177
+ were measured with v0.5.0 and ruby 2.7.2 without MJIT enabled._
178
+
179
+ These benchmarks use very simple implementations for a pure-ruby heap and an
180
+ array that is kept sorted using `Array#bsearch_index` and `Array#insert`. For
181
+ comparison, I also compare to the [priority_queue_cxx gem] which uses the [C++
182
+ STL priority_queue], and another naive implementation that uses `Array#min` and
183
+ `Array#delete_at` with an unsorted array.
184
+
185
+ In these benchmarks, `DHeap` runs faster than all other implementations for
186
+ every scenario and every value of N, although the difference is usually more
187
+ noticable at higher values of N. The pure ruby heap implementation is
188
+ competitive for `push` alone at every value of N, but is significantly slower
189
+ than bsearch + insert for push + pop, until N is _very_ large (somewhere between
190
+ 10k and 100k)!
191
+
192
+ [priority_queue_cxx gem]: https://rubygems.org/gems/priority_queue_cxx
193
+ [C++ STL priority_queue]: http://www.cplusplus.com/reference/queue/priority_queue/
194
+
195
+ Three different scenarios are measured:
196
+
197
+ ### push N items onto an empty heap
198
+
199
+ ...but never pop (clearing between each set of pushes).
200
+
201
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n.png)
202
+
203
+ ### push N items onto an empty heap then pop all N
204
+
205
+ Although this could be used for heap sort, we're unlikely to choose heap sort
206
+ over Ruby's quick sort implementation. I'm using this scenario to represent
207
+ the amortized cost of creating a heap and (eventually) draining it.
208
+
209
+ ![bar graph for push_n_pop_n benchmarks](./images/push_n_pop_n.png)
210
+
211
+ ### push and pop on a heap with N values
212
+
213
+ Repeatedly push and pop while keeping a stable heap size. This is a _very
214
+ simplistic_ approximation for how most scheduler/timer heaps might be used.
215
+ Usually when a timer fires it will be quickly replaced by a new timer, and the
216
+ overall count of timers will remain roughly stable.
217
+
218
+ ![bar graph for push_pop benchmarks](./images/push_pop.png)
219
+
220
+ ### numbers
221
+
222
+ Even for very small N values the benchmark implementations, `DHeap` runs faster
223
+ than the other implementations for each scenario, although the difference is
224
+ still relatively small. The pure ruby binary heap is 2x or more slower than
225
+ bsearch + insert for common push/pop scenario.
226
+
227
+ == push N (N=5) ==========================================================
228
+ push N (c_dheap): 1969700.7 i/s
229
+ push N (c++ stl): 1049738.1 i/s - 1.88x slower
230
+ push N (rb_heap): 928435.2 i/s - 2.12x slower
231
+ push N (bsearch): 921060.0 i/s - 2.14x slower
232
+
233
+ == push N then pop N (N=5) ===============================================
234
+ push N + pop N (c_dheap): 1375805.0 i/s
235
+ push N + pop N (c++ stl): 1134997.5 i/s - 1.21x slower
236
+ push N + pop N (findmin): 862913.1 i/s - 1.59x slower
237
+ push N + pop N (bsearch): 762887.1 i/s - 1.80x slower
238
+ push N + pop N (rb_heap): 506890.4 i/s - 2.71x slower
239
+
240
+ == Push/pop with pre-filled queue of size=N (N=5) ========================
241
+ push + pop (c_dheap): 9044435.5 i/s
242
+ push + pop (c++ stl): 7534583.4 i/s - 1.20x slower
243
+ push + pop (findmin): 5026155.1 i/s - 1.80x slower
244
+ push + pop (bsearch): 4300260.0 i/s - 2.10x slower
245
+ push + pop (rb_heap): 2299499.7 i/s - 3.93x slower
246
+
247
+ By N=21, `DHeap` has pulled significantly ahead of bsearch + insert for all
248
+ scenarios, but the pure ruby heap is still slower than every other
249
+ implementation—even resorting the array after every `#push`—in any scenario that
250
+ uses `#pop`.
251
+
252
+ == push N (N=21) =========================================================
253
+ push N (c_dheap): 464231.4 i/s
254
+ push N (c++ stl): 305546.7 i/s - 1.52x slower
255
+ push N (rb_heap): 202803.7 i/s - 2.29x slower
256
+ push N (bsearch): 168678.7 i/s - 2.75x slower
257
+
258
+ == push N then pop N (N=21) ==============================================
259
+ push N + pop N (c_dheap): 298350.3 i/s
260
+ push N + pop N (c++ stl): 252227.1 i/s - 1.18x slower
261
+ push N + pop N (findmin): 161998.7 i/s - 1.84x slower
262
+ push N + pop N (bsearch): 143432.3 i/s - 2.08x slower
263
+ push N + pop N (rb_heap): 79622.1 i/s - 3.75x slower
264
+
265
+ == Push/pop with pre-filled queue of size=N (N=21) =======================
266
+ push + pop (c_dheap): 8855093.4 i/s
267
+ push + pop (c++ stl): 7223079.5 i/s - 1.23x slower
268
+ push + pop (findmin): 4542913.7 i/s - 1.95x slower
269
+ push + pop (bsearch): 3461802.4 i/s - 2.56x slower
270
+ push + pop (rb_heap): 1845488.7 i/s - 4.80x slower
271
+
272
+ At higher values of N, a heaps logarithmic growth leads to only a little
273
+ slowdown of `#push`, while insert's linear growth causes it to run noticably
274
+ slower and slower. But because `#pop` is `O(1)` for a sorted array and `O(d log
275
+ n / log d)` for a heap, scenarios involving both `#push` and `#pop` remain
276
+ relatively close, and bsearch + insert still runs faster than a pure ruby heap,
277
+ even up to queues with 10k items. But as queue size increases beyond than that,
278
+ the linear time compexity to keep a sorted array dominates.
279
+
280
+ == push + pop (rb_heap)
281
+ queue size = 10000: 736618.2 i/s
282
+ queue size = 25000: 670186.8 i/s - 1.10x slower
283
+ queue size = 50000: 618156.7 i/s - 1.19x slower
284
+ queue size = 100000: 579250.7 i/s - 1.27x slower
285
+ queue size = 250000: 572795.0 i/s - 1.29x slower
286
+ queue size = 500000: 543648.3 i/s - 1.35x slower
287
+ queue size = 1000000: 513523.4 i/s - 1.43x slower
288
+ queue size = 2500000: 460848.9 i/s - 1.60x slower
289
+ queue size = 5000000: 445234.5 i/s - 1.65x slower
290
+ queue size = 10000000: 423119.0 i/s - 1.74x slower
291
+
292
+ == push + pop (bsearch)
293
+ queue size = 10000: 786334.2 i/s
294
+ queue size = 25000: 364963.8 i/s - 2.15x slower
295
+ queue size = 50000: 200520.6 i/s - 3.92x slower
296
+ queue size = 100000: 88607.0 i/s - 8.87x slower
297
+ queue size = 250000: 34530.5 i/s - 22.77x slower
298
+ queue size = 500000: 17965.4 i/s - 43.77x slower
299
+ queue size = 1000000: 5638.7 i/s - 139.45x slower
300
+ queue size = 2500000: 1302.0 i/s - 603.93x slower
301
+ queue size = 5000000: 592.0 i/s - 1328.25x slower
302
+ queue size = 10000000: 288.8 i/s - 2722.66x slower
303
+
304
+ == push + pop (c_dheap)
305
+ queue size = 10000: 7311366.6 i/s
306
+ queue size = 50000: 6737824.5 i/s - 1.09x slower
307
+ queue size = 25000: 6407340.6 i/s - 1.14x slower
308
+ queue size = 100000: 6254396.3 i/s - 1.17x slower
309
+ queue size = 250000: 5917684.5 i/s - 1.24x slower
310
+ queue size = 500000: 5126307.6 i/s - 1.43x slower
311
+ queue size = 1000000: 4403494.1 i/s - 1.66x slower
312
+ queue size = 2500000: 3304088.2 i/s - 2.21x slower
313
+ queue size = 5000000: 2664897.7 i/s - 2.74x slower
314
+ queue size = 10000000: 2137927.6 i/s - 3.42x slower
96
315
 
97
316
  ## Analysis
98
317
 
99
318
  ### Time complexity
100
319
 
101
- Both sift operations can perform (log[d] n = log n / log d) swaps.
102
- Swap up performs only a single comparison per swap: O(1).
103
- Swap down performs as many as d comparions per swap: O(d).
104
-
105
- Inserting an item is O(log n / log d).
106
- Deleting the root is O(d log n / log d).
107
-
108
- Assuming every inserted item is eventually deleted from the root, d=4 requires
109
- the fewest comparisons for combined insert and delete:
110
- * (1 + 2) lg 2 = 4.328085
111
- * (1 + 3) lg 3 = 3.640957
112
- * (1 + 4) lg 4 = 3.606738
113
- * (1 + 5) lg 5 = 3.728010
114
- * (1 + 6) lg 6 = 3.906774
115
- * etc...
116
-
117
- Leaf nodes require no comparisons to shift down, and higher values for d have
118
- higher percentage of leaf nodes:
119
- * d=2 has ~50% leaves,
120
- * d=3 has ~67% leaves,
121
- * d=4 has ~75% leaves,
122
- * and so on...
320
+ There are two fundamental heap operations: sift-up (used by push) and sift-down
321
+ (used by pop).
322
+
323
+ * A _d_-ary heap will have `log n / log d` layers, so both sift operations can
324
+ perform as many as `log n / log d` writes, when a member sifts the entire
325
+ length of the tree.
326
+ * Sift-up makes one comparison per layer, so push runs in `O(log n / log d)`.
327
+ * Sift-down makes d comparions per layer, so pop runs in `O(d log n / log d)`.
328
+
329
+ So, in the simplest case of running balanced push/pop while maintaining the same
330
+ heap size, `(1 + d) log n / log d` comparisons are made. In the worst case,
331
+ when every sift traverses every layer of the tree, `d=4` requires the fewest
332
+ comparisons for combined insert and delete:
333
+
334
+ * (1 + 2) lg n / lg d ≈ 4.328085 lg n
335
+ * (1 + 3) lg n / lg d ≈ 3.640957 lg n
336
+ * (1 + 4) lg n / lg d 3.606738 lg n
337
+ * (1 + 5) lg n / lg d ≈ 3.728010 lg n
338
+ * (1 + 6) lg n / lg d 3.906774 lg n
339
+ * (1 + 7) lg n / lg d 4.111187 lg n
340
+ * (1 + 8) lg n / lg d4.328085 lg n
341
+ * (1 + 9) lg n / lg d ≈ 4.551196 lg n
342
+ * (1 + 10) lg n / lg d ≈ 4.777239 lg n
343
+ * etc...
123
344
 
124
345
  See https://en.wikipedia.org/wiki/D-ary_heap#Analysis for deeper analysis.
125
346
 
126
347
  ### Space complexity
127
348
 
128
- Because the heap is a complete binary tree, space usage is linear, regardless
129
- of d. However higher d values may provide better cache locality.
130
-
131
- We can run comparisons much much faster for Numeric or String objects than for
132
- ruby objects which delegate comparison to internal Numeric or String objects.
133
- And it is often advantageous to use extrinsic scores for uncomparable items.
134
- For this, our internal array uses twice as many entries (one for score and one
135
- for value) as it would if it only supported intrinsic comparison or used an
136
- un-memoized "sort_by" proc.
349
+ Space usage is linear, regardless of d. However higher d values may
350
+ provide better cache locality. Because the heap is a complete binary tree, the
351
+ elements can be stored in an array, without the need for tree or list pointers.
137
352
 
138
- ### Timers
353
+ Ruby can compare Numeric values _much_ faster than other ruby objects, even if
354
+ those objects simply delegate comparison to internal Numeric values. And it is
355
+ often useful to use external scores for otherwise uncomparable values. So
356
+ `DHeap` uses twice as many entries (one for score and one for value)
357
+ as an array which only stores values.
139
358
 
140
- Additionally, when used to sort timers, we can reasonably assume that:
141
- * New timers usually sort after most existing timers.
142
- * Most timers will be canceled before executing.
143
- * Canceled timers usually sort after most existing timers.
359
+ ## Thread safety
144
360
 
145
- So, if we are able to delete an item without searching for it, by keeping a map
146
- of positions within the heap, most timers can be inserted and deleted in O(1)
147
- time. Canceling a non-leaf timer can be further optimized by marking it as
148
- canceled without immediately removing it from the heap. If the timer is
149
- rescheduled before we garbage collect, adjusting its position will usually be
150
- faster than a delete and re-insert.
361
+ `DHeap` is _not_ thread-safe, so concurrent access from multiple threads need to
362
+ take precautions such as locking access behind a mutex.
151
363
 
152
364
  ## Alternative data structures
153
365
 
154
- Depending on what you're doing, maintaining a sorted `Array` using
155
- `#bsearch_index` and `#insert` might be faster! Although it is technically
156
- O(n) for insertions, the implementations for `memcpy` or `memmove` can be *very*
157
- fast on modern architectures. Also, it can be faster O(n) on average, if
158
- insertions are usually near the end of the array. You should run benchmarks
159
- with your expected scenarios to determine which is right.
366
+ As always, you should run benchmarks with your expected scenarios to determine
367
+ which is best for your application.
368
+
369
+ Depending on your use-case, maintaining a sorted `Array` using `#bsearch_index`
370
+ and `#insert` might be just fine! Even `min` plus `delete` with an unsorted
371
+ array can be very fast on small queues. Although insertions run with `O(n)`,
372
+ `memcpy` is so fast on modern hardware that your dataset might not be large
373
+ enough for it to matter.
374
+
375
+ More complex heap varients, e.g. [Fibonacci heap], allow heaps to be split and
376
+ merged which gives some graph algorithms a lower amortized time complexity. But
377
+ in practice, _d_-ary heaps have much lower overhead and often run faster.
378
+
379
+ [Fibonacci heap]: https://en.wikipedia.org/wiki/Fibonacci_heap
160
380
 
161
381
  If it is important to be able to quickly enumerate the set or find the ranking
162
- of values in it, then you probably want to use a self-balancing binary search
163
- tree (e.g. a red-black tree) or a skip-list.
164
-
165
- A Hashed Timing Wheel or Heirarchical Timing Wheels (or some variant in that
166
- family of data structures) can be constructed to have effectively O(1) running
167
- time in most cases. However, the implementation for that data structure is more
168
- complex than a heap. If a 4-ary heap is good enough for go's timers, it should
169
- be suitable for many use cases.
382
+ of values in it, then you may want to use a self-balancing binary search tree
383
+ (e.g. a [red-black tree]) or a [skip-list].
384
+
385
+ [red-black tree]: https://en.wikipedia.org/wiki/Red%E2%80%93black_tree
386
+ [skip-list]: https://en.wikipedia.org/wiki/Skip_list
387
+
388
+ [Hashed and Heirarchical Timing Wheels][timing wheels] (or some variant in that
389
+ family of data structures) can be constructed to have effectively `O(1)` running
390
+ time in most cases. Although the implementation for that data structure is more
391
+ complex than a heap, it may be necessary for enormous values of N.
392
+
393
+ [timing wheels]: http://www.cs.columbia.edu/~nahum/w6998/papers/ton97-timing-wheels.pdf
394
+
395
+ ## TODOs...
396
+
397
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Map`, which augments the
398
+ basic heap with an internal `Hash`, which maps objects to their position in the
399
+ heap. This enforces a uniqueness constraint on items on the heap, and also
400
+ allows items to be more efficiently deleted or adjusted. However maintaining
401
+ the hash does lead to a small drop in normal `#push` and `#pop` performance.
402
+
403
+ _TODO:_ Also ~~included is~~ _will include_ `DHeap::Lazy`, which contains some
404
+ features that are loosely inspired by go's timers. e.g: It lazily sifts its
405
+ heap after deletion and adjustments, to achieve faster average runtime for *add*
406
+ and *cancel* operations.
170
407
 
171
408
  ## Development
172
409