tdigest 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 13d323c00e8ecc6fc72db137c6160f28c23a8615
4
- data.tar.gz: efad7cb62f79a512e4fc68dceca2ae5a55a693c3
2
+ SHA256:
3
+ metadata.gz: 2b5f999c42d1051e26230facc685f22b8d7cc1c6c31af9e5680a202c7cb4f653
4
+ data.tar.gz: 5580752fe6f646fdc1774e4f5eb04c6170f2ce9dbdb325de3daae950fe4895eb
5
5
  SHA512:
6
- metadata.gz: a948d7d63a22957a34e9e1cf71d4e6904a325b1b7e4cd93de12611332c41c62be932a7fd69a6f356122c7da033bc24abaa86db29db5b03a8d1dc79031f603e59
7
- data.tar.gz: 6fb668b7bbe9f1885af98036843095f03d0a74034462c7fdd8fdde970c95b606be556074c41c042a1e02ce9c60bb29fddd0177d4eefdc6da5a32bac857e391ff
6
+ metadata.gz: 8d0eb2f8fd8a1645b035cb9245ca5659de2059e0fcc4899159fd8a600b8d23f13b919638d53787622956b11ced97fefd4898b6caa264f47429bfcc35a6a1d214
7
+ data.tar.gz: cbf7ce26a01bfcf8cff1496b31921227a97ebf21f74a7877a8c978475f69da5dc409763d308fb852488f05fa9cd3d4537ea309c05614d6c81c09db4488d81cc9
@@ -0,0 +1,22 @@
1
+ name: CI
2
+
3
+ on: [push, pull_request]
4
+
5
+ jobs:
6
+ build:
7
+ runs-on: ubuntu-latest
8
+ strategy:
9
+ matrix:
10
+ ruby-version: [2.7, 3.1]
11
+ steps:
12
+ - uses: actions/checkout@v3
13
+ - name: Set up Ruby ${{ matrix.ruby-version }}
14
+ uses: ruby/setup-ruby@v1
15
+ with:
16
+ ruby-version: ${{ matrix.ruby-version }}
17
+ - name: Install dependencies
18
+ run: |
19
+ gem update --system
20
+ bundle install
21
+ - name: Run tests
22
+ run: bundle exec rake test
data/.ruby-version CHANGED
@@ -1 +1 @@
1
- 2.2.3
1
+ 3.1.3
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source 'https://rubygems.org'
2
4
 
3
5
  # Specify your gem's dependencies in tdigest.gemspec
data/README.md CHANGED
@@ -1,8 +1,7 @@
1
- # Tdigest
1
+ # t-digest Ruby
2
2
 
3
+ [![Ruby CI](https://github.com/castle/tdigest/actions/workflows/specs.yml/badge.svg?branch=master)](https://github.com/castle/tdigest/actions/workflows/specs.yml)
3
4
  [![Gem Version](https://badge.fury.io/rb/tdigest.svg)](https://badge.fury.io/rb/tdigest)
4
- [![Build Status](https://travis-ci.org/castle/tdigest.svg?branch=master)](https://travis-ci.org/castle/tdigest)
5
- [![Coverage Status](https://coveralls.io/repos/castle/tdigest/badge.svg?branch=master&service=github)](https://coveralls.io/github/castle/tdigest?branch=master)
6
5
 
7
6
  Ruby implementation of Ted Dunning's [t-digest](https://github.com/tdunning/t-digest) data structure.
8
7
 
@@ -37,12 +36,11 @@ puts td.p_rank(0.95)
37
36
 
38
37
  #### Serialization
39
38
 
40
- This gem offers the same serialization options as the original [Java implementation](https://github.com/tdunning/t-digest). You can read more about T-digest persistance in [Chapter 3 in the paper](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf).
39
+ This gem offers the same serialization options as the original [Java implementation](https://github.com/tdunning/t-digest). You can read more about T-digest persistence in [Chapter 3 in the paper](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf).
41
40
 
42
41
  **Standard encoding**
43
42
 
44
- This encoding uses 8-byte Double for the means and a 4-byte integers for counts.
45
- Size per centroid is a fixed 12-bytes.
43
+ This encoding uses 8-byte Double for the means and a 4-byte integer for counts. Size per centroid is a fixed 12-bytes.
46
44
 
47
45
  ```ruby
48
46
  bytes = tdigest.as_bytes
@@ -50,8 +48,7 @@ bytes = tdigest.as_bytes
50
48
 
51
49
  **Compressed encoding**
52
50
 
53
- This encoding uses delta encoding with 4-byte floats for the means and variable
54
- length encoding for the counts. Size per centroid is between 5-12 bytes.
51
+ This encoding uses delta encoding with 4-byte floats for the means and variable length encoding for the counts. Size per centroid is between 5-12 bytes.
55
52
 
56
53
  ```ruby
57
54
  bytes = tdigest.as_small_bytes
@@ -79,4 +76,3 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/castle
79
76
  ## License
80
77
 
81
78
  The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
82
-
data/Rakefile CHANGED
@@ -1,10 +1,12 @@
1
- require "bundler/gem_tasks"
2
- require "rake/testtask"
1
+ # frozen_string_literal: true
2
+
3
+ require 'bundler/gem_tasks'
4
+ require 'rake/testtask'
3
5
 
4
6
  Rake::TestTask.new(:test) do |t|
5
- t.libs << "test"
6
- t.libs << "lib"
7
+ t.libs << 'test'
8
+ t.libs << 'lib'
7
9
  t.test_files = FileList['test/**/*_test.rb']
8
10
  end
9
11
 
10
- task :default => :test
12
+ task default: :test
data/bin/console CHANGED
@@ -1,7 +1,8 @@
1
1
  #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
2
3
 
3
- require "bundler/setup"
4
- require "tdigest"
4
+ require 'bundler/setup'
5
+ require 'tdigest'
5
6
 
6
7
  # You can add fixtures and/or initialization code here to make experimenting
7
8
  # with your gem easier. You can also use a different console, if you like.
@@ -10,5 +11,5 @@ require "tdigest"
10
11
  # require "pry"
11
12
  # Pry.start
12
13
 
13
- require "irb"
14
+ require 'irb'
14
15
  IRB.start
@@ -1,10 +1,13 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module TDigest
2
4
  class Centroid
3
5
  attr_accessor :mean, :n, :cumn, :mean_cumn
4
- def initialize(params = {})
5
- params.each do |p, value|
6
- send("#{p}=", value)
7
- end
6
+ def initialize(mean, n, cumn, mean_cumn = nil)
7
+ @mean = mean
8
+ @n = n
9
+ @cumn = cumn
10
+ @mean_cumn = mean_cumn
8
11
  end
9
12
 
10
13
  def as_json(_ = nil)
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'rbtree'
2
4
  require 'tdigest/centroid'
3
5
 
@@ -13,16 +15,15 @@ module TDigest
13
15
  @cx = cx
14
16
  @centroids = RBTree.new
15
17
  @nreset = 0
18
+ @n = 0
16
19
  reset!
17
20
  end
18
21
 
19
22
  def +(other)
20
23
  # Uses delta, k and cx from the caller
21
24
  t = self.class.new(@delta, @k, @cx)
22
- data = self.centroids.values + other.centroids.values
23
- while data.length > 0
24
- t.push_centroid(data.delete_at(rand(data.length)))
25
- end
25
+ data = centroids.values + other.centroids.values
26
+ t.push_centroid(data.delete_at(rand(data.length))) until data.empty?
26
27
  t
27
28
  end
28
29
 
@@ -55,7 +56,7 @@ module TDigest
55
56
  arr << b
56
57
  n = n >> 7
57
58
  k += 1
58
- fail 'Unreasonable large number' if k > 6
59
+ raise 'Unreasonable large number' if k > 6
59
60
  end
60
61
  arr << n
61
62
  end
@@ -76,7 +77,7 @@ module TDigest
76
77
  def bound_mean_cumn(cumn)
77
78
  last_c = nil
78
79
  bounds = []
79
- matches = @centroids.each do |k, v|
80
+ @centroids.each do |_k, v|
80
81
  if v.mean_cumn == cumn
81
82
  bounds << v
82
83
  break
@@ -97,10 +98,8 @@ module TDigest
97
98
  def compress!
98
99
  points = to_a
99
100
  reset!
100
- while points.length > 0
101
- push_centroid(points.delete_at(rand(points.length)))
102
- end
103
- _cumulate(true)
101
+ push_centroid(points.shuffle)
102
+ _cumulate(true, true)
104
103
  nil
105
104
  end
106
105
 
@@ -128,10 +127,8 @@ module TDigest
128
127
  end
129
128
 
130
129
  def merge!(other)
131
- # Uses delta, k and cx from the caller
132
- t = self + other
133
- @centroids = t.centroids
134
- compress!
130
+ push_centroid(other.centroids.values.shuffle)
131
+ self
135
132
  end
136
133
 
137
134
  def p_rank(x)
@@ -167,8 +164,9 @@ module TDigest
167
164
  p = [p] unless is_array
168
165
  p.map! do |item|
169
166
  unless (0..1).include? item
170
- fail ArgumentError, "p should be in [0,1], got #{item}"
167
+ raise ArgumentError, "p should be in [0,1], got #{item}"
171
168
  end
169
+
172
170
  if size == 0
173
171
  nil
174
172
  else
@@ -222,7 +220,7 @@ module TDigest
222
220
  case format
223
221
  when VERBOSE_ENCODING
224
222
  array = bytes[start_idx..-1].unpack("d#{size}L#{size}")
225
- means, counts = array.each_slice(size).to_a if array.size > 0
223
+ means, counts = array.each_slice(size).to_a unless array.empty?
226
224
  when SMALL_ENCODING
227
225
  means = bytes[start_idx..(start_idx + 4 * size)].unpack("f#{size}")
228
226
  # Decode delta encoding of means
@@ -240,7 +238,8 @@ module TDigest
240
238
  z = 0x7f & v
241
239
  shift = 7
242
240
  while (v & 0x80) != 0
243
- fail 'Shift too large in decode' if shift > 28
241
+ raise 'Shift too large in decode' if shift > 28
242
+
244
243
  v = counts_bytes.shift || 0
245
244
  z += (v & 0x7f) << shift
246
245
  shift += 7
@@ -248,9 +247,9 @@ module TDigest
248
247
  counts << z
249
248
  end
250
249
  # This shouldn't happen
251
- fail 'Mismatch' unless counts.size == means.size
250
+ raise 'Mismatch' unless counts.size == means.size
252
251
  else
253
- fail 'Unknown compression format'
252
+ raise 'Unknown compression format'
254
253
  end
255
254
  if means && counts
256
255
  means.zip(counts).each { |val| tdigest.push(val[0], val[1]) }
@@ -277,7 +276,6 @@ module TDigest
277
276
  nearest.cumn += n
278
277
  nearest.mean_cumn += n / 2.0
279
278
  nearest.n += n
280
- @n += n
281
279
 
282
280
  nil
283
281
  end
@@ -285,11 +283,11 @@ module TDigest
285
283
  def _cumulate(exact = false, force = false)
286
284
  unless force
287
285
  factor = if @last_cumulate == 0
288
- Float::INFINITY
289
- else
290
- (@n.to_f / @last_cumulate)
291
- end
292
- return if @n == @last_cumulate || (!exact && @cx && @cx > (factor))
286
+ Float::INFINITY
287
+ else
288
+ (@n.to_f / @last_cumulate)
289
+ end
290
+ return if @n == @last_cumulate || (!exact && @cx && @cx > factor)
293
291
  end
294
292
 
295
293
  cumn = 0
@@ -311,6 +309,8 @@ module TDigest
311
309
  max = max.nil? ? nil : max[1]
312
310
  nearest = find_nearest(x)
313
311
 
312
+ @n += n
313
+
314
314
  if nearest && nearest.mean == x
315
315
  _add_weight(nearest, x, n)
316
316
  elsif nearest == min
@@ -320,7 +320,7 @@ module TDigest
320
320
  else
321
321
  p = nearest.mean_cumn.to_f / @n
322
322
  max_n = (4 * @n * @delta * p * (1 - p)).floor
323
- if (max_n - nearest.n >= n)
323
+ if max_n - nearest.n >= n
324
324
  _add_weight(nearest, x, n)
325
325
  else
326
326
  _new_centroid(x, n, nearest.cumn)
@@ -333,17 +333,14 @@ module TDigest
333
333
  # it may be due to values being inserted in sorted order.
334
334
  # We combat that by replaying the centroids in random order,
335
335
  # which is what compress! does
336
- if @centroids.size > (@k / @delta)
337
- compress!
338
- end
336
+ compress! if @centroids.size > (@k / @delta)
339
337
 
340
338
  nil
341
339
  end
342
340
 
343
341
  def _new_centroid(x, n, cumn)
344
- c = Centroid.new({ mean: x, n: n, cumn: cumn })
342
+ c = Centroid.new(x, n, cumn)
345
343
  @centroids[x] = c
346
- @n += n
347
344
  c
348
345
  end
349
346
  end
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module TDigest
2
- VERSION = "0.1.0"
4
+ VERSION = '0.2.0'
3
5
  end
data/lib/tdigest.rb CHANGED
@@ -1,6 +1,6 @@
1
- require "tdigest/version"
2
- require "tdigest/tdigest"
1
+ # frozen_string_literal: true
3
2
 
4
3
  module TDigest
5
- # Your code goes here...
4
+ require 'tdigest/version'
5
+ require 'tdigest/tdigest'
6
6
  end
data/tdigest.gemspec CHANGED
@@ -1,29 +1,37 @@
1
- # coding: utf-8
2
- lib = File.expand_path('../lib', __FILE__)
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path('lib', __dir__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require 'tdigest/version'
5
6
 
7
+ java = (ENV['RUBY_PLATFORM'] == 'java')
8
+
6
9
  Gem::Specification.new do |spec|
7
- spec.name = "tdigest"
10
+ spec.name = 'tdigest'
8
11
  spec.version = TDigest::VERSION
9
- spec.authors = ["Sebastian Wallin"]
10
- spec.email = ["sebastian.wallin@gmail.com"]
12
+ spec.authors = ['Sebastian Wallin']
13
+ spec.email = ['sebastian.wallin@gmail.com']
11
14
 
12
- spec.summary = %q{Ruby implementation of Dunning's T-Digest for streaming quantile approximation}
13
- spec.description = %q{Ruby implementation of Dunning's T-Digest for streaming quantile approximation}
14
- spec.homepage = "https://github.com/castle/tdigest"
15
- spec.license = "MIT"
15
+ spec.summary = 'TDigest for Ruby'
16
+ spec.description = "Ruby implementation of Dunning's T-Digest for streaming quantile approximation"
17
+ spec.homepage = 'https://github.com/castle/tdigest'
18
+ spec.license = 'MIT'
16
19
 
17
20
  spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
- spec.bindir = "exe"
21
+ spec.bindir = 'exe'
19
22
  spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
- spec.require_paths = ["lib"]
23
+ spec.require_paths = ['lib']
24
+ spec.platform = java ? 'java' : 'ruby'
21
25
 
22
- spec.add_runtime_dependency 'rbtree', '~> 0.4.2'
26
+ if java
27
+ spec.add_runtime_dependency 'rbtree-jruby', '~> 0.2.1'
28
+ else
29
+ spec.add_runtime_dependency 'rbtree3', '~> 0.6.0'
30
+ end
23
31
 
24
- spec.add_development_dependency 'bundler', '~> 1.10'
25
- spec.add_development_dependency 'rake', '~> 10.0'
26
- spec.add_development_dependency 'minitest', '~> 5.8.3'
32
+ spec.add_development_dependency 'bundler', '~> 2.1'
27
33
  spec.add_development_dependency 'coveralls', '~> 0.8.10'
28
- spec.add_development_dependency 'simplecov', '~> 0.11.1'
34
+ spec.add_development_dependency 'minitest', '~> 5.8'
35
+ spec.add_development_dependency 'rake', '>= 12.3.3'
36
+ spec.add_development_dependency 'simplecov', '~> 0.16.0'
29
37
  end
metadata CHANGED
@@ -1,99 +1,99 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: tdigest
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sebastian Wallin
8
- autorequire:
8
+ autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-01-19 00:00:00.000000000 Z
11
+ date: 2023-03-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
- name: rbtree
14
+ name: rbtree3
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
17
  - - "~>"
18
18
  - !ruby/object:Gem::Version
19
- version: 0.4.2
19
+ version: 0.6.0
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
24
  - - "~>"
25
25
  - !ruby/object:Gem::Version
26
- version: 0.4.2
26
+ version: 0.6.0
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: bundler
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
31
  - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: '1.10'
33
+ version: '2.1'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: '1.10'
40
+ version: '2.1'
41
41
  - !ruby/object:Gem::Dependency
42
- name: rake
42
+ name: coveralls
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
45
  - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '10.0'
47
+ version: 0.8.10
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
52
  - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '10.0'
54
+ version: 0.8.10
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: minitest
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
59
  - - "~>"
60
60
  - !ruby/object:Gem::Version
61
- version: 5.8.3
61
+ version: '5.8'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
66
  - - "~>"
67
67
  - !ruby/object:Gem::Version
68
- version: 5.8.3
68
+ version: '5.8'
69
69
  - !ruby/object:Gem::Dependency
70
- name: coveralls
70
+ name: rake
71
71
  requirement: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - "~>"
73
+ - - ">="
74
74
  - !ruby/object:Gem::Version
75
- version: 0.8.10
75
+ version: 12.3.3
76
76
  type: :development
77
77
  prerelease: false
78
78
  version_requirements: !ruby/object:Gem::Requirement
79
79
  requirements:
80
- - - "~>"
80
+ - - ">="
81
81
  - !ruby/object:Gem::Version
82
- version: 0.8.10
82
+ version: 12.3.3
83
83
  - !ruby/object:Gem::Dependency
84
84
  name: simplecov
85
85
  requirement: !ruby/object:Gem::Requirement
86
86
  requirements:
87
87
  - - "~>"
88
88
  - !ruby/object:Gem::Version
89
- version: 0.11.1
89
+ version: 0.16.0
90
90
  type: :development
91
91
  prerelease: false
92
92
  version_requirements: !ruby/object:Gem::Requirement
93
93
  requirements:
94
94
  - - "~>"
95
95
  - !ruby/object:Gem::Version
96
- version: 0.11.1
96
+ version: 0.16.0
97
97
  description: Ruby implementation of Dunning's T-Digest for streaming quantile approximation
98
98
  email:
99
99
  - sebastian.wallin@gmail.com
@@ -101,9 +101,9 @@ executables: []
101
101
  extensions: []
102
102
  extra_rdoc_files: []
103
103
  files:
104
+ - ".github/workflows/specs.yml"
104
105
  - ".gitignore"
105
106
  - ".ruby-version"
106
- - ".travis.yml"
107
107
  - Gemfile
108
108
  - LICENSE.txt
109
109
  - README.md
@@ -119,7 +119,7 @@ homepage: https://github.com/castle/tdigest
119
119
  licenses:
120
120
  - MIT
121
121
  metadata: {}
122
- post_install_message:
122
+ post_install_message:
123
123
  rdoc_options: []
124
124
  require_paths:
125
125
  - lib
@@ -134,9 +134,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
134
134
  - !ruby/object:Gem::Version
135
135
  version: '0'
136
136
  requirements: []
137
- rubyforge_project:
138
- rubygems_version: 2.4.5.1
139
- signing_key:
137
+ rubygems_version: 3.3.26
138
+ signing_key:
140
139
  specification_version: 4
141
- summary: Ruby implementation of Dunning's T-Digest for streaming quantile approximation
140
+ summary: TDigest for Ruby
142
141
  test_files: []
data/.travis.yml DELETED
@@ -1,6 +0,0 @@
1
- language: ruby
2
- rvm:
3
- - 1.9.3
4
- - 2.1.0
5
- - 2.2.3
6
- before_install: gem install bundler -v 1.10.6