running_stat 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +12 -0
- data/.rspec +2 -0
- data/.travis.yml +10 -0
- data/CHANGELOG.md +10 -0
- data/Gemfile +3 -0
- data/LICENSE.txt +21 -0
- data/README.md +81 -0
- data/Rakefile +6 -0
- data/bin/console +7 -0
- data/bin/setup +5 -0
- data/lib/running_stat/insufficient_data_error.rb +6 -0
- data/lib/running_stat/invalid_data_error.rb +5 -0
- data/lib/running_stat/lua/push_datum.rb +31 -0
- data/lib/running_stat/lua/variance.rb +30 -0
- data/lib/running_stat/version.rb +3 -0
- data/lib/running_stat.rb +78 -0
- data/running_stat.gemspec +27 -0
- metadata +118 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 8ec11d286e66a056fc873beecd760b3916008ea2
|
4
|
+
data.tar.gz: 99915adeadbbe43c59c190a5392f46c85f438126
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: dc655fd6ca351e6bf5a3d9fb4331b2e704b22cd0a3551dd914ae025ad558b2a261dad16edbf7964578fc4945a7d3b54da013fc9040e83c8fb2929483f4e1e416
|
7
|
+
data.tar.gz: 5ed907c0cc481824640b8df911556c509616132e991ab1a9fecf347cb54c22398132990906bff31d6a6fac7cb1d5f7848b92509486dc3b9e620043867cd6178d
|
data/.gitignore
ADDED
data/.rspec
ADDED
data/.travis.yml
ADDED
data/CHANGELOG.md
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2015 Anuj Das
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
13
|
+
all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,81 @@
|
|
1
|
+
# RunningStat
|
2
|
+
|
3
|
+
[![Build Status](https://travis-ci.org/anujdas/running_stat.png?branch=master)](https://travis-ci.org/anujdas/running_stat)
|
4
|
+
|
5
|
+
[![Gem Version](https://badge.fury.io/rb/running_stat.png)](http://badge.fury.io/rb/running_stat)
|
6
|
+
|
7
|
+
RunningStat provides distributed redis-backed data buckets on which various statistics are calculated live without storing the datapoints themselves: cardinality, average (arithmetic mean), standard deviation, and variance. Numbers (integer or float) can be pushed into buckets atomically. The space and time overhead for each metric is constant and invariant under data cardinality.
|
8
|
+
|
9
|
+
The algorithm used is based on Knuth's TAOCP and is numerically stable; a brief writeup is available on Wikipedia under [Algorithms for calculating online variances](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm).
|
10
|
+
|
11
|
+
Disclaimer: Because it uses atomic Lua scripting, RunningStat requires redis 2.6+.
|
12
|
+
|
13
|
+
|
14
|
+
## Quickstart
|
15
|
+
|
16
|
+
RunningStat is easy to use. Since it depends on redis, you need to have a configured [Redis](https://github.com/redis/redis-rb) instance ([ConnectionPool](https://github.com/mperham/connection_pool) wrapper highly recommended/practically mandatory if your app is multithreaded). By default, RunningStat will use `Redis.current`, but you can pass a redis instance to the constructor in the `:redis` key, for example if you're using ConnectionPool:
|
17
|
+
|
18
|
+
```ruby
|
19
|
+
stat = RunningStat.instance('my_bucket') # uses Redis.current
|
20
|
+
# or
|
21
|
+
stat = RunningStat.instance('my_bucket', redis: Redis.new(db: 1)) # uses db 1 for stats
|
22
|
+
# or
|
23
|
+
stat = $redis_pool.with { |redis| RunningStat.new('my_bucket', redis: redis) } # checks out from pool
|
24
|
+
```
|
25
|
+
|
26
|
+
Now, for anything you want to measure, pick a bucket name and push your data:
|
27
|
+
|
28
|
+
```ruby
|
29
|
+
stat = RunningStat.instance('my_bucket')
|
30
|
+
stat.push(1)
|
31
|
+
stat.push(100.0)
|
32
|
+
stat.push(-10)
|
33
|
+
```
|
34
|
+
|
35
|
+
At any point, you can obtain stats about the data seen so far in a given bucket:
|
36
|
+
|
37
|
+
```ruby
|
38
|
+
> stat = RunningStat.instance('my_bucket')
|
39
|
+
=> #<RunningStat instance>
|
40
|
+
> stat.cardinality
|
41
|
+
=> 3
|
42
|
+
> stat.mean
|
43
|
+
=> 30.333333333333
|
44
|
+
> stat.std_dev
|
45
|
+
=> 60.58327602014685
|
46
|
+
> stat.variance
|
47
|
+
=> 3670.3333333333
|
48
|
+
```
|
49
|
+
|
50
|
+
Reads and writes are both O(1); stats are calculated on insert, so reads are fast.
|
51
|
+
|
52
|
+
Note that by definition, none of the statistics except cardinality are defined for datasets of cardinality < 2; you'll see a RunningStat::InsufficientDataError raised instead.
|
53
|
+
|
54
|
+
|
55
|
+
## Installation
|
56
|
+
|
57
|
+
Add this line to your application's Gemfile:
|
58
|
+
|
59
|
+
```ruby
|
60
|
+
gem 'running_stat'
|
61
|
+
```
|
62
|
+
|
63
|
+
And then execute:
|
64
|
+
|
65
|
+
$ bundle
|
66
|
+
|
67
|
+
Or install it yourself as:
|
68
|
+
|
69
|
+
$ gem install running_stat
|
70
|
+
|
71
|
+
|
72
|
+
## Development
|
73
|
+
|
74
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
75
|
+
|
76
|
+
To install this gem onto your local machine, run `bundle exec rake install`.
|
77
|
+
|
78
|
+
|
79
|
+
## License
|
80
|
+
|
81
|
+
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
data/Rakefile
ADDED
data/bin/console
ADDED
data/bin/setup
ADDED
@@ -0,0 +1,31 @@
|
|
1
|
+
require 'digest/sha1'
|
2
|
+
|
3
|
+
class RunningStat
|
4
|
+
module Lua
|
5
|
+
# Keys:
|
6
|
+
# count - the dataset cardinality
|
7
|
+
# mean - the mean of the dataset so far
|
8
|
+
# m2 - the running sum of the squares of the differences of the dataset
|
9
|
+
# Arguments:
|
10
|
+
# datum - the number to be added to the dataset
|
11
|
+
# Effects:
|
12
|
+
# calculates running stats (mean, variance, std_dev)
|
13
|
+
# Returns:
|
14
|
+
# nothing
|
15
|
+
PUSH_DATUM = <<-EOLUA
|
16
|
+
local count_key = KEYS[1]
|
17
|
+
local mean_key = KEYS[2]
|
18
|
+
local m2_key = KEYS[3]
|
19
|
+
local datum = ARGV[1]
|
20
|
+
|
21
|
+
local mean = tonumber(redis.call("GET", mean_key)) or 0.0
|
22
|
+
local delta = datum - mean
|
23
|
+
|
24
|
+
local count = redis.call("INCR", count_key)
|
25
|
+
mean = redis.call("INCRBYFLOAT", mean_key, tostring(delta / count))
|
26
|
+
redis.call("INCRBYFLOAT", m2_key, tostring(delta * (datum - mean)))
|
27
|
+
EOLUA
|
28
|
+
|
29
|
+
PUSH_DATUM_SHA1 = Digest::SHA1.hexdigest(PUSH_DATUM).freeze
|
30
|
+
end
|
31
|
+
end
|
@@ -0,0 +1,30 @@
|
|
1
|
+
require 'digest/sha1'
|
2
|
+
require 'running_stat/insufficient_data_error'
|
3
|
+
|
4
|
+
class RunningStat
|
5
|
+
module Lua
|
6
|
+
# Keys:
|
7
|
+
# count - the dataset cardinality
|
8
|
+
# m2 - the running sum of the squares of the differences of the dataset
|
9
|
+
# Arguments:
|
10
|
+
# nothing
|
11
|
+
# Effects:
|
12
|
+
# nothing
|
13
|
+
# Returns:
|
14
|
+
# the sample variance of the dataset, as a stringified float
|
15
|
+
VARIANCE = <<-EOLUA
|
16
|
+
local count_key = KEYS[1]
|
17
|
+
local m2_key = KEYS[2]
|
18
|
+
|
19
|
+
local count = tonumber(redis.call("GET", count_key)) or 0
|
20
|
+
if count < 2 then
|
21
|
+
return redis.error_reply("#{InsufficientDataError::ERROR_STRING}")
|
22
|
+
else
|
23
|
+
local m2 = tonumber(redis.call("GET", m2_key)) or 0.0
|
24
|
+
return tostring(m2 / (count - 1))
|
25
|
+
end
|
26
|
+
EOLUA
|
27
|
+
|
28
|
+
VARIANCE_SHA1 = Digest::SHA1.hexdigest(VARIANCE).freeze
|
29
|
+
end
|
30
|
+
end
|
data/lib/running_stat.rb
ADDED
@@ -0,0 +1,78 @@
|
|
1
|
+
require 'running_stat/version'
|
2
|
+
|
3
|
+
require 'redis'
|
4
|
+
|
5
|
+
require 'running_stat/lua/push_datum'
|
6
|
+
require 'running_stat/lua/variance'
|
7
|
+
require 'running_stat/insufficient_data_error'
|
8
|
+
require 'running_stat/invalid_data_error'
|
9
|
+
|
10
|
+
class RunningStat
|
11
|
+
BASE_KEY = 'running_stat:v1'
|
12
|
+
|
13
|
+
# Returns an instance of RunningStat for the given dataset
|
14
|
+
def self.instance(data_bucket, opts = {})
|
15
|
+
new(data_bucket, opts)
|
16
|
+
end
|
17
|
+
|
18
|
+
def initialize(data_bucket, opts = {})
|
19
|
+
@data_bucket = data_bucket
|
20
|
+
@redis = opts[:redis]
|
21
|
+
end
|
22
|
+
|
23
|
+
# Adds a piece of numerical data to the dataset's stats
|
24
|
+
def push(datum)
|
25
|
+
redis.eval(Lua::PUSH_DATUM, [count_key, mean_key, sum_sq_diff_key], [Float(datum)])
|
26
|
+
rescue ArgumentError => e
|
27
|
+
raise InvalidDataError.new(e) # datum was non-numerical
|
28
|
+
end
|
29
|
+
|
30
|
+
# Returns the number of data points seen, or 0 if the stat does not exist
|
31
|
+
def cardinality
|
32
|
+
redis.get(count_key).to_i
|
33
|
+
end
|
34
|
+
|
35
|
+
# Returns the arithmetic mean of data points seen, or 0.0 if non-existent
|
36
|
+
def mean
|
37
|
+
redis.get(mean_key).to_f
|
38
|
+
end
|
39
|
+
|
40
|
+
# Returns the sample variance of the dataset so far, or raises
|
41
|
+
# an InsufficientDataError if insufficient data (< 2 datapoints)
|
42
|
+
# has been pushed
|
43
|
+
def variance
|
44
|
+
redis.eval(Lua::VARIANCE, [count_key, sum_sq_diff_key], []).to_f
|
45
|
+
rescue Redis::CommandError => e
|
46
|
+
raise InsufficientDataError.new(e) # only CommandError possible
|
47
|
+
end
|
48
|
+
|
49
|
+
# Returns the standard deviation of the dataset so far, or raises
|
50
|
+
# an InsufficientDataError if insufficient data (< 2 datapoints)
|
51
|
+
# has been pushed
|
52
|
+
def std_dev
|
53
|
+
Math.sqrt(variance)
|
54
|
+
end
|
55
|
+
|
56
|
+
# Resets the stat to reflect an empty dataset
|
57
|
+
def flush
|
58
|
+
redis.del(count_key, mean_key, sum_sq_diff_key)
|
59
|
+
end
|
60
|
+
|
61
|
+
private
|
62
|
+
|
63
|
+
def redis
|
64
|
+
@redis || Redis.current
|
65
|
+
end
|
66
|
+
|
67
|
+
def count_key
|
68
|
+
"#{BASE_KEY}:#{@data_bucket}:count"
|
69
|
+
end
|
70
|
+
|
71
|
+
def mean_key
|
72
|
+
"#{BASE_KEY}:#{@data_bucket}:mean"
|
73
|
+
end
|
74
|
+
|
75
|
+
def sum_sq_diff_key
|
76
|
+
"#{BASE_KEY}:#{@data_bucket}:sum_sq_diff"
|
77
|
+
end
|
78
|
+
end
|
@@ -0,0 +1,27 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'running_stat/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = 'running_stat'
|
8
|
+
spec.version = RunningStat::VERSION
|
9
|
+
spec.authors = ['Anuj Das']
|
10
|
+
spec.email = ['anujdas@gmail.com']
|
11
|
+
|
12
|
+
spec.summary = %q{A distributed streaming mean, variance, and standard deviation metric}
|
13
|
+
spec.description = %q{Using redis, allows statistics calculations on a streaming set of data without storing every value}
|
14
|
+
spec.homepage = 'https://www.github.com/anujdas/running_stat'
|
15
|
+
spec.license = 'MIT'
|
16
|
+
|
17
|
+
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
18
|
+
spec.bindir = 'exe'
|
19
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
20
|
+
spec.require_paths = ['lib']
|
21
|
+
|
22
|
+
spec.add_dependency 'redis', '~> 3.0'
|
23
|
+
|
24
|
+
spec.add_development_dependency 'bundler', '~> 1.10'
|
25
|
+
spec.add_development_dependency 'rake', '~> 10.0'
|
26
|
+
spec.add_development_dependency 'rspec', '~> 3.0'
|
27
|
+
end
|
metadata
ADDED
@@ -0,0 +1,118 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: running_stat
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.0.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Anuj Das
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2016-01-11 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: redis
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '3.0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '3.0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: bundler
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '1.10'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '1.10'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rake
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '10.0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - "~>"
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '10.0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: rspec
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - "~>"
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '3.0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - "~>"
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '3.0'
|
69
|
+
description: Using redis, allows statistics calculations on a streaming set of data
|
70
|
+
without storing every value
|
71
|
+
email:
|
72
|
+
- anujdas@gmail.com
|
73
|
+
executables: []
|
74
|
+
extensions: []
|
75
|
+
extra_rdoc_files: []
|
76
|
+
files:
|
77
|
+
- ".gitignore"
|
78
|
+
- ".rspec"
|
79
|
+
- ".travis.yml"
|
80
|
+
- CHANGELOG.md
|
81
|
+
- Gemfile
|
82
|
+
- LICENSE.txt
|
83
|
+
- README.md
|
84
|
+
- Rakefile
|
85
|
+
- bin/console
|
86
|
+
- bin/setup
|
87
|
+
- lib/running_stat.rb
|
88
|
+
- lib/running_stat/insufficient_data_error.rb
|
89
|
+
- lib/running_stat/invalid_data_error.rb
|
90
|
+
- lib/running_stat/lua/push_datum.rb
|
91
|
+
- lib/running_stat/lua/variance.rb
|
92
|
+
- lib/running_stat/version.rb
|
93
|
+
- running_stat.gemspec
|
94
|
+
homepage: https://www.github.com/anujdas/running_stat
|
95
|
+
licenses:
|
96
|
+
- MIT
|
97
|
+
metadata: {}
|
98
|
+
post_install_message:
|
99
|
+
rdoc_options: []
|
100
|
+
require_paths:
|
101
|
+
- lib
|
102
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
103
|
+
requirements:
|
104
|
+
- - ">="
|
105
|
+
- !ruby/object:Gem::Version
|
106
|
+
version: '0'
|
107
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
108
|
+
requirements:
|
109
|
+
- - ">="
|
110
|
+
- !ruby/object:Gem::Version
|
111
|
+
version: '0'
|
112
|
+
requirements: []
|
113
|
+
rubyforge_project:
|
114
|
+
rubygems_version: 2.4.8
|
115
|
+
signing_key:
|
116
|
+
specification_version: 4
|
117
|
+
summary: A distributed streaming mean, variance, and standard deviation metric
|
118
|
+
test_files: []
|