running_stat 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 8ec11d286e66a056fc873beecd760b3916008ea2
4
+ data.tar.gz: 99915adeadbbe43c59c190a5392f46c85f438126
5
+ SHA512:
6
+ metadata.gz: dc655fd6ca351e6bf5a3d9fb4331b2e704b22cd0a3551dd914ae025ad558b2a261dad16edbf7964578fc4945a7d3b54da013fc9040e83c8fb2929483f4e1e416
7
+ data.tar.gz: 5ed907c0cc481824640b8df911556c509616132e991ab1a9fecf347cb54c22398132990906bff31d6a6fac7cb1d5f7848b92509486dc3b9e620043867cd6178d
data/.gitignore ADDED
@@ -0,0 +1,12 @@
1
+ .ruby-version
2
+ .ruby-gemset
3
+ /.bundle/
4
+ /.yardoc
5
+ /Gemfile.lock
6
+ /_yardoc/
7
+ /coverage/
8
+ /doc/
9
+ /pkg/
10
+ /spec/reports/
11
+ /tmp/
12
+ .sw?
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/.travis.yml ADDED
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - jruby-1.7
4
+ - jruby-9k
5
+ - 1.8.7
6
+ - 1.9.3
7
+ - 2.2
8
+ before_install: gem install bundler -v 1.10.6
9
+ services:
10
+ - redis-server
data/CHANGELOG.md ADDED
@@ -0,0 +1,10 @@
1
+ # Change Log
2
+ All notable changes to this project will be documented in this file.
3
+ This project adheres to [Semantic Versioning](http://semver.org/).
4
+
5
+ ## [0.0.1] - 2016-01-11
6
+ ### Added
7
+ - Initial release
8
+
9
+ ### Changed
10
+ - n/a
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Anuj Das
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,81 @@
1
+ # RunningStat
2
+
3
+ [![Build Status](https://travis-ci.org/anujdas/running_stat.png?branch=master)](https://travis-ci.org/anujdas/running_stat)
4
+
5
+ [![Gem Version](https://badge.fury.io/rb/running_stat.png)](http://badge.fury.io/rb/running_stat)
6
+
7
+ RunningStat provides distributed redis-backed data buckets on which various statistics are calculated live without storing the datapoints themselves: cardinality, average (arithmetic mean), standard deviation, and variance. Numbers (integer or float) can be pushed into buckets atomically. The space and time overhead for each metric is constant and invariant under data cardinality.
8
+
9
+ The algorithm used is based on Knuth's TAOCP and is numerically stable; a brief writeup is available on Wikipedia under [Algorithms for calculating online variances](https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm).
10
+
11
+ Disclaimer: Because it uses atomic Lua scripting, RunningStat requires redis 2.6+.
12
+
13
+
14
+ ## Quickstart
15
+
16
+ RunningStat is easy to use. Since it depends on redis, you need to have a configured [Redis](https://github.com/redis/redis-rb) instance ([ConnectionPool](https://github.com/mperham/connection_pool) wrapper highly recommended/practically mandatory if your app is multithreaded). By default, RunningStat will use `Redis.current`, but you can pass a redis instance to the constructor in the `:redis` key, for example if you're using ConnectionPool:
17
+
18
+ ```ruby
19
+ stat = RunningStat.instance('my_bucket') # uses Redis.current
20
+ # or
21
+ stat = RunningStat.instance('my_bucket', redis: Redis.new(db: 1)) # uses db 1 for stats
22
+ # or
23
+ stat = $redis_pool.with { |redis| RunningStat.new('my_bucket', redis: redis) } # checks out from pool
24
+ ```
25
+
26
+ Now, for anything you want to measure, pick a bucket name and push your data:
27
+
28
+ ```ruby
29
+ stat = RunningStat.instance('my_bucket')
30
+ stat.push(1)
31
+ stat.push(100.0)
32
+ stat.push(-10)
33
+ ```
34
+
35
+ At any point, you can obtain stats about the data seen so far in a given bucket:
36
+
37
+ ```ruby
38
+ > stat = RunningStat.instance('my_bucket')
39
+ => #<RunningStat instance>
40
+ > stat.cardinality
41
+ => 3
42
+ > stat.mean
43
+ => 30.333333333333
44
+ > stat.std_dev
45
+ => 60.58327602014685
46
+ > stat.variance
47
+ => 3670.3333333333
48
+ ```
49
+
50
+ Reads and writes are both O(1); stats are calculated on insert, so reads are fast.
51
+
52
+ Note that by definition, none of the statistics except cardinality are defined for datasets of cardinality < 2; you'll see a RunningStat::InsufficientDataError raised instead.
53
+
54
+
55
+ ## Installation
56
+
57
+ Add this line to your application's Gemfile:
58
+
59
+ ```ruby
60
+ gem 'running_stat'
61
+ ```
62
+
63
+ And then execute:
64
+
65
+ $ bundle
66
+
67
+ Or install it yourself as:
68
+
69
+ $ gem install running_stat
70
+
71
+
72
+ ## Development
73
+
74
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
75
+
76
+ To install this gem onto your local machine, run `bundle exec rake install`.
77
+
78
+
79
+ ## License
80
+
81
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bundler/setup'
4
+ require 'running_stat'
5
+
6
+ require 'irb'
7
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,5 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
@@ -0,0 +1,6 @@
1
+ class RunningStat
2
+ # Raised when < 2 data points are provided
3
+ class InsufficientDataError < RuntimeError
4
+ ERROR_STRING = 'Insufficient Data'.freeze
5
+ end
6
+ end
@@ -0,0 +1,5 @@
1
+ class RunningStat
2
+ # Raised when a non-numerical value is pushed to a state
3
+ class InvalidDataError < ArgumentError
4
+ end
5
+ end
@@ -0,0 +1,31 @@
1
+ require 'digest/sha1'
2
+
3
+ class RunningStat
4
+ module Lua
5
+ # Keys:
6
+ # count - the dataset cardinality
7
+ # mean - the mean of the dataset so far
8
+ # m2 - the running sum of the squares of the differences of the dataset
9
+ # Arguments:
10
+ # datum - the number to be added to the dataset
11
+ # Effects:
12
+ # calculates running stats (mean, variance, std_dev)
13
+ # Returns:
14
+ # nothing
15
+ PUSH_DATUM = <<-EOLUA
16
+ local count_key = KEYS[1]
17
+ local mean_key = KEYS[2]
18
+ local m2_key = KEYS[3]
19
+ local datum = ARGV[1]
20
+
21
+ local mean = tonumber(redis.call("GET", mean_key)) or 0.0
22
+ local delta = datum - mean
23
+
24
+ local count = redis.call("INCR", count_key)
25
+ mean = redis.call("INCRBYFLOAT", mean_key, tostring(delta / count))
26
+ redis.call("INCRBYFLOAT", m2_key, tostring(delta * (datum - mean)))
27
+ EOLUA
28
+
29
+ PUSH_DATUM_SHA1 = Digest::SHA1.hexdigest(PUSH_DATUM).freeze
30
+ end
31
+ end
@@ -0,0 +1,30 @@
1
+ require 'digest/sha1'
2
+ require 'running_stat/insufficient_data_error'
3
+
4
+ class RunningStat
5
+ module Lua
6
+ # Keys:
7
+ # count - the dataset cardinality
8
+ # m2 - the running sum of the squares of the differences of the dataset
9
+ # Arguments:
10
+ # nothing
11
+ # Effects:
12
+ # nothing
13
+ # Returns:
14
+ # the sample variance of the dataset, as a stringified float
15
+ VARIANCE = <<-EOLUA
16
+ local count_key = KEYS[1]
17
+ local m2_key = KEYS[2]
18
+
19
+ local count = tonumber(redis.call("GET", count_key)) or 0
20
+ if count < 2 then
21
+ return redis.error_reply("#{InsufficientDataError::ERROR_STRING}")
22
+ else
23
+ local m2 = tonumber(redis.call("GET", m2_key)) or 0.0
24
+ return tostring(m2 / (count - 1))
25
+ end
26
+ EOLUA
27
+
28
+ VARIANCE_SHA1 = Digest::SHA1.hexdigest(VARIANCE).freeze
29
+ end
30
+ end
@@ -0,0 +1,3 @@
1
+ class RunningStat
2
+ VERSION = '0.0.1'
3
+ end
@@ -0,0 +1,78 @@
1
+ require 'running_stat/version'
2
+
3
+ require 'redis'
4
+
5
+ require 'running_stat/lua/push_datum'
6
+ require 'running_stat/lua/variance'
7
+ require 'running_stat/insufficient_data_error'
8
+ require 'running_stat/invalid_data_error'
9
+
10
+ class RunningStat
11
+ BASE_KEY = 'running_stat:v1'
12
+
13
+ # Returns an instance of RunningStat for the given dataset
14
+ def self.instance(data_bucket, opts = {})
15
+ new(data_bucket, opts)
16
+ end
17
+
18
+ def initialize(data_bucket, opts = {})
19
+ @data_bucket = data_bucket
20
+ @redis = opts[:redis]
21
+ end
22
+
23
+ # Adds a piece of numerical data to the dataset's stats
24
+ def push(datum)
25
+ redis.eval(Lua::PUSH_DATUM, [count_key, mean_key, sum_sq_diff_key], [Float(datum)])
26
+ rescue ArgumentError => e
27
+ raise InvalidDataError.new(e) # datum was non-numerical
28
+ end
29
+
30
+ # Returns the number of data points seen, or 0 if the stat does not exist
31
+ def cardinality
32
+ redis.get(count_key).to_i
33
+ end
34
+
35
+ # Returns the arithmetic mean of data points seen, or 0.0 if non-existent
36
+ def mean
37
+ redis.get(mean_key).to_f
38
+ end
39
+
40
+ # Returns the sample variance of the dataset so far, or raises
41
+ # an InsufficientDataError if insufficient data (< 2 datapoints)
42
+ # has been pushed
43
+ def variance
44
+ redis.eval(Lua::VARIANCE, [count_key, sum_sq_diff_key], []).to_f
45
+ rescue Redis::CommandError => e
46
+ raise InsufficientDataError.new(e) # only CommandError possible
47
+ end
48
+
49
+ # Returns the standard deviation of the dataset so far, or raises
50
+ # an InsufficientDataError if insufficient data (< 2 datapoints)
51
+ # has been pushed
52
+ def std_dev
53
+ Math.sqrt(variance)
54
+ end
55
+
56
+ # Resets the stat to reflect an empty dataset
57
+ def flush
58
+ redis.del(count_key, mean_key, sum_sq_diff_key)
59
+ end
60
+
61
+ private
62
+
63
+ def redis
64
+ @redis || Redis.current
65
+ end
66
+
67
+ def count_key
68
+ "#{BASE_KEY}:#{@data_bucket}:count"
69
+ end
70
+
71
+ def mean_key
72
+ "#{BASE_KEY}:#{@data_bucket}:mean"
73
+ end
74
+
75
+ def sum_sq_diff_key
76
+ "#{BASE_KEY}:#{@data_bucket}:sum_sq_diff"
77
+ end
78
+ end
@@ -0,0 +1,27 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'running_stat/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = 'running_stat'
8
+ spec.version = RunningStat::VERSION
9
+ spec.authors = ['Anuj Das']
10
+ spec.email = ['anujdas@gmail.com']
11
+
12
+ spec.summary = %q{A distributed streaming mean, variance, and standard deviation metric}
13
+ spec.description = %q{Using redis, allows statistics calculations on a streaming set of data without storing every value}
14
+ spec.homepage = 'https://www.github.com/anujdas/running_stat'
15
+ spec.license = 'MIT'
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
+ spec.bindir = 'exe'
19
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
+ spec.require_paths = ['lib']
21
+
22
+ spec.add_dependency 'redis', '~> 3.0'
23
+
24
+ spec.add_development_dependency 'bundler', '~> 1.10'
25
+ spec.add_development_dependency 'rake', '~> 10.0'
26
+ spec.add_development_dependency 'rspec', '~> 3.0'
27
+ end
metadata ADDED
@@ -0,0 +1,118 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: running_stat
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Anuj Das
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-01-11 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: redis
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '3.0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '3.0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.10'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.10'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '10.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '10.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '3.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '3.0'
69
+ description: Using redis, allows statistics calculations on a streaming set of data
70
+ without storing every value
71
+ email:
72
+ - anujdas@gmail.com
73
+ executables: []
74
+ extensions: []
75
+ extra_rdoc_files: []
76
+ files:
77
+ - ".gitignore"
78
+ - ".rspec"
79
+ - ".travis.yml"
80
+ - CHANGELOG.md
81
+ - Gemfile
82
+ - LICENSE.txt
83
+ - README.md
84
+ - Rakefile
85
+ - bin/console
86
+ - bin/setup
87
+ - lib/running_stat.rb
88
+ - lib/running_stat/insufficient_data_error.rb
89
+ - lib/running_stat/invalid_data_error.rb
90
+ - lib/running_stat/lua/push_datum.rb
91
+ - lib/running_stat/lua/variance.rb
92
+ - lib/running_stat/version.rb
93
+ - running_stat.gemspec
94
+ homepage: https://www.github.com/anujdas/running_stat
95
+ licenses:
96
+ - MIT
97
+ metadata: {}
98
+ post_install_message:
99
+ rdoc_options: []
100
+ require_paths:
101
+ - lib
102
+ required_ruby_version: !ruby/object:Gem::Requirement
103
+ requirements:
104
+ - - ">="
105
+ - !ruby/object:Gem::Version
106
+ version: '0'
107
+ required_rubygems_version: !ruby/object:Gem::Requirement
108
+ requirements:
109
+ - - ">="
110
+ - !ruby/object:Gem::Version
111
+ version: '0'
112
+ requirements: []
113
+ rubyforge_project:
114
+ rubygems_version: 2.4.8
115
+ signing_key:
116
+ specification_version: 4
117
+ summary: A distributed streaming mean, variance, and standard deviation metric
118
+ test_files: []