anomaly 0.1.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 7cf7957fa3d0c9ddf4fb5dbc59c6b153ae0122d6777dffc4b088156a34b44c04
4
+ data.tar.gz: a4130a661e44bf2363ef024c8e5296803cab479d9bc69cff7661a5d702feccb7
5
+ SHA512:
6
+ metadata.gz: 3c13a829fdc03e54b408b7bfc8ab0ea5ccb6b7092a12146ee092071ccf81e73da278b7910b0dd44d846b2a3619580567da5c2116968d0d73b4bd13d5df414c1c
7
+ data.tar.gz: e7b7aff28acbaf2c9c3a95d62ccf8c10829689896a98e32a5f85ce22114754d42eeb0e5ccf70bbf49a4fda850fadcb367747caabc41b944db5e82bfd7686f351
data/CHANGELOG.md ADDED
@@ -0,0 +1,18 @@
1
+ ## 0.3.0 (2022-09-05)
2
+
3
+ - Dropped support for `narray` (use `numo-narray` instead)
4
+ - Dropped support for Ruby < 2.7
5
+
6
+ ## 0.2.1 (2020-04-16)
7
+
8
+ - Added support for multiple predictions
9
+
10
+ ## 0.2.0 (2019-10-27)
11
+
12
+ - Switched to Ruby `sum` for performance
13
+ - Added support for Numo::NArray
14
+ - Use keyword arguments
15
+
16
+ ## 0.1.0 (2011-12-18)
17
+
18
+ - Started changelog
@@ -1,4 +1,4 @@
1
- Copyright (c) 2011 Andrew Kane
1
+ Copyright (c) 2011-2022 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
@@ -19,4 +19,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
19
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
20
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
21
  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md CHANGED
@@ -1,32 +1,20 @@
1
1
  # Anomaly
2
2
 
3
- Easy-to-use anomaly detection
3
+ Easy-to-use anomaly detection for Ruby
4
+
5
+ [![Build Status](https://github.com/ankane/anomaly/workflows/build/badge.svg?branch=master)](https://github.com/ankane/anomaly/actions)
4
6
 
5
7
  ## Installation
6
8
 
7
- Add this line to your application's Gemfile:
9
+ Add this line to your applications Gemfile:
8
10
 
9
11
  ```ruby
10
12
  gem "anomaly"
11
13
  ```
12
14
 
13
- And then execute:
15
+ ## Getting Started
14
16
 
15
- ```sh
16
- bundle install
17
- ```
18
-
19
- For max performance (trains ~3x faster for large datasets), also install the NArray gem:
20
-
21
- ```ruby
22
- gem "narray"
23
- ```
24
-
25
- Anomaly will automatically detect it and use it.
26
-
27
- ## How to Use
28
-
29
- Say we have weather data and we want to predict if it's sunny. In this example, sunny days are non-anomalies, and days with other types of weather (rain, snow, etc.) are anomalies. The data looks like:
17
+ Say we have weather data and we want to predict if it’s sunny. In this example, sunny days are non-anomalies, and days with other types of weather (rain, snow, etc.) are anomalies. The data looks like:
30
18
 
31
19
  ```ruby
32
20
  # [temperature(°F), humidity(%), pressure(in), sunny?(y=0, n=1)]
@@ -41,57 +29,78 @@ weather_data = [
41
29
 
42
30
  The last column **must** be 0 for non-anomalies, 1 for anomalies. Non-anomalies are used to train the detector, and both anomalies and non-anomalies are used to find the best value of ε.
43
31
 
44
- To train the detector and test for anomalies, run:
32
+ Train the detector
45
33
 
46
34
  ```ruby
47
- ad = Anomaly::Detector.new(weather_data)
35
+ detector = Anomaly::Detector.new(weather_data)
36
+ ```
37
+
38
+ Test for anomalies
48
39
 
40
+ ```ruby
49
41
  # 85°F, 42% humidity, 12.3 in. pressure
50
- ad.anomaly?([85, 42, 12.3])
51
- # => true
42
+ detector.anomaly?([85, 42, 12.3])
52
43
  ```
53
44
 
54
45
  Anomaly automatically finds the best value for ε, which you can access with:
55
46
 
56
47
  ```ruby
57
- ad.eps
48
+ detector.eps
58
49
  ```
59
50
 
60
51
  If you already know you want ε = 0.01, initialize the detector with:
61
52
 
62
53
  ```ruby
63
- ad = Anomaly::Detector.new(weather_data, {:eps => 0.01})
54
+ detector = Anomaly::Detector.new(weather_data, eps: 0.01)
64
55
  ```
65
56
 
66
- ### Persistence
57
+ ## Persistence
67
58
 
68
- You can easily persist the detector to a file or database - it's very tiny.
59
+ You can easily persist the detector to a file or database - its very tiny.
69
60
 
70
61
  ```ruby
71
- serialized_ad = Marshal.dump(ad)
72
-
73
- # Save to a file
74
- File.open("anomaly_detector.dump", "w") {|f| f.write(serialized_ad) }
62
+ bin = Marshal.dump(detector)
63
+ File.binwrite("detector.bin", bin)
64
+ ```
75
65
 
76
- # ...
66
+ Then read it later:
77
67
 
78
- # Read it later
79
- ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read)
68
+ ```ruby
69
+ bin = File.binread("detector.bin")
70
+ detector = Marshal.load(bin)
80
71
  ```
81
72
 
82
- ## TODO
73
+ ## Related Projects
74
+
75
+ - [AnomalyDetection.rb](https://github.com/ankane/AnomalyDetection.rb) - Time series anomaly detection for Ruby
76
+ - [Prophet.rb](https://github.com/ankane/prophet-ruby) - Time series forecasting (and anomaly detection) for Ruby
77
+ - [IsoTree](https://github.com/ankane/isotree-ruby) - Outlier/anomaly detection for Ruby using Isolation Forest
78
+ - [OutlierTree](https://github.com/ankane/outliertree-ruby) - Explainable outlier/anomaly detection for Ruby
79
+ - [MIDAS](https://github.com/ankane/midas-ruby) - Edge stream anomaly detection for Ruby
80
+ - [Trend](https://github.com/ankane/trend-ruby) - Anomaly detection and forecasting for Ruby
81
+
82
+ ## Credits
83
83
 
84
- - Train in chunks (for very large datasets)
85
- - Multivariate normal distribution (possibly)
84
+ A special thanks to [Andrew Ng](https://www.coursera.org/learn/machine-learning).
85
+
86
+ ## History
87
+
88
+ View the [changelog](https://github.com/ankane/anomaly/blob/master/CHANGELOG.md)
86
89
 
87
90
  ## Contributing
88
91
 
89
- 1. Fork it
90
- 2. Create your feature branch (`git checkout -b my-new-feature`)
91
- 3. Commit your changes (`git commit -am 'Added some feature'`)
92
- 4. Push to the branch (`git push origin my-new-feature`)
93
- 5. Create new Pull Request
92
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
93
+
94
+ - [Report bugs](https://github.com/ankane/anomaly/issues)
95
+ - Fix bugs and [submit pull requests](https://github.com/ankane/anomaly/pulls)
96
+ - Write, clarify, or fix documentation
97
+ - Suggest or add new features
94
98
 
95
- ## Thanks
99
+ To get started with development:
96
100
 
97
- A special thanks to [Andrew Ng](http://www.ml-class.org).
101
+ ```sh
102
+ git clone https://github.com/ankane/anomaly.git
103
+ cd anomaly
104
+ bundle install
105
+ bundle exec rake spec
106
+ ```
@@ -1,13 +1,18 @@
1
1
  module Anomaly
2
2
  class Detector
3
+ attr_reader :mean, :std
3
4
  attr_accessor :eps
4
5
 
5
- def initialize(examples = nil, opts = {})
6
+ def initialize(examples = nil, **opts)
6
7
  @m = 0
7
- train(examples, opts) if examples
8
+ train(examples, **opts) if examples
8
9
  end
9
10
 
10
- def train(examples, opts = {})
11
+ def train(examples, eps: 0)
12
+ # for Numo::NArray
13
+ # TODO make more efficient when possible
14
+ examples = examples.to_a
15
+
11
16
  raise "No examples" if examples.empty?
12
17
  raise "Must have at least two columns" if examples.first.size < 2
13
18
 
@@ -24,7 +29,7 @@ module Anomaly
24
29
 
25
30
  raise "Must have at least one non-anomaly" if non_anomalies.empty?
26
31
 
27
- @eps = (opts[:eps] || 0).to_f
32
+ @eps = eps
28
33
  if @eps > 0
29
34
  # Use all non-anomalies to train.
30
35
  training_examples = non_anomalies
@@ -33,28 +38,33 @@ module Anomaly
33
38
  test_examples.concat(anomalies)
34
39
  end
35
40
  # Remove last column.
36
- training_examples = training_examples.map{|e| e[0..-2]}
41
+ training_examples = training_examples.map { |e| e[0..-2] }
37
42
  @m = training_examples.size
38
43
  @n = training_examples.first.size
39
44
 
40
- if defined?(NMatrix)
45
+ if defined?(Numo::SFloat)
46
+ training_examples = Numo::SFloat.cast(training_examples)
47
+ # Convert these to an Array for Marshal.dump
48
+ @mean = training_examples.mean(0).to_a
49
+ @std = training_examples.stddev(0).to_a
50
+ elsif defined?(NMatrix)
41
51
  training_examples = NMatrix.to_na(training_examples)
42
52
  # Convert these to an Array for Marshal.dump
43
53
  @mean = training_examples.mean(1).to_a
44
54
  @std = training_examples.stddev(1).to_a
45
55
  else
46
56
  # Default to Array, since built-in Matrix does not give us a big performance advantage.
47
- cols = @n.times.map{|i| training_examples.map{|r| r[i]}}
48
- @mean = cols.map{|c| mean(c)}
49
- @std = cols.each_with_index.map{|c,i| std(c, @mean[i])}
57
+ cols = @n.times.map { |i| training_examples.map { |r| r[i] } }
58
+ @mean = cols.map { |c| alt_mean(c) }
59
+ @std = cols.each_with_index.map { |c, i| alt_std(c, @mean[i]) }
50
60
  end
51
- @std.map!{|std| (std == 0 or std.nan?) ? Float::MIN : std}
61
+ @std.map! { |std| (std == 0 || std.nan?) ? 1e-10 : std }
52
62
 
53
63
  if @eps == 0
54
64
  # Find the best eps.
55
- epss = (1..9).map{|i| [1,3,5,7,9].map{|j| (j*10**(-i)).to_f }}.flatten
56
- f1_scores = epss.map{|eps| [eps, compute_f1_score(test_examples, eps)] }
57
- @eps, best_f1 = f1_scores.max_by{|v| v[1]}
65
+ epss = (1..9).map { |i| [1, 3, 5, 7, 9].map { |j| (j * 10**(-i)).to_f } }.flatten
66
+ f1_scores = epss.map { |eps| [eps, compute_f1_score(test_examples, eps)] }
67
+ @eps, _ = f1_scores.max_by { |v| v[1] }
58
68
  end
59
69
  end
60
70
 
@@ -64,25 +74,44 @@ module Anomaly
64
74
 
65
75
  # Limit the probability of features to [0,1]
66
76
  # to keep probabilities at same scale.
77
+ # Use log to prevent underflow
67
78
  def probability(x)
68
79
  raise "Train me first" unless trained?
69
- raise ArgumentError, "First argument must have #{@n} elements" if x.size != @n
70
- @n.times.map do |i|
71
- p = normal_pdf(x[i], @mean[i], @std[i])
72
- (p.nan? or p > 1) ? 1 : p
73
- end.reduce(1, :*)
80
+
81
+ singular = !x.first.is_a?(Array)
82
+ x = [x] if singular
83
+
84
+ y =
85
+ x.map do |xi|
86
+ prob = 0
87
+ @n.times.map do |i|
88
+ pi = normal_pdf(xi[i], @mean[i], @std[i])
89
+ prob += Math.log(pi > 1 ? 1 : pi)
90
+ end
91
+ Math.exp(prob)
92
+ end
93
+
94
+ singular ? y.first : y
74
95
  end
75
96
 
76
97
  def anomaly?(x, eps = @eps)
77
- probability(x) < eps
98
+ y = probability(x)
99
+
100
+ if y.is_a?(Array)
101
+ y.map do |yi|
102
+ yi < eps
103
+ end
104
+ else
105
+ y < eps
106
+ end
78
107
  end
79
108
 
80
109
  protected
81
110
 
82
- SQRT2PI = Math.sqrt(2*Math::PI)
111
+ SQRT2PI = Math.sqrt(2 * Math::PI)
83
112
 
84
113
  def normal_pdf(x, mean = 0, std = 1)
85
- 1/(SQRT2PI*std)*Math.exp(-((x - mean)**2/(2.0*(std**2))))
114
+ 1 / (SQRT2PI * std) * Math.exp(-((x - mean)**2 / (2.0 * (std**2))))
86
115
  end
87
116
 
88
117
  # Find best eps.
@@ -99,8 +128,8 @@ module Anomaly
99
128
  fn = 0
100
129
  examples.each do |example|
101
130
  act = example.last != 0
102
- pred = self.anomaly?(example[0..-2], eps)
103
- if act and pred
131
+ pred = anomaly?(example[0..-2], eps)
132
+ if act && pred
104
133
  tp += 1
105
134
  elsif pred # and !act
106
135
  fp += 1
@@ -120,13 +149,12 @@ module Anomaly
120
149
 
121
150
  # Not used for NArray
122
151
 
123
- def mean(x)
124
- x.inject(0.0){|a, i| a + i}/x.size
152
+ def alt_mean(x)
153
+ x.sum / x.size
125
154
  end
126
155
 
127
- def std(x, mean)
128
- Math.sqrt(x.inject(0.0){|a, i| a + (i - mean) ** 2}/(x.size - 1))
156
+ def alt_std(x, mean)
157
+ Math.sqrt(x.sum { |i| (i - mean)**2 }.to_f / (x.size - 1))
129
158
  end
130
-
131
159
  end
132
160
  end
@@ -1,3 +1,3 @@
1
1
  module Anomaly
2
- VERSION = "0.1.0"
2
+ VERSION = "0.3.0"
3
3
  end
metadata CHANGED
@@ -1,98 +1,48 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: anomaly
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
5
- prerelease:
4
+ version: 0.3.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - Andrew Kane
9
- autorequire:
8
+ autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2011-12-19 00:00:00.000000000Z
13
- dependencies:
14
- - !ruby/object:Gem::Dependency
15
- name: rake
16
- requirement: &2155813680 !ruby/object:Gem::Requirement
17
- none: false
18
- requirements:
19
- - - ! '>='
20
- - !ruby/object:Gem::Version
21
- version: '0'
22
- type: :development
23
- prerelease: false
24
- version_requirements: *2155813680
25
- - !ruby/object:Gem::Dependency
26
- name: rspec
27
- requirement: &2155813180 !ruby/object:Gem::Requirement
28
- none: false
29
- requirements:
30
- - - ! '>='
31
- - !ruby/object:Gem::Version
32
- version: 2.0.0
33
- type: :development
34
- prerelease: false
35
- version_requirements: *2155813180
36
- - !ruby/object:Gem::Dependency
37
- name: narray
38
- requirement: &2155812760 !ruby/object:Gem::Requirement
39
- none: false
40
- requirements:
41
- - - ! '>='
42
- - !ruby/object:Gem::Version
43
- version: '0'
44
- type: :development
45
- prerelease: false
46
- version_requirements: *2155812760
47
- description: Easy-to-use anomaly detection
48
- email:
49
- - andrew@getformidable.com
11
+ date: 2022-09-05 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description:
14
+ email: andrew@ankane.org
50
15
  executables: []
51
16
  extensions: []
52
17
  extra_rdoc_files: []
53
18
  files:
54
- - .gitignore
55
- - .rspec
56
- - Gemfile
57
- - LICENSE
19
+ - CHANGELOG.md
20
+ - LICENSE.txt
58
21
  - README.md
59
- - Rakefile
60
- - anomaly.gemspec
61
22
  - lib/anomaly.rb
62
23
  - lib/anomaly/detector.rb
63
24
  - lib/anomaly/version.rb
64
- - spec/anomaly/detector_spec.rb
65
- - spec/spec_helper.rb
66
25
  homepage: https://github.com/ankane/anomaly
67
- licenses: []
68
- post_install_message:
26
+ licenses:
27
+ - MIT
28
+ metadata: {}
29
+ post_install_message:
69
30
  rdoc_options: []
70
31
  require_paths:
71
32
  - lib
72
33
  required_ruby_version: !ruby/object:Gem::Requirement
73
- none: false
74
34
  requirements:
75
- - - ! '>='
35
+ - - ">="
76
36
  - !ruby/object:Gem::Version
77
- version: '0'
78
- segments:
79
- - 0
80
- hash: 1886385059125072633
37
+ version: '2.7'
81
38
  required_rubygems_version: !ruby/object:Gem::Requirement
82
- none: false
83
39
  requirements:
84
- - - ! '>='
40
+ - - ">="
85
41
  - !ruby/object:Gem::Version
86
42
  version: '0'
87
- segments:
88
- - 0
89
- hash: 1886385059125072633
90
43
  requirements: []
91
- rubyforge_project:
92
- rubygems_version: 1.8.11
93
- signing_key:
94
- specification_version: 3
95
- summary: Easy-to-use anomaly detection
96
- test_files:
97
- - spec/anomaly/detector_spec.rb
98
- - spec/spec_helper.rb
44
+ rubygems_version: 3.3.7
45
+ signing_key:
46
+ specification_version: 4
47
+ summary: Easy-to-use anomaly detection for Ruby
48
+ test_files: []
data/.gitignore DELETED
@@ -1,17 +0,0 @@
1
- *.gem
2
- *.rbc
3
- .bundle
4
- .config
5
- .yardoc
6
- Gemfile.lock
7
- InstalledFiles
8
- _yardoc
9
- coverage
10
- doc/
11
- lib/bundler/man
12
- pkg
13
- rdoc
14
- spec/reports
15
- test/tmp
16
- test/version_tmp
17
- tmp
data/.rspec DELETED
@@ -1 +0,0 @@
1
- --color
data/Gemfile DELETED
@@ -1,4 +0,0 @@
1
- source 'https://rubygems.org'
2
-
3
- # Specify your gem's dependencies in anomaly.gemspec
4
- gemspec
data/Rakefile DELETED
@@ -1,25 +0,0 @@
1
- #!/usr/bin/env rake
2
- require "bundler/gem_tasks"
3
- require "rspec/core/rake_task"
4
- RSpec::Core::RakeTask.new("spec")
5
-
6
- require "benchmark"
7
- require "anomaly"
8
-
9
- task :benchmark do
10
- examples = 1_000_000.times.map{ [rand, rand, rand, 0] }
11
-
12
- Benchmark.bm do |x|
13
- x.report { Anomaly::Detector.new(examples, {:eps => 0.5}) }
14
- require "narray"
15
- x.report { Anomaly::Detector.new(examples, {:eps => 0.5}) }
16
- end
17
- end
18
-
19
- task :random_examples do
20
- examples = 10_000.times.map{ [rand, rand(10), rand(100), 0] } +
21
- 100.times.map{ [rand + 1, rand(10) + 2, rand(100) + 20, 1] }
22
-
23
- ad = Anomaly::Detector.new(examples)
24
- puts ad.eps
25
- end
data/anomaly.gemspec DELETED
@@ -1,21 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- require File.expand_path('../lib/anomaly/version', __FILE__)
3
-
4
- Gem::Specification.new do |gem|
5
- gem.authors = ["Andrew Kane"]
6
- gem.email = ["andrew@getformidable.com"]
7
- gem.description = %q{Easy-to-use anomaly detection}
8
- gem.summary = %q{Easy-to-use anomaly detection}
9
- gem.homepage = "https://github.com/ankane/anomaly"
10
-
11
- gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
12
- gem.files = `git ls-files`.split("\n")
13
- gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
14
- gem.name = "anomaly"
15
- gem.require_paths = ["lib"]
16
- gem.version = Anomaly::VERSION
17
-
18
- gem.add_development_dependency "rake"
19
- gem.add_development_dependency "rspec", ">= 2.0.0"
20
- gem.add_development_dependency "narray"
21
- end
@@ -1,71 +0,0 @@
1
- require "spec_helper"
2
-
3
- describe Anomaly::Detector do
4
- let(:examples) { [[-1,-2,0],[0,0,0],[1,2,0]] }
5
- let(:ad) { Anomaly::Detector.new(examples) }
6
-
7
- # mean = [0, 0], std = [1, 2]
8
- it "computes the right probability" do
9
- ad.probability([0,0]).should == 0.079577471545947667
10
- end
11
-
12
- it "marshalizes" do
13
- expect{ Marshal.dump(ad) }.to_not raise_error
14
- end
15
-
16
- context "when standard deviation is 0" do
17
- let(:examples) { [[0,0],[0,0]] }
18
-
19
- it "returns infinity for mean" do
20
- ad.probability([0]).should == 1
21
- end
22
-
23
- it "returns 0 for not mean" do
24
- ad.probability([1]).should == 0
25
- end
26
- end
27
-
28
- context "when examples is an array" do
29
- let(:examples) { [[-1,-2,0],[0,0,0],[1,2,0]] }
30
- let(:sample) { [rand, rand] }
31
-
32
- it "returns the same probability as an NMatrix" do
33
- prob = ad.probability(sample)
34
- Object.send(:remove_const, :NMatrix)
35
- prob.should == Anomaly::Detector.new(examples).probability(sample)
36
- end
37
- end
38
-
39
- context "when lots of samples" do
40
- let(:examples) { m.times.map{[0,0]} }
41
- let(:m) { rand(100) + 1 }
42
-
43
- it { ad.trained?.should be_true }
44
- end
45
-
46
- context "when no samples" do
47
- let(:examples) { nil }
48
-
49
- it { ad.trained?.should be_false }
50
- end
51
-
52
- context "when pdf is greater than 1" do
53
- let(:examples) { 100.times.map{[0,0]}.push([1,0]) }
54
-
55
- it { ad.probability([0]).should == 1 }
56
- end
57
-
58
- context "when only anomalies" do
59
- let(:examples) { [[0,1]] }
60
-
61
- it "raises error" do
62
- expect{ ad }.to raise_error RuntimeError, "Must have at least one non-anomaly"
63
- end
64
- end
65
-
66
- context "when only one non-anomaly" do
67
- let(:examples) { [[0,0]] }
68
-
69
- it { ad.eps.should == 1e-1 }
70
- end
71
- end
data/spec/spec_helper.rb DELETED
@@ -1,8 +0,0 @@
1
- require "rubygems"
2
- require "bundler/setup"
3
-
4
- require "anomaly"
5
- require "narray"
6
-
7
- RSpec.configure do |config|
8
- end