anomaly 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 301e930624aaaf4c39f66458571f05b9e09cd06015c7d2682444a93d0424f2a7
4
+ data.tar.gz: 38841ca328edf7405bf0ae7b631538485ca7e226416d03b6977591897847bb3c
5
+ SHA512:
6
+ metadata.gz: 26ccb948b275b2a2787e1368abe909e7ef9f78793743cf14065b6be830861c1def870a49d862ba466995fbdea4a6a012479cc51fcc4c2f9ea99ff341ad4ecd87
7
+ data.tar.gz: a4bc731756ce68306d681433cc251af40e06cf1bd1ac21edd21ebfd7b8ef45ab05cffe3e557b2b731354bb377ade915f6e0e0f8b9b4fd945217bac3a278240c0
@@ -0,0 +1,9 @@
1
+ # 0.2.0
2
+
3
+ - Switched to Ruby `sum` for performance
4
+ - Added support for Numo::NArray
5
+ - Use keyword arguments
6
+
7
+ # 0.1.0
8
+
9
+ - Started changelog
data/README.md CHANGED
@@ -2,31 +2,19 @@
2
2
 
3
3
  Easy-to-use anomaly detection
4
4
 
5
+ [![Build Status](https://travis-ci.org/ankane/anomaly.svg?branch=master)](https://travis-ci.org/ankane/anomaly)
6
+
5
7
  ## Installation
6
8
 
7
- Add this line to your application's Gemfile:
9
+ Add this line to your applications Gemfile:
8
10
 
9
11
  ```ruby
10
12
  gem "anomaly"
11
13
  ```
12
14
 
13
- And then execute:
14
-
15
- ```sh
16
- bundle install
17
- ```
18
-
19
- For max performance (trains ~3x faster for large datasets), also install the NArray gem:
20
-
21
- ```ruby
22
- gem "narray"
23
- ```
24
-
25
- Anomaly will automatically detect it and use it.
26
-
27
- ## How to Use
15
+ ## Getting Started
28
16
 
29
- Say we have weather data and we want to predict if it's sunny. In this example, sunny days are non-anomalies, and days with other types of weather (rain, snow, etc.) are anomalies. The data looks like:
17
+ Say we have weather data and we want to predict if its sunny. In this example, sunny days are non-anomalies, and days with other types of weather (rain, snow, etc.) are anomalies. The data looks like:
30
18
 
31
19
  ```ruby
32
20
  # [temperature(°F), humidity(%), pressure(in), sunny?(y=0, n=1)]
@@ -44,54 +32,53 @@ The last column **must** be 0 for non-anomalies, 1 for anomalies. Non-anomalies
44
32
  To train the detector and test for anomalies, run:
45
33
 
46
34
  ```ruby
47
- ad = Anomaly::Detector.new(weather_data)
35
+ detector = Anomaly::Detector.new(weather_data)
48
36
 
49
37
  # 85°F, 42% humidity, 12.3 in. pressure
50
- ad.anomaly?([85, 42, 12.3])
51
- # => true
38
+ detector.anomaly?([85, 42, 12.3])
52
39
  ```
53
40
 
54
41
  Anomaly automatically finds the best value for ε, which you can access with:
55
42
 
56
43
  ```ruby
57
- ad.eps
44
+ detector.eps
58
45
  ```
59
46
 
60
47
  If you already know you want ε = 0.01, initialize the detector with:
61
48
 
62
49
  ```ruby
63
- ad = Anomaly::Detector.new(weather_data, {:eps => 0.01})
50
+ detector = Anomaly::Detector.new(weather_data, eps: 0.01)
64
51
  ```
65
52
 
66
- ### Persistence
53
+ ## Persistence
67
54
 
68
- You can easily persist the detector to a file or database - it's very tiny.
55
+ You can easily persist the detector to a file or database - its very tiny.
69
56
 
70
57
  ```ruby
71
- serialized_ad = Marshal.dump(ad)
72
-
73
- # Save to a file
74
- File.open("anomaly_detector.dump", "w") {|f| f.write(serialized_ad) }
58
+ dump = Marshal.dump(detector)
59
+ File.binwrite("detector.dump", dump)
60
+ ```
75
61
 
76
- # ...
62
+ Then read it later:
77
63
 
78
- # Read it later
79
- ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read)
64
+ ```ruby
65
+ dump = File.binread("detector.dump")
66
+ detector = Marshal.load(dump)
80
67
  ```
81
68
 
82
- ## TODO
69
+ ## Credits
70
+
71
+ A special thanks to [Andrew Ng](http://www.ml-class.org).
83
72
 
84
- - Train in chunks (for very large datasets)
85
- - Multivariate normal distribution (possibly)
73
+ ## History
86
74
 
87
- ## Contributing
75
+ View the [changelog](https://github.com/ankane/anomaly/blob/master/CHANGELOG.md)
88
76
 
89
- 1. Fork it
90
- 2. Create your feature branch (`git checkout -b my-new-feature`)
91
- 3. Commit your changes (`git commit -am 'Added some feature'`)
92
- 4. Push to the branch (`git push origin my-new-feature`)
93
- 5. Create new Pull Request
77
+ ## Contributing
94
78
 
95
- ## Thanks
79
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
96
80
 
97
- A special thanks to [Andrew Ng](http://www.ml-class.org).
81
+ - [Report bugs](https://github.com/ankane/anomaly/issues)
82
+ - Fix bugs and [submit pull requests](https://github.com/ankane/anomaly/pulls)
83
+ - Write, clarify, or fix documentation
84
+ - Suggest or add new features
@@ -1,13 +1,18 @@
1
1
  module Anomaly
2
2
  class Detector
3
+ attr_reader :mean, :std
3
4
  attr_accessor :eps
4
5
 
5
- def initialize(examples = nil, opts = {})
6
+ def initialize(examples = nil, **opts)
6
7
  @m = 0
7
- train(examples, opts) if examples
8
+ train(examples, **opts) if examples
8
9
  end
9
10
 
10
- def train(examples, opts = {})
11
+ def train(examples, eps: 0)
12
+ # for Numo::NArray
13
+ # TODO make more efficient when possible
14
+ examples = examples.to_a
15
+
11
16
  raise "No examples" if examples.empty?
12
17
  raise "Must have at least two columns" if examples.first.size < 2
13
18
 
@@ -24,7 +29,7 @@ module Anomaly
24
29
 
25
30
  raise "Must have at least one non-anomaly" if non_anomalies.empty?
26
31
 
27
- @eps = (opts[:eps] || 0).to_f
32
+ @eps = eps
28
33
  if @eps > 0
29
34
  # Use all non-anomalies to train.
30
35
  training_examples = non_anomalies
@@ -33,28 +38,33 @@ module Anomaly
33
38
  test_examples.concat(anomalies)
34
39
  end
35
40
  # Remove last column.
36
- training_examples = training_examples.map{|e| e[0..-2]}
41
+ training_examples = training_examples.map { |e| e[0..-2] }
37
42
  @m = training_examples.size
38
43
  @n = training_examples.first.size
39
44
 
40
- if defined?(NMatrix)
45
+ if defined?(Numo::SFloat)
46
+ training_examples = Numo::SFloat.cast(training_examples)
47
+ # Convert these to an Array for Marshal.dump
48
+ @mean = training_examples.mean(0).to_a
49
+ @std = training_examples.stddev(0).to_a
50
+ elsif defined?(NMatrix)
41
51
  training_examples = NMatrix.to_na(training_examples)
42
52
  # Convert these to an Array for Marshal.dump
43
53
  @mean = training_examples.mean(1).to_a
44
54
  @std = training_examples.stddev(1).to_a
45
55
  else
46
56
  # Default to Array, since built-in Matrix does not give us a big performance advantage.
47
- cols = @n.times.map{|i| training_examples.map{|r| r[i]}}
48
- @mean = cols.map{|c| mean(c)}
49
- @std = cols.each_with_index.map{|c,i| std(c, @mean[i])}
57
+ cols = @n.times.map { |i| training_examples.map { |r| r[i] } }
58
+ @mean = cols.map { |c| alt_mean(c) }
59
+ @std = cols.each_with_index.map { |c, i| alt_std(c, @mean[i]) }
50
60
  end
51
- @std.map!{|std| (std == 0 or std.nan?) ? Float::MIN : std}
61
+ @std.map! { |std| (std == 0 || std.nan?) ? Float::MIN : std }
52
62
 
53
63
  if @eps == 0
54
64
  # Find the best eps.
55
- epss = (1..9).map{|i| [1,3,5,7,9].map{|j| (j*10**(-i)).to_f }}.flatten
56
- f1_scores = epss.map{|eps| [eps, compute_f1_score(test_examples, eps)] }
57
- @eps, best_f1 = f1_scores.max_by{|v| v[1]}
65
+ epss = (1..9).map { |i| [1, 3, 5, 7, 9].map { |j| (j * 10**(-i)).to_f } }.flatten
66
+ f1_scores = epss.map { |eps| [eps, compute_f1_score(test_examples, eps)] }
67
+ @eps, _ = f1_scores.max_by { |v| v[1] }
58
68
  end
59
69
  end
60
70
 
@@ -69,7 +79,7 @@ module Anomaly
69
79
  raise ArgumentError, "First argument must have #{@n} elements" if x.size != @n
70
80
  @n.times.map do |i|
71
81
  p = normal_pdf(x[i], @mean[i], @std[i])
72
- (p.nan? or p > 1) ? 1 : p
82
+ (p.nan? || p > 1) ? 1 : p
73
83
  end.reduce(1, :*)
74
84
  end
75
85
 
@@ -79,10 +89,10 @@ module Anomaly
79
89
 
80
90
  protected
81
91
 
82
- SQRT2PI = Math.sqrt(2*Math::PI)
92
+ SQRT2PI = Math.sqrt(2 * Math::PI)
83
93
 
84
94
  def normal_pdf(x, mean = 0, std = 1)
85
- 1/(SQRT2PI*std)*Math.exp(-((x - mean)**2/(2.0*(std**2))))
95
+ 1 / (SQRT2PI * std) * Math.exp(-((x - mean)**2 / (2.0 * (std**2))))
86
96
  end
87
97
 
88
98
  # Find best eps.
@@ -99,8 +109,8 @@ module Anomaly
99
109
  fn = 0
100
110
  examples.each do |example|
101
111
  act = example.last != 0
102
- pred = self.anomaly?(example[0..-2], eps)
103
- if act and pred
112
+ pred = anomaly?(example[0..-2], eps)
113
+ if act && pred
104
114
  tp += 1
105
115
  elsif pred # and !act
106
116
  fp += 1
@@ -120,13 +130,12 @@ module Anomaly
120
130
 
121
131
  # Not used for NArray
122
132
 
123
- def mean(x)
124
- x.inject(0.0){|a, i| a + i}/x.size
133
+ def alt_mean(x)
134
+ x.sum / x.size
125
135
  end
126
136
 
127
- def std(x, mean)
128
- Math.sqrt(x.inject(0.0){|a, i| a + (i - mean) ** 2}/(x.size - 1))
137
+ def alt_std(x, mean)
138
+ Math.sqrt(x.sum { |i| (i - mean)**2 }.to_f / (x.size - 1))
129
139
  end
130
-
131
140
  end
132
141
  end
@@ -1,3 +1,3 @@
1
1
  module Anomaly
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
metadata CHANGED
@@ -1,98 +1,117 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: anomaly
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
5
- prerelease:
4
+ version: 0.2.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - Andrew Kane
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2011-12-19 00:00:00.000000000Z
11
+ date: 2019-10-28 00:00:00.000000000 Z
13
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
14
27
  - !ruby/object:Gem::Dependency
15
28
  name: rake
16
- requirement: &2155813680 !ruby/object:Gem::Requirement
17
- none: false
29
+ requirement: !ruby/object:Gem::Requirement
18
30
  requirements:
19
- - - ! '>='
31
+ - - ">="
20
32
  - !ruby/object:Gem::Version
21
33
  version: '0'
22
34
  type: :development
23
35
  prerelease: false
24
- version_requirements: *2155813680
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
25
41
  - !ruby/object:Gem::Dependency
26
42
  name: rspec
27
- requirement: &2155813180 !ruby/object:Gem::Requirement
28
- none: false
43
+ requirement: !ruby/object:Gem::Requirement
29
44
  requirements:
30
- - - ! '>='
45
+ - - ">="
31
46
  - !ruby/object:Gem::Version
32
- version: 2.0.0
47
+ version: '2'
33
48
  type: :development
34
49
  prerelease: false
35
- version_requirements: *2155813180
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '2'
36
55
  - !ruby/object:Gem::Dependency
37
56
  name: narray
38
- requirement: &2155812760 !ruby/object:Gem::Requirement
39
- none: false
57
+ requirement: !ruby/object:Gem::Requirement
40
58
  requirements:
41
- - - ! '>='
59
+ - - ">="
42
60
  - !ruby/object:Gem::Version
43
61
  version: '0'
44
62
  type: :development
45
63
  prerelease: false
46
- version_requirements: *2155812760
47
- description: Easy-to-use anomaly detection
48
- email:
49
- - andrew@getformidable.com
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: numo-narray
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ description:
84
+ email: andrew@chartkick.com
50
85
  executables: []
51
86
  extensions: []
52
87
  extra_rdoc_files: []
53
88
  files:
54
- - .gitignore
55
- - .rspec
56
- - Gemfile
57
- - LICENSE
89
+ - CHANGELOG.md
58
90
  - README.md
59
- - Rakefile
60
- - anomaly.gemspec
61
91
  - lib/anomaly.rb
62
92
  - lib/anomaly/detector.rb
63
93
  - lib/anomaly/version.rb
64
- - spec/anomaly/detector_spec.rb
65
- - spec/spec_helper.rb
66
94
  homepage: https://github.com/ankane/anomaly
67
- licenses: []
95
+ licenses:
96
+ - MIT
97
+ metadata: {}
68
98
  post_install_message:
69
99
  rdoc_options: []
70
100
  require_paths:
71
101
  - lib
72
102
  required_ruby_version: !ruby/object:Gem::Requirement
73
- none: false
74
103
  requirements:
75
- - - ! '>='
104
+ - - ">="
76
105
  - !ruby/object:Gem::Version
77
- version: '0'
78
- segments:
79
- - 0
80
- hash: 1886385059125072633
106
+ version: '2.4'
81
107
  required_rubygems_version: !ruby/object:Gem::Requirement
82
- none: false
83
108
  requirements:
84
- - - ! '>='
109
+ - - ">="
85
110
  - !ruby/object:Gem::Version
86
111
  version: '0'
87
- segments:
88
- - 0
89
- hash: 1886385059125072633
90
112
  requirements: []
91
- rubyforge_project:
92
- rubygems_version: 1.8.11
113
+ rubygems_version: 3.0.3
93
114
  signing_key:
94
- specification_version: 3
115
+ specification_version: 4
95
116
  summary: Easy-to-use anomaly detection
96
- test_files:
97
- - spec/anomaly/detector_spec.rb
98
- - spec/spec_helper.rb
117
+ test_files: []
data/.gitignore DELETED
@@ -1,17 +0,0 @@
1
- *.gem
2
- *.rbc
3
- .bundle
4
- .config
5
- .yardoc
6
- Gemfile.lock
7
- InstalledFiles
8
- _yardoc
9
- coverage
10
- doc/
11
- lib/bundler/man
12
- pkg
13
- rdoc
14
- spec/reports
15
- test/tmp
16
- test/version_tmp
17
- tmp
data/.rspec DELETED
@@ -1 +0,0 @@
1
- --color
data/Gemfile DELETED
@@ -1,4 +0,0 @@
1
- source 'https://rubygems.org'
2
-
3
- # Specify your gem's dependencies in anomaly.gemspec
4
- gemspec
data/LICENSE DELETED
@@ -1,22 +0,0 @@
1
- Copyright (c) 2011 Andrew Kane
2
-
3
- MIT License
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining
6
- a copy of this software and associated documentation files (the
7
- "Software"), to deal in the Software without restriction, including
8
- without limitation the rights to use, copy, modify, merge, publish,
9
- distribute, sublicense, and/or sell copies of the Software, and to
10
- permit persons to whom the Software is furnished to do so, subject to
11
- the following conditions:
12
-
13
- The above copyright notice and this permission notice shall be
14
- included in all copies or substantial portions of the Software.
15
-
16
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile DELETED
@@ -1,25 +0,0 @@
1
- #!/usr/bin/env rake
2
- require "bundler/gem_tasks"
3
- require "rspec/core/rake_task"
4
- RSpec::Core::RakeTask.new("spec")
5
-
6
- require "benchmark"
7
- require "anomaly"
8
-
9
- task :benchmark do
10
- examples = 1_000_000.times.map{ [rand, rand, rand, 0] }
11
-
12
- Benchmark.bm do |x|
13
- x.report { Anomaly::Detector.new(examples, {:eps => 0.5}) }
14
- require "narray"
15
- x.report { Anomaly::Detector.new(examples, {:eps => 0.5}) }
16
- end
17
- end
18
-
19
- task :random_examples do
20
- examples = 10_000.times.map{ [rand, rand(10), rand(100), 0] } +
21
- 100.times.map{ [rand + 1, rand(10) + 2, rand(100) + 20, 1] }
22
-
23
- ad = Anomaly::Detector.new(examples)
24
- puts ad.eps
25
- end
@@ -1,21 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- require File.expand_path('../lib/anomaly/version', __FILE__)
3
-
4
- Gem::Specification.new do |gem|
5
- gem.authors = ["Andrew Kane"]
6
- gem.email = ["andrew@getformidable.com"]
7
- gem.description = %q{Easy-to-use anomaly detection}
8
- gem.summary = %q{Easy-to-use anomaly detection}
9
- gem.homepage = "https://github.com/ankane/anomaly"
10
-
11
- gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
12
- gem.files = `git ls-files`.split("\n")
13
- gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
14
- gem.name = "anomaly"
15
- gem.require_paths = ["lib"]
16
- gem.version = Anomaly::VERSION
17
-
18
- gem.add_development_dependency "rake"
19
- gem.add_development_dependency "rspec", ">= 2.0.0"
20
- gem.add_development_dependency "narray"
21
- end
@@ -1,71 +0,0 @@
1
- require "spec_helper"
2
-
3
- describe Anomaly::Detector do
4
- let(:examples) { [[-1,-2,0],[0,0,0],[1,2,0]] }
5
- let(:ad) { Anomaly::Detector.new(examples) }
6
-
7
- # mean = [0, 0], std = [1, 2]
8
- it "computes the right probability" do
9
- ad.probability([0,0]).should == 0.079577471545947667
10
- end
11
-
12
- it "marshalizes" do
13
- expect{ Marshal.dump(ad) }.to_not raise_error
14
- end
15
-
16
- context "when standard deviation is 0" do
17
- let(:examples) { [[0,0],[0,0]] }
18
-
19
- it "returns infinity for mean" do
20
- ad.probability([0]).should == 1
21
- end
22
-
23
- it "returns 0 for not mean" do
24
- ad.probability([1]).should == 0
25
- end
26
- end
27
-
28
- context "when examples is an array" do
29
- let(:examples) { [[-1,-2,0],[0,0,0],[1,2,0]] }
30
- let(:sample) { [rand, rand] }
31
-
32
- it "returns the same probability as an NMatrix" do
33
- prob = ad.probability(sample)
34
- Object.send(:remove_const, :NMatrix)
35
- prob.should == Anomaly::Detector.new(examples).probability(sample)
36
- end
37
- end
38
-
39
- context "when lots of samples" do
40
- let(:examples) { m.times.map{[0,0]} }
41
- let(:m) { rand(100) + 1 }
42
-
43
- it { ad.trained?.should be_true }
44
- end
45
-
46
- context "when no samples" do
47
- let(:examples) { nil }
48
-
49
- it { ad.trained?.should be_false }
50
- end
51
-
52
- context "when pdf is greater than 1" do
53
- let(:examples) { 100.times.map{[0,0]}.push([1,0]) }
54
-
55
- it { ad.probability([0]).should == 1 }
56
- end
57
-
58
- context "when only anomalies" do
59
- let(:examples) { [[0,1]] }
60
-
61
- it "raises error" do
62
- expect{ ad }.to raise_error RuntimeError, "Must have at least one non-anomaly"
63
- end
64
- end
65
-
66
- context "when only one non-anomaly" do
67
- let(:examples) { [[0,0]] }
68
-
69
- it { ad.eps.should == 1e-1 }
70
- end
71
- end
@@ -1,8 +0,0 @@
1
- require "rubygems"
2
- require "bundler/setup"
3
-
4
- require "anomaly"
5
- require "narray"
6
-
7
- RSpec.configure do |config|
8
- end