anomaly 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 301e930624aaaf4c39f66458571f05b9e09cd06015c7d2682444a93d0424f2a7
4
+ data.tar.gz: 38841ca328edf7405bf0ae7b631538485ca7e226416d03b6977591897847bb3c
5
+ SHA512:
6
+ metadata.gz: 26ccb948b275b2a2787e1368abe909e7ef9f78793743cf14065b6be830861c1def870a49d862ba466995fbdea4a6a012479cc51fcc4c2f9ea99ff341ad4ecd87
7
+ data.tar.gz: a4bc731756ce68306d681433cc251af40e06cf1bd1ac21edd21ebfd7b8ef45ab05cffe3e557b2b731354bb377ade915f6e0e0f8b9b4fd945217bac3a278240c0
@@ -0,0 +1,9 @@
1
+ # 0.2.0
2
+
3
+ - Switched to Ruby `sum` for performance
4
+ - Added support for Numo::NArray
5
+ - Use keyword arguments
6
+
7
+ # 0.1.0
8
+
9
+ - Started changelog
data/README.md CHANGED
@@ -2,31 +2,19 @@
2
2
 
3
3
  Easy-to-use anomaly detection
4
4
 
5
+ [![Build Status](https://travis-ci.org/ankane/anomaly.svg?branch=master)](https://travis-ci.org/ankane/anomaly)
6
+
5
7
  ## Installation
6
8
 
7
- Add this line to your application's Gemfile:
9
+ Add this line to your applications Gemfile:
8
10
 
9
11
  ```ruby
10
12
  gem "anomaly"
11
13
  ```
12
14
 
13
- And then execute:
14
-
15
- ```sh
16
- bundle install
17
- ```
18
-
19
- For max performance (trains ~3x faster for large datasets), also install the NArray gem:
20
-
21
- ```ruby
22
- gem "narray"
23
- ```
24
-
25
- Anomaly will automatically detect it and use it.
26
-
27
- ## How to Use
15
+ ## Getting Started
28
16
 
29
- Say we have weather data and we want to predict if it's sunny. In this example, sunny days are non-anomalies, and days with other types of weather (rain, snow, etc.) are anomalies. The data looks like:
17
+ Say we have weather data and we want to predict if its sunny. In this example, sunny days are non-anomalies, and days with other types of weather (rain, snow, etc.) are anomalies. The data looks like:
30
18
 
31
19
  ```ruby
32
20
  # [temperature(°F), humidity(%), pressure(in), sunny?(y=0, n=1)]
@@ -44,54 +32,53 @@ The last column **must** be 0 for non-anomalies, 1 for anomalies. Non-anomalies
44
32
  To train the detector and test for anomalies, run:
45
33
 
46
34
  ```ruby
47
- ad = Anomaly::Detector.new(weather_data)
35
+ detector = Anomaly::Detector.new(weather_data)
48
36
 
49
37
  # 85°F, 42% humidity, 12.3 in. pressure
50
- ad.anomaly?([85, 42, 12.3])
51
- # => true
38
+ detector.anomaly?([85, 42, 12.3])
52
39
  ```
53
40
 
54
41
  Anomaly automatically finds the best value for ε, which you can access with:
55
42
 
56
43
  ```ruby
57
- ad.eps
44
+ detector.eps
58
45
  ```
59
46
 
60
47
  If you already know you want ε = 0.01, initialize the detector with:
61
48
 
62
49
  ```ruby
63
- ad = Anomaly::Detector.new(weather_data, {:eps => 0.01})
50
+ detector = Anomaly::Detector.new(weather_data, eps: 0.01)
64
51
  ```
65
52
 
66
- ### Persistence
53
+ ## Persistence
67
54
 
68
- You can easily persist the detector to a file or database - it's very tiny.
55
+ You can easily persist the detector to a file or database - its very tiny.
69
56
 
70
57
  ```ruby
71
- serialized_ad = Marshal.dump(ad)
72
-
73
- # Save to a file
74
- File.open("anomaly_detector.dump", "w") {|f| f.write(serialized_ad) }
58
+ dump = Marshal.dump(detector)
59
+ File.binwrite("detector.dump", dump)
60
+ ```
75
61
 
76
- # ...
62
+ Then read it later:
77
63
 
78
- # Read it later
79
- ad2 = Marshal.load(File.open("anomaly_detector.dump", "r").read)
64
+ ```ruby
65
+ dump = File.binread("detector.dump")
66
+ detector = Marshal.load(dump)
80
67
  ```
81
68
 
82
- ## TODO
69
+ ## Credits
70
+
71
+ A special thanks to [Andrew Ng](http://www.ml-class.org).
83
72
 
84
- - Train in chunks (for very large datasets)
85
- - Multivariate normal distribution (possibly)
73
+ ## History
86
74
 
87
- ## Contributing
75
+ View the [changelog](https://github.com/ankane/anomaly/blob/master/CHANGELOG.md)
88
76
 
89
- 1. Fork it
90
- 2. Create your feature branch (`git checkout -b my-new-feature`)
91
- 3. Commit your changes (`git commit -am 'Added some feature'`)
92
- 4. Push to the branch (`git push origin my-new-feature`)
93
- 5. Create new Pull Request
77
+ ## Contributing
94
78
 
95
- ## Thanks
79
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
96
80
 
97
- A special thanks to [Andrew Ng](http://www.ml-class.org).
81
+ - [Report bugs](https://github.com/ankane/anomaly/issues)
82
+ - Fix bugs and [submit pull requests](https://github.com/ankane/anomaly/pulls)
83
+ - Write, clarify, or fix documentation
84
+ - Suggest or add new features
@@ -1,13 +1,18 @@
1
1
  module Anomaly
2
2
  class Detector
3
+ attr_reader :mean, :std
3
4
  attr_accessor :eps
4
5
 
5
- def initialize(examples = nil, opts = {})
6
+ def initialize(examples = nil, **opts)
6
7
  @m = 0
7
- train(examples, opts) if examples
8
+ train(examples, **opts) if examples
8
9
  end
9
10
 
10
- def train(examples, opts = {})
11
+ def train(examples, eps: 0)
12
+ # for Numo::NArray
13
+ # TODO make more efficient when possible
14
+ examples = examples.to_a
15
+
11
16
  raise "No examples" if examples.empty?
12
17
  raise "Must have at least two columns" if examples.first.size < 2
13
18
 
@@ -24,7 +29,7 @@ module Anomaly
24
29
 
25
30
  raise "Must have at least one non-anomaly" if non_anomalies.empty?
26
31
 
27
- @eps = (opts[:eps] || 0).to_f
32
+ @eps = eps
28
33
  if @eps > 0
29
34
  # Use all non-anomalies to train.
30
35
  training_examples = non_anomalies
@@ -33,28 +38,33 @@ module Anomaly
33
38
  test_examples.concat(anomalies)
34
39
  end
35
40
  # Remove last column.
36
- training_examples = training_examples.map{|e| e[0..-2]}
41
+ training_examples = training_examples.map { |e| e[0..-2] }
37
42
  @m = training_examples.size
38
43
  @n = training_examples.first.size
39
44
 
40
- if defined?(NMatrix)
45
+ if defined?(Numo::SFloat)
46
+ training_examples = Numo::SFloat.cast(training_examples)
47
+ # Convert these to an Array for Marshal.dump
48
+ @mean = training_examples.mean(0).to_a
49
+ @std = training_examples.stddev(0).to_a
50
+ elsif defined?(NMatrix)
41
51
  training_examples = NMatrix.to_na(training_examples)
42
52
  # Convert these to an Array for Marshal.dump
43
53
  @mean = training_examples.mean(1).to_a
44
54
  @std = training_examples.stddev(1).to_a
45
55
  else
46
56
  # Default to Array, since built-in Matrix does not give us a big performance advantage.
47
- cols = @n.times.map{|i| training_examples.map{|r| r[i]}}
48
- @mean = cols.map{|c| mean(c)}
49
- @std = cols.each_with_index.map{|c,i| std(c, @mean[i])}
57
+ cols = @n.times.map { |i| training_examples.map { |r| r[i] } }
58
+ @mean = cols.map { |c| alt_mean(c) }
59
+ @std = cols.each_with_index.map { |c, i| alt_std(c, @mean[i]) }
50
60
  end
51
- @std.map!{|std| (std == 0 or std.nan?) ? Float::MIN : std}
61
+ @std.map! { |std| (std == 0 || std.nan?) ? Float::MIN : std }
52
62
 
53
63
  if @eps == 0
54
64
  # Find the best eps.
55
- epss = (1..9).map{|i| [1,3,5,7,9].map{|j| (j*10**(-i)).to_f }}.flatten
56
- f1_scores = epss.map{|eps| [eps, compute_f1_score(test_examples, eps)] }
57
- @eps, best_f1 = f1_scores.max_by{|v| v[1]}
65
+ epss = (1..9).map { |i| [1, 3, 5, 7, 9].map { |j| (j * 10**(-i)).to_f } }.flatten
66
+ f1_scores = epss.map { |eps| [eps, compute_f1_score(test_examples, eps)] }
67
+ @eps, _ = f1_scores.max_by { |v| v[1] }
58
68
  end
59
69
  end
60
70
 
@@ -69,7 +79,7 @@ module Anomaly
69
79
  raise ArgumentError, "First argument must have #{@n} elements" if x.size != @n
70
80
  @n.times.map do |i|
71
81
  p = normal_pdf(x[i], @mean[i], @std[i])
72
- (p.nan? or p > 1) ? 1 : p
82
+ (p.nan? || p > 1) ? 1 : p
73
83
  end.reduce(1, :*)
74
84
  end
75
85
 
@@ -79,10 +89,10 @@ module Anomaly
79
89
 
80
90
  protected
81
91
 
82
- SQRT2PI = Math.sqrt(2*Math::PI)
92
+ SQRT2PI = Math.sqrt(2 * Math::PI)
83
93
 
84
94
  def normal_pdf(x, mean = 0, std = 1)
85
- 1/(SQRT2PI*std)*Math.exp(-((x - mean)**2/(2.0*(std**2))))
95
+ 1 / (SQRT2PI * std) * Math.exp(-((x - mean)**2 / (2.0 * (std**2))))
86
96
  end
87
97
 
88
98
  # Find best eps.
@@ -99,8 +109,8 @@ module Anomaly
99
109
  fn = 0
100
110
  examples.each do |example|
101
111
  act = example.last != 0
102
- pred = self.anomaly?(example[0..-2], eps)
103
- if act and pred
112
+ pred = anomaly?(example[0..-2], eps)
113
+ if act && pred
104
114
  tp += 1
105
115
  elsif pred # and !act
106
116
  fp += 1
@@ -120,13 +130,12 @@ module Anomaly
120
130
 
121
131
  # Not used for NArray
122
132
 
123
- def mean(x)
124
- x.inject(0.0){|a, i| a + i}/x.size
133
+ def alt_mean(x)
134
+ x.sum / x.size
125
135
  end
126
136
 
127
- def std(x, mean)
128
- Math.sqrt(x.inject(0.0){|a, i| a + (i - mean) ** 2}/(x.size - 1))
137
+ def alt_std(x, mean)
138
+ Math.sqrt(x.sum { |i| (i - mean)**2 }.to_f / (x.size - 1))
129
139
  end
130
-
131
140
  end
132
141
  end
@@ -1,3 +1,3 @@
1
1
  module Anomaly
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
metadata CHANGED
@@ -1,98 +1,117 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: anomaly
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
5
- prerelease:
4
+ version: 0.2.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - Andrew Kane
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2011-12-19 00:00:00.000000000Z
11
+ date: 2019-10-28 00:00:00.000000000 Z
13
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
14
27
  - !ruby/object:Gem::Dependency
15
28
  name: rake
16
- requirement: &2155813680 !ruby/object:Gem::Requirement
17
- none: false
29
+ requirement: !ruby/object:Gem::Requirement
18
30
  requirements:
19
- - - ! '>='
31
+ - - ">="
20
32
  - !ruby/object:Gem::Version
21
33
  version: '0'
22
34
  type: :development
23
35
  prerelease: false
24
- version_requirements: *2155813680
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
25
41
  - !ruby/object:Gem::Dependency
26
42
  name: rspec
27
- requirement: &2155813180 !ruby/object:Gem::Requirement
28
- none: false
43
+ requirement: !ruby/object:Gem::Requirement
29
44
  requirements:
30
- - - ! '>='
45
+ - - ">="
31
46
  - !ruby/object:Gem::Version
32
- version: 2.0.0
47
+ version: '2'
33
48
  type: :development
34
49
  prerelease: false
35
- version_requirements: *2155813180
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '2'
36
55
  - !ruby/object:Gem::Dependency
37
56
  name: narray
38
- requirement: &2155812760 !ruby/object:Gem::Requirement
39
- none: false
57
+ requirement: !ruby/object:Gem::Requirement
40
58
  requirements:
41
- - - ! '>='
59
+ - - ">="
42
60
  - !ruby/object:Gem::Version
43
61
  version: '0'
44
62
  type: :development
45
63
  prerelease: false
46
- version_requirements: *2155812760
47
- description: Easy-to-use anomaly detection
48
- email:
49
- - andrew@getformidable.com
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: numo-narray
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ description:
84
+ email: andrew@chartkick.com
50
85
  executables: []
51
86
  extensions: []
52
87
  extra_rdoc_files: []
53
88
  files:
54
- - .gitignore
55
- - .rspec
56
- - Gemfile
57
- - LICENSE
89
+ - CHANGELOG.md
58
90
  - README.md
59
- - Rakefile
60
- - anomaly.gemspec
61
91
  - lib/anomaly.rb
62
92
  - lib/anomaly/detector.rb
63
93
  - lib/anomaly/version.rb
64
- - spec/anomaly/detector_spec.rb
65
- - spec/spec_helper.rb
66
94
  homepage: https://github.com/ankane/anomaly
67
- licenses: []
95
+ licenses:
96
+ - MIT
97
+ metadata: {}
68
98
  post_install_message:
69
99
  rdoc_options: []
70
100
  require_paths:
71
101
  - lib
72
102
  required_ruby_version: !ruby/object:Gem::Requirement
73
- none: false
74
103
  requirements:
75
- - - ! '>='
104
+ - - ">="
76
105
  - !ruby/object:Gem::Version
77
- version: '0'
78
- segments:
79
- - 0
80
- hash: 1886385059125072633
106
+ version: '2.4'
81
107
  required_rubygems_version: !ruby/object:Gem::Requirement
82
- none: false
83
108
  requirements:
84
- - - ! '>='
109
+ - - ">="
85
110
  - !ruby/object:Gem::Version
86
111
  version: '0'
87
- segments:
88
- - 0
89
- hash: 1886385059125072633
90
112
  requirements: []
91
- rubyforge_project:
92
- rubygems_version: 1.8.11
113
+ rubygems_version: 3.0.3
93
114
  signing_key:
94
- specification_version: 3
115
+ specification_version: 4
95
116
  summary: Easy-to-use anomaly detection
96
- test_files:
97
- - spec/anomaly/detector_spec.rb
98
- - spec/spec_helper.rb
117
+ test_files: []
data/.gitignore DELETED
@@ -1,17 +0,0 @@
1
- *.gem
2
- *.rbc
3
- .bundle
4
- .config
5
- .yardoc
6
- Gemfile.lock
7
- InstalledFiles
8
- _yardoc
9
- coverage
10
- doc/
11
- lib/bundler/man
12
- pkg
13
- rdoc
14
- spec/reports
15
- test/tmp
16
- test/version_tmp
17
- tmp
data/.rspec DELETED
@@ -1 +0,0 @@
1
- --color
data/Gemfile DELETED
@@ -1,4 +0,0 @@
1
- source 'https://rubygems.org'
2
-
3
- # Specify your gem's dependencies in anomaly.gemspec
4
- gemspec
data/LICENSE DELETED
@@ -1,22 +0,0 @@
1
- Copyright (c) 2011 Andrew Kane
2
-
3
- MIT License
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining
6
- a copy of this software and associated documentation files (the
7
- "Software"), to deal in the Software without restriction, including
8
- without limitation the rights to use, copy, modify, merge, publish,
9
- distribute, sublicense, and/or sell copies of the Software, and to
10
- permit persons to whom the Software is furnished to do so, subject to
11
- the following conditions:
12
-
13
- The above copyright notice and this permission notice shall be
14
- included in all copies or substantial portions of the Software.
15
-
16
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
- EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
- MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
- NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
- LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
- OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/Rakefile DELETED
@@ -1,25 +0,0 @@
1
- #!/usr/bin/env rake
2
- require "bundler/gem_tasks"
3
- require "rspec/core/rake_task"
4
- RSpec::Core::RakeTask.new("spec")
5
-
6
- require "benchmark"
7
- require "anomaly"
8
-
9
- task :benchmark do
10
- examples = 1_000_000.times.map{ [rand, rand, rand, 0] }
11
-
12
- Benchmark.bm do |x|
13
- x.report { Anomaly::Detector.new(examples, {:eps => 0.5}) }
14
- require "narray"
15
- x.report { Anomaly::Detector.new(examples, {:eps => 0.5}) }
16
- end
17
- end
18
-
19
- task :random_examples do
20
- examples = 10_000.times.map{ [rand, rand(10), rand(100), 0] } +
21
- 100.times.map{ [rand + 1, rand(10) + 2, rand(100) + 20, 1] }
22
-
23
- ad = Anomaly::Detector.new(examples)
24
- puts ad.eps
25
- end
@@ -1,21 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- require File.expand_path('../lib/anomaly/version', __FILE__)
3
-
4
- Gem::Specification.new do |gem|
5
- gem.authors = ["Andrew Kane"]
6
- gem.email = ["andrew@getformidable.com"]
7
- gem.description = %q{Easy-to-use anomaly detection}
8
- gem.summary = %q{Easy-to-use anomaly detection}
9
- gem.homepage = "https://github.com/ankane/anomaly"
10
-
11
- gem.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
12
- gem.files = `git ls-files`.split("\n")
13
- gem.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
14
- gem.name = "anomaly"
15
- gem.require_paths = ["lib"]
16
- gem.version = Anomaly::VERSION
17
-
18
- gem.add_development_dependency "rake"
19
- gem.add_development_dependency "rspec", ">= 2.0.0"
20
- gem.add_development_dependency "narray"
21
- end
@@ -1,71 +0,0 @@
1
- require "spec_helper"
2
-
3
- describe Anomaly::Detector do
4
- let(:examples) { [[-1,-2,0],[0,0,0],[1,2,0]] }
5
- let(:ad) { Anomaly::Detector.new(examples) }
6
-
7
- # mean = [0, 0], std = [1, 2]
8
- it "computes the right probability" do
9
- ad.probability([0,0]).should == 0.079577471545947667
10
- end
11
-
12
- it "marshalizes" do
13
- expect{ Marshal.dump(ad) }.to_not raise_error
14
- end
15
-
16
- context "when standard deviation is 0" do
17
- let(:examples) { [[0,0],[0,0]] }
18
-
19
- it "returns infinity for mean" do
20
- ad.probability([0]).should == 1
21
- end
22
-
23
- it "returns 0 for not mean" do
24
- ad.probability([1]).should == 0
25
- end
26
- end
27
-
28
- context "when examples is an array" do
29
- let(:examples) { [[-1,-2,0],[0,0,0],[1,2,0]] }
30
- let(:sample) { [rand, rand] }
31
-
32
- it "returns the same probability as an NMatrix" do
33
- prob = ad.probability(sample)
34
- Object.send(:remove_const, :NMatrix)
35
- prob.should == Anomaly::Detector.new(examples).probability(sample)
36
- end
37
- end
38
-
39
- context "when lots of samples" do
40
- let(:examples) { m.times.map{[0,0]} }
41
- let(:m) { rand(100) + 1 }
42
-
43
- it { ad.trained?.should be_true }
44
- end
45
-
46
- context "when no samples" do
47
- let(:examples) { nil }
48
-
49
- it { ad.trained?.should be_false }
50
- end
51
-
52
- context "when pdf is greater than 1" do
53
- let(:examples) { 100.times.map{[0,0]}.push([1,0]) }
54
-
55
- it { ad.probability([0]).should == 1 }
56
- end
57
-
58
- context "when only anomalies" do
59
- let(:examples) { [[0,1]] }
60
-
61
- it "raises error" do
62
- expect{ ad }.to raise_error RuntimeError, "Must have at least one non-anomaly"
63
- end
64
- end
65
-
66
- context "when only one non-anomaly" do
67
- let(:examples) { [[0,0]] }
68
-
69
- it { ad.eps.should == 1e-1 }
70
- end
71
- end
@@ -1,8 +0,0 @@
1
- require "rubygems"
2
- require "bundler/setup"
3
-
4
- require "anomaly"
5
- require "narray"
6
-
7
- RSpec.configure do |config|
8
- end