rroc 0.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (3) hide show
  1. data/README.markdown +57 -0
  2. data/lib/rroc.rb +67 -0
  3. metadata +57 -0
@@ -0,0 +1,57 @@
1
+ RROC: Dead-simple ROC analysis in Ruby
2
+ ==========
3
+
4
+ Ported from the [ML-Mathematica](http://www.bioinf.jku.at/software/ML-Math/) set of machine learning demos.
5
+
6
+ This class provides methods for Reciever Operating Characteristic (ROC) calculation; algorithm copied from the ML-Mathematica[http://www.bioinf.jku.at/software/ML-Math/] mathematica library by Steven Bedrick ([steve@bedrick.org](mailto:steve@bedrick.org)). For an excellent overview of ROC analysis, check out:
7
+
8
+ Fawcett, T. "An introduction to ROC analysis" Pattern Recognition Letters 27 (2006) 861-874 [(pdf)](https://cours.etsmtl.ca/sys828/REFS/A1/Fawcett_PRL2006.pdf)
9
+
10
+ Installation
11
+ --------
12
+
13
+ gem install rroc
14
+
15
+ Usage
16
+ --------
17
+
18
+ Using RROC is very simple. It expects to be given data in the form of a _n_x2 matrix (i.e., an Array of 2-element Arrays) representing the output of a binary classifier. Each row represents a "case" (document, data point, etc.); the first column represents the classifier's discriminant value and the second column represents the ground-truth class label for that case. RROC expects class labels in the form of either 1 or -1; larger discriminant values should be associated with membership in class 1. You can determine the area under the ROC curve as follows:
19
+
20
+ require 'rroc'
21
+
22
+ my_data = open('some_data.csv').readlines.collect { |l| l.strip.split(",").map(&:to_f) }
23
+ auc = ROC.auc(my_data)
24
+ puts auc
25
+
26
+ If you want to obtain a set of points describing the ROC curve itself, you can do that by calling the `ROC.curve_points` method:
27
+
28
+ require 'rroc'
29
+
30
+ my_data = open('some_data.csv').readlines.collect { |l| l.strip.split(",").map(&:to_f) }
31
+ pts = ROC.curve_points(my_data) # returns something like: [[0.0, 0.0], [0.1, 0.01]... ]
32
+
33
+ The points are returned as an Array of two-element Arrays, each of which contains an X and a Y coordinate that you may then export or plot using the utility of your choice. For example, together with the `googlecharts` gem, you can obtain a Google Chart link that will display your ROC curve:
34
+
35
+
36
+ require 'gchart'
37
+ require 'rroc'
38
+
39
+ my_data = open('some_data.csv').readlines.collect { |l| l.strip.split(",").map(&:to_f) }
40
+ pts = ROC.curve_points(my_data)
41
+ puts Gchart.scatter(:data => [pts.collect { |x| x[0] }, pts.collect { |x| x[1] }])
42
+
43
+ Conclusion
44
+ ---------
45
+ RROC is designed to be a tool for bare-bones ROC analysis in Ruby; take it for what it is and use it at your own risk. Bug reports, forks, and patches are more than welcome!
46
+
47
+ License
48
+ ----------
49
+ Copyright (c) 2011, Steven Bedrick
50
+ All rights reserved.
51
+
52
+ Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
53
+
54
+ Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
55
+ Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
56
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
57
+
@@ -0,0 +1,67 @@
1
+ # This class provides methods for Reciever Operating Characteristic (ROC) curve calculation; algorithm copied from the ML-Mathematica[http://www.bioinf.jku.at/software/ML-Math/] mathematica library by Steven Bedrick (steve@bedrick.org). For an excellent overview of ROC analysis, check out:
2
+ #
3
+ # Fawcett, T. "An introduction to ROC analysis" Pattern Recognition Letters 27 (2006) 861-874 (pdf[https://cours.etsmtl.ca/sys828/REFS/A1/Fawcett_PRL2006.pdf])
4
+ #
5
+ # first col of mat is discrim. val; second col is label (+1 -> pos, -1 -> neg, higher disc. val -> more pos)
6
+ #
7
+ # e.g.: +[[.3, -1], [.7, 1], [.1, -1] . . . ]+
8
+ #
9
+ # The scale of first column is not important. The labels in the second column are.
10
+ #
11
+ # pts plot fpr (x-axis) against tpr (y-axis)
12
+
13
+ class ROC
14
+
15
+ # Calculates the "area under the ROC curve" for the output of a binary classifier.
16
+ #
17
+ # @param [Array] dat Classifier output for which the AUC should be calculated, in the form of a n x 2 matrix. Each row represents a "case" (document, example, etc.); the first column is the discriminant value, and the second column _must_ be either -1 (if the ground truth class of the case is "negative") or 1 (if the ground truth is "positive").
18
+ # @return [Fixnum] The "area under the ROC" curve.
19
+ def self.auc(dat)
20
+ return self.calc(dat, false)
21
+ end
22
+
23
+ # Returns a set of x/y coordinates describing an ROC curve for +dat+ plotting the FPR on the abscissa and the TPR on the ordinate.
24
+ # @param dat (see ROC.auc)
25
+ # @return [Array] x/y coordinates that, when plotted, illustrate an ROC curve for +dat+. Each element in the array is an array containing an x and y coordinate.
26
+ def self.curve_points(dat)
27
+ return self.calc(dat, true)[:points]
28
+ end
29
+
30
+ private
31
+ def self.calc(mat, inc_pts = false)
32
+ # sort by first col, ascending, and take labels
33
+ sorted_by_disc = mat.sort { |a,b| a[0] <=> b[0] }
34
+ sorted_labels = sorted_by_disc.collect { |d| d[1] }
35
+
36
+ # now let's count the number of positive and negatives:
37
+ pos = sorted_labels.count { |l| l == 1 }.to_f
38
+ neg = sorted_labels.count { |l| l == -1 }.to_f
39
+
40
+ auc = 0.0
41
+
42
+ plotlist = [[0,0]]
43
+
44
+ if pos > 0 # are there *any* true positives? If not we can't really calculate much...
45
+ c = 0.0 # how many positives we've seen thus far
46
+ n = 0.0 # how many negatives we've seen thus far
47
+ (sorted_labels.length - 1).downto(0) do |i| # walk backwards through the data...
48
+ if sorted_labels[i] > 0 # pos?
49
+ c += 1.0
50
+ else
51
+ n += 1.0
52
+ auc += (c / (pos * neg)) # update auc
53
+ end
54
+ plotlist << [n / neg, c / pos]
55
+ end
56
+ # plotlist << [1,0] # the original MMA has this, but I don't think we really need it.
57
+ end
58
+
59
+ if inc_pts # does the caller want x/y points describing the curve?
60
+ return {:auc => auc, :points => plotlist}
61
+ else # if not, just return the auc
62
+ return auc
63
+ end
64
+ end
65
+ end
66
+
67
+
metadata ADDED
@@ -0,0 +1,57 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: rroc
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: "0.1"
6
+ platform: ruby
7
+ authors:
8
+ - Steven Bedrick
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2011-06-30 00:00:00 -07:00
14
+ default_executable:
15
+ dependencies: []
16
+
17
+ description:
18
+ email: steve@bedrick.org
19
+ executables: []
20
+
21
+ extensions: []
22
+
23
+ extra_rdoc_files: []
24
+
25
+ files:
26
+ - README.markdown
27
+ - lib/rroc.rb
28
+ has_rdoc: true
29
+ homepage: https://github.com/stevenbedrick/RROC
30
+ licenses: []
31
+
32
+ post_install_message:
33
+ rdoc_options: []
34
+
35
+ require_paths:
36
+ - lib
37
+ required_ruby_version: !ruby/object:Gem::Requirement
38
+ none: false
39
+ requirements:
40
+ - - ">="
41
+ - !ruby/object:Gem::Version
42
+ version: "0"
43
+ required_rubygems_version: !ruby/object:Gem::Requirement
44
+ none: false
45
+ requirements:
46
+ - - ">="
47
+ - !ruby/object:Gem::Version
48
+ version: "0"
49
+ requirements: []
50
+
51
+ rubyforge_project:
52
+ rubygems_version: 1.6.2
53
+ signing_key:
54
+ specification_version: 3
55
+ summary: Dead-simple ROC analysis in Ruby.
56
+ test_files: []
57
+