rroc 0.1
Sign up to get free protection for your applications and to get access to all the features.
- data/README.markdown +57 -0
- data/lib/rroc.rb +67 -0
- metadata +57 -0
data/README.markdown
ADDED
@@ -0,0 +1,57 @@
|
|
1
|
+
RROC: Dead-simple ROC analysis in Ruby
|
2
|
+
==========
|
3
|
+
|
4
|
+
Ported from the [ML-Mathematica](http://www.bioinf.jku.at/software/ML-Math/) set of machine learning demos.
|
5
|
+
|
6
|
+
This class provides methods for Reciever Operating Characteristic (ROC) calculation; algorithm copied from the ML-Mathematica[http://www.bioinf.jku.at/software/ML-Math/] mathematica library by Steven Bedrick ([steve@bedrick.org](mailto:steve@bedrick.org)). For an excellent overview of ROC analysis, check out:
|
7
|
+
|
8
|
+
Fawcett, T. "An introduction to ROC analysis" Pattern Recognition Letters 27 (2006) 861-874 [(pdf)](https://cours.etsmtl.ca/sys828/REFS/A1/Fawcett_PRL2006.pdf)
|
9
|
+
|
10
|
+
Installation
|
11
|
+
--------
|
12
|
+
|
13
|
+
gem install rroc
|
14
|
+
|
15
|
+
Usage
|
16
|
+
--------
|
17
|
+
|
18
|
+
Using RROC is very simple. It expects to be given data in the form of a _n_x2 matrix (i.e., an Array of 2-element Arrays) representing the output of a binary classifier. Each row represents a "case" (document, data point, etc.); the first column represents the classifier's discriminant value and the second column represents the ground-truth class label for that case. RROC expects class labels in the form of either 1 or -1; larger discriminant values should be associated with membership in class 1. You can determine the area under the ROC curve as follows:
|
19
|
+
|
20
|
+
require 'rroc'
|
21
|
+
|
22
|
+
my_data = open('some_data.csv').readlines.collect { |l| l.strip.split(",").map(&:to_f) }
|
23
|
+
auc = ROC.auc(my_data)
|
24
|
+
puts auc
|
25
|
+
|
26
|
+
If you want to obtain a set of points describing the ROC curve itself, you can do that by calling the `ROC.curve_points` method:
|
27
|
+
|
28
|
+
require 'rroc'
|
29
|
+
|
30
|
+
my_data = open('some_data.csv').readlines.collect { |l| l.strip.split(",").map(&:to_f) }
|
31
|
+
pts = ROC.curve_points(my_data) # returns something like: [[0.0, 0.0], [0.1, 0.01]... ]
|
32
|
+
|
33
|
+
The points are returned as an Array of two-element Arrays, each of which contains an X and a Y coordinate that you may then export or plot using the utility of your choice. For example, together with the `googlecharts` gem, you can obtain a Google Chart link that will display your ROC curve:
|
34
|
+
|
35
|
+
|
36
|
+
require 'gchart'
|
37
|
+
require 'rroc'
|
38
|
+
|
39
|
+
my_data = open('some_data.csv').readlines.collect { |l| l.strip.split(",").map(&:to_f) }
|
40
|
+
pts = ROC.curve_points(my_data)
|
41
|
+
puts Gchart.scatter(:data => [pts.collect { |x| x[0] }, pts.collect { |x| x[1] }])
|
42
|
+
|
43
|
+
Conclusion
|
44
|
+
---------
|
45
|
+
RROC is designed to be a tool for bare-bones ROC analysis in Ruby; take it for what it is and use it at your own risk. Bug reports, forks, and patches are more than welcome!
|
46
|
+
|
47
|
+
License
|
48
|
+
----------
|
49
|
+
Copyright (c) 2011, Steven Bedrick
|
50
|
+
All rights reserved.
|
51
|
+
|
52
|
+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
53
|
+
|
54
|
+
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
|
55
|
+
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
|
56
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
57
|
+
|
data/lib/rroc.rb
ADDED
@@ -0,0 +1,67 @@
|
|
1
|
+
# This class provides methods for Reciever Operating Characteristic (ROC) curve calculation; algorithm copied from the ML-Mathematica[http://www.bioinf.jku.at/software/ML-Math/] mathematica library by Steven Bedrick (steve@bedrick.org). For an excellent overview of ROC analysis, check out:
|
2
|
+
#
|
3
|
+
# Fawcett, T. "An introduction to ROC analysis" Pattern Recognition Letters 27 (2006) 861-874 (pdf[https://cours.etsmtl.ca/sys828/REFS/A1/Fawcett_PRL2006.pdf])
|
4
|
+
#
|
5
|
+
# first col of mat is discrim. val; second col is label (+1 -> pos, -1 -> neg, higher disc. val -> more pos)
|
6
|
+
#
|
7
|
+
# e.g.: +[[.3, -1], [.7, 1], [.1, -1] . . . ]+
|
8
|
+
#
|
9
|
+
# The scale of first column is not important. The labels in the second column are.
|
10
|
+
#
|
11
|
+
# pts plot fpr (x-axis) against tpr (y-axis)
|
12
|
+
|
13
|
+
class ROC
|
14
|
+
|
15
|
+
# Calculates the "area under the ROC curve" for the output of a binary classifier.
|
16
|
+
#
|
17
|
+
# @param [Array] dat Classifier output for which the AUC should be calculated, in the form of a n x 2 matrix. Each row represents a "case" (document, example, etc.); the first column is the discriminant value, and the second column _must_ be either -1 (if the ground truth class of the case is "negative") or 1 (if the ground truth is "positive").
|
18
|
+
# @return [Fixnum] The "area under the ROC" curve.
|
19
|
+
def self.auc(dat)
|
20
|
+
return self.calc(dat, false)
|
21
|
+
end
|
22
|
+
|
23
|
+
# Returns a set of x/y coordinates describing an ROC curve for +dat+ plotting the FPR on the abscissa and the TPR on the ordinate.
|
24
|
+
# @param dat (see ROC.auc)
|
25
|
+
# @return [Array] x/y coordinates that, when plotted, illustrate an ROC curve for +dat+. Each element in the array is an array containing an x and y coordinate.
|
26
|
+
def self.curve_points(dat)
|
27
|
+
return self.calc(dat, true)[:points]
|
28
|
+
end
|
29
|
+
|
30
|
+
private
|
31
|
+
def self.calc(mat, inc_pts = false)
|
32
|
+
# sort by first col, ascending, and take labels
|
33
|
+
sorted_by_disc = mat.sort { |a,b| a[0] <=> b[0] }
|
34
|
+
sorted_labels = sorted_by_disc.collect { |d| d[1] }
|
35
|
+
|
36
|
+
# now let's count the number of positive and negatives:
|
37
|
+
pos = sorted_labels.count { |l| l == 1 }.to_f
|
38
|
+
neg = sorted_labels.count { |l| l == -1 }.to_f
|
39
|
+
|
40
|
+
auc = 0.0
|
41
|
+
|
42
|
+
plotlist = [[0,0]]
|
43
|
+
|
44
|
+
if pos > 0 # are there *any* true positives? If not we can't really calculate much...
|
45
|
+
c = 0.0 # how many positives we've seen thus far
|
46
|
+
n = 0.0 # how many negatives we've seen thus far
|
47
|
+
(sorted_labels.length - 1).downto(0) do |i| # walk backwards through the data...
|
48
|
+
if sorted_labels[i] > 0 # pos?
|
49
|
+
c += 1.0
|
50
|
+
else
|
51
|
+
n += 1.0
|
52
|
+
auc += (c / (pos * neg)) # update auc
|
53
|
+
end
|
54
|
+
plotlist << [n / neg, c / pos]
|
55
|
+
end
|
56
|
+
# plotlist << [1,0] # the original MMA has this, but I don't think we really need it.
|
57
|
+
end
|
58
|
+
|
59
|
+
if inc_pts # does the caller want x/y points describing the curve?
|
60
|
+
return {:auc => auc, :points => plotlist}
|
61
|
+
else # if not, just return the auc
|
62
|
+
return auc
|
63
|
+
end
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
|
metadata
ADDED
@@ -0,0 +1,57 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: rroc
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
prerelease:
|
5
|
+
version: "0.1"
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Steven Bedrick
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
|
13
|
+
date: 2011-06-30 00:00:00 -07:00
|
14
|
+
default_executable:
|
15
|
+
dependencies: []
|
16
|
+
|
17
|
+
description:
|
18
|
+
email: steve@bedrick.org
|
19
|
+
executables: []
|
20
|
+
|
21
|
+
extensions: []
|
22
|
+
|
23
|
+
extra_rdoc_files: []
|
24
|
+
|
25
|
+
files:
|
26
|
+
- README.markdown
|
27
|
+
- lib/rroc.rb
|
28
|
+
has_rdoc: true
|
29
|
+
homepage: https://github.com/stevenbedrick/RROC
|
30
|
+
licenses: []
|
31
|
+
|
32
|
+
post_install_message:
|
33
|
+
rdoc_options: []
|
34
|
+
|
35
|
+
require_paths:
|
36
|
+
- lib
|
37
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
38
|
+
none: false
|
39
|
+
requirements:
|
40
|
+
- - ">="
|
41
|
+
- !ruby/object:Gem::Version
|
42
|
+
version: "0"
|
43
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
44
|
+
none: false
|
45
|
+
requirements:
|
46
|
+
- - ">="
|
47
|
+
- !ruby/object:Gem::Version
|
48
|
+
version: "0"
|
49
|
+
requirements: []
|
50
|
+
|
51
|
+
rubyforge_project:
|
52
|
+
rubygems_version: 1.6.2
|
53
|
+
signing_key:
|
54
|
+
specification_version: 3
|
55
|
+
summary: Dead-simple ROC analysis in Ruby.
|
56
|
+
test_files: []
|
57
|
+
|