evalir 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in lorem.gemspec
4
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,16 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ evalir (0.0.1)
5
+
6
+ GEM
7
+ remote: http://rubygems.org/
8
+ specs:
9
+ rake (0.9.2)
10
+
11
+ PLATFORMS
12
+ ruby
13
+
14
+ DEPENDENCIES
15
+ evalir!
16
+ rake
data/README.md ADDED
@@ -0,0 +1,57 @@
1
+ What is Evalir?
2
+ ===============
3
+ Evalir is a library for evaluation of IR systems. It incorporates a number of standard measurements, from the basic precision and recall, to single value summaries such as NDCG and MAP.
4
+
5
+ For a good reference on the theory behind this, please check out Manning, Raghavan & Schützes excellent [Introduction to Information Retrieval, ch.8](http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-in-information-retrieval-1.html).
6
+
7
+ What can Evalir do?
8
+ -------------------
9
+ * [Precision](http://en.wikipedia.org/wiki/Information_retrieval#Precision)
10
+ * [Recall](http://en.wikipedia.org/wiki/Information_retrieval#Recall)
11
+ * Precision at Recall (e.g. Precision at 20%)
12
+ * Precision at rank k
13
+ * Average Precision
14
+ * Precision-Recall curve
15
+ * [Mean Average Precision (MAP)](http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision)
16
+ * [F-measure](http://en.wikipedia.org/wiki/Information_retrieval#F-measure)
17
+ * [R-Precision](http://en.wikipedia.org/wiki/Information_retrieval#R-Precision)
18
+ * [Discounted Cumulative Gain (DCG)](http://en.wikipedia.org/wiki/Discounted_cumulative_gain)
19
+ * [Normalized DCG](http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG)
20
+
21
+ How does Evalir work?
22
+ ---------------------
23
+ The goal of an Information Retrieval system is to provide the user with relevant information -- relevant w.r.t. the user's *information need*. For example, an information need might be:
24
+
25
+ > Information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.
26
+
27
+ However, this is *not* the query. A user will try to encode her need like a query, for instance:
28
+
29
+ > red white wine reducing "heart attack"
30
+
31
+ To evaluate an IR system with Evalir, we will need human-annotated test data, each data point consisting of the following:
32
+
33
+ * An explicit information need
34
+ * A query
35
+ * A list of documents that are relevant w.r.t. the information need (*not* the query)
36
+
37
+ For example, we have the aforementioned information need and query, and a list of documents that have been found to be relevant; { 123, 654, 29, 1029 }. If we had the actual query results in an array named *results*, we could use an Evalirator like this:
38
+
39
+ relevant = [123, 654, 29, 1029]
40
+ e = Evalir::Evalirator.new(relevant, results)
41
+ puts "Precision: #{e.precision}"
42
+ puts "Recall: #{e.recall}"
43
+ puts "F-1: #{e.f1}"
44
+ puts "F-3: #{e.f_measure(3)}"
45
+ puts "Precision at rank 10: #{e.precision_at_rank(10)}"
46
+ puts "Average Precision: #{e.average_precision}"
47
+
48
+ When you have several information needs and want to compute aggregate statistics, use an EvaliratorCollection like this:
49
+
50
+ e = Evalir::EvaliratorCollection.new
51
+ queries.each do |query|
52
+ relevant = get_relevant_docids(query)
53
+ results = get_results(query)
54
+ e << Evalir.Evalirator.new(relevant, results)
55
+ end
56
+ puts "MAP: #{e.mean_average_precision}"
57
+ puts "Precision-Recall Curve: #{e.precision_recall_curve}"
data/Rakefile ADDED
@@ -0,0 +1,20 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rubygems'
3
+ #require 'bundler'
4
+
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+
13
+ require 'rake/testtask'
14
+ Rake::TestTask.new(:test) do |test|
15
+ test.libs << 'lib' << 'test'
16
+ test.pattern = 'test/**/test_*.rb'
17
+ test.verbose = true
18
+ end
19
+
20
+ task :default => :test
data/evalir.gemspec ADDED
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "evalir/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "evalir"
7
+ s.version = Evalir::VERSION
8
+ s.platform = Gem::Platform::RUBY
9
+ s.authors = ["Alexander Mossin"]
10
+ s.email = ["alexander@companybook.no", "alexander.mossin@gmail.com"]
11
+ s.homepage = "http://github.com/companybook/Evalir"
12
+ s.summary = %q{A library for evaluation of IR systems.}
13
+ s.description = %q{Evalir is used to measure search relevance at Companybook, and offers a number of standard measurements, from the basic precision and recall to single value summaries such as NDCG and MAP.}
14
+
15
+ s.files = `git ls-files`.split("\n")
16
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
17
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
18
+ s.require_paths = ["lib"]
19
+
20
+ s.add_development_dependency "rake"
21
+ end
@@ -0,0 +1,171 @@
1
+ require 'set'
2
+
3
+ module Evalir
4
+ class Evalirator
5
+ # Gets the number of retrieved results
6
+ # that were indeed relevant.
7
+ attr_reader :true_positives
8
+
9
+ # Gets the number of retrieved results
10
+ # that were in fact irrelevant.
11
+ attr_reader :false_positives
12
+
13
+ # Instantiates a new instance of the
14
+ # Evalirator, using the provided judgements
15
+ # as a basis for later calculations.
16
+ def initialize(relevant_docids, retrieved_docids = [])
17
+ @relevant_docids = relevant_docids.to_set
18
+ @true_positives = @false_positives = 0
19
+ @search_hits = []
20
+
21
+ retrieved_docids.each do |docid|
22
+ if @relevant_docids.include? docid
23
+ @true_positives = @true_positives + 1
24
+ else
25
+ @false_positives = @false_positives + 1
26
+ end
27
+ @search_hits << docid
28
+ end
29
+ end
30
+
31
+ # Gets the size of the evaluated set,
32
+ # e.g. the number of search hits added.
33
+ def size
34
+ @search_hits.size.to_f
35
+ end
36
+
37
+ # Calculate the number of false negatives.
38
+ # Divide by #size to get the rate, e.g:
39
+ # fn_rate = e.false_negatives / e.size
40
+ def false_negatives
41
+ @relevant_docids.size - @true_positives
42
+ end
43
+
44
+ # Calculate the precision, e.g. the
45
+ # fraction of retrieved documents that
46
+ # were relevant.
47
+ def precision
48
+ @true_positives / size
49
+ end
50
+
51
+ # Calculate the recall, e.g. the
52
+ # fraction of relevant documents that
53
+ # were retrieved.
54
+ def recall
55
+ fn = false_negatives
56
+ @true_positives / (@true_positives + fn + 0.0)
57
+ end
58
+
59
+ # Calculate the evenly weighted
60
+ # harmonic mean of #precision and
61
+ # #recall. This is equivalent to
62
+ # calling #f_measure with a parameter
63
+ # of 1.0.
64
+ def f1
65
+ f_measure(1.0)
66
+ end
67
+
68
+ # Calculate the weighted harmonic
69
+ # mean of precision and recall -
70
+ # β > 1 means emphasizing recall,
71
+ # β < 1 means emphasizing precision.
72
+ # β = 1 is equivalent to #f1.
73
+ def f_measure(beta)
74
+ betaSquared = beta ** 2
75
+ n = (betaSquared + 1) * (precision * recall)
76
+ d = (betaSquared * precision) + recall
77
+ n / d
78
+ end
79
+
80
+ # Returns the top p percent hits
81
+ # that were added to this evalirator.
82
+ def top_percent(p)
83
+ k = size * (p / 100.0)
84
+ @search_hits[0,k.ceil]
85
+ end
86
+
87
+ # The precision at the rank k,
88
+ # meaning the precision after exactly
89
+ # k documents have been retrieved.
90
+ def precision_at_rank(k)
91
+ top_k = @search_hits[0, k].to_set
92
+ (@relevant_docids & top_k).size.to_f / k
93
+ end
94
+
95
+ # Returns the precision at r percent
96
+ # recall. Used to plot the Precision
97
+ # vs. Recall curve.
98
+ def precision_at_recall(r)
99
+ return 1.0 if r == 0.0
100
+ k = (size * r).ceil
101
+ top_k = @search_hits[0, k].to_set
102
+ (@relevant_docids & top_k).size.to_f / k
103
+ end
104
+
105
+ # A single value summary which is
106
+ # obtained by computing the precision
107
+ # at the R-th position in the ranking.
108
+ # Here, R is the total number of
109
+ # relevant documents for the current
110
+ # query.
111
+ def r_precision
112
+ r = @relevant_docids.size
113
+ top_r = @search_hits[0, r].to_set
114
+ (@relevant_docids & top_r).size.to_f / r
115
+ end
116
+
117
+ # Gets the data for the precision-recall
118
+ # curve, ranging over the interval [<em>from</em>,
119
+ # <em>to</em>], with a step size of <em>step</em>.
120
+ def precision_recall_curve(from = 0, to = 100, step = 10)
121
+ raise "From must be in the interval [0, 100)" unless (from >= 0 and from < 100)
122
+ raise "To must be in the interval (from, 100]" unless (to > from and to <= 100)
123
+ raise "Invalid step size - (to-from) must be divisible by step." unless ((to - from) % step) == 0
124
+
125
+ data = []
126
+ range = from..to
127
+ range.step(step).each do |recall|
128
+ data << self.precision_at_recall(recall/100.0)
129
+ end
130
+ data
131
+ end
132
+
133
+ # The average precision. This is
134
+ # equivalent to the average of calling
135
+ # #precision_at_rank with 1..n, n
136
+ # being the number of results.
137
+ def average_precision
138
+ n = 0
139
+ avg = 0.0
140
+ relevant = 0
141
+
142
+ @search_hits.each do |h|
143
+ n = n + 1
144
+ if @relevant_docids.include? h
145
+ relevant = relevant + 1
146
+ avg += (relevant.to_f / n) / @relevant_docids.size
147
+ end
148
+ end
149
+ avg
150
+ end
151
+
152
+ # Discounted Cumulative Gain at
153
+ # rank k. For a relevant search
154
+ # result at position x, its con-
155
+ # tribution to the DCG is
156
+ # 1.0/Math.log(x, logbase). A
157
+ # higher logbase means more dis-
158
+ # counts for results further out.
159
+ def dcg_at(k, logbase=2)
160
+ i = 1
161
+ dcg = 0.0
162
+ @search_hits[0, k].each do |h|
163
+ if @relevant_docids.include? h
164
+ dcg += i == 1 ? 1.0 : 1.0 / Math.log(i, logbase)
165
+ end
166
+ i += 1
167
+ end
168
+ dcg
169
+ end
170
+ end
171
+ end
@@ -0,0 +1,71 @@
1
+ module Evalir
2
+ class EvaliratorCollection
3
+ include Enumerable
4
+
5
+ def initialize
6
+ @evalirators = []
7
+ end
8
+
9
+ def size
10
+ @evalirators.size
11
+ end
12
+
13
+ # Calls block once for each element in self,
14
+ # passing that element as a parameter.
15
+ def each(&block)
16
+ @evalirators.each(&block)
17
+ end
18
+
19
+ # Adds an evalirator to the set over which
20
+ # calculations are done.
21
+ def <<(evalirator)
22
+ @evalirators << evalirator
23
+ end
24
+
25
+ # Maps over all elements, executing
26
+ # <em>blk</em> on every evalirator.
27
+ def lazy_map(&blk)
28
+ Enumerator.new do |yielder|
29
+ self.each do |e|
30
+ yielder << blk[e]
31
+ end
32
+ end
33
+ end
34
+
35
+ # Adds a list of relevant documents, and
36
+ # a list of retrived documents. This rep-
37
+ # resents one information need.
38
+ def add(relevant_docids, retrieved_docids)
39
+ @evalirators << Evalirator.new(relevant_docids, retrieved_docids)
40
+ end
41
+
42
+ # Mean Average Precision - this is just
43
+ # a fancy way of saying 'average average
44
+ # precision'!
45
+ def mean_average_precision
46
+ avg = 0.0
47
+ @evalirators.each do |e|
48
+ avg += (e.average_precision / @evalirators.size)
49
+ end
50
+ avg
51
+ end
52
+
53
+ # Gets the data for the precision-recall
54
+ # curve, ranging over the interval [<em>from</em>,
55
+ # <em>to</em>], with a step size of <em>step</em>.
56
+ # This is the average over all evalirators.
57
+ def precision_recall_curve(from = 0, to = 100, step = 10)
58
+ return nil if @evalirators.empty?
59
+
60
+ #n = self.size.to_f
61
+ x = 1
62
+ curves = self.lazy_map { |e| e.precision_recall_curve(from, to, step) }
63
+ return curves.reduce do |acc, data|
64
+ x += 1
65
+ data.each_with_index.map do |d,i|
66
+ acc[i] = (acc[i] + d) / x
67
+ end
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,3 @@
1
+ module Evalir
2
+ VERSION = "0.0.1"
3
+ end
data/lib/evalir.rb ADDED
@@ -0,0 +1,2 @@
1
+ require 'evalir/evalirator'
2
+ require 'evalir/evalirator_collection'
@@ -0,0 +1,24 @@
1
+ require 'test/unit'
2
+ require 'evalir'
3
+
4
+ class EvaliratorCollectionTest < Test::Unit::TestCase
5
+ def setup
6
+ @e = Evalir::EvaliratorCollection.new()
7
+ @e.add([1,3,6,9,10], [1,2,3,4,5,6,7,8,9,10])
8
+ @e.add([2,5,7], [1,2,3,4,5,6,7,8,9,10])
9
+ end
10
+
11
+ def test_map
12
+ assert_equal(0.53, @e.mean_average_precision.round(2))
13
+ end
14
+
15
+ def test_simple_enumeration
16
+ assert_equal(2, @e.count)
17
+ end
18
+
19
+ def test_precision_recall_curve
20
+ expected = [1.0, 0.5, 0.5, 0.5, 0.375, 0.4, 0.417, 0.429, 0.375, 0.389, 0.4]
21
+ actual = @e.precision_recall_curve.collect { |f| f.round(3) }
22
+ assert_equal(expected, actual)
23
+ end
24
+ end
@@ -0,0 +1,52 @@
1
+ require 'test/unit'
2
+ require 'evalir'
3
+
4
+ class EvaliratorRankedTest < Test::Unit::TestCase
5
+ def setup
6
+ relevant = [3, 5, 9, 25, 39, 44, 56, 71, 89, 123]
7
+ retrieved = [123,84,56,6,8,9,511,129,187,25,38,48,250,113,3]
8
+ @e = Evalir::Evalirator.new(relevant, retrieved)
9
+ end
10
+
11
+ def test_top_10_percent
12
+ assert_equal([123, 84], @e.top_percent(10))
13
+ end
14
+
15
+ def test_precision_at_rank_6
16
+ assert_equal(0.5, @e.precision_at_rank(6))
17
+ end
18
+
19
+ def test_precision_at_recall_0_1
20
+ assert_equal(0.5, @e.precision_at_recall(0.1))
21
+ end
22
+
23
+ def test_precision_at_recall_0
24
+ assert_equal(1.0, @e.precision_at_recall(0.0))
25
+ end
26
+
27
+ def test_precision_recall_curve
28
+ relevant = [1,3,5,7,9]
29
+ retrieved = [1,2,3,4,5,6,7,8,9,10]
30
+ expected = [1.0,1/1.0,1/2.0,2/3.0,2/4.0,3/5.0,3/6.0,4/7.0,4/8.0,5/9.0,5/10.0]
31
+ evalirator = Evalir::Evalirator.new(relevant, retrieved)
32
+ assert_equal(expected, evalirator.precision_recall_curve)
33
+ end
34
+
35
+ def test_r_precision
36
+ assert_equal(0.4, @e.r_precision)
37
+ end
38
+
39
+ def test_average_precision
40
+ e1 = Evalir::Evalirator.new([1,3,4,5,6,10], [1,2,3,4,5,6,7,8,9,10])
41
+ assert_equal(0.78, e1.average_precision.round(2))
42
+
43
+ e2 = Evalir::Evalirator.new([2,5,6,7,9,10], [1,2,3,4,5,6,7,8,9,10])
44
+ assert_equal(0.52, e2.average_precision.round(2))
45
+ end
46
+
47
+ def test_dcg_at_5
48
+ expected = 1.0 + (1.0/Math.log(3,2))
49
+ assert_equal(expected, @e.dcg_at(5))
50
+ end
51
+ end
52
+
@@ -0,0 +1,57 @@
1
+ require 'test/unit'
2
+ require_relative '../lib/evalir'
3
+
4
+ class EvaliratorUnrankedTest < Test::Unit::TestCase
5
+ def setup
6
+ @e = Evalir::Evalirator.new([1], [1,4,8])
7
+ end
8
+
9
+ def test_precision_on_empty
10
+ assert(Evalir::Evalirator.new([1]).precision.nan?)
11
+ end
12
+
13
+ def test_recall_on_empty
14
+ assert_equal(0, Evalir::Evalirator.new([1]).recall)
15
+ end
16
+
17
+ def test_precision
18
+ assert_equal(1.0/3, @e.precision)
19
+ end
20
+
21
+ def test_recall
22
+ assert_equal(1.0, @e.recall)
23
+ end
24
+
25
+ def test_size
26
+ assert_equal(3.0, @e.size)
27
+ end
28
+
29
+ def test_false_negatives
30
+ assert_equal(0, @e.false_negatives)
31
+ end
32
+
33
+ def test_true_positives
34
+ assert_equal(1, @e.true_positives)
35
+ end
36
+
37
+ def test_false_positives
38
+ assert_equal(2, @e.false_positives)
39
+ end
40
+
41
+ def test_f1
42
+ assert_equal(0.5, @e.f1)
43
+ end
44
+
45
+ def test_f_measure_1
46
+ assert_equal(0.5, @e.f_measure(1.0))
47
+ end
48
+
49
+ def test_f05
50
+ assert_equal(0.38, @e.f_measure(0.5).round(2))
51
+ end
52
+
53
+ def test_f3
54
+ assert_equal(0.833, @e.f_measure(3.0).round(3))
55
+ end
56
+ end
57
+
metadata ADDED
@@ -0,0 +1,80 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: evalir
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Alexander Mossin
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2011-09-30 00:00:00.000000000Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rake
16
+ requirement: &70244820710980 !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: *70244820710980
25
+ description: Evalir is used to measure search relevance at Companybook, and offers
26
+ a number of standard measurements, from the basic precision and recall to single
27
+ value summaries such as NDCG and MAP.
28
+ email:
29
+ - alexander@companybook.no
30
+ - alexander.mossin@gmail.com
31
+ executables: []
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - Gemfile
36
+ - Gemfile.lock
37
+ - README.md
38
+ - Rakefile
39
+ - evalir.gemspec
40
+ - lib/evalir.rb
41
+ - lib/evalir/evalirator.rb
42
+ - lib/evalir/evalirator_collection.rb
43
+ - lib/evalir/version.rb
44
+ - test/test_evalirator_collection.rb
45
+ - test/test_evalirator_ranked.rb
46
+ - test/test_evalirator_unranked.rb
47
+ homepage: http://github.com/companybook/Evalir
48
+ licenses: []
49
+ post_install_message:
50
+ rdoc_options: []
51
+ require_paths:
52
+ - lib
53
+ required_ruby_version: !ruby/object:Gem::Requirement
54
+ none: false
55
+ requirements:
56
+ - - ! '>='
57
+ - !ruby/object:Gem::Version
58
+ version: '0'
59
+ segments:
60
+ - 0
61
+ hash: 1697956995838933814
62
+ required_rubygems_version: !ruby/object:Gem::Requirement
63
+ none: false
64
+ requirements:
65
+ - - ! '>='
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ segments:
69
+ - 0
70
+ hash: 1697956995838933814
71
+ requirements: []
72
+ rubyforge_project:
73
+ rubygems_version: 1.8.10
74
+ signing_key:
75
+ specification_version: 3
76
+ summary: A library for evaluation of IR systems.
77
+ test_files:
78
+ - test/test_evalirator_collection.rb
79
+ - test/test_evalirator_ranked.rb
80
+ - test/test_evalirator_unranked.rb