evalir 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source "http://rubygems.org"
2
+
3
+ # Specify your gem's dependencies in lorem.gemspec
4
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,16 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ evalir (0.0.1)
5
+
6
+ GEM
7
+ remote: http://rubygems.org/
8
+ specs:
9
+ rake (0.9.2)
10
+
11
+ PLATFORMS
12
+ ruby
13
+
14
+ DEPENDENCIES
15
+ evalir!
16
+ rake
data/README.md ADDED
@@ -0,0 +1,57 @@
1
+ What is Evalir?
2
+ ===============
3
+ Evalir is a library for evaluation of IR systems. It incorporates a number of standard measurements, from the basic precision and recall, to single value summaries such as NDCG and MAP.
4
+
5
+ For a good reference on the theory behind this, please check out Manning, Raghavan & Schützes excellent [Introduction to Information Retrieval, ch.8](http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-in-information-retrieval-1.html).
6
+
7
+ What can Evalir do?
8
+ -------------------
9
+ * [Precision](http://en.wikipedia.org/wiki/Information_retrieval#Precision)
10
+ * [Recall](http://en.wikipedia.org/wiki/Information_retrieval#Recall)
11
+ * Precision at Recall (e.g. Precision at 20%)
12
+ * Precision at rank k
13
+ * Average Precision
14
+ * Precision-Recall curve
15
+ * [Mean Average Precision (MAP)](http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision)
16
+ * [F-measure](http://en.wikipedia.org/wiki/Information_retrieval#F-measure)
17
+ * [R-Precision](http://en.wikipedia.org/wiki/Information_retrieval#R-Precision)
18
+ * [Discounted Cumulative Gain (DCG)](http://en.wikipedia.org/wiki/Discounted_cumulative_gain)
19
+ * [Normalized DCG](http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG)
20
+
21
+ How does Evalir work?
22
+ ---------------------
23
+ The goal of an Information Retrieval system is to provide the user with relevant information -- relevant w.r.t. the user's *information need*. For example, an information need might be:
24
+
25
+ > Information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.
26
+
27
+ However, this is *not* the query. A user will try to encode her need like a query, for instance:
28
+
29
+ > red white wine reducing "heart attack"
30
+
31
+ To evaluate an IR system with Evalir, we will need human-annotated test data, each data point consisting of the following:
32
+
33
+ * An explicit information need
34
+ * A query
35
+ * A list of documents that are relevant w.r.t. the information need (*not* the query)
36
+
37
+ For example, we have the aforementioned information need and query, and a list of documents that have been found to be relevant; { 123, 654, 29, 1029 }. If we had the actual query results in an array named *results*, we could use an Evalirator like this:
38
+
39
+ relevant = [123, 654, 29, 1029]
40
+ e = Evalir::Evalirator.new(relevant, results)
41
+ puts "Precision: #{e.precision}"
42
+ puts "Recall: #{e.recall}"
43
+ puts "F-1: #{e.f1}"
44
+ puts "F-3: #{e.f_measure(3)}"
45
+ puts "Precision at rank 10: #{e.precision_at_rank(10)}"
46
+ puts "Average Precision: #{e.average_precision}"
47
+
48
+ When you have several information needs and want to compute aggregate statistics, use an EvaliratorCollection like this:
49
+
50
+ e = Evalir::EvaliratorCollection.new
51
+ queries.each do |query|
52
+ relevant = get_relevant_docids(query)
53
+ results = get_results(query)
54
+ e << Evalir.Evalirator.new(relevant, results)
55
+ end
56
+ puts "MAP: #{e.mean_average_precision}"
57
+ puts "Precision-Recall Curve: #{e.precision_recall_curve}"
data/Rakefile ADDED
@@ -0,0 +1,20 @@
1
+ require "bundler/gem_tasks"
2
+ require 'rubygems'
3
+ #require 'bundler'
4
+
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+
13
+ require 'rake/testtask'
14
+ Rake::TestTask.new(:test) do |test|
15
+ test.libs << 'lib' << 'test'
16
+ test.pattern = 'test/**/test_*.rb'
17
+ test.verbose = true
18
+ end
19
+
20
+ task :default => :test
data/evalir.gemspec ADDED
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+ $:.push File.expand_path("../lib", __FILE__)
3
+ require "evalir/version"
4
+
5
+ Gem::Specification.new do |s|
6
+ s.name = "evalir"
7
+ s.version = Evalir::VERSION
8
+ s.platform = Gem::Platform::RUBY
9
+ s.authors = ["Alexander Mossin"]
10
+ s.email = ["alexander@companybook.no", "alexander.mossin@gmail.com"]
11
+ s.homepage = "http://github.com/companybook/Evalir"
12
+ s.summary = %q{A library for evaluation of IR systems.}
13
+ s.description = %q{Evalir is used to measure search relevance at Companybook, and offers a number of standard measurements, from the basic precision and recall to single value summaries such as NDCG and MAP.}
14
+
15
+ s.files = `git ls-files`.split("\n")
16
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
17
+ s.executables = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
18
+ s.require_paths = ["lib"]
19
+
20
+ s.add_development_dependency "rake"
21
+ end
@@ -0,0 +1,171 @@
1
+ require 'set'
2
+
3
+ module Evalir
4
+ class Evalirator
5
+ # Gets the number of retrieved results
6
+ # that were indeed relevant.
7
+ attr_reader :true_positives
8
+
9
+ # Gets the number of retrieved results
10
+ # that were in fact irrelevant.
11
+ attr_reader :false_positives
12
+
13
+ # Instantiates a new instance of the
14
+ # Evalirator, using the provided judgements
15
+ # as a basis for later calculations.
16
+ def initialize(relevant_docids, retrieved_docids = [])
17
+ @relevant_docids = relevant_docids.to_set
18
+ @true_positives = @false_positives = 0
19
+ @search_hits = []
20
+
21
+ retrieved_docids.each do |docid|
22
+ if @relevant_docids.include? docid
23
+ @true_positives = @true_positives + 1
24
+ else
25
+ @false_positives = @false_positives + 1
26
+ end
27
+ @search_hits << docid
28
+ end
29
+ end
30
+
31
+ # Gets the size of the evaluated set,
32
+ # e.g. the number of search hits added.
33
+ def size
34
+ @search_hits.size.to_f
35
+ end
36
+
37
+ # Calculate the number of false negatives.
38
+ # Divide by #size to get the rate, e.g:
39
+ # fn_rate = e.false_negatives / e.size
40
+ def false_negatives
41
+ @relevant_docids.size - @true_positives
42
+ end
43
+
44
+ # Calculate the precision, e.g. the
45
+ # fraction of retrieved documents that
46
+ # were relevant.
47
+ def precision
48
+ @true_positives / size
49
+ end
50
+
51
+ # Calculate the recall, e.g. the
52
+ # fraction of relevant documents that
53
+ # were retrieved.
54
+ def recall
55
+ fn = false_negatives
56
+ @true_positives / (@true_positives + fn + 0.0)
57
+ end
58
+
59
+ # Calculate the evenly weighted
60
+ # harmonic mean of #precision and
61
+ # #recall. This is equivalent to
62
+ # calling #f_measure with a parameter
63
+ # of 1.0.
64
+ def f1
65
+ f_measure(1.0)
66
+ end
67
+
68
+ # Calculate the weighted harmonic
69
+ # mean of precision and recall -
70
+ # β > 1 means emphasizing recall,
71
+ # β < 1 means emphasizing precision.
72
+ # β = 1 is equivalent to #f1.
73
+ def f_measure(beta)
74
+ betaSquared = beta ** 2
75
+ n = (betaSquared + 1) * (precision * recall)
76
+ d = (betaSquared * precision) + recall
77
+ n / d
78
+ end
79
+
80
+ # Returns the top p percent hits
81
+ # that were added to this evalirator.
82
+ def top_percent(p)
83
+ k = size * (p / 100.0)
84
+ @search_hits[0,k.ceil]
85
+ end
86
+
87
+ # The precision at the rank k,
88
+ # meaning the precision after exactly
89
+ # k documents have been retrieved.
90
+ def precision_at_rank(k)
91
+ top_k = @search_hits[0, k].to_set
92
+ (@relevant_docids & top_k).size.to_f / k
93
+ end
94
+
95
+ # Returns the precision at r percent
96
+ # recall. Used to plot the Precision
97
+ # vs. Recall curve.
98
+ def precision_at_recall(r)
99
+ return 1.0 if r == 0.0
100
+ k = (size * r).ceil
101
+ top_k = @search_hits[0, k].to_set
102
+ (@relevant_docids & top_k).size.to_f / k
103
+ end
104
+
105
+ # A single value summary which is
106
+ # obtained by computing the precision
107
+ # at the R-th position in the ranking.
108
+ # Here, R is the total number of
109
+ # relevant documents for the current
110
+ # query.
111
+ def r_precision
112
+ r = @relevant_docids.size
113
+ top_r = @search_hits[0, r].to_set
114
+ (@relevant_docids & top_r).size.to_f / r
115
+ end
116
+
117
+ # Gets the data for the precision-recall
118
+ # curve, ranging over the interval [<em>from</em>,
119
+ # <em>to</em>], with a step size of <em>step</em>.
120
+ def precision_recall_curve(from = 0, to = 100, step = 10)
121
+ raise "From must be in the interval [0, 100)" unless (from >= 0 and from < 100)
122
+ raise "To must be in the interval (from, 100]" unless (to > from and to <= 100)
123
+ raise "Invalid step size - (to-from) must be divisible by step." unless ((to - from) % step) == 0
124
+
125
+ data = []
126
+ range = from..to
127
+ range.step(step).each do |recall|
128
+ data << self.precision_at_recall(recall/100.0)
129
+ end
130
+ data
131
+ end
132
+
133
+ # The average precision. This is
134
+ # equivalent to the average of calling
135
+ # #precision_at_rank with 1..n, n
136
+ # being the number of results.
137
+ def average_precision
138
+ n = 0
139
+ avg = 0.0
140
+ relevant = 0
141
+
142
+ @search_hits.each do |h|
143
+ n = n + 1
144
+ if @relevant_docids.include? h
145
+ relevant = relevant + 1
146
+ avg += (relevant.to_f / n) / @relevant_docids.size
147
+ end
148
+ end
149
+ avg
150
+ end
151
+
152
+ # Discounted Cumulative Gain at
153
+ # rank k. For a relevant search
154
+ # result at position x, its con-
155
+ # tribution to the DCG is
156
+ # 1.0/Math.log(x, logbase). A
157
+ # higher logbase means more dis-
158
+ # counts for results further out.
159
+ def dcg_at(k, logbase=2)
160
+ i = 1
161
+ dcg = 0.0
162
+ @search_hits[0, k].each do |h|
163
+ if @relevant_docids.include? h
164
+ dcg += i == 1 ? 1.0 : 1.0 / Math.log(i, logbase)
165
+ end
166
+ i += 1
167
+ end
168
+ dcg
169
+ end
170
+ end
171
+ end
@@ -0,0 +1,71 @@
1
+ module Evalir
2
+ class EvaliratorCollection
3
+ include Enumerable
4
+
5
+ def initialize
6
+ @evalirators = []
7
+ end
8
+
9
+ def size
10
+ @evalirators.size
11
+ end
12
+
13
+ # Calls block once for each element in self,
14
+ # passing that element as a parameter.
15
+ def each(&block)
16
+ @evalirators.each(&block)
17
+ end
18
+
19
+ # Adds an evalirator to the set over which
20
+ # calculations are done.
21
+ def <<(evalirator)
22
+ @evalirators << evalirator
23
+ end
24
+
25
+ # Maps over all elements, executing
26
+ # <em>blk</em> on every evalirator.
27
+ def lazy_map(&blk)
28
+ Enumerator.new do |yielder|
29
+ self.each do |e|
30
+ yielder << blk[e]
31
+ end
32
+ end
33
+ end
34
+
35
+ # Adds a list of relevant documents, and
36
+ # a list of retrived documents. This rep-
37
+ # resents one information need.
38
+ def add(relevant_docids, retrieved_docids)
39
+ @evalirators << Evalirator.new(relevant_docids, retrieved_docids)
40
+ end
41
+
42
+ # Mean Average Precision - this is just
43
+ # a fancy way of saying 'average average
44
+ # precision'!
45
+ def mean_average_precision
46
+ avg = 0.0
47
+ @evalirators.each do |e|
48
+ avg += (e.average_precision / @evalirators.size)
49
+ end
50
+ avg
51
+ end
52
+
53
+ # Gets the data for the precision-recall
54
+ # curve, ranging over the interval [<em>from</em>,
55
+ # <em>to</em>], with a step size of <em>step</em>.
56
+ # This is the average over all evalirators.
57
+ def precision_recall_curve(from = 0, to = 100, step = 10)
58
+ return nil if @evalirators.empty?
59
+
60
+ #n = self.size.to_f
61
+ x = 1
62
+ curves = self.lazy_map { |e| e.precision_recall_curve(from, to, step) }
63
+ return curves.reduce do |acc, data|
64
+ x += 1
65
+ data.each_with_index.map do |d,i|
66
+ acc[i] = (acc[i] + d) / x
67
+ end
68
+ end
69
+ end
70
+ end
71
+ end
@@ -0,0 +1,3 @@
1
+ module Evalir
2
+ VERSION = "0.0.1"
3
+ end
data/lib/evalir.rb ADDED
@@ -0,0 +1,2 @@
1
+ require 'evalir/evalirator'
2
+ require 'evalir/evalirator_collection'
@@ -0,0 +1,24 @@
1
+ require 'test/unit'
2
+ require 'evalir'
3
+
4
+ class EvaliratorCollectionTest < Test::Unit::TestCase
5
+ def setup
6
+ @e = Evalir::EvaliratorCollection.new()
7
+ @e.add([1,3,6,9,10], [1,2,3,4,5,6,7,8,9,10])
8
+ @e.add([2,5,7], [1,2,3,4,5,6,7,8,9,10])
9
+ end
10
+
11
+ def test_map
12
+ assert_equal(0.53, @e.mean_average_precision.round(2))
13
+ end
14
+
15
+ def test_simple_enumeration
16
+ assert_equal(2, @e.count)
17
+ end
18
+
19
+ def test_precision_recall_curve
20
+ expected = [1.0, 0.5, 0.5, 0.5, 0.375, 0.4, 0.417, 0.429, 0.375, 0.389, 0.4]
21
+ actual = @e.precision_recall_curve.collect { |f| f.round(3) }
22
+ assert_equal(expected, actual)
23
+ end
24
+ end
@@ -0,0 +1,52 @@
1
+ require 'test/unit'
2
+ require 'evalir'
3
+
4
+ class EvaliratorRankedTest < Test::Unit::TestCase
5
+ def setup
6
+ relevant = [3, 5, 9, 25, 39, 44, 56, 71, 89, 123]
7
+ retrieved = [123,84,56,6,8,9,511,129,187,25,38,48,250,113,3]
8
+ @e = Evalir::Evalirator.new(relevant, retrieved)
9
+ end
10
+
11
+ def test_top_10_percent
12
+ assert_equal([123, 84], @e.top_percent(10))
13
+ end
14
+
15
+ def test_precision_at_rank_6
16
+ assert_equal(0.5, @e.precision_at_rank(6))
17
+ end
18
+
19
+ def test_precision_at_recall_0_1
20
+ assert_equal(0.5, @e.precision_at_recall(0.1))
21
+ end
22
+
23
+ def test_precision_at_recall_0
24
+ assert_equal(1.0, @e.precision_at_recall(0.0))
25
+ end
26
+
27
+ def test_precision_recall_curve
28
+ relevant = [1,3,5,7,9]
29
+ retrieved = [1,2,3,4,5,6,7,8,9,10]
30
+ expected = [1.0,1/1.0,1/2.0,2/3.0,2/4.0,3/5.0,3/6.0,4/7.0,4/8.0,5/9.0,5/10.0]
31
+ evalirator = Evalir::Evalirator.new(relevant, retrieved)
32
+ assert_equal(expected, evalirator.precision_recall_curve)
33
+ end
34
+
35
+ def test_r_precision
36
+ assert_equal(0.4, @e.r_precision)
37
+ end
38
+
39
+ def test_average_precision
40
+ e1 = Evalir::Evalirator.new([1,3,4,5,6,10], [1,2,3,4,5,6,7,8,9,10])
41
+ assert_equal(0.78, e1.average_precision.round(2))
42
+
43
+ e2 = Evalir::Evalirator.new([2,5,6,7,9,10], [1,2,3,4,5,6,7,8,9,10])
44
+ assert_equal(0.52, e2.average_precision.round(2))
45
+ end
46
+
47
+ def test_dcg_at_5
48
+ expected = 1.0 + (1.0/Math.log(3,2))
49
+ assert_equal(expected, @e.dcg_at(5))
50
+ end
51
+ end
52
+
@@ -0,0 +1,57 @@
1
+ require 'test/unit'
2
+ require_relative '../lib/evalir'
3
+
4
+ class EvaliratorUnrankedTest < Test::Unit::TestCase
5
+ def setup
6
+ @e = Evalir::Evalirator.new([1], [1,4,8])
7
+ end
8
+
9
+ def test_precision_on_empty
10
+ assert(Evalir::Evalirator.new([1]).precision.nan?)
11
+ end
12
+
13
+ def test_recall_on_empty
14
+ assert_equal(0, Evalir::Evalirator.new([1]).recall)
15
+ end
16
+
17
+ def test_precision
18
+ assert_equal(1.0/3, @e.precision)
19
+ end
20
+
21
+ def test_recall
22
+ assert_equal(1.0, @e.recall)
23
+ end
24
+
25
+ def test_size
26
+ assert_equal(3.0, @e.size)
27
+ end
28
+
29
+ def test_false_negatives
30
+ assert_equal(0, @e.false_negatives)
31
+ end
32
+
33
+ def test_true_positives
34
+ assert_equal(1, @e.true_positives)
35
+ end
36
+
37
+ def test_false_positives
38
+ assert_equal(2, @e.false_positives)
39
+ end
40
+
41
+ def test_f1
42
+ assert_equal(0.5, @e.f1)
43
+ end
44
+
45
+ def test_f_measure_1
46
+ assert_equal(0.5, @e.f_measure(1.0))
47
+ end
48
+
49
+ def test_f05
50
+ assert_equal(0.38, @e.f_measure(0.5).round(2))
51
+ end
52
+
53
+ def test_f3
54
+ assert_equal(0.833, @e.f_measure(3.0).round(3))
55
+ end
56
+ end
57
+
metadata ADDED
@@ -0,0 +1,80 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: evalir
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Alexander Mossin
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2011-09-30 00:00:00.000000000Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rake
16
+ requirement: &70244820710980 !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: *70244820710980
25
+ description: Evalir is used to measure search relevance at Companybook, and offers
26
+ a number of standard measurements, from the basic precision and recall to single
27
+ value summaries such as NDCG and MAP.
28
+ email:
29
+ - alexander@companybook.no
30
+ - alexander.mossin@gmail.com
31
+ executables: []
32
+ extensions: []
33
+ extra_rdoc_files: []
34
+ files:
35
+ - Gemfile
36
+ - Gemfile.lock
37
+ - README.md
38
+ - Rakefile
39
+ - evalir.gemspec
40
+ - lib/evalir.rb
41
+ - lib/evalir/evalirator.rb
42
+ - lib/evalir/evalirator_collection.rb
43
+ - lib/evalir/version.rb
44
+ - test/test_evalirator_collection.rb
45
+ - test/test_evalirator_ranked.rb
46
+ - test/test_evalirator_unranked.rb
47
+ homepage: http://github.com/companybook/Evalir
48
+ licenses: []
49
+ post_install_message:
50
+ rdoc_options: []
51
+ require_paths:
52
+ - lib
53
+ required_ruby_version: !ruby/object:Gem::Requirement
54
+ none: false
55
+ requirements:
56
+ - - ! '>='
57
+ - !ruby/object:Gem::Version
58
+ version: '0'
59
+ segments:
60
+ - 0
61
+ hash: 1697956995838933814
62
+ required_rubygems_version: !ruby/object:Gem::Requirement
63
+ none: false
64
+ requirements:
65
+ - - ! '>='
66
+ - !ruby/object:Gem::Version
67
+ version: '0'
68
+ segments:
69
+ - 0
70
+ hash: 1697956995838933814
71
+ requirements: []
72
+ rubyforge_project:
73
+ rubygems_version: 1.8.10
74
+ signing_key:
75
+ specification_version: 3
76
+ summary: A library for evaluation of IR systems.
77
+ test_files:
78
+ - test/test_evalirator_collection.rb
79
+ - test/test_evalirator_ranked.rb
80
+ - test/test_evalirator_unranked.rb