RubyGems - evalir - Versions diffs - 0.0.1 - Mend

evalir 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

data/Gemfile +4 -0
data/Gemfile.lock +16 -0
data/README.md +57 -0
data/Rakefile +20 -0
data/evalir.gemspec +21 -0
data/lib/evalir/evalirator.rb +171 -0
data/lib/evalir/evalirator_collection.rb +71 -0
data/lib/evalir/version.rb +3 -0
data/lib/evalir.rb +2 -0
data/test/test_evalirator_collection.rb +24 -0
data/test/test_evalirator_ranked.rb +52 -0
data/test/test_evalirator_unranked.rb +57 -0
metadata +80 -0

data/Gemfile ADDED Viewed

@@ -0,0 +1,4 @@
+source "http://rubygems.org"
+# Specify your gem's dependencies in lorem.gemspec
+gemspec

data/Gemfile.lock ADDED Viewed

@@ -0,0 +1,16 @@
+PATH
+  remote: .
+  specs:
+    evalir (0.0.1)
+GEM
+  remote: http://rubygems.org/
+  specs:
+    rake (0.9.2)
+PLATFORMS
+  ruby
+DEPENDENCIES
+  evalir!
+  rake

data/README.md ADDED Viewed

@@ -0,0 +1,57 @@
+What is Evalir?
+===============
+Evalir is a library for evaluation of IR systems. It incorporates a number of standard measurements, from the basic precision and recall, to single value summaries such as NDCG and MAP.
+For a good reference on the theory behind this, please check out Manning, Raghavan & Schützes excellent [Introduction to Information Retrieval, ch.8](http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-in-information-retrieval-1.html).
+What can Evalir do?
+-------------------
+* [Precision](http://en.wikipedia.org/wiki/Information_retrieval#Precision)
+* [Recall](http://en.wikipedia.org/wiki/Information_retrieval#Recall)
+* Precision at Recall (e.g. Precision at 20%)
+* Precision at rank k
+* Average Precision
+* Precision-Recall curve
+* [Mean Average Precision (MAP)](http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision)
+* [F-measure](http://en.wikipedia.org/wiki/Information_retrieval#F-measure)
+* [R-Precision](http://en.wikipedia.org/wiki/Information_retrieval#R-Precision)
+* [Discounted Cumulative Gain (DCG)](http://en.wikipedia.org/wiki/Discounted_cumulative_gain)
+* [Normalized DCG](http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG)
+How does Evalir work?
+---------------------
+The goal of an Information Retrieval system is to provide the user with relevant information -- relevant w.r.t. the user's *information need*. For example, an information need might be:
+> Information on whether drinking red wine is more effective at reducing your risk of heart attacks than white wine.
+However, this is *not* the query. A user will try to encode her need like a query, for instance:
+> red white wine reducing "heart attack"
+To evaluate an IR system with Evalir, we will need human-annotated test data, each data point consisting of the following:
+* An explicit information need
+* A query
+* A list of documents that are relevant w.r.t. the information need (*not* the query)
+For example, we have the aforementioned information need and query, and a list of documents that have been found to be relevant; { 123, 654, 29, 1029 }. If we had the actual query results in an array named *results*, we could use an Evalirator like this:
+	relevant = [123, 654, 29, 1029]
+    e = Evalir::Evalirator.new(relevant, results)
+    puts "Precision: #{e.precision}"
+    puts "Recall: #{e.recall}"
+    puts "F-1: #{e.f1}"
+    puts "F-3: #{e.f_measure(3)}"
+	puts "Precision at rank 10: #{e.precision_at_rank(10)}"
+	puts "Average Precision: #{e.average_precision}"
+When you have several information needs and want to compute aggregate statistics, use an EvaliratorCollection like this:
+	e = Evalir::EvaliratorCollection.new
+	queries.each do |query|
+	  relevant = get_relevant_docids(query)
+	  results = get_results(query)
+	  e << Evalir.Evalirator.new(relevant, results)
+	end
+	puts "MAP: #{e.mean_average_precision}"
+	puts "Precision-Recall Curve: #{e.precision_recall_curve}"

data/Rakefile ADDED Viewed

@@ -0,0 +1,20 @@
+require "bundler/gem_tasks"
+require 'rubygems'
+#require 'bundler'
+begin
+  Bundler.setup(:default, :development)
+rescue Bundler::BundlerError => e
+  $stderr.puts e.message
+  $stderr.puts "Run `bundle install` to install missing gems"
+  exit e.status_code
+end
+require 'rake/testtask'
+Rake::TestTask.new(:test) do |test|
+  test.libs << 'lib' << 'test'
+  test.pattern = 'test/**/test_*.rb'
+  test.verbose = true
+end
+task :default => :test

data/evalir.gemspec ADDED Viewed

@@ -0,0 +1,21 @@
+# -*- encoding: utf-8 -*-
+$:.push File.expand_path("../lib", __FILE__)
+require "evalir/version"
+Gem::Specification.new do |s|
+  s.name        = "evalir"
+  s.version     = Evalir::VERSION
+  s.platform    = Gem::Platform::RUBY
+  s.authors     = ["Alexander Mossin"]
+  s.email       = ["alexander@companybook.no", "alexander.mossin@gmail.com"]
+  s.homepage    = "http://github.com/companybook/Evalir"
+  s.summary     = %q{A library for evaluation of IR systems.}
+  s.description = %q{Evalir is used to measure search relevance at Companybook, and offers a number of standard measurements, from the basic precision and recall to single value summaries such as NDCG and MAP.}
+  s.files         = `git ls-files`.split("\n")
+  s.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")
+  s.executables   = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
+  s.require_paths = ["lib"]
+  s.add_development_dependency "rake"
+end

data/lib/evalir/evalirator.rb ADDED Viewed

@@ -0,0 +1,171 @@
+require 'set'
+module Evalir
+  class Evalirator
+    # Gets the number of retrieved results
+    # that were indeed relevant.
+    attr_reader :true_positives
+    # Gets the number of retrieved results
+    # that were in fact irrelevant.
+    attr_reader :false_positives
+    # Instantiates a new instance of the
+    # Evalirator, using the provided judgements
+    # as a basis for later calculations.
+    def initialize(relevant_docids, retrieved_docids = [])
+      @relevant_docids = relevant_docids.to_set
+      @true_positives = @false_positives = 0
+      @search_hits = []
+      retrieved_docids.each do |docid|
+        if @relevant_docids.include? docid
+          @true_positives = @true_positives + 1
+        else
+          @false_positives = @false_positives + 1
+        end
+        @search_hits << docid
+      end
+    end
+    # Gets the size of the evaluated set,
+    # e.g. the number of search hits added.
+    def size
+      @search_hits.size.to_f
+    end
+    # Calculate the number of false negatives.
+    # Divide by #size to get the rate, e.g:
+    # fn_rate = e.false_negatives / e.size
+    def false_negatives
+      @relevant_docids.size - @true_positives
+    end
+    # Calculate the precision, e.g. the
+    # fraction of retrieved documents that
+    # were relevant.
+    def precision
+      @true_positives / size
+    end
+    # Calculate the recall, e.g. the
+    # fraction of relevant documents that
+    # were retrieved.
+    def recall
+      fn = false_negatives
+      @true_positives / (@true_positives + fn + 0.0)
+    end
+    # Calculate the evenly weighted
+    # harmonic mean of #precision and
+    # #recall. This is equivalent to
+    # calling #f_measure with a parameter
+    # of 1.0.
+    def f1
+      f_measure(1.0)
+    end
+    # Calculate the weighted harmonic
+    # mean of precision and recall -
+    # β > 1 means emphasizing recall,
+    # β < 1 means emphasizing precision.
+    # β = 1 is equivalent to #f1.
+    def f_measure(beta)
+      betaSquared = beta ** 2
+      n = (betaSquared + 1) * (precision * recall)
+      d = (betaSquared * precision) + recall
+      n / d
+    end
+    # Returns the top p percent hits
+    # that were added to this evalirator.
+    def top_percent(p)
+      k = size * (p / 100.0)
+      @search_hits[0,k.ceil]
+    end
+    # The precision at the rank k,
+    # meaning the precision after exactly
+    # k documents have been retrieved.
+    def precision_at_rank(k)
+      top_k = @search_hits[0, k].to_set
+      (@relevant_docids & top_k).size.to_f / k
+    end
+    # Returns the precision at r percent
+    # recall. Used to plot the Precision
+    # vs. Recall curve.
+    def precision_at_recall(r)
+      return 1.0 if r == 0.0
+      k = (size * r).ceil
+      top_k = @search_hits[0, k].to_set
+      (@relevant_docids & top_k).size.to_f / k
+    end
+    # A single value summary which is
+    # obtained by computing the precision
+    # at the R-th position in the ranking.
+    # Here, R is the total number of
+    # relevant documents for the current
+    # query.
+    def r_precision
+      r = @relevant_docids.size
+      top_r = @search_hits[0, r].to_set
+      (@relevant_docids & top_r).size.to_f / r
+    end
+    # Gets the data for the precision-recall
+    # curve, ranging over the interval [<em>from</em>,
+    # <em>to</em>], with a step size of <em>step</em>.
+    def precision_recall_curve(from = 0, to = 100, step = 10)
+      raise "From must be in the interval [0, 100)" unless (from >= 0 and from < 100)
+      raise "To must be in the interval (from, 100]" unless (to > from and to <= 100)
+      raise "Invalid step size - (to-from) must be divisible by step." unless ((to - from) % step) == 0
+      data = []
+      range = from..to
+      range.step(step).each do |recall|
+        data << self.precision_at_recall(recall/100.0)
+      end
+      data
+    end
+    # The average precision. This is
+    # equivalent to the average of calling
+    # #precision_at_rank with 1..n, n
+    # being the number of results.
+    def average_precision
+      n = 0
+      avg = 0.0
+      relevant = 0
+      @search_hits.each do |h|
+        n = n + 1
+        if @relevant_docids.include? h
+          relevant = relevant + 1
+          avg += (relevant.to_f / n) / @relevant_docids.size
+        end
+      end
+      avg
+    end
+    # Discounted Cumulative Gain at
+    # rank k. For a relevant search
+    # result at position x, its con-
+    # tribution to the DCG is
+    # 1.0/Math.log(x, logbase). A
+    # higher logbase means more dis-
+    # counts for results further out.
+    def dcg_at(k, logbase=2)
+      i = 1
+      dcg = 0.0
+      @search_hits[0, k].each do |h|
+        if @relevant_docids.include? h
+          dcg += i == 1 ? 1.0 : 1.0 / Math.log(i, logbase)
+        end
+        i += 1
+      end
+      dcg
+    end
+  end
+end

data/lib/evalir/evalirator_collection.rb ADDED Viewed

@@ -0,0 +1,71 @@
+module Evalir
+  class EvaliratorCollection
+    include Enumerable
+    def initialize
+      @evalirators = []
+    end
+    def size
+      @evalirators.size
+    end
+    # Calls block once for each element in self,
+    # passing that element as a parameter.
+    def each(&block)
+      @evalirators.each(&block)
+    end
+    # Adds an evalirator to the set over which
+    # calculations are done.
+    def <<(evalirator)
+      @evalirators << evalirator
+    end
+    # Maps over all elements, executing
+    # <em>blk</em> on every evalirator.
+    def lazy_map(&blk)
+      Enumerator.new do |yielder|
+        self.each do |e|
+          yielder << blk[e]
+        end
+      end
+    end
+    # Adds a list of relevant documents, and
+    # a list of retrived documents. This rep-
+    # resents one information need.
+    def add(relevant_docids, retrieved_docids)
+      @evalirators << Evalirator.new(relevant_docids, retrieved_docids)
+    end
+    # Mean Average Precision - this is just
+    # a fancy way of saying 'average average
+    # precision'!
+    def mean_average_precision
+      avg = 0.0
+      @evalirators.each do |e|
+        avg += (e.average_precision / @evalirators.size)
+      end
+      avg
+    end
+    # Gets the data for the precision-recall
+    # curve, ranging over the interval [<em>from</em>,
+    # <em>to</em>], with a step size of <em>step</em>.
+    # This is the average over all evalirators.
+    def precision_recall_curve(from = 0, to = 100, step = 10)
+      return nil if @evalirators.empty?
+      #n = self.size.to_f
+      x = 1
+      curves = self.lazy_map { |e| e.precision_recall_curve(from, to, step) }
+      return curves.reduce do |acc, data|
+        x += 1
+        data.each_with_index.map do |d,i|
+          acc[i] = (acc[i] + d) / x
+        end
+      end
+    end
+  end
+end

data/lib/evalir/version.rb ADDED Viewed

@@ -0,0 +1,3 @@
+module Evalir
+  VERSION = "0.0.1"
+end

data/lib/evalir.rb ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ require 'evalir/evalirator'
2	+ require 'evalir/evalirator_collection'

data/test/test_evalirator_collection.rb ADDED Viewed

@@ -0,0 +1,24 @@
+require 'test/unit'
+require 'evalir'
+class EvaliratorCollectionTest < Test::Unit::TestCase
+  def setup
+    @e = Evalir::EvaliratorCollection.new()
+    @e.add([1,3,6,9,10], [1,2,3,4,5,6,7,8,9,10])
+    @e.add([2,5,7], [1,2,3,4,5,6,7,8,9,10])
+  end
+  def test_map
+    assert_equal(0.53, @e.mean_average_precision.round(2))
+  end
+  def test_simple_enumeration
+    assert_equal(2, @e.count)
+  end
+  def test_precision_recall_curve
+    expected = [1.0, 0.5, 0.5, 0.5, 0.375, 0.4, 0.417, 0.429, 0.375, 0.389, 0.4]
+    actual = @e.precision_recall_curve.collect { |f| f.round(3) }
+    assert_equal(expected, actual)
+  end
+end

data/test/test_evalirator_ranked.rb ADDED Viewed

@@ -0,0 +1,52 @@
+require 'test/unit'
+require 'evalir'
+class EvaliratorRankedTest < Test::Unit::TestCase
+  def setup
+    relevant = [3, 5, 9, 25, 39, 44, 56, 71, 89, 123]
+    retrieved = [123,84,56,6,8,9,511,129,187,25,38,48,250,113,3]
+    @e = Evalir::Evalirator.new(relevant, retrieved)
+  end
+  def test_top_10_percent
+    assert_equal([123, 84], @e.top_percent(10))
+  end
+  def test_precision_at_rank_6
+    assert_equal(0.5, @e.precision_at_rank(6))
+  end
+  def test_precision_at_recall_0_1
+    assert_equal(0.5, @e.precision_at_recall(0.1))
+  end
+  def test_precision_at_recall_0
+    assert_equal(1.0, @e.precision_at_recall(0.0))
+  end
+  def test_precision_recall_curve
+    relevant = [1,3,5,7,9]
+    retrieved = [1,2,3,4,5,6,7,8,9,10]
+    expected = [1.0,1/1.0,1/2.0,2/3.0,2/4.0,3/5.0,3/6.0,4/7.0,4/8.0,5/9.0,5/10.0]
+    evalirator = Evalir::Evalirator.new(relevant, retrieved)
+    assert_equal(expected, evalirator.precision_recall_curve)
+  end
+  def test_r_precision
+    assert_equal(0.4, @e.r_precision)
+  end
+  def test_average_precision
+    e1 = Evalir::Evalirator.new([1,3,4,5,6,10], [1,2,3,4,5,6,7,8,9,10])
+    assert_equal(0.78, e1.average_precision.round(2))
+    e2 = Evalir::Evalirator.new([2,5,6,7,9,10], [1,2,3,4,5,6,7,8,9,10])
+    assert_equal(0.52, e2.average_precision.round(2))
+  end
+  def test_dcg_at_5
+    expected = 1.0 + (1.0/Math.log(3,2))
+    assert_equal(expected, @e.dcg_at(5))
+  end
+end

data/test/test_evalirator_unranked.rb ADDED Viewed

@@ -0,0 +1,57 @@
+require 'test/unit'
+require_relative '../lib/evalir'
+class EvaliratorUnrankedTest < Test::Unit::TestCase
+  def setup
+    @e = Evalir::Evalirator.new([1], [1,4,8])
+  end
+  def test_precision_on_empty
+    assert(Evalir::Evalirator.new([1]).precision.nan?)
+  end
+  def test_recall_on_empty
+    assert_equal(0, Evalir::Evalirator.new([1]).recall)
+  end
+  def test_precision
+    assert_equal(1.0/3, @e.precision)
+  end
+  def test_recall
+    assert_equal(1.0, @e.recall)
+  end
+  def test_size
+    assert_equal(3.0, @e.size)
+  end
+  def test_false_negatives
+    assert_equal(0, @e.false_negatives)
+  end
+  def test_true_positives
+    assert_equal(1, @e.true_positives)
+  end
+  def test_false_positives
+    assert_equal(2, @e.false_positives)
+  end
+  def test_f1
+    assert_equal(0.5, @e.f1)
+  end
+  def test_f_measure_1
+    assert_equal(0.5, @e.f_measure(1.0))
+  end
+  def test_f05
+    assert_equal(0.38, @e.f_measure(0.5).round(2))
+  end
+  def test_f3
+    assert_equal(0.833, @e.f_measure(3.0).round(3))
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,80 @@
+--- !ruby/object:Gem::Specification
+name: evalir
+version: !ruby/object:Gem::Version
+  version: 0.0.1
+  prerelease:
+platform: ruby
+authors:
+- Alexander Mossin
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2011-09-30 00:00:00.000000000Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: &70244820710980 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: *70244820710980
+description: Evalir is used to measure search relevance at Companybook, and offers
+  a number of standard measurements, from the basic precision and recall to single
+  value summaries such as NDCG and MAP.
+email:
+- alexander@companybook.no
+- alexander.mossin@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- Gemfile
+- Gemfile.lock
+- README.md
+- Rakefile
+- evalir.gemspec
+- lib/evalir.rb
+- lib/evalir/evalirator.rb
+- lib/evalir/evalirator_collection.rb
+- lib/evalir/version.rb
+- test/test_evalirator_collection.rb
+- test/test_evalirator_ranked.rb
+- test/test_evalirator_unranked.rb
+homepage: http://github.com/companybook/Evalir
+licenses: []
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+      segments:
+      - 0
+      hash: 1697956995838933814
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+      segments:
+      - 0
+      hash: 1697956995838933814
+requirements: []
+rubyforge_project:
+rubygems_version: 1.8.10
+signing_key:
+specification_version: 3
+summary: A library for evaluation of IR systems.
+test_files:
+- test/test_evalirator_collection.rb
+- test/test_evalirator_ranked.rb
+- test/test_evalirator_unranked.rb