retreval 0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ CHANGELOG
2
+ =========
3
+
4
+ Version 0.1
5
+ -----------
6
+
7
+ - First public version released.
@@ -0,0 +1,321 @@
1
+ README
2
+ ======
3
+
4
+ This is a simple API to evaluate information retrieval results. It allows you to load ranked and unranked query results and calculate various evaluation metrics (precision, recall, MAP, kappa) against a previously loaded gold standard.
5
+
6
+ Start this program from the command line with:
7
+
8
+ retreval -l <gold-standard-file> -q <query-results> -f <format> -o <output-prefix>
9
+
10
+ The options are outlined when you pass no arguments and just call
11
+
12
+ retreval
13
+
14
+ You will find further information in the RDOC documentation and the HOWTO section below.
15
+
16
+ If you want to see an example, use this command:
17
+
18
+ retreval -l example/gold_standard.yml -q example/query_results.yml -f yaml -v
19
+
20
+
21
+ INSTALLATION
22
+ ============
23
+
24
+ You can manually download the sources and build the Gem from there by `cd`ing to the folder where this README is saved and calling
25
+
26
+ gem build retreval.gemspec
27
+
28
+ This will create a file called `retreval` which you just have to install:
29
+
30
+ gem install retreval
31
+
32
+ And you're done.
33
+
34
+
35
+ HOWTO
36
+ =====
37
+
38
+ This API supports the following evaluation tasks:
39
+
40
+ - Loading a Gold Standard that takes a set of documents, queries and corresponding judgements of relevancy (i.e. "Is this document relevant for this query?")
41
+ - Calculation of the _kappa measure_ for the given gold standard
42
+
43
+ - Loading ranked or unranked query results for a certain query
44
+ - Calculation of _precision_ and _recall_ for each result
45
+ - Calculation of the _F-measure_ for weighing precision and recall
46
+ - Calculation of _mean average precision_ for multiple query results
47
+ - Calculation of the _11-point precision_ and _average precision_ for ranked query results
48
+
49
+ - Printing of summary tables and results
50
+
51
+ Typically, you will want to use this Gem either standalone or within another application's context.
52
+
53
+ Standalone Usage
54
+ ================
55
+
56
+ Call parameters
57
+ ---------------
58
+
59
+ After installing the Gem (see INSTALLATION), you can always call `retreval` from the commandline. The typical call is:
60
+
61
+ retreval -l <gold-standard-file> -q <query-results> -f <format> -o <output-prefix>
62
+
63
+ Where you have to define the following options:
64
+
65
+ - `gold-standard-file` is a file in a specified format that includes all the judgements
66
+ - `query-results` is a file in a specified format that includes all the query results in a single file
67
+ - `format` is the format that the files will use (either "yaml" or "plain")
68
+ - `output-prefix` is the prefix of output files that will be created
69
+
70
+ Formats
71
+ -------
72
+
73
+ Right now, we focus on the formats you can use to load data into the API. Currently, we support YAML files that must adhere to a special syntax. So, in order to load a gold standard, we need a file in the following format:
74
+
75
+ * "query" denotes the query
76
+ * "documents" these are the documents judged for this query
77
+ * "id" the ID of the document (e.g. its filename, etc.)
78
+ * "judgements" an array of judgements, each one with:
79
+ * "relevant" a boolean value of the judgment (relevant or not)
80
+ * "user" an optional identifier of the user
81
+
82
+ Example file, with one query, two documents, and one judgement:
83
+
84
+ - query: 12th air force germany 1957
85
+ documents:
86
+ - id: g5701s.ict21311
87
+ judgements: []
88
+
89
+ - id: g5701s.ict21313
90
+ judgements:
91
+ - relevant: false
92
+ user: 2
93
+
94
+ So, when calling the program, specify the format as `yaml`.
95
+ For the query results, a similar format is used. Note that it is necessary to specify whether the result sets are ranked or not, as this will heavily influence the calculations. You can specify the score for a document. By "score" we mean the score that your retrieval algorithm has given the document. But this is not necessary. The documents will always be ranked in the order of their appearance, regardless of their score. Thus in the following example, the document with "07" at the end is the first and "25" is the last, regardless of the score.
96
+
97
+ ---
98
+ query: 12th air force germany 1957
99
+ ranked: true
100
+ documents:
101
+ - score: 0.44034874
102
+ document: g5701s.ict21307
103
+ - score: 0.44034874
104
+ document: g5701s.ict21309
105
+ - score: 0.44034874
106
+ document: g5701s.ict21311
107
+ - score: 0.44034874
108
+ document: g5701s.ict21313
109
+ - score: 0.44034874
110
+ document: g5701s.ict21315
111
+ - score: 0.44034874
112
+ document: g5701s.ict21317
113
+ - score: 0.44034874
114
+ document: g5701s.ict21319
115
+ - score: 0.44034874
116
+ document: g5701s.ict21321
117
+ - score: 0.44034874
118
+ document: g5701s.ict21323
119
+ - score: 0.44034874
120
+ document: g5701s.ict21325
121
+ ---
122
+ query: 1612
123
+ ranked: true
124
+ documents:
125
+ - score: 1.0174774
126
+ document: g3290.np000144
127
+ - score: 0.763108
128
+ document: g3201b.ct000726
129
+ - score: 0.763108
130
+ document: g3400.ct000886
131
+ - score: 0.6359234
132
+ document: g3201s.ct000130
133
+ ---
134
+
135
+ **Note**: You can also use the `plain` format, which will load the gold standard in a different way (but not the results):
136
+
137
+ my_query my_document_1 false
138
+ my_query my_document_2 true
139
+
140
+ See that every query/document/relevancy pair is separated by a tabulator? You can also add the user's ID in the fourth column if necessary.
141
+
142
+ Running the evaluation
143
+ -----------------------
144
+
145
+ After you have specified the input files and the format, you can run the program. If needed, the `-v` switch will turn on verbose messages, such as information on how many judgements, documents and users there are, but this shouldn't be necessary.
146
+
147
+ The program will first load the gold standard and then calculate the statistics for each result set. The output files are automatically created and contain a YAML representation of the results.
148
+
149
+ Calculations may take a while depending on the amount of judgements and documents. If there are a thousand judgements, always consider a few seconds for each result set.
150
+
151
+ Interpreting the output files
152
+ ------------------------------
153
+
154
+ Two output files will be created:
155
+
156
+ - `output_avg_precision.yml`
157
+ - `output_statistics.yml`
158
+
159
+ The first lists the average precision for each query in the query result file. The second file lists all supported statistics for each query in the query results file.
160
+
161
+ For example, for a ranked evaluation, the first two entries of such a query result statistic look like this:
162
+
163
+ ---
164
+ 12th air force germany 1957:
165
+ - :precision: 0.0
166
+ :recall: 0.0
167
+ :false_negatives: 1
168
+ :false_positives: 1
169
+ :true_negatives: 2516
170
+ :true_positives: 0
171
+ :document: g5701s.ict21313
172
+ :relevant: false
173
+ - :precision: 0.0
174
+ :recall: 0.0
175
+ :false_negatives: 1
176
+ :false_positives: 2
177
+ :true_negatives: 2515
178
+ :true_positives: 0
179
+ :document: g5701s.ict21317
180
+ :relevant: false
181
+
182
+ You can see the precision and recall for that specific point and also the number of documents for the contingency table (true/false positives/negatives). Also, the document identifier is given.
183
+
184
+ API Usage
185
+ =========
186
+
187
+ Using this API in another ruby application is probably the more common use case. All you have to do is include the Gem in your Ruby or Ruby on Rails application. For details about available methods, please refer to the API documentation generated by RDoc.
188
+
189
+ **Important**: For this implementation, we use the document ID, the query and the user ID as the primary keys for matching objects. This means that your documents and queries are identified by a string and thus the strings should be sanitized first.
190
+
191
+ Loading the Gold Standard
192
+ -------------------------
193
+
194
+ Once you have loaded the Gem, you will probably start by creating a new gold standard.
195
+
196
+ gold_standard = GoldStandard.new
197
+
198
+ Then, you can load judgements into this standard, either from a file, or manually:
199
+
200
+ gold_standard.load_from_yaml_file "my-file.yml"
201
+ gold_standard.add_judgement :document => doc_id, :query => query_string, :relevant => boolean, :user => John
202
+
203
+ There is a nice shortcut for the `add_judgement` method. Both lines are essentially the same:
204
+
205
+ gold_standard.add_judgement :document => doc_id, :query => query_string, :relevant => boolean, :user => John
206
+ gold_standard << :document => doc_id, :query => query_string, :relevant => boolean, :user => John
207
+
208
+ Note the usage of typical Rails hashes for better readability (also, this Gem was developed to be used in a Rails webapp).
209
+
210
+ Now that you have loaded the gold standard, you can do things like:
211
+
212
+ gold_standard.contains_judgement? :document => "a document", :query => "the query"
213
+ gold_standard.relevant? :document => "a document", :query => "the query"
214
+
215
+
216
+ Loading the Query Results
217
+ -------------------------
218
+
219
+ Now we want to create a new `QueryResultSet`. A query result set can contain more than one result, which is what we normally want. It is important that you specify the gold standard it belongs to.
220
+
221
+ query_result_set = QueryResultSet.new :gold_standard => gold_standard
222
+
223
+ Just like the Gold Standard, you can read a query result set from a file:
224
+
225
+ query_result_set.load_from_yaml_file "my-results-file.yml"
226
+
227
+ Alternatively, you can load the query results one by one. To do this, you have to create the results (either ranked or unranked) and then add documents:
228
+
229
+ my_result = RankedQueryResult.new :query => "the query"
230
+ my_result.add_document :document => "test_document 1", :score => 13
231
+ my_result.add_document :document => "test_document 2", :score => 11
232
+ my_result.add_document :document => "test_document 3", :score => 3
233
+
234
+ This result would be ranked, obviously, and contain three documents. Documents can have a score, but this is optional. You can also create an Array of documents first and add them altogether:
235
+
236
+ documents = Array.new
237
+ documents << ResultDocument.new :id => "test_document 1", :score => 20
238
+ documents << ResultDocument.new :id => "test_document 2", :score => 21
239
+ my_result = RankedQueryResult.new :query => "the query", :documents => documents
240
+
241
+ The same applies to `UnrankedQueryResult`s, obviously. The order of ranked documents is the same as the order in which they were added to the result.
242
+
243
+ The `QueryResultSet` will now contain all the results. They are stored in an array called `query_results`, which you can access. So, to iterate over each result, you might want to use the following code:
244
+
245
+ query_result_set.query_results.each_with_index do |result, index|
246
+ # ...
247
+ end
248
+
249
+ Or, more simply:
250
+
251
+ for result in query_result_set.query_results
252
+ # ...
253
+ end
254
+
255
+ Calculating statistics
256
+ ----------------------
257
+
258
+ Now to the interesting part: Calculating statistics. As mentioned before, there is a conceptual difference between ranked and unranked results. Unranked results are much easier to calculate and thus take less CPU time.
259
+
260
+ No matter if unranked or ranked, you can get the most important statistics by just calling the `statistics` method.
261
+
262
+ statistics = my_result.statistics
263
+
264
+ In the simple case of an unranked result, you will receive a hash with the following information:
265
+
266
+ * `precision` - the precision of the results
267
+ * `recall` - the recall of the results
268
+ * `false_negatives` - number of not retrieved but relevant items
269
+ * `false_positives` - number of retrieved but nonrelevant
270
+ * `true_negatives` - number of not retrieved and nonrelevantv items
271
+ * `true_positives` - number of retrieved and relevant items
272
+
273
+ In case of a ranked result, you will receive an Array that consists of _n_ such Hashes, depending on the number of documents. Each Hash will give you the information at a certain rank, e.g. the following to lines return the recall at the fourth rank.
274
+
275
+ statistics = my_ranked_result.statistics
276
+ statistics[3][:recall]
277
+
278
+ In addition to the information mentioned above, you can also get for each rank:
279
+
280
+ * `document` - the ID of the document that was returned at this rank
281
+ * `relevant` - whether the document was relevant or not
282
+
283
+ Calculating statistics with missing judgements
284
+ ----------------------------------------------
285
+
286
+ Sometimes, you don't have judgements for all document/query pairs in the gold standard. If this happens, the results will be cleaned up first. This means that every document in the results that doesn't appear to have a judgement will be removed temporarily.
287
+
288
+ As an example, take the following results:
289
+
290
+ * A
291
+ * B
292
+ * C
293
+ * D
294
+
295
+ Our gold standard only contains judgements for A and C. The results will be cleaned up first, thus leading to:
296
+
297
+ * A
298
+ * C
299
+
300
+ With this approach, we can still provide meaningful results (for precision and recall).
301
+
302
+ Other statistics
303
+ ----------------
304
+
305
+ There are several other statistics that can be calculated, for example the **F measure**. The F measure weighs precision and recall and has one parameter, either "alpha" or "beta". Get the F measure like so:
306
+
307
+ my_result.f_measure :beta => 1
308
+
309
+ If you don't specify either alpha or beta, we will assume that beta = 1.
310
+
311
+ Another interesting measure is **Cohen's Kappa**, which tells us about the inter-agreement of assessors. Get the kappa statistic like this:
312
+
313
+ gold_standard.kappa
314
+
315
+ This will calculate the average kappa for each pairwise combination of users in the gold standard.
316
+
317
+ For ranked results one might also want to calculate an **11-point precision**. Just call the following:
318
+
319
+ my_ranked_result.eleven_point_precision
320
+
321
+ This will return a Hash that has indices at the 11 recall levels from 0 to 1 (with steps of 0.1) and the corresponding precision at that recall level.
data/TODO ADDED
@@ -0,0 +1,5 @@
1
+ TODO
2
+ ====
3
+
4
+ - Find a suitable plaintext format for retrieval results
5
+ - Use Hashes instead of Arrays to store classes
@@ -0,0 +1,5 @@
1
+ #!/usr/bin/env ruby
2
+ require 'retreval/runner'
3
+
4
+ runner = Retreval::Runner.new(ARGV)
5
+ runner.run
@@ -0,0 +1,48 @@
1
+ - query: Example Query
2
+ documents:
3
+ - id: ict21307
4
+ judgements:
5
+ - relevant: true
6
+ user: Bob
7
+ - id: ict21309
8
+ judgements:
9
+ - relevant: false
10
+ user: Bob
11
+ - id: ict21311
12
+ judgements:
13
+ - relevant: false
14
+ user: Bob
15
+ - id: ict21313
16
+ judgements:
17
+ - relevant: false
18
+ user: Bob
19
+ - id: ict21315
20
+ judgements:
21
+ - relevant: true
22
+ user: Bob
23
+ - relevant: true
24
+ user: John
25
+ - id: ict21317
26
+ judgements:
27
+ - relevant: false
28
+ user: Bob
29
+ - relevant: false
30
+ user: John
31
+ - id: ict21319
32
+ judgements:
33
+ - relevant: false
34
+ user: Bob
35
+ - relevant: false
36
+ user: John
37
+ - id: ict21321
38
+ judgements:
39
+ - relevant: false
40
+ user: John
41
+ - id: ict21323
42
+ judgements:
43
+ - relevant: true
44
+ user: John
45
+ - id: ict21325
46
+ judgements:
47
+ - relevant: true
48
+ user: John
@@ -0,0 +1,23 @@
1
+ - query: Example Query
2
+ ranked: true
3
+ documents:
4
+ - score: 0.24921744
5
+ id: ict21307
6
+ - score: 0.1623808
7
+ id: ict21309
8
+ - score: 0.13997056
9
+ id: ict21311
10
+ - score: 0.12525019
11
+ id: ict21313
12
+ - score: 0.11482056
13
+ id: ict21315
14
+ - score: 0.1131133
15
+ id: ict21317
16
+ - score: 0.09897413
17
+ id: ict21319
18
+ - score: 0.09897413
19
+ id: ict21321
20
+ - score: 0.09742848
21
+ id: ict21323
22
+ - score: 0.09742848
23
+ id: ict21325
@@ -0,0 +1,424 @@
1
+ module Retreval
2
+
3
+ # A gold standard is composed of several judgements for the
4
+ # cartesian product of documents and queries
5
+ class GoldStandard
6
+
7
+ attr_reader :documents, :judgements, :queries, :users
8
+
9
+ # Creates a new gold standard. One can optionally construct the gold
10
+ # standard with triples given. This would be a hash like:
11
+ # triples = {
12
+ # :document => "Document ID",
13
+ # :query => "Some query",
14
+ # :relevant => "true"
15
+ # }
16
+ #
17
+ # Called via:
18
+ # GoldStandard.new :triples => an_array_of_triples
19
+ def initialize(args = {})
20
+ @documents = Hash.new
21
+ @queries = Array.new
22
+ @judgements = Array.new
23
+ @users = Hash.new
24
+
25
+ # one can also construct a Gold Standard with everything already loaded
26
+ unless args[:triples].nil?
27
+ args[:triples].each do |triple|
28
+ add_judgement(triple)
29
+ end
30
+ end
31
+ end
32
+
33
+
34
+ # Parses a YAML file adhering to the following generic standard:
35
+ #
36
+ # * "query" denotes the query
37
+ # * "documents" these are the documents judged for this query
38
+ # * "id" the ID of the document (e.g. its filename, etc.)
39
+ # * "judgements" an array of judgements, each one with:
40
+ # * "relevant" a boolean value of the judgment (relevant or not)
41
+ # * "user" an optional identifier of the user
42
+ #
43
+ # Example file:
44
+ # - query: 12th air force germany 1957
45
+ # documents:
46
+ # - id: g5701s.ict21311
47
+ # judgements: []
48
+ #
49
+ # - id: g5701s.ict21313
50
+ # judgements:
51
+ # - relevant: false
52
+ # user: 2
53
+ def load_from_yaml_file(file)
54
+ begin
55
+ ydoc = YAML.load(File.open(file, "r"))
56
+ ydoc.each do |entry|
57
+
58
+ # The query is first in the hierarchy
59
+ query = entry["query"]
60
+
61
+ # Every query contains several documents
62
+ documents = entry["documents"]
63
+ documents.each do |doc|
64
+
65
+ document = doc["id"]
66
+
67
+ # Only count the map if it has judgements
68
+ if doc["judgements"].empty?
69
+ add_judgement :document => document, :query => query, :relevant => nil, :user => nil
70
+ else
71
+ doc["judgements"].each do |judgement|
72
+ relevant = judgement["relevant"]
73
+ user = judgement["user"]
74
+
75
+ add_judgement :document => document, :query => query, :relevant => relevant, :user => user
76
+ end
77
+ end
78
+
79
+ end
80
+ end
81
+
82
+ rescue Exception => e
83
+ raise "Error while parsing the YAML document: " + e.message
84
+ end
85
+ end
86
+
87
+
88
+ # Parses a plaintext file adhering to the following standard:
89
+ # Every line of text should include a triple that designates the judgement.
90
+ # The symbols should be separated by a tabulator.
91
+ # E.g.
92
+ # my_query my_document_1 false
93
+ # my_query my_document_2 true
94
+ #
95
+ # You can also add the user's ID in the fourth column.
96
+ def load_from_plaintext_file(file)
97
+ begin
98
+ File.open(file).each do |line|
99
+ line.chomp!
100
+ info = line.split("\t")
101
+ if info.length == 3
102
+ add_judgement :query => info[0], :document => info[1], :relevant => info[2]
103
+ elsif info.length == 4
104
+ add_judgement :query => info[0], :document => info[1], :relevant => info[2], :user => info[3]
105
+ end
106
+ end
107
+ rescue Exception => e
108
+ raise "Error while parsing the document: " + e.message
109
+ end
110
+ end
111
+
112
+
113
+ # Adds a judgement (document, query, relevancy) to the gold standard.
114
+ # All of those are strings in the public interface.
115
+ # The user ID is an optional parameter that can be used to measure kappa later.
116
+ # Call this with:
117
+ # add_judgement :document => doc_id, :query => query_string, :relevant => boolean, :user => John
118
+ def add_judgement(args)
119
+ document_id = args[:document]
120
+ query_string = args[:query]
121
+ relevant = args[:relevant]
122
+ user_id = args[:user]
123
+
124
+
125
+ unless document_id.nil? or query_string.nil?
126
+ document = Document.new :id => document_id
127
+ query = Query.new :querystring => query_string
128
+
129
+
130
+ # If the user exists, load it, otherwise create a new one
131
+ if @users.has_key?(user_id)
132
+ user = @users[user_id]
133
+ else
134
+ user = User.new :id => user_id unless user_id.nil?
135
+ end
136
+
137
+ # If there is no judgement for this combination, just add the document/query pair
138
+ if relevant.nil?
139
+ # TODO: improve efficiency by introducing hashes !
140
+ @documents[document_id] = document
141
+ @queries << query unless @queries.include?(query)
142
+ return
143
+ end
144
+
145
+ if user_id.nil?
146
+ judgement = Judgement.new :document => document, :query => query, :relevant => relevant
147
+ else
148
+ judgement = Judgement.new :document => document, :query => query, :relevant => relevant, :user => user
149
+
150
+ user.add_judgement(judgement)
151
+ @users[user_id] = user
152
+ end
153
+
154
+ @documents[document_id] = document
155
+ @queries << query unless @queries.include?(query)
156
+ @judgements << judgement
157
+ else
158
+ #TOOD I think there is somethink like an ArgumentExcpetion in Ruby; use that if applicable
159
+ raise "Need at least a Document, and a Query for creating the new entry."
160
+ end
161
+
162
+ end
163
+
164
+ # This is essentially the same as adding a Judgement, we can use this operator too.
165
+ def <<(args)
166
+ self.add_judgement args
167
+ end
168
+
169
+ # Returns true if a Document is relevant for a Query, according to this GoldStandard.
170
+ # Called by:
171
+ # relevant? :document => "document ID", :query => "query"
172
+ def relevant?(args)
173
+ query = Query.new :querystring => args[:query]
174
+ document = Document.new :id => args[:document]
175
+
176
+ relevant_count = 0
177
+ nonrelevant_count = 0
178
+
179
+ #TODO: looks quite inefficient. Would a hash with document-query-pairs as key help?
180
+ @judgements.each do |judgement|
181
+ if judgement.document == document and judgement.query == query
182
+ judgement.relevant ? relevant_count += 1 : nonrelevant_count += 1
183
+ end
184
+ end
185
+
186
+ # If we didn't find any judgements, just leave it as false
187
+ if relevant_count == 0 and relevant_count == 0
188
+ false
189
+ else
190
+ relevant_count >= nonrelevant_count
191
+ end
192
+ end
193
+
194
+
195
+ # Returns true if this GoldStandard contains a Judgement for this Query / Document pair
196
+ # This is called by:
197
+ # contains_judgement? :id => "the document ID", :querystring => "the query"
198
+ def contains_judgement?(args)
199
+ query = Query.new :querystring => args[:query]
200
+ document = Document.new :id => args[:document]
201
+
202
+ #TODO: a hash could improve performance here as well
203
+
204
+ @judgements.each { |judgement| return true if judgement.document == document and judgement.query == query }
205
+
206
+ false
207
+ end
208
+
209
+
210
+ # Returns true if this GoldStandard contains this Document
211
+ # Called by:
212
+ # contains_document? :id => "document ID"
213
+ def contains_document?(args)
214
+ document_id = args[:id]
215
+ @documents.key? document_id
216
+ end
217
+
218
+
219
+ # Returns true if this GoldStandard contains this Query string
220
+ # Called by:
221
+ # contains_query? :querystring => "the query"
222
+ def contains_query?(args)
223
+ querystring = args[:querystring]
224
+ query = Query.new :querystring => querystring
225
+ @queries.include? query
226
+ end
227
+
228
+
229
+ # Returns true if this GoldStandard contains this User
230
+ # Called by:
231
+ # contains_user? :id => "John Doe"
232
+ def contains_user?(args)
233
+ user_id = args[:id]
234
+ @users.key? user_id
235
+ end
236
+
237
+
238
+ # Calculates and returns the Kappa measure for this GoldStandard. It shows
239
+ # to which degree the judges agree in their decisions
240
+ # See: http://nlp.stanford.edu/IR-book/html/htmledition/assessing-relevance-1.html
241
+ def kappa
242
+
243
+ # FIXME: This isn't very pretty, maybe there's a more ruby-esque way to do this?
244
+ sum = 0
245
+ count = 0
246
+
247
+ # A repeated_combination yields all the pairwise combinations of
248
+ # users to generate the pairwise kappa statistic. Elements are also
249
+ # paired with themselves, so we need to remove those.
250
+ @users.values.repeated_combination(2) do |combination|
251
+ user1, user2 = combination[0], combination[1]
252
+ unless user1 == user2
253
+ kappa = pairwise_kappa(user1, user2)
254
+ unless kappa.nil?
255
+ puts "Kappa for User #{user1.id} and #{user2.id}: #{kappa}" if $verbose
256
+ sum += kappa unless kappa.nil?
257
+ count += 1
258
+ end
259
+ end
260
+ end
261
+
262
+ @kappa = sum / count.to_f
263
+ puts "Average pairwise kappa: #{@kappa}" if $verbose
264
+ return @kappa
265
+ end
266
+
267
+ private
268
+
269
+ # Calculates the pairwise kappa statistic for two users.
270
+ # The two users objects need at least one Judgement in common.
271
+ # Note that the kappa statistic is not really meaningful when there are
272
+ # too little judgements in common!
273
+ def pairwise_kappa(user1, user2)
274
+
275
+ user1_judgements = user1.judgements.reject { |judgement| not user2.judgements.include?(judgement) }
276
+ user2_judgements = user2.judgements.reject { |judgement| not user1.judgements.include?(judgement) }
277
+
278
+ total_count = user1_judgements.count
279
+
280
+ unless user1_judgements.empty? or user1_judgements.empty?
281
+
282
+ positive_agreements = 0 # => when both judges agree positively (relevant)
283
+ negative_agreements = 0 # => when both judges agree negatively (nonrelevant)
284
+ negative_disagreements = 0 # => when the second judge disagrees by using "nonrelevant"
285
+ positive_disagreements = 0 # => when the second judge disagrees by using "relevant"
286
+
287
+ for i in 0..(user1_judgements.count-1)
288
+ if user1_judgements[i].relevant == true
289
+ if user2_judgements[i].relevant == true
290
+ positive_agreements += 1
291
+ else
292
+ negative_disagreements += 1
293
+ end
294
+ elsif user1_judgements[i].relevant == false
295
+ if user2_judgements[i].relevant == false
296
+ negative_agreements += 1
297
+ else
298
+ positive_disagreements += 1
299
+ end
300
+ end
301
+ end
302
+
303
+ # The proportion the judges agreed:
304
+ p_agreed = (positive_agreements + negative_agreements) / total_count.to_f
305
+
306
+ # The pooled marginals:
307
+ p_nonrelevant = (positive_disagreements + negative_agreements * 2 + negative_disagreements) / (total_count.to_f * 2)
308
+ # This one is the opposite of P(nonrelevant):
309
+ # p_relevant = (positive_agreements * 2 + negative_disagreements + positive_disagreements) / (total_count.to_f * 2)
310
+ p_relevant = 1 - p_nonrelevant
311
+
312
+ # The probability that the judges agreed by chance
313
+ p_agreement_by_chance = p_nonrelevant ** 2 + p_relevant ** 2
314
+
315
+
316
+ # Finally, the pairwise kappa value
317
+ # If there'd be a division by zero, we avoid it and return 0 right away
318
+ if p_agreed - p_agreement_by_chance == 0
319
+ return 0
320
+ # In any other case, the kappa value is correct and we can return it
321
+ else
322
+ kappa = (p_agreed - p_agreement_by_chance) / (1 - p_agreement_by_chance)
323
+ return kappa
324
+ end
325
+ end
326
+
327
+ # If there are no common judgements, there is no kappa value to calculate
328
+ return nil
329
+ end
330
+
331
+ end
332
+
333
+
334
+ # A Query is effectively a string that is used as its ID.
335
+ class Query
336
+
337
+ attr_reader :querystring
338
+
339
+ # Compares two Query objects according to their query string
340
+ def ==(query)
341
+ query.querystring == self.querystring
342
+ end
343
+
344
+ # Creates a new Query object with a specified string
345
+ def initialize(args)
346
+ @querystring = args[:querystring].to_s
347
+ raise "Can not construct a Query with an empty query string" if @querystring.nil?
348
+ end
349
+
350
+ end
351
+
352
+ # A Document is a generic resource that is identified by its ID (which could be anything).
353
+ class Document
354
+
355
+ attr_reader :id
356
+
357
+ # Compares two Document objects according to their id
358
+ def ==(document)
359
+ document.id == self.id
360
+ end
361
+
362
+ # Creates a new Document object with a specified id
363
+ def initialize(args)
364
+ @id = args[:id].to_s
365
+ raise "Can not construct a Document with an empty identifier" if @id.nil?
366
+ end
367
+
368
+ end
369
+
370
+ # A Judgement references one query and one document as being relevant to each other or not.
371
+ # It also keeps track of the User who created the Judgement, if necessary.
372
+ class Judgement
373
+
374
+ attr_reader :relevant, :document, :query, :user
375
+
376
+ # Creates a new Judgement that belongs to a Query, a Document, and optionally to a User
377
+ # Called by (note the usage of IDs, not objects):
378
+ # Judgement.new :document => my_doc_id, :user => my_user_id, :query => query_string, :relevant => true
379
+ def initialize(args)
380
+ @relevant = args[:relevant]
381
+ @document = args[:document]
382
+ @query = args[:query]
383
+ @user = args[:user]
384
+ end
385
+
386
+
387
+ # A Judgement is considered equal to another when they are for the same Query or Document.
388
+ # This comparison happens regardless of the user, so it is easier to generate "unique" Judgements
389
+ # or calculate the kappa measure.
390
+ def ==(judgement)
391
+ self.document == judgement.document and self.query == judgement.query
392
+ end
393
+
394
+ end
395
+
396
+ # A User is optional for a Judgement, they are identified by their ID, which could be anything.
397
+ class User
398
+
399
+ attr_reader :id, :judgements
400
+
401
+ # Compares two User objects according to their id
402
+ def ==(user)
403
+ user.id == self.id
404
+ end
405
+
406
+
407
+ # Creates a new User object with a specified id
408
+ def initialize(args)
409
+ @id = args[:id]
410
+ @judgements = Array.new
411
+ raise "Can not construct a User with an empty identifier" if @id.nil?
412
+ end
413
+
414
+
415
+ # Adds a reference to a Judgement to this User object, since this makes it
416
+ # easier to calculate kappa later. Some users have multiple judgements for
417
+ # the same Document Query pair, which isn't really helpful. We therefore eliminate
418
+ # duplicates.
419
+ def add_judgement(judgement)
420
+ @judgements << judgement unless @judgements.include?(judgement)
421
+ end
422
+
423
+ end
424
+ end