koda-calais 0.0.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,33 @@
1
+ # Changes
2
+
3
+ ## 0.0.7
4
+ * verified 4.0 API
5
+ * moved gem packaging to `jeweler` and documentation to `yard`
6
+
7
+ ## 0.0.6
8
+ * fully implemented 3.1 API
9
+
10
+ ## 0.0.5
11
+ * fixed error where classes weren't being required in the proper order on Ubuntu (reported by Jon Moses)
12
+ * New things coming back from the API. Fixing in tests.
13
+
14
+ ## 0.0.4
15
+ * changed dependency from `hpricot` to `libxml`
16
+ * unicode fun
17
+ * cleanup all around
18
+
19
+ ## 0.0.3
20
+ * pluginized the library for Rails (thanks [pius](http://gitorious.org/projects/calais-au-rails))
21
+ * added helper methods name entity types from a response
22
+
23
+ ## 0.0.2
24
+ * cleanup in the specs
25
+ * cleaner parsing
26
+ * location of named entities
27
+ * more data in relationships
28
+ * moved Names and Relationships
29
+
30
+ ## 0.0.1
31
+ * Access to OpenCalais's Enlighten action
32
+ * Single method to process a document
33
+ * Get relationships and names from a document
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2008 Abhay Kumar info@opensynapse.net
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ 'Software'), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
17
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
18
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
19
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.markdown ADDED
@@ -0,0 +1,49 @@
1
+ # Calais #
2
+ A Ruby interface to the [Open Calais Web Service](http://opencalais.com)
3
+
4
+ ## Features ##
5
+ * Accepts documents in text/plain, text/xml and text/html format.
6
+ * Basic access to the Open Calais API's Enlighten action.
7
+ * Output is RDF representation of input document.
8
+ * Single function ability to extract names, entities and geographies from given text.
9
+
10
+ ## Synopsis ##
11
+
12
+ This is a very basic wrapper to the Open Calais API. It uses the POST endpoint and currently supports the Enlighten action. Here's a simple call:
13
+
14
+ Calais.enlighten(
15
+ :content => "The government of the United Kingdom has given corporations like fast food chain McDonald's the right to award high school qualifications to employees who complete a company training program."
16
+ :content_type => :text,
17
+ :license_id => 'your license id'
18
+ )
19
+
20
+ This is the easiest way to get the RDF-formated response from the OpenCalais service.
21
+
22
+ If you want to do something more fun like getting all sorts of fun information about a document, you can try this:
23
+
24
+ Calais.process_document(
25
+ :content => "The government of the United Kingdom has given corporations like fast food chain McDonald's the right to award high school qualifications to employees who complete a company training program.",
26
+ :content_type => :text,
27
+ :license_id => 'your license id'
28
+ )
29
+
30
+ This will return an object containing information extracted from the RDF response.
31
+
32
+ ## Requirements ##
33
+
34
+ * [Ruby 1.8.5 or better](http://ruby-lang.org)
35
+ * [nokogiri](http://nokogiri.rubyforge.org/nokogiri/), [libxml2](http://xmlsoft.org/), [libxslt](http://xmlsoft.org/xslt/)
36
+ * [curb](http://curb.rubyforge.org/), [libcurl](http://curl.haxx.se/)
37
+ * [json](http://json.rubyforge.org/)
38
+
39
+ ## Install ##
40
+
41
+ You can install the Calais gem via Rubygems (`gem install calais`) or by building from source.
42
+
43
+ ## Authors ##
44
+
45
+ * [Abhay Kumar](http://opensynapse.net)
46
+
47
+ ## Acknowledgements ##
48
+
49
+ * [Paul Legato](http://www.economaton.com/): Help all around with the new response processor and implementation of the 3.1 API.
data/Rakefile ADDED
@@ -0,0 +1,97 @@
1
+ # -*- ruby -*-
2
+
3
+ require 'rake'
4
+ require 'rake/clean'
5
+
6
+ require './lib/calais.rb'
7
+
8
+ begin
9
+ gem 'jeweler', '>= 1.0.1'
10
+ require 'jeweler'
11
+
12
+ Jeweler::Tasks.new do |s|
13
+ s.name = 'calais'
14
+ s.summary = 'A Ruby interface to the Calais Web Service'
15
+ s.email = 'info@opensynapse.net'
16
+ s.homepage = 'http://github.com/abhay/calais'
17
+ s.description = 'A Ruby interface to the Calais Web Service'
18
+ s.authors = ['Abhay Kumar']
19
+ s.files = FileList["[A-Z]*", "{bin,generators,lib,test}/**/*"]
20
+ s.rubyforge_project = 'calais'
21
+ s.add_dependency 'nokogiri', '>= 1.3.3'
22
+ s.add_dependency 'json', '>= 1.1.3'
23
+ s.add_dependency 'curb', '>= 0.1.4'
24
+ end
25
+ rescue LoadError
26
+ puts "Jeweler, or one of its dependencies, is not available. Please install it."
27
+ exit(1)
28
+ end
29
+
30
+ begin
31
+ require 'spec/rake/spectask'
32
+
33
+ desc "Run all specs"
34
+ Spec::Rake::SpecTask.new do |t|
35
+ t.spec_files = FileList["spec/**/*_spec.rb"].sort
36
+ t.spec_opts = ["--options", "spec/spec.opts"]
37
+ end
38
+
39
+ desc "Run all specs and get coverage statistics"
40
+ Spec::Rake::SpecTask.new('coverage') do |t|
41
+ t.spec_opts = ["--options", "spec/spec.opts"]
42
+ t.spec_files = FileList["spec/*_spec.rb"].sort
43
+ t.rcov_opts = ["--exclude", "spec", "--exclude", "gems"]
44
+ t.rcov = true
45
+ end
46
+
47
+ task :default => :spec
48
+ rescue LoadError
49
+ puts "RSpec, or one of its dependencies, is not available. Please install it."
50
+ exit(1)
51
+ end
52
+
53
+ begin
54
+ require 'yard'
55
+ require 'yard/rake/yardoc_task'
56
+
57
+ YARD::Rake::YardocTask.new do |t|
58
+ t.options = ["--verbose", "--markup=markdown", "--files=CHANGELOG.markdown,MIT-LICENSE"]
59
+ end
60
+
61
+ task :rdoc => :yardoc
62
+
63
+ CLOBBER.include 'doc'
64
+ CLOBBER.include '.yardoc'
65
+ rescue LoadError
66
+ puts "Yard, or one of its dependencies is not available. Please install it."
67
+ exit(1)
68
+ end
69
+
70
+ begin
71
+ require 'rake/contrib/sshpublisher'
72
+ namespace :rubyforge do
73
+
74
+ desc "Release gem and RDoc documentation to RubyForge"
75
+ task :release => ["rubyforge:release:gem", "rubyforge:release:docs"]
76
+
77
+ namespace :release do
78
+ desc "Publish RDoc to RubyForge."
79
+ task :docs => [:yardoc] do
80
+ config = YAML.load(
81
+ File.read(File.expand_path('~/.rubyforge/user-config.yml'))
82
+ )
83
+
84
+ host = "#{config['username']}@rubyforge.org"
85
+ remote_dir = "/var/www/gforge-projects/calais/"
86
+ local_dir = 'doc'
87
+
88
+ Rake::SshDirPublisher.new(host, remote_dir, local_dir).upload
89
+ end
90
+ end
91
+ end
92
+ rescue LoadError
93
+ puts "Rake SshDirPublisher is unavailable or your rubyforge environment is not configured."
94
+ exit(1)
95
+ end
96
+
97
+ # vim: syntax=Ruby
data/VERSION.yml ADDED
@@ -0,0 +1,4 @@
1
+ ---
2
+ :minor: 0
3
+ :patch: 9
4
+ :major: 0
data/lib/calais.rb ADDED
@@ -0,0 +1,56 @@
1
+ require 'digest/sha1'
2
+ require 'net/http'
3
+ require 'cgi'
4
+ require 'iconv'
5
+ require 'set'
6
+
7
+ require 'rubygems'
8
+ require 'nokogiri'
9
+ require 'json'
10
+ require 'curb'
11
+
12
+ $KCODE = "UTF8"
13
+ require 'jcode'
14
+
15
+ $:.unshift File.expand_path(File.dirname(__FILE__)) + '/calais'
16
+
17
+ require 'client'
18
+ require 'response'
19
+ require 'error'
20
+
21
+ module Calais
22
+ REST_ENDPOINT = "http://api.opencalais.com/enlighten/rest/"
23
+ BETA_REST_ENDPOINT = "http://beta.opencalais.com/enlighten/rest/"
24
+
25
+ AVAILABLE_CONTENT_TYPES = {
26
+ :xml => 'text/xml',
27
+ :html => 'text/html',
28
+ :htmlraw => 'text/htmlraw',
29
+ :raw => 'text/raw'
30
+ }
31
+
32
+ AVAILABLE_OUTPUT_FORMATS = {
33
+ :rdf => 'xml/rdf',
34
+ :simple => 'text/simple',
35
+ :microformats => 'text/microformats',
36
+ :json => 'application/json'
37
+ }
38
+
39
+ KNOWN_ENABLES = ['GenericRelations', 'SocialTags']
40
+ KNOWN_DISCARDS = ['er/Company', 'er/Geo', 'er/Product']
41
+
42
+ MAX_RETRIES = 5
43
+ HTTP_TIMEOUT = 60
44
+ MIN_CONTENT_SIZE = 1
45
+ MAX_CONTENT_SIZE = 100_000
46
+
47
+ class << self
48
+ def enlighten(*args, &block); Client.new(*args, &block).enlighten; end
49
+
50
+ def process_document(*args, &block)
51
+ client = Client.new(*args, &block)
52
+ client.output_format = :rdf
53
+ Response.new(client.enlighten)
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,113 @@
1
+ module Calais
2
+ class Client
3
+ # base attributes of the call
4
+ attr_accessor :content
5
+ attr_accessor :license_id
6
+
7
+ # processing directives
8
+ attr_accessor :content_type, :output_format, :reltag_base_url, :calculate_relevance, :omit_outputting_original_text
9
+ attr_accessor :store_rdf, :metadata_enables, :metadata_discards
10
+
11
+ # user directives
12
+ attr_accessor :allow_distribution, :allow_search, :external_id, :submitter
13
+
14
+ attr_accessor :external_metadata
15
+
16
+ attr_accessor :use_beta
17
+
18
+ def initialize(options={}, &block)
19
+ options.each {|k,v| send("#{k}=", v)}
20
+ yield(self) if block_given?
21
+ end
22
+
23
+ def enlighten
24
+ post_args = {
25
+ "licenseID" => @license_id,
26
+ "content" => Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "#{@content} ").first[0..-2],
27
+ "paramsXML" => params_xml
28
+ }
29
+
30
+ @client ||= Curl::Easy.new
31
+ @client.url = @use_beta ? BETA_REST_ENDPOINT : REST_ENDPOINT
32
+ @client.timeout = HTTP_TIMEOUT
33
+
34
+ post_fields = post_args.map {|k,v| Curl::PostField.content(k, v) }
35
+
36
+ do_request(post_fields)
37
+ end
38
+
39
+ def params_xml
40
+ check_params
41
+ document = Nokogiri::XML::Document.new
42
+
43
+ params_node = Nokogiri::XML::Node.new('c:params', document)
44
+ params_node['xmlns:c'] = 'http://s.opencalais.com/1/pred/'
45
+ params_node['xmlns:rdf'] = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
46
+
47
+ processing_node = Nokogiri::XML::Node.new('c:processingDirectives', document)
48
+ processing_node['c:contentType'] = AVAILABLE_CONTENT_TYPES[@content_type] if @content_type
49
+ processing_node['c:outputFormat'] = AVAILABLE_OUTPUT_FORMATS[@output_format] if @output_format
50
+ processing_node['c:calculateRelevanceScore'] = 'false' if @calculate_relevance == false
51
+ processing_node['c:reltagBaseURL'] = @reltag_base_url.to_s if @reltag_base_url
52
+
53
+ processing_node['c:enableMetadataType'] = @metadata_enables.join(',') unless @metadata_enables.empty?
54
+ processing_node['c:docRDFaccessible'] = @store_rdf if @store_rdf
55
+ processing_node['c:discardMetadata'] = @metadata_discards.join(';') unless @metadata_discards.empty?
56
+ processing_node['c:omitOutputtingOriginalText'] = 'true' if @omit_outputting_original_text
57
+
58
+ user_node = Nokogiri::XML::Node.new('c:userDirectives', document)
59
+ user_node['c:allowDistribution'] = @allow_distribution.to_s unless @allow_distribution.nil?
60
+ user_node['c:allowSearch'] = @allow_search.to_s unless @allow_search.nil?
61
+ user_node['c:externalID'] = @external_id.to_s if @external_id
62
+ user_node['c:submitter'] = @submitter.to_s if @submitter
63
+
64
+ params_node << processing_node
65
+ params_node << user_node
66
+
67
+ if @external_metadata
68
+ external_node = Nokogiri::XML::Node.new('c:externalMetadata', document)
69
+ external_node << @external_metadata
70
+ params_node << external_node
71
+ end
72
+
73
+ params_node.to_xml(:indent => 2)
74
+ end
75
+
76
+ private
77
+ def check_params
78
+ raise 'missing content' if @content.nil? || @content.empty?
79
+
80
+ content_length = @content.length
81
+ raise 'content is too small' if content_length < MIN_CONTENT_SIZE
82
+ raise 'content is too large' if content_length > MAX_CONTENT_SIZE
83
+
84
+ raise 'missing license id' if @license_id.nil? || @license_id.empty?
85
+
86
+ raise 'unknown content type' unless AVAILABLE_CONTENT_TYPES.keys.include?(@content_type) if @content_type
87
+ raise 'unknown output format' unless AVAILABLE_OUTPUT_FORMATS.keys.include?(@output_format) if @output_format
88
+
89
+ %w[calculate_relevance store_rdf allow_distribution allow_search].each do |variable|
90
+ value = self.send(variable)
91
+ unless NilClass === value || TrueClass === value || FalseClass === value
92
+ raise "expected a boolean value for #{variable} but got #{value}"
93
+ end
94
+ end
95
+
96
+ @metadata_enables ||= []
97
+ unknown_enables = Set.new(@metadata_enables) - KNOWN_ENABLES
98
+ raise "unknown metadata enables: #{unknown_enables.to_a.inspect}" unless unknown_enables.empty?
99
+
100
+ @metadata_discards ||= []
101
+ unknown_discards = Set.new(@metadata_discards) - KNOWN_DISCARDS
102
+ raise "unknown metadata discards: #{unknown_discards.to_a.inspect}" unless unknown_discards.empty?
103
+ end
104
+
105
+ def do_request(post_fields)
106
+ unless @client.http_post(post_fields)
107
+ raise 'unable to post to api endpoint'
108
+ end
109
+
110
+ @client.body_str
111
+ end
112
+ end
113
+ end
@@ -0,0 +1,3 @@
1
+ class Calais::Error < StandardError
2
+
3
+ end
@@ -0,0 +1,218 @@
1
+ module Calais
2
+ class Response
3
+ MATCHERS = {
4
+ :docinfo => 'DocInfo',
5
+ :docinfometa => 'DocInfoMeta',
6
+ :defaultlangid => 'DefaultLangId',
7
+ :doccat => 'DocCat',
8
+ :entities => 'type/em/e',
9
+ :relations => 'type/em/r',
10
+ :geographies => 'type/er',
11
+ :instances => 'type/sys/InstanceInfo',
12
+ :relevances => 'type/sys/RelevanceInfo',
13
+ :socialtags => 'SocialTag'
14
+ }
15
+
16
+ attr_accessor :submitter_code, :signature, :language, :submission_date, :request_id, :doc_title, :doc_date
17
+ attr_accessor :hashes, :entities, :relations, :geographies, :categories, :social_tags
18
+
19
+ def initialize(rdf_string)
20
+ @raw_response = rdf_string
21
+
22
+ @hashes = []
23
+ @entities = []
24
+ @relations = []
25
+ @geographies = []
26
+ @relevances = {} # key = String hash, val = Float relevance
27
+ @categories = []
28
+
29
+ extract_data
30
+ end
31
+
32
+ class Entity
33
+ attr_accessor :calais_hash, :type, :attributes, :relevance, :instances
34
+ end
35
+
36
+ class Relation
37
+ attr_accessor :calais_hash, :type, :attributes, :instances
38
+ end
39
+
40
+ class Geography
41
+ attr_accessor :name, :calais_hash, :attributes
42
+ end
43
+
44
+ class SocialTag
45
+ attr_accessor :name, :importance, :attributes
46
+ end
47
+
48
+ class Category
49
+ attr_accessor :name, :score
50
+ end
51
+
52
+ class Instance
53
+ attr_accessor :prefix, :exact, :suffix, :offset, :length
54
+
55
+ # Makes a new Instance object from an appropriate Nokogiri::XML::Node.
56
+ def self.from_node(node)
57
+ instance = self.new
58
+ instance.prefix = node.xpath("c:prefix[1]").first.content
59
+ instance.exact = node.xpath("c:exact[1]").first.content
60
+ instance.suffix = node.xpath("c:suffix[1]").first.content
61
+ instance.offset = node.xpath("c:offset[1]").first.content.to_i
62
+ instance.length = node.xpath("c:length[1]").first.content.to_i
63
+
64
+ instance
65
+ end
66
+ end
67
+
68
+ class CalaisHash
69
+ attr_accessor :value
70
+
71
+ def self.find_or_create(hash, hashes)
72
+ if !selected = hashes.select {|h| h.value == hash }.first
73
+ selected = self.new
74
+ selected.value = hash
75
+ hashes << selected
76
+ end
77
+
78
+ selected
79
+ end
80
+ end
81
+
82
+ private
83
+ def extract_data
84
+ doc = Nokogiri::XML(@raw_response)
85
+
86
+ if doc.root.xpath("/Error[1]").first
87
+ raise Calais::Error, doc.root.xpath("/Error/Exception").first.content
88
+ end
89
+
90
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:docinfometa]}')]/..").each do |node|
91
+ @language = node['language']
92
+ @submission_date = DateTime.parse node['submissionDate']
93
+
94
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
95
+
96
+ @signature = attributes.delete('signature')
97
+ @submitter_code = attributes.delete('submitterCode')
98
+
99
+ node.remove
100
+ end
101
+
102
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:docinfo]}')]/..").each do |node|
103
+ @request_id = node['calaisRequestID']
104
+
105
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
106
+
107
+ @doc_title = attributes.delete('docTitle')
108
+ @doc_date = Date.parse(attributes.delete('docDate'))
109
+
110
+ node.remove
111
+ end
112
+
113
+ @categories = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:doccat]}')]/..").map do |node|
114
+ category = Category.new
115
+ category.name = node.xpath("c:categoryName[1]").first.content
116
+ score = node.xpath("c:score[1]").first
117
+ category.score = score.content.to_f unless score.nil?
118
+
119
+ node.remove
120
+ category
121
+ end
122
+
123
+ @relevances = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:relevances]}')]/..").inject({}) do |acc, node|
124
+ subject_hash = node.xpath("c:subject[1]").first[:resource].split('/')[-1]
125
+ acc[subject_hash] = node.xpath("c:relevance[1]").first.content.to_f
126
+
127
+ node.remove
128
+ acc
129
+ end
130
+
131
+ @entities = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:entities]}')]/..").map do |node|
132
+ extracted_hash = node['about'].split('/')[-1] rescue nil
133
+
134
+ entity = Entity.new
135
+ entity.calais_hash = CalaisHash.find_or_create(extracted_hash, @hashes)
136
+ entity.type = extract_type(node)
137
+ entity.attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
138
+
139
+ entity.relevance = @relevances[extracted_hash]
140
+ entity.instances = extract_instances(doc, extracted_hash)
141
+
142
+ node.remove
143
+ entity
144
+ end
145
+
146
+ @relations = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:relations]}')]/..").map do |node|
147
+ extracted_hash = node['about'].split('/')[-1] rescue nil
148
+
149
+ relation = Relation.new
150
+ relation.calais_hash = CalaisHash.find_or_create(extracted_hash, @hashes)
151
+ relation.type = extract_type(node)
152
+ relation.attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
153
+ relation.instances = extract_instances(doc, extracted_hash)
154
+
155
+ node.remove
156
+ relation
157
+ end
158
+
159
+ @geographies = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:geographies]}')]/..").map do |node|
160
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
161
+
162
+ geography = Geography.new
163
+ geography.name = attributes.delete('name')
164
+ geography.calais_hash = attributes.delete('subject')
165
+ geography.attributes = attributes
166
+
167
+ node.remove
168
+ geography
169
+ end
170
+
171
+ @social_tags = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:socialtags]}')]/..").map do |node|
172
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
173
+
174
+ social_tag = SocialTag.new
175
+ social_tag.name = attributes.delete('name')
176
+ social_tag.importance = attributes.delete('importance')
177
+ social_tag.attributes = attributes
178
+
179
+ node.remove
180
+ social_tag
181
+ end
182
+
183
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:defaultlangid]}')]/..").each { |node| node.remove }
184
+ doc.root.xpath("./*").each { |node| node.remove }
185
+
186
+ return
187
+ end
188
+
189
+ def extract_instances(doc, hash)
190
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:instances]}')]/..").select do |instance_node|
191
+ instance_node.xpath("c:subject[1]").first[:resource].split("/")[-1] == hash
192
+ end.map do |instance_node|
193
+ instance = Instance.from_node(instance_node)
194
+ instance_node.remove
195
+
196
+ instance
197
+ end
198
+ end
199
+
200
+ def extract_type(node)
201
+ node.xpath("*[name()='rdf:type']")[0]['resource'].split('/')[-1]
202
+ rescue
203
+ nil
204
+ end
205
+
206
+ def extract_attributes(nodes)
207
+ nodes.inject({}) do |hsh, node|
208
+ value = if node['resource']
209
+ extracted_hash = node['resource'].split('/')[-1] rescue nil
210
+ CalaisHash.find_or_create(extracted_hash, @hashes)
211
+ else
212
+ node.content
213
+ end
214
+ hsh.merge(node.name => value)
215
+ end
216
+ end
217
+ end
218
+ end
@@ -0,0 +1,79 @@
1
+ require File.join(File.dirname(__FILE__), %w[.. helper])
2
+
3
+ describe Calais::Client, :new do
4
+ it 'accepts arguments as a hash' do
5
+ client = nil
6
+
7
+ lambda { client = Calais::Client.new(:content => SAMPLE_DOCUMENT, :license_id => LICENSE_ID) }.should_not raise_error
8
+
9
+ client.license_id.should == LICENSE_ID
10
+ client.content.should == SAMPLE_DOCUMENT
11
+ end
12
+
13
+ it 'accepts arguments as a block' do
14
+ client = nil
15
+
16
+ lambda {
17
+ client = Calais::Client.new do |c|
18
+ c.content = SAMPLE_DOCUMENT
19
+ c.license_id = LICENSE_ID
20
+ end
21
+ }.should_not raise_error
22
+
23
+ client.license_id.should == LICENSE_ID
24
+ client.content.should == SAMPLE_DOCUMENT
25
+ end
26
+
27
+ it 'should not accept unknown attributes' do
28
+ lambda { Calais::Client.new(:monkey => 'monkey', :license_id => LICENSE_ID) }.should raise_error(NoMethodError)
29
+ end
30
+ end
31
+
32
+ describe Calais::Client, :params_xml do
33
+ it 'returns an xml encoded string' do
34
+ client = Calais::Client.new(:content => SAMPLE_DOCUMENT, :license_id => LICENSE_ID)
35
+ client.params_xml.should == %[<c:params xmlns:c=\"http://s.opencalais.com/1/pred/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n <c:processingDirectives/>\n <c:userDirectives/>\n</c:params>]
36
+
37
+ client.content_type = :xml
38
+ client.output_format = :json
39
+ client.reltag_base_url = 'http://opencalais.com'
40
+ client.calculate_relevance = true
41
+ client.metadata_enables = Calais::KNOWN_ENABLES
42
+ client.metadata_discards = Calais::KNOWN_DISCARDS
43
+ client.allow_distribution = true
44
+ client.allow_search = true
45
+ client.external_id = Digest::SHA1.hexdigest(client.content)
46
+ client.submitter = 'calais.rb'
47
+
48
+ client.params_xml.should == %[<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">\n <c:processingDirectives c:contentType="text/xml" c:outputFormat="application/json" c:reltagBaseURL="http://opencalais.com" c:enableMetadataType="GenericRelations,SocialTags" c:discardMetadata="er/Company;er/Geo;er/Product"/>\n <c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="1a008b91e7d21962e132bc1d6cb252532116a606" c:submitter="calais.rb"/>\n</c:params>]
49
+ end
50
+ end
51
+
52
+ describe Calais::Client, :enlighten do
53
+ before do
54
+ @client = Calais::Client.new do |c|
55
+ c.content = SAMPLE_DOCUMENT
56
+ c.license_id = LICENSE_ID
57
+ c.content_type = :xml
58
+ c.output_format = :json
59
+ c.calculate_relevance = true
60
+ c.metadata_enables = Calais::KNOWN_ENABLES
61
+ c.allow_distribution = true
62
+ c.allow_search = true
63
+ end
64
+ end
65
+
66
+ it 'provides access to the enlighten command on the generic rest endpoint' do
67
+ @client.should_receive(:do_request).with(anything).and_return(SAMPLE_RESPONSE)
68
+ @client.enlighten
69
+ @client.instance_variable_get(:@client).url.should == Calais::REST_ENDPOINT
70
+ end
71
+
72
+ it 'provides access to the enlighten command on the beta rest endpoint' do
73
+ @client.use_beta = true
74
+
75
+ @client.should_receive(:do_request).with(anything).and_return(SAMPLE_RESPONSE)
76
+ @client.enlighten
77
+ @client.instance_variable_get(:@client).url.should == Calais::BETA_REST_ENDPOINT
78
+ end
79
+ end
@@ -0,0 +1,139 @@
1
+ require File.join(File.dirname(__FILE__), %w[.. helper])
2
+
3
+ describe Calais::Response, :new do
4
+ it 'accepts an rdf string to generate the response object' do
5
+ lambda { Calais::Response.new(SAMPLE_RESPONSE) }.should_not raise_error
6
+ end
7
+ end
8
+
9
+ describe Calais::Response, :new do
10
+ it "should return error message in runtime error" do
11
+ lambda {
12
+ @response = Calais::Response.new(RESPONSE_WITH_EXCEPTION)
13
+ }.should raise_error(Calais::Error, "My Error Message")
14
+ end
15
+ end
16
+
17
+ describe Calais::Response, :new do
18
+ before :all do
19
+ @response = Calais::Response.new(RESPONSE_WITH_SOCIAL_TAGS)
20
+ end
21
+
22
+ it 'should extract social tags' do
23
+ social_tags = @response.social_tags
24
+ social_tags.map { |e| e.name }.sort.uniq.should == ["Agile software development", "Behavior Driven Development", "Code refactoring", "Computing", "Extreme Programming", "RSpec", "Ruby on Rails", "Selenium", "Software development", "Software engineering", "Web 2.0"]
25
+ end
26
+ end
27
+
28
+ describe Calais::Response, :new do
29
+ before :all do
30
+ @response = Calais::Response.new(SAMPLE_RESPONSE)
31
+ end
32
+
33
+ it 'should extract document information' do
34
+ @response.language.should == 'English'
35
+ @response.submission_date.should be_a_kind_of(DateTime)
36
+ @response.signature.should be_a_kind_of(String)
37
+ @response.submitter_code.should be_a_kind_of(String)
38
+ @response.request_id.should be_a_kind_of(String)
39
+ @response.doc_title.should == 'Record number of bicycles sold in Australia in 2006'
40
+ @response.doc_date.should be_a_kind_of(Date)
41
+ end
42
+
43
+ it 'should extract entities' do
44
+ entities = @response.entities
45
+ entities.map { |e| e.type }.sort.uniq.should == %w[City Continent Country IndustryTerm Organization Person Position ProvinceOrState]
46
+ end
47
+
48
+ it 'should extract relations' do
49
+ relations = @response.relations
50
+ relations.map { |e| e.type }.sort.uniq.should == %w[GenericRelations PersonAttributes PersonCareer Quotation]
51
+ end
52
+
53
+ it 'should extract geographies' do
54
+ geographies = @response.geographies
55
+ geographies.map { |e| e.name }.sort.uniq.should == %w[Australia Hobart,Tasmania,Australia Tasmania,Australia]
56
+ end
57
+
58
+ it 'should extract relevances' do
59
+ @response.instance_variable_get(:@relevances).should be_a_kind_of(Hash)
60
+ end
61
+
62
+ it 'should assign a floating-point relevance to each entity' do
63
+ @response.entities.each {|e| e.relevance.should be_a_kind_of(Float) }
64
+ end
65
+
66
+ it 'should find the correct document categories returned by OpenCalais' do
67
+ @response.categories.map {|c| c.name }.sort.should == %w[Business_Finance Technology_Internet]
68
+ end
69
+
70
+ it 'should find the correct document category scores returned by OpenCalais' do
71
+ @response.categories.map {|c| c.score.should be_a_kind_of(Float) }
72
+ end
73
+
74
+ it "should not raise an error if no score is given by OpenCalais" do
75
+ lambda {Calais::Response.new(SAMPLE_RESPONSE_WITH_NO_SCORE)}.should_not raise_error
76
+ end
77
+
78
+ it "should not raise an error if no score is given by OpenCalais" do
79
+ response = Calais::Response.new(SAMPLE_RESPONSE_WITH_NO_SCORE)
80
+ response.categories.map {|c| c.score }.should == [nil]
81
+ end
82
+
83
+ it 'should find instances for each entity' do
84
+ @response.entities.each {|e|
85
+ e.instances.size.should > 0
86
+ }
87
+ end
88
+
89
+
90
+ it 'should find instances for each relation' do
91
+ @response.relations.each {|r|
92
+ r.instances.size.should > 0
93
+ }
94
+ end
95
+
96
+ it 'should find the correct instances for each entity' do
97
+ ## This currently tests only for the "Australia" entity's
98
+ ## instances. A more thorough test that tests for the instances
99
+ ## of each of the many entities in the sample doc is desirable in
100
+ ## the future.
101
+
102
+ australia = @response.entities.select {|e| e.attributes["name"] == "Australia" }.first
103
+ australia.instances.size.should == 3
104
+ instances = australia.instances.sort{|a,b| a.offset <=> b.offset }
105
+
106
+ instances[0].prefix.should == "number of bicycles sold in "
107
+ instances[0].exact.should == "Australia"
108
+ instances[0].suffix.should == " in 2006<\/title>\n<date>January 4,"
109
+ instances[0].offset.should == 67
110
+ instances[0].length.should == 9
111
+
112
+ instances[1].prefix.should == "4, 2007<\/date>\n<body>\nBicycle sales in "
113
+ instances[1].exact.should == "Australia"
114
+ instances[1].suffix.should == " have recorded record sales of 1,273,781 units"
115
+ instances[1].offset.should == 146
116
+ instances[1].length.should == 9
117
+
118
+ instances[2].prefix.should == " the traditional company car,\" he said.\n\n\"Some of "
119
+ instances[2].exact.should == "Australia"
120
+ instances[2].suffix.should == "'s biggest corporations now have bicycle fleets,"
121
+ instances[2].offset.should == 952
122
+ instances[2].length.should == 9
123
+ end
124
+
125
+ it 'should find the correct instances for each relation' do
126
+ ## This currently tests only for one relation's instances. A more
127
+ ## thorough test that tests for the instances of each of the many other
128
+ ## relations in the sample doc is desirable in the future.
129
+
130
+ rel = @response.relations.select {|e| e.calais_hash.value == "8f3936d9-cf6b-37fc-ae0d-a145959ae3b5" }.first
131
+ rel.instances.size.should == 1
132
+
133
+ rel.instances.first.prefix.should == " manufacturers.\n\nThe Cycling Promotion Fund (CPF) "
134
+ rel.instances.first.exact.should == "spokesman Ian Christie said Australians were increasingly using bicycles as an alternative to cars."
135
+ rel.instances.first.suffix.should == " Sales rose nine percent in 2006 while the car"
136
+ rel.instances.first.offset.should == 425
137
+ rel.instances.first.length.should == 99
138
+ end
139
+ end
data/spec/helper.rb ADDED
@@ -0,0 +1,13 @@
1
+ require 'rubygems'
2
+ require 'spec'
3
+ require 'yaml'
4
+
5
+ require File.dirname(__FILE__) + '/../lib/calais'
6
+
7
+ FIXTURES_DIR = File.join File.dirname(__FILE__), %[fixtures]
8
+ SAMPLE_DOCUMENT = File.read(File.join(FIXTURES_DIR, %[bicycles_australia.xml]))
9
+ SAMPLE_RESPONSE = File.read(File.join(FIXTURES_DIR, %[bicycles_australia.response.rdf]))
10
+ SAMPLE_RESPONSE_WITH_NO_SCORE = File.read(File.join(FIXTURES_DIR, %[twitter_tweet_without_score.response.rdf]))
11
+ RESPONSE_WITH_EXCEPTION = File.read(File.join(FIXTURES_DIR, %[error.response.xml]))
12
+ RESPONSE_WITH_SOCIAL_TAGS = File.read(File.join(FIXTURES_DIR, %[rails_job.rdf]))
13
+ LICENSE_ID = YAML.load(File.read(File.join(FIXTURES_DIR, %[calais.yml])))['key']
metadata ADDED
@@ -0,0 +1,113 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: koda-calais
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 0
8
+ - 9
9
+ version: 0.0.9
10
+ platform: ruby
11
+ authors:
12
+ - Abhay Kumar
13
+ autorequire:
14
+ bindir: bin
15
+ cert_chain: []
16
+
17
+ date: 2009-09-18 00:00:00 -07:00
18
+ default_executable:
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: nokogiri
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - ">="
26
+ - !ruby/object:Gem::Version
27
+ segments:
28
+ - 1
29
+ - 3
30
+ - 3
31
+ version: 1.3.3
32
+ type: :runtime
33
+ version_requirements: *id001
34
+ - !ruby/object:Gem::Dependency
35
+ name: json
36
+ prerelease: false
37
+ requirement: &id002 !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ segments:
42
+ - 1
43
+ - 1
44
+ - 3
45
+ version: 1.1.3
46
+ type: :runtime
47
+ version_requirements: *id002
48
+ - !ruby/object:Gem::Dependency
49
+ name: curb
50
+ prerelease: false
51
+ requirement: &id003 !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ segments:
56
+ - 0
57
+ - 1
58
+ - 4
59
+ version: 0.1.4
60
+ type: :runtime
61
+ version_requirements: *id003
62
+ description: A Ruby interface to the Calais Web Service
63
+ email: info@opensynapse.net
64
+ executables: []
65
+
66
+ extensions: []
67
+
68
+ extra_rdoc_files:
69
+ - README.markdown
70
+ files:
71
+ - CHANGELOG.markdown
72
+ - MIT-LICENSE
73
+ - README.markdown
74
+ - Rakefile
75
+ - VERSION.yml
76
+ - lib/calais.rb
77
+ - lib/calais/client.rb
78
+ - lib/calais/error.rb
79
+ - lib/calais/response.rb
80
+ has_rdoc: true
81
+ homepage: http://github.com/abhay/calais
82
+ licenses: []
83
+
84
+ post_install_message:
85
+ rdoc_options:
86
+ - --charset=UTF-8
87
+ require_paths:
88
+ - lib
89
+ required_ruby_version: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - ">="
92
+ - !ruby/object:Gem::Version
93
+ segments:
94
+ - 0
95
+ version: "0"
96
+ required_rubygems_version: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - ">="
99
+ - !ruby/object:Gem::Version
100
+ segments:
101
+ - 0
102
+ version: "0"
103
+ requirements: []
104
+
105
+ rubyforge_project: calais
106
+ rubygems_version: 1.3.6
107
+ signing_key:
108
+ specification_version: 2
109
+ summary: A Ruby interface to the Calais Web Service
110
+ test_files:
111
+ - spec/calais/client_spec.rb
112
+ - spec/calais/response_spec.rb
113
+ - spec/helper.rb