koda-calais 0.0.9

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,33 @@
1
+ # Changes
2
+
3
+ ## 0.0.7
4
+ * verified 4.0 API
5
+ * moved gem packaging to `jeweler` and documentation to `yard`
6
+
7
+ ## 0.0.6
8
+ * fully implemented 3.1 API
9
+
10
+ ## 0.0.5
11
+ * fixed error where classes weren't being required in the proper order on Ubuntu (reported by Jon Moses)
12
+ * New things coming back from the API. Fixing in tests.
13
+
14
+ ## 0.0.4
15
+ * changed dependency from `hpricot` to `libxml`
16
+ * unicode fun
17
+ * cleanup all around
18
+
19
+ ## 0.0.3
20
+ * pluginized the library for Rails (thanks [pius](http://gitorious.org/projects/calais-au-rails))
21
+ * added helper methods name entity types from a response
22
+
23
+ ## 0.0.2
24
+ * cleanup in the specs
25
+ * cleaner parsing
26
+ * location of named entities
27
+ * more data in relationships
28
+ * moved Names and Relationships
29
+
30
+ ## 0.0.1
31
+ * Access to OpenCalais's Enlighten action
32
+ * Single method to process a document
33
+ * Get relationships and names from a document
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2008 Abhay Kumar info@opensynapse.net
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ 'Software'), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
17
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
18
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
19
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
20
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.markdown ADDED
@@ -0,0 +1,49 @@
1
+ # Calais #
2
+ A Ruby interface to the [Open Calais Web Service](http://opencalais.com)
3
+
4
+ ## Features ##
5
+ * Accepts documents in text/plain, text/xml and text/html format.
6
+ * Basic access to the Open Calais API's Enlighten action.
7
+ * Output is RDF representation of input document.
8
+ * Single function ability to extract names, entities and geographies from given text.
9
+
10
+ ## Synopsis ##
11
+
12
+ This is a very basic wrapper to the Open Calais API. It uses the POST endpoint and currently supports the Enlighten action. Here's a simple call:
13
+
14
+ Calais.enlighten(
15
+ :content => "The government of the United Kingdom has given corporations like fast food chain McDonald's the right to award high school qualifications to employees who complete a company training program."
16
+ :content_type => :text,
17
+ :license_id => 'your license id'
18
+ )
19
+
20
+ This is the easiest way to get the RDF-formated response from the OpenCalais service.
21
+
22
+ If you want to do something more fun like getting all sorts of fun information about a document, you can try this:
23
+
24
+ Calais.process_document(
25
+ :content => "The government of the United Kingdom has given corporations like fast food chain McDonald's the right to award high school qualifications to employees who complete a company training program.",
26
+ :content_type => :text,
27
+ :license_id => 'your license id'
28
+ )
29
+
30
+ This will return an object containing information extracted from the RDF response.
31
+
32
+ ## Requirements ##
33
+
34
+ * [Ruby 1.8.5 or better](http://ruby-lang.org)
35
+ * [nokogiri](http://nokogiri.rubyforge.org/nokogiri/), [libxml2](http://xmlsoft.org/), [libxslt](http://xmlsoft.org/xslt/)
36
+ * [curb](http://curb.rubyforge.org/), [libcurl](http://curl.haxx.se/)
37
+ * [json](http://json.rubyforge.org/)
38
+
39
+ ## Install ##
40
+
41
+ You can install the Calais gem via Rubygems (`gem install calais`) or by building from source.
42
+
43
+ ## Authors ##
44
+
45
+ * [Abhay Kumar](http://opensynapse.net)
46
+
47
+ ## Acknowledgements ##
48
+
49
+ * [Paul Legato](http://www.economaton.com/): Help all around with the new response processor and implementation of the 3.1 API.
data/Rakefile ADDED
@@ -0,0 +1,97 @@
1
+ # -*- ruby -*-
2
+
3
+ require 'rake'
4
+ require 'rake/clean'
5
+
6
+ require './lib/calais.rb'
7
+
8
+ begin
9
+ gem 'jeweler', '>= 1.0.1'
10
+ require 'jeweler'
11
+
12
+ Jeweler::Tasks.new do |s|
13
+ s.name = 'calais'
14
+ s.summary = 'A Ruby interface to the Calais Web Service'
15
+ s.email = 'info@opensynapse.net'
16
+ s.homepage = 'http://github.com/abhay/calais'
17
+ s.description = 'A Ruby interface to the Calais Web Service'
18
+ s.authors = ['Abhay Kumar']
19
+ s.files = FileList["[A-Z]*", "{bin,generators,lib,test}/**/*"]
20
+ s.rubyforge_project = 'calais'
21
+ s.add_dependency 'nokogiri', '>= 1.3.3'
22
+ s.add_dependency 'json', '>= 1.1.3'
23
+ s.add_dependency 'curb', '>= 0.1.4'
24
+ end
25
+ rescue LoadError
26
+ puts "Jeweler, or one of its dependencies, is not available. Please install it."
27
+ exit(1)
28
+ end
29
+
30
+ begin
31
+ require 'spec/rake/spectask'
32
+
33
+ desc "Run all specs"
34
+ Spec::Rake::SpecTask.new do |t|
35
+ t.spec_files = FileList["spec/**/*_spec.rb"].sort
36
+ t.spec_opts = ["--options", "spec/spec.opts"]
37
+ end
38
+
39
+ desc "Run all specs and get coverage statistics"
40
+ Spec::Rake::SpecTask.new('coverage') do |t|
41
+ t.spec_opts = ["--options", "spec/spec.opts"]
42
+ t.spec_files = FileList["spec/*_spec.rb"].sort
43
+ t.rcov_opts = ["--exclude", "spec", "--exclude", "gems"]
44
+ t.rcov = true
45
+ end
46
+
47
+ task :default => :spec
48
+ rescue LoadError
49
+ puts "RSpec, or one of its dependencies, is not available. Please install it."
50
+ exit(1)
51
+ end
52
+
53
+ begin
54
+ require 'yard'
55
+ require 'yard/rake/yardoc_task'
56
+
57
+ YARD::Rake::YardocTask.new do |t|
58
+ t.options = ["--verbose", "--markup=markdown", "--files=CHANGELOG.markdown,MIT-LICENSE"]
59
+ end
60
+
61
+ task :rdoc => :yardoc
62
+
63
+ CLOBBER.include 'doc'
64
+ CLOBBER.include '.yardoc'
65
+ rescue LoadError
66
+ puts "Yard, or one of its dependencies is not available. Please install it."
67
+ exit(1)
68
+ end
69
+
70
+ begin
71
+ require 'rake/contrib/sshpublisher'
72
+ namespace :rubyforge do
73
+
74
+ desc "Release gem and RDoc documentation to RubyForge"
75
+ task :release => ["rubyforge:release:gem", "rubyforge:release:docs"]
76
+
77
+ namespace :release do
78
+ desc "Publish RDoc to RubyForge."
79
+ task :docs => [:yardoc] do
80
+ config = YAML.load(
81
+ File.read(File.expand_path('~/.rubyforge/user-config.yml'))
82
+ )
83
+
84
+ host = "#{config['username']}@rubyforge.org"
85
+ remote_dir = "/var/www/gforge-projects/calais/"
86
+ local_dir = 'doc'
87
+
88
+ Rake::SshDirPublisher.new(host, remote_dir, local_dir).upload
89
+ end
90
+ end
91
+ end
92
+ rescue LoadError
93
+ puts "Rake SshDirPublisher is unavailable or your rubyforge environment is not configured."
94
+ exit(1)
95
+ end
96
+
97
+ # vim: syntax=Ruby
data/VERSION.yml ADDED
@@ -0,0 +1,4 @@
1
+ ---
2
+ :minor: 0
3
+ :patch: 9
4
+ :major: 0
data/lib/calais.rb ADDED
@@ -0,0 +1,56 @@
1
+ require 'digest/sha1'
2
+ require 'net/http'
3
+ require 'cgi'
4
+ require 'iconv'
5
+ require 'set'
6
+
7
+ require 'rubygems'
8
+ require 'nokogiri'
9
+ require 'json'
10
+ require 'curb'
11
+
12
+ $KCODE = "UTF8"
13
+ require 'jcode'
14
+
15
+ $:.unshift File.expand_path(File.dirname(__FILE__)) + '/calais'
16
+
17
+ require 'client'
18
+ require 'response'
19
+ require 'error'
20
+
21
+ module Calais
22
+ REST_ENDPOINT = "http://api.opencalais.com/enlighten/rest/"
23
+ BETA_REST_ENDPOINT = "http://beta.opencalais.com/enlighten/rest/"
24
+
25
+ AVAILABLE_CONTENT_TYPES = {
26
+ :xml => 'text/xml',
27
+ :html => 'text/html',
28
+ :htmlraw => 'text/htmlraw',
29
+ :raw => 'text/raw'
30
+ }
31
+
32
+ AVAILABLE_OUTPUT_FORMATS = {
33
+ :rdf => 'xml/rdf',
34
+ :simple => 'text/simple',
35
+ :microformats => 'text/microformats',
36
+ :json => 'application/json'
37
+ }
38
+
39
+ KNOWN_ENABLES = ['GenericRelations', 'SocialTags']
40
+ KNOWN_DISCARDS = ['er/Company', 'er/Geo', 'er/Product']
41
+
42
+ MAX_RETRIES = 5
43
+ HTTP_TIMEOUT = 60
44
+ MIN_CONTENT_SIZE = 1
45
+ MAX_CONTENT_SIZE = 100_000
46
+
47
+ class << self
48
+ def enlighten(*args, &block); Client.new(*args, &block).enlighten; end
49
+
50
+ def process_document(*args, &block)
51
+ client = Client.new(*args, &block)
52
+ client.output_format = :rdf
53
+ Response.new(client.enlighten)
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,113 @@
1
+ module Calais
2
+ class Client
3
+ # base attributes of the call
4
+ attr_accessor :content
5
+ attr_accessor :license_id
6
+
7
+ # processing directives
8
+ attr_accessor :content_type, :output_format, :reltag_base_url, :calculate_relevance, :omit_outputting_original_text
9
+ attr_accessor :store_rdf, :metadata_enables, :metadata_discards
10
+
11
+ # user directives
12
+ attr_accessor :allow_distribution, :allow_search, :external_id, :submitter
13
+
14
+ attr_accessor :external_metadata
15
+
16
+ attr_accessor :use_beta
17
+
18
+ def initialize(options={}, &block)
19
+ options.each {|k,v| send("#{k}=", v)}
20
+ yield(self) if block_given?
21
+ end
22
+
23
+ def enlighten
24
+ post_args = {
25
+ "licenseID" => @license_id,
26
+ "content" => Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "#{@content} ").first[0..-2],
27
+ "paramsXML" => params_xml
28
+ }
29
+
30
+ @client ||= Curl::Easy.new
31
+ @client.url = @use_beta ? BETA_REST_ENDPOINT : REST_ENDPOINT
32
+ @client.timeout = HTTP_TIMEOUT
33
+
34
+ post_fields = post_args.map {|k,v| Curl::PostField.content(k, v) }
35
+
36
+ do_request(post_fields)
37
+ end
38
+
39
+ def params_xml
40
+ check_params
41
+ document = Nokogiri::XML::Document.new
42
+
43
+ params_node = Nokogiri::XML::Node.new('c:params', document)
44
+ params_node['xmlns:c'] = 'http://s.opencalais.com/1/pred/'
45
+ params_node['xmlns:rdf'] = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
46
+
47
+ processing_node = Nokogiri::XML::Node.new('c:processingDirectives', document)
48
+ processing_node['c:contentType'] = AVAILABLE_CONTENT_TYPES[@content_type] if @content_type
49
+ processing_node['c:outputFormat'] = AVAILABLE_OUTPUT_FORMATS[@output_format] if @output_format
50
+ processing_node['c:calculateRelevanceScore'] = 'false' if @calculate_relevance == false
51
+ processing_node['c:reltagBaseURL'] = @reltag_base_url.to_s if @reltag_base_url
52
+
53
+ processing_node['c:enableMetadataType'] = @metadata_enables.join(',') unless @metadata_enables.empty?
54
+ processing_node['c:docRDFaccessible'] = @store_rdf if @store_rdf
55
+ processing_node['c:discardMetadata'] = @metadata_discards.join(';') unless @metadata_discards.empty?
56
+ processing_node['c:omitOutputtingOriginalText'] = 'true' if @omit_outputting_original_text
57
+
58
+ user_node = Nokogiri::XML::Node.new('c:userDirectives', document)
59
+ user_node['c:allowDistribution'] = @allow_distribution.to_s unless @allow_distribution.nil?
60
+ user_node['c:allowSearch'] = @allow_search.to_s unless @allow_search.nil?
61
+ user_node['c:externalID'] = @external_id.to_s if @external_id
62
+ user_node['c:submitter'] = @submitter.to_s if @submitter
63
+
64
+ params_node << processing_node
65
+ params_node << user_node
66
+
67
+ if @external_metadata
68
+ external_node = Nokogiri::XML::Node.new('c:externalMetadata', document)
69
+ external_node << @external_metadata
70
+ params_node << external_node
71
+ end
72
+
73
+ params_node.to_xml(:indent => 2)
74
+ end
75
+
76
+ private
77
+ def check_params
78
+ raise 'missing content' if @content.nil? || @content.empty?
79
+
80
+ content_length = @content.length
81
+ raise 'content is too small' if content_length < MIN_CONTENT_SIZE
82
+ raise 'content is too large' if content_length > MAX_CONTENT_SIZE
83
+
84
+ raise 'missing license id' if @license_id.nil? || @license_id.empty?
85
+
86
+ raise 'unknown content type' unless AVAILABLE_CONTENT_TYPES.keys.include?(@content_type) if @content_type
87
+ raise 'unknown output format' unless AVAILABLE_OUTPUT_FORMATS.keys.include?(@output_format) if @output_format
88
+
89
+ %w[calculate_relevance store_rdf allow_distribution allow_search].each do |variable|
90
+ value = self.send(variable)
91
+ unless NilClass === value || TrueClass === value || FalseClass === value
92
+ raise "expected a boolean value for #{variable} but got #{value}"
93
+ end
94
+ end
95
+
96
+ @metadata_enables ||= []
97
+ unknown_enables = Set.new(@metadata_enables) - KNOWN_ENABLES
98
+ raise "unknown metadata enables: #{unknown_enables.to_a.inspect}" unless unknown_enables.empty?
99
+
100
+ @metadata_discards ||= []
101
+ unknown_discards = Set.new(@metadata_discards) - KNOWN_DISCARDS
102
+ raise "unknown metadata discards: #{unknown_discards.to_a.inspect}" unless unknown_discards.empty?
103
+ end
104
+
105
+ def do_request(post_fields)
106
+ unless @client.http_post(post_fields)
107
+ raise 'unable to post to api endpoint'
108
+ end
109
+
110
+ @client.body_str
111
+ end
112
+ end
113
+ end
@@ -0,0 +1,3 @@
1
+ class Calais::Error < StandardError
2
+
3
+ end
@@ -0,0 +1,218 @@
1
+ module Calais
2
+ class Response
3
+ MATCHERS = {
4
+ :docinfo => 'DocInfo',
5
+ :docinfometa => 'DocInfoMeta',
6
+ :defaultlangid => 'DefaultLangId',
7
+ :doccat => 'DocCat',
8
+ :entities => 'type/em/e',
9
+ :relations => 'type/em/r',
10
+ :geographies => 'type/er',
11
+ :instances => 'type/sys/InstanceInfo',
12
+ :relevances => 'type/sys/RelevanceInfo',
13
+ :socialtags => 'SocialTag'
14
+ }
15
+
16
+ attr_accessor :submitter_code, :signature, :language, :submission_date, :request_id, :doc_title, :doc_date
17
+ attr_accessor :hashes, :entities, :relations, :geographies, :categories, :social_tags
18
+
19
+ def initialize(rdf_string)
20
+ @raw_response = rdf_string
21
+
22
+ @hashes = []
23
+ @entities = []
24
+ @relations = []
25
+ @geographies = []
26
+ @relevances = {} # key = String hash, val = Float relevance
27
+ @categories = []
28
+
29
+ extract_data
30
+ end
31
+
32
+ class Entity
33
+ attr_accessor :calais_hash, :type, :attributes, :relevance, :instances
34
+ end
35
+
36
+ class Relation
37
+ attr_accessor :calais_hash, :type, :attributes, :instances
38
+ end
39
+
40
+ class Geography
41
+ attr_accessor :name, :calais_hash, :attributes
42
+ end
43
+
44
+ class SocialTag
45
+ attr_accessor :name, :importance, :attributes
46
+ end
47
+
48
+ class Category
49
+ attr_accessor :name, :score
50
+ end
51
+
52
+ class Instance
53
+ attr_accessor :prefix, :exact, :suffix, :offset, :length
54
+
55
+ # Makes a new Instance object from an appropriate Nokogiri::XML::Node.
56
+ def self.from_node(node)
57
+ instance = self.new
58
+ instance.prefix = node.xpath("c:prefix[1]").first.content
59
+ instance.exact = node.xpath("c:exact[1]").first.content
60
+ instance.suffix = node.xpath("c:suffix[1]").first.content
61
+ instance.offset = node.xpath("c:offset[1]").first.content.to_i
62
+ instance.length = node.xpath("c:length[1]").first.content.to_i
63
+
64
+ instance
65
+ end
66
+ end
67
+
68
+ class CalaisHash
69
+ attr_accessor :value
70
+
71
+ def self.find_or_create(hash, hashes)
72
+ if !selected = hashes.select {|h| h.value == hash }.first
73
+ selected = self.new
74
+ selected.value = hash
75
+ hashes << selected
76
+ end
77
+
78
+ selected
79
+ end
80
+ end
81
+
82
+ private
83
+ def extract_data
84
+ doc = Nokogiri::XML(@raw_response)
85
+
86
+ if doc.root.xpath("/Error[1]").first
87
+ raise Calais::Error, doc.root.xpath("/Error/Exception").first.content
88
+ end
89
+
90
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:docinfometa]}')]/..").each do |node|
91
+ @language = node['language']
92
+ @submission_date = DateTime.parse node['submissionDate']
93
+
94
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
95
+
96
+ @signature = attributes.delete('signature')
97
+ @submitter_code = attributes.delete('submitterCode')
98
+
99
+ node.remove
100
+ end
101
+
102
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:docinfo]}')]/..").each do |node|
103
+ @request_id = node['calaisRequestID']
104
+
105
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
106
+
107
+ @doc_title = attributes.delete('docTitle')
108
+ @doc_date = Date.parse(attributes.delete('docDate'))
109
+
110
+ node.remove
111
+ end
112
+
113
+ @categories = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:doccat]}')]/..").map do |node|
114
+ category = Category.new
115
+ category.name = node.xpath("c:categoryName[1]").first.content
116
+ score = node.xpath("c:score[1]").first
117
+ category.score = score.content.to_f unless score.nil?
118
+
119
+ node.remove
120
+ category
121
+ end
122
+
123
+ @relevances = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:relevances]}')]/..").inject({}) do |acc, node|
124
+ subject_hash = node.xpath("c:subject[1]").first[:resource].split('/')[-1]
125
+ acc[subject_hash] = node.xpath("c:relevance[1]").first.content.to_f
126
+
127
+ node.remove
128
+ acc
129
+ end
130
+
131
+ @entities = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:entities]}')]/..").map do |node|
132
+ extracted_hash = node['about'].split('/')[-1] rescue nil
133
+
134
+ entity = Entity.new
135
+ entity.calais_hash = CalaisHash.find_or_create(extracted_hash, @hashes)
136
+ entity.type = extract_type(node)
137
+ entity.attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
138
+
139
+ entity.relevance = @relevances[extracted_hash]
140
+ entity.instances = extract_instances(doc, extracted_hash)
141
+
142
+ node.remove
143
+ entity
144
+ end
145
+
146
+ @relations = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:relations]}')]/..").map do |node|
147
+ extracted_hash = node['about'].split('/')[-1] rescue nil
148
+
149
+ relation = Relation.new
150
+ relation.calais_hash = CalaisHash.find_or_create(extracted_hash, @hashes)
151
+ relation.type = extract_type(node)
152
+ relation.attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
153
+ relation.instances = extract_instances(doc, extracted_hash)
154
+
155
+ node.remove
156
+ relation
157
+ end
158
+
159
+ @geographies = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:geographies]}')]/..").map do |node|
160
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
161
+
162
+ geography = Geography.new
163
+ geography.name = attributes.delete('name')
164
+ geography.calais_hash = attributes.delete('subject')
165
+ geography.attributes = attributes
166
+
167
+ node.remove
168
+ geography
169
+ end
170
+
171
+ @social_tags = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:socialtags]}')]/..").map do |node|
172
+ attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
173
+
174
+ social_tag = SocialTag.new
175
+ social_tag.name = attributes.delete('name')
176
+ social_tag.importance = attributes.delete('importance')
177
+ social_tag.attributes = attributes
178
+
179
+ node.remove
180
+ social_tag
181
+ end
182
+
183
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:defaultlangid]}')]/..").each { |node| node.remove }
184
+ doc.root.xpath("./*").each { |node| node.remove }
185
+
186
+ return
187
+ end
188
+
189
+ def extract_instances(doc, hash)
190
+ doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:instances]}')]/..").select do |instance_node|
191
+ instance_node.xpath("c:subject[1]").first[:resource].split("/")[-1] == hash
192
+ end.map do |instance_node|
193
+ instance = Instance.from_node(instance_node)
194
+ instance_node.remove
195
+
196
+ instance
197
+ end
198
+ end
199
+
200
+ def extract_type(node)
201
+ node.xpath("*[name()='rdf:type']")[0]['resource'].split('/')[-1]
202
+ rescue
203
+ nil
204
+ end
205
+
206
+ def extract_attributes(nodes)
207
+ nodes.inject({}) do |hsh, node|
208
+ value = if node['resource']
209
+ extracted_hash = node['resource'].split('/')[-1] rescue nil
210
+ CalaisHash.find_or_create(extracted_hash, @hashes)
211
+ else
212
+ node.content
213
+ end
214
+ hsh.merge(node.name => value)
215
+ end
216
+ end
217
+ end
218
+ end
@@ -0,0 +1,79 @@
1
+ require File.join(File.dirname(__FILE__), %w[.. helper])
2
+
3
+ describe Calais::Client, :new do
4
+ it 'accepts arguments as a hash' do
5
+ client = nil
6
+
7
+ lambda { client = Calais::Client.new(:content => SAMPLE_DOCUMENT, :license_id => LICENSE_ID) }.should_not raise_error
8
+
9
+ client.license_id.should == LICENSE_ID
10
+ client.content.should == SAMPLE_DOCUMENT
11
+ end
12
+
13
+ it 'accepts arguments as a block' do
14
+ client = nil
15
+
16
+ lambda {
17
+ client = Calais::Client.new do |c|
18
+ c.content = SAMPLE_DOCUMENT
19
+ c.license_id = LICENSE_ID
20
+ end
21
+ }.should_not raise_error
22
+
23
+ client.license_id.should == LICENSE_ID
24
+ client.content.should == SAMPLE_DOCUMENT
25
+ end
26
+
27
+ it 'should not accept unknown attributes' do
28
+ lambda { Calais::Client.new(:monkey => 'monkey', :license_id => LICENSE_ID) }.should raise_error(NoMethodError)
29
+ end
30
+ end
31
+
32
+ describe Calais::Client, :params_xml do
33
+ it 'returns an xml encoded string' do
34
+ client = Calais::Client.new(:content => SAMPLE_DOCUMENT, :license_id => LICENSE_ID)
35
+ client.params_xml.should == %[<c:params xmlns:c=\"http://s.opencalais.com/1/pred/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\n <c:processingDirectives/>\n <c:userDirectives/>\n</c:params>]
36
+
37
+ client.content_type = :xml
38
+ client.output_format = :json
39
+ client.reltag_base_url = 'http://opencalais.com'
40
+ client.calculate_relevance = true
41
+ client.metadata_enables = Calais::KNOWN_ENABLES
42
+ client.metadata_discards = Calais::KNOWN_DISCARDS
43
+ client.allow_distribution = true
44
+ client.allow_search = true
45
+ client.external_id = Digest::SHA1.hexdigest(client.content)
46
+ client.submitter = 'calais.rb'
47
+
48
+ client.params_xml.should == %[<c:params xmlns:c="http://s.opencalais.com/1/pred/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">\n <c:processingDirectives c:contentType="text/xml" c:outputFormat="application/json" c:reltagBaseURL="http://opencalais.com" c:enableMetadataType="GenericRelations,SocialTags" c:discardMetadata="er/Company;er/Geo;er/Product"/>\n <c:userDirectives c:allowDistribution="true" c:allowSearch="true" c:externalID="1a008b91e7d21962e132bc1d6cb252532116a606" c:submitter="calais.rb"/>\n</c:params>]
49
+ end
50
+ end
51
+
52
+ describe Calais::Client, :enlighten do
53
+ before do
54
+ @client = Calais::Client.new do |c|
55
+ c.content = SAMPLE_DOCUMENT
56
+ c.license_id = LICENSE_ID
57
+ c.content_type = :xml
58
+ c.output_format = :json
59
+ c.calculate_relevance = true
60
+ c.metadata_enables = Calais::KNOWN_ENABLES
61
+ c.allow_distribution = true
62
+ c.allow_search = true
63
+ end
64
+ end
65
+
66
+ it 'provides access to the enlighten command on the generic rest endpoint' do
67
+ @client.should_receive(:do_request).with(anything).and_return(SAMPLE_RESPONSE)
68
+ @client.enlighten
69
+ @client.instance_variable_get(:@client).url.should == Calais::REST_ENDPOINT
70
+ end
71
+
72
+ it 'provides access to the enlighten command on the beta rest endpoint' do
73
+ @client.use_beta = true
74
+
75
+ @client.should_receive(:do_request).with(anything).and_return(SAMPLE_RESPONSE)
76
+ @client.enlighten
77
+ @client.instance_variable_get(:@client).url.should == Calais::BETA_REST_ENDPOINT
78
+ end
79
+ end
@@ -0,0 +1,139 @@
1
+ require File.join(File.dirname(__FILE__), %w[.. helper])
2
+
3
+ describe Calais::Response, :new do
4
+ it 'accepts an rdf string to generate the response object' do
5
+ lambda { Calais::Response.new(SAMPLE_RESPONSE) }.should_not raise_error
6
+ end
7
+ end
8
+
9
+ describe Calais::Response, :new do
10
+ it "should return error message in runtime error" do
11
+ lambda {
12
+ @response = Calais::Response.new(RESPONSE_WITH_EXCEPTION)
13
+ }.should raise_error(Calais::Error, "My Error Message")
14
+ end
15
+ end
16
+
17
+ describe Calais::Response, :new do
18
+ before :all do
19
+ @response = Calais::Response.new(RESPONSE_WITH_SOCIAL_TAGS)
20
+ end
21
+
22
+ it 'should extract social tags' do
23
+ social_tags = @response.social_tags
24
+ social_tags.map { |e| e.name }.sort.uniq.should == ["Agile software development", "Behavior Driven Development", "Code refactoring", "Computing", "Extreme Programming", "RSpec", "Ruby on Rails", "Selenium", "Software development", "Software engineering", "Web 2.0"]
25
+ end
26
+ end
27
+
28
+ describe Calais::Response, :new do
29
+ before :all do
30
+ @response = Calais::Response.new(SAMPLE_RESPONSE)
31
+ end
32
+
33
+ it 'should extract document information' do
34
+ @response.language.should == 'English'
35
+ @response.submission_date.should be_a_kind_of(DateTime)
36
+ @response.signature.should be_a_kind_of(String)
37
+ @response.submitter_code.should be_a_kind_of(String)
38
+ @response.request_id.should be_a_kind_of(String)
39
+ @response.doc_title.should == 'Record number of bicycles sold in Australia in 2006'
40
+ @response.doc_date.should be_a_kind_of(Date)
41
+ end
42
+
43
+ it 'should extract entities' do
44
+ entities = @response.entities
45
+ entities.map { |e| e.type }.sort.uniq.should == %w[City Continent Country IndustryTerm Organization Person Position ProvinceOrState]
46
+ end
47
+
48
+ it 'should extract relations' do
49
+ relations = @response.relations
50
+ relations.map { |e| e.type }.sort.uniq.should == %w[GenericRelations PersonAttributes PersonCareer Quotation]
51
+ end
52
+
53
+ it 'should extract geographies' do
54
+ geographies = @response.geographies
55
+ geographies.map { |e| e.name }.sort.uniq.should == %w[Australia Hobart,Tasmania,Australia Tasmania,Australia]
56
+ end
57
+
58
+ it 'should extract relevances' do
59
+ @response.instance_variable_get(:@relevances).should be_a_kind_of(Hash)
60
+ end
61
+
62
+ it 'should assign a floating-point relevance to each entity' do
63
+ @response.entities.each {|e| e.relevance.should be_a_kind_of(Float) }
64
+ end
65
+
66
+ it 'should find the correct document categories returned by OpenCalais' do
67
+ @response.categories.map {|c| c.name }.sort.should == %w[Business_Finance Technology_Internet]
68
+ end
69
+
70
+ it 'should find the correct document category scores returned by OpenCalais' do
71
+ @response.categories.map {|c| c.score.should be_a_kind_of(Float) }
72
+ end
73
+
74
+ it "should not raise an error if no score is given by OpenCalais" do
75
+ lambda {Calais::Response.new(SAMPLE_RESPONSE_WITH_NO_SCORE)}.should_not raise_error
76
+ end
77
+
78
+ it "should not raise an error if no score is given by OpenCalais" do
79
+ response = Calais::Response.new(SAMPLE_RESPONSE_WITH_NO_SCORE)
80
+ response.categories.map {|c| c.score }.should == [nil]
81
+ end
82
+
83
+ it 'should find instances for each entity' do
84
+ @response.entities.each {|e|
85
+ e.instances.size.should > 0
86
+ }
87
+ end
88
+
89
+
90
+ it 'should find instances for each relation' do
91
+ @response.relations.each {|r|
92
+ r.instances.size.should > 0
93
+ }
94
+ end
95
+
96
+ it 'should find the correct instances for each entity' do
97
+ ## This currently tests only for the "Australia" entity's
98
+ ## instances. A more thorough test that tests for the instances
99
+ ## of each of the many entities in the sample doc is desirable in
100
+ ## the future.
101
+
102
+ australia = @response.entities.select {|e| e.attributes["name"] == "Australia" }.first
103
+ australia.instances.size.should == 3
104
+ instances = australia.instances.sort{|a,b| a.offset <=> b.offset }
105
+
106
+ instances[0].prefix.should == "number of bicycles sold in "
107
+ instances[0].exact.should == "Australia"
108
+ instances[0].suffix.should == " in 2006<\/title>\n<date>January 4,"
109
+ instances[0].offset.should == 67
110
+ instances[0].length.should == 9
111
+
112
+ instances[1].prefix.should == "4, 2007<\/date>\n<body>\nBicycle sales in "
113
+ instances[1].exact.should == "Australia"
114
+ instances[1].suffix.should == " have recorded record sales of 1,273,781 units"
115
+ instances[1].offset.should == 146
116
+ instances[1].length.should == 9
117
+
118
+ instances[2].prefix.should == " the traditional company car,\" he said.\n\n\"Some of "
119
+ instances[2].exact.should == "Australia"
120
+ instances[2].suffix.should == "'s biggest corporations now have bicycle fleets,"
121
+ instances[2].offset.should == 952
122
+ instances[2].length.should == 9
123
+ end
124
+
125
+ it 'should find the correct instances for each relation' do
126
+ ## This currently tests only for one relation's instances. A more
127
+ ## thorough test that tests for the instances of each of the many other
128
+ ## relations in the sample doc is desirable in the future.
129
+
130
+ rel = @response.relations.select {|e| e.calais_hash.value == "8f3936d9-cf6b-37fc-ae0d-a145959ae3b5" }.first
131
+ rel.instances.size.should == 1
132
+
133
+ rel.instances.first.prefix.should == " manufacturers.\n\nThe Cycling Promotion Fund (CPF) "
134
+ rel.instances.first.exact.should == "spokesman Ian Christie said Australians were increasingly using bicycles as an alternative to cars."
135
+ rel.instances.first.suffix.should == " Sales rose nine percent in 2006 while the car"
136
+ rel.instances.first.offset.should == 425
137
+ rel.instances.first.length.should == 99
138
+ end
139
+ end
data/spec/helper.rb ADDED
@@ -0,0 +1,13 @@
1
+ require 'rubygems'
2
+ require 'spec'
3
+ require 'yaml'
4
+
5
+ require File.dirname(__FILE__) + '/../lib/calais'
6
+
7
+ FIXTURES_DIR = File.join File.dirname(__FILE__), %[fixtures]
8
+ SAMPLE_DOCUMENT = File.read(File.join(FIXTURES_DIR, %[bicycles_australia.xml]))
9
+ SAMPLE_RESPONSE = File.read(File.join(FIXTURES_DIR, %[bicycles_australia.response.rdf]))
10
+ SAMPLE_RESPONSE_WITH_NO_SCORE = File.read(File.join(FIXTURES_DIR, %[twitter_tweet_without_score.response.rdf]))
11
+ RESPONSE_WITH_EXCEPTION = File.read(File.join(FIXTURES_DIR, %[error.response.xml]))
12
+ RESPONSE_WITH_SOCIAL_TAGS = File.read(File.join(FIXTURES_DIR, %[rails_job.rdf]))
13
+ LICENSE_ID = YAML.load(File.read(File.join(FIXTURES_DIR, %[calais.yml])))['key']
metadata ADDED
@@ -0,0 +1,113 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: koda-calais
3
+ version: !ruby/object:Gem::Version
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 0
8
+ - 9
9
+ version: 0.0.9
10
+ platform: ruby
11
+ authors:
12
+ - Abhay Kumar
13
+ autorequire:
14
+ bindir: bin
15
+ cert_chain: []
16
+
17
+ date: 2009-09-18 00:00:00 -07:00
18
+ default_executable:
19
+ dependencies:
20
+ - !ruby/object:Gem::Dependency
21
+ name: nokogiri
22
+ prerelease: false
23
+ requirement: &id001 !ruby/object:Gem::Requirement
24
+ requirements:
25
+ - - ">="
26
+ - !ruby/object:Gem::Version
27
+ segments:
28
+ - 1
29
+ - 3
30
+ - 3
31
+ version: 1.3.3
32
+ type: :runtime
33
+ version_requirements: *id001
34
+ - !ruby/object:Gem::Dependency
35
+ name: json
36
+ prerelease: false
37
+ requirement: &id002 !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ segments:
42
+ - 1
43
+ - 1
44
+ - 3
45
+ version: 1.1.3
46
+ type: :runtime
47
+ version_requirements: *id002
48
+ - !ruby/object:Gem::Dependency
49
+ name: curb
50
+ prerelease: false
51
+ requirement: &id003 !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ segments:
56
+ - 0
57
+ - 1
58
+ - 4
59
+ version: 0.1.4
60
+ type: :runtime
61
+ version_requirements: *id003
62
+ description: A Ruby interface to the Calais Web Service
63
+ email: info@opensynapse.net
64
+ executables: []
65
+
66
+ extensions: []
67
+
68
+ extra_rdoc_files:
69
+ - README.markdown
70
+ files:
71
+ - CHANGELOG.markdown
72
+ - MIT-LICENSE
73
+ - README.markdown
74
+ - Rakefile
75
+ - VERSION.yml
76
+ - lib/calais.rb
77
+ - lib/calais/client.rb
78
+ - lib/calais/error.rb
79
+ - lib/calais/response.rb
80
+ has_rdoc: true
81
+ homepage: http://github.com/abhay/calais
82
+ licenses: []
83
+
84
+ post_install_message:
85
+ rdoc_options:
86
+ - --charset=UTF-8
87
+ require_paths:
88
+ - lib
89
+ required_ruby_version: !ruby/object:Gem::Requirement
90
+ requirements:
91
+ - - ">="
92
+ - !ruby/object:Gem::Version
93
+ segments:
94
+ - 0
95
+ version: "0"
96
+ required_rubygems_version: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - ">="
99
+ - !ruby/object:Gem::Version
100
+ segments:
101
+ - 0
102
+ version: "0"
103
+ requirements: []
104
+
105
+ rubyforge_project: calais
106
+ rubygems_version: 1.3.6
107
+ signing_key:
108
+ specification_version: 2
109
+ summary: A Ruby interface to the Calais Web Service
110
+ test_files:
111
+ - spec/calais/client_spec.rb
112
+ - spec/calais/response_spec.rb
113
+ - spec/helper.rb