jm-calais 0.0.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/CHANGELOG.markdown +63 -0
- data/Gemfile +3 -0
- data/MIT-LICENSE +20 -0
- data/README.markdown +55 -0
- data/Rakefile +36 -0
- data/lib/calais/client.rb +115 -0
- data/lib/calais/error.rb +3 -0
- data/lib/calais/response.rb +220 -0
- data/lib/calais/version.rb +3 -0
- data/lib/calais.rb +59 -0
- data/spec/calais/client_spec.rb +79 -0
- data/spec/calais/response_spec.rb +149 -0
- data/spec/fixtures/bicycles_australia.response.json +538 -0
- data/spec/fixtures/bicycles_australia.response.rdf +836 -0
- data/spec/fixtures/bicycles_australia.xml +18 -0
- data/spec/fixtures/calais.yml.sample +1 -0
- data/spec/fixtures/error.response.xml +1 -0
- data/spec/fixtures/slovenia_euro.xml +14 -0
- data/spec/fixtures/twitter_tweet_without_score.response.rdf +96 -0
- data/spec/helper.rb +16 -0
- metadata +98 -0
data/CHANGELOG.markdown
ADDED
@@ -0,0 +1,63 @@
|
|
1
|
+
# Changes
|
2
|
+
|
3
|
+
## 0.0.13
|
4
|
+
|
5
|
+
* load path fix
|
6
|
+
|
7
|
+
## 0.0.12
|
8
|
+
|
9
|
+
* added relevances to Geographies
|
10
|
+
* improved doc
|
11
|
+
* removed jeweler dependency and simplified Rakefile
|
12
|
+
* bumped rspec requirement
|
13
|
+
|
14
|
+
## 0.0.11
|
15
|
+
|
16
|
+
* simple fix for some rubies not liking DateTime.parse without including date
|
17
|
+
* tests for SocialTags
|
18
|
+
* typo fix: SocailTag != SocialTag
|
19
|
+
|
20
|
+
## 0.0.10
|
21
|
+
|
22
|
+
* community patch to expose SocialTags
|
23
|
+
|
24
|
+
## 0.0.9
|
25
|
+
|
26
|
+
* updates related to API changes
|
27
|
+
* community patches to support bundler, support ruby 1.9
|
28
|
+
|
29
|
+
## 0.0.8
|
30
|
+
|
31
|
+
* community patches to use nokogiri
|
32
|
+
|
33
|
+
## 0.0.7
|
34
|
+
* verified 4.0 API
|
35
|
+
* moved gem packaging to `jeweler` and documentation to `yard`
|
36
|
+
|
37
|
+
## 0.0.6
|
38
|
+
* fully implemented 3.1 API
|
39
|
+
|
40
|
+
## 0.0.5
|
41
|
+
* fixed error where classes weren't being required in the proper order on Ubuntu (reported by Jon Moses)
|
42
|
+
* New things coming back from the API. Fixing in tests.
|
43
|
+
|
44
|
+
## 0.0.4
|
45
|
+
* changed dependency from `hpricot` to `libxml`
|
46
|
+
* unicode fun
|
47
|
+
* cleanup all around
|
48
|
+
|
49
|
+
## 0.0.3
|
50
|
+
* pluginized the library for Rails (thanks [pius](http://gitorious.org/projects/calais-au-rails))
|
51
|
+
* added helper methods name entity types from a response
|
52
|
+
|
53
|
+
## 0.0.2
|
54
|
+
* cleanup in the specs
|
55
|
+
* cleaner parsing
|
56
|
+
* location of named entities
|
57
|
+
* more data in relationships
|
58
|
+
* moved Names and Relationships
|
59
|
+
|
60
|
+
## 0.0.1
|
61
|
+
* Access to OpenCalais's Enlighten action
|
62
|
+
* Single method to process a document
|
63
|
+
* Get relationships and names from a document
|
data/Gemfile
ADDED
data/MIT-LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2008 Abhay Kumar info@opensynapse.net
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
'Software'), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
17
|
+
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
18
|
+
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
|
19
|
+
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
|
20
|
+
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.markdown
ADDED
@@ -0,0 +1,55 @@
|
|
1
|
+
# Calais #
|
2
|
+
A Ruby interface to the [Open Calais Web Service](http://opencalais.com)
|
3
|
+
|
4
|
+
## About this Fork ##
|
5
|
+
Forked from https://github.com/abhay/calais version ~> 0.0.13
|
6
|
+
to fix issues caused by the depreciation of iconv in ruby > 1.9.3
|
7
|
+
|
8
|
+
## Features ##
|
9
|
+
* Accepts documents in text/plain, text/xml and text/html format.
|
10
|
+
* Basic access to the Open Calais API's Enlighten action.
|
11
|
+
* Output is RDF representation of input document.
|
12
|
+
* Single function ability to extract names, entities and geographies from given text.
|
13
|
+
|
14
|
+
## Synopsis ##
|
15
|
+
|
16
|
+
This is a very basic wrapper to the Open Calais API. It uses the POST endpoint and currently supports the Enlighten action. Here's a simple call:
|
17
|
+
|
18
|
+
Calais.enlighten(
|
19
|
+
:content => "The government of the United Kingdom has given corporations like fast food chain McDonald's the right to award high school qualifications to employees who complete a company training program.",
|
20
|
+
:content_type => :raw,
|
21
|
+
:license_id => 'your license id'
|
22
|
+
)
|
23
|
+
|
24
|
+
This is the easiest way to get the RDF-formated response from the OpenCalais service.
|
25
|
+
|
26
|
+
If you want to do something more fun like getting all sorts of fun information about a document, you can try this:
|
27
|
+
|
28
|
+
Calais.process_document(
|
29
|
+
:content => "The government of the United Kingdom has given corporations like fast food chain McDonald's the right to award high school qualifications to employees who complete a company training program.",
|
30
|
+
:content_type => :raw,
|
31
|
+
:license_id => 'your license id'
|
32
|
+
)
|
33
|
+
|
34
|
+
This will return an object containing information extracted from the RDF response.
|
35
|
+
|
36
|
+
## Requirements ##
|
37
|
+
|
38
|
+
* [Ruby 1.8.5 or better](http://ruby-lang.org)
|
39
|
+
* [nokogiri](http://nokogiri.rubyforge.org/nokogiri/), [libxml2](http://xmlsoft.org/), [libxslt](http://xmlsoft.org/xslt/)
|
40
|
+
* [curb](http://curb.rubyforge.org/), [libcurl](http://curl.haxx.se/)
|
41
|
+
* [json](http://json.rubyforge.org/)
|
42
|
+
|
43
|
+
## Install ##
|
44
|
+
|
45
|
+
You can install the Calais gem via Rubygems (`gem install calais`) or by building from source.
|
46
|
+
|
47
|
+
## Authors ##
|
48
|
+
|
49
|
+
* [Abhay Kumar](http://opensynapse.net)
|
50
|
+
|
51
|
+
## Acknowledgements ##
|
52
|
+
|
53
|
+
* [Paul Legato](http://www.economaton.com/): Help all around with the new response processor and implementation of the 3.1 API.
|
54
|
+
* [Ryan Ong](http://www.ryanong.net/)
|
55
|
+
* [Juan Antonio Chavez](https://github.com/TheNaoX): Geographies relevance
|
data/Rakefile
ADDED
@@ -0,0 +1,36 @@
|
|
1
|
+
# -*- ruby -*-
|
2
|
+
|
3
|
+
require 'rake'
|
4
|
+
require 'rake/clean'
|
5
|
+
|
6
|
+
require './lib/calais.rb'
|
7
|
+
|
8
|
+
begin
|
9
|
+
require 'rspec/core/rake_task'
|
10
|
+
|
11
|
+
RSpec::Core::RakeTask.new(:spec)
|
12
|
+
|
13
|
+
task :default => :spec
|
14
|
+
rescue LoadError
|
15
|
+
puts "RSpec, or one of its dependencies, is not available. Please install it."
|
16
|
+
exit(1)
|
17
|
+
end
|
18
|
+
|
19
|
+
begin
|
20
|
+
require 'yard'
|
21
|
+
require 'yard/rake/yardoc_task'
|
22
|
+
|
23
|
+
YARD::Rake::YardocTask.new do |t|
|
24
|
+
t.options = ["--verbose", "--markup=markdown", "--files=CHANGELOG.markdown,MIT-LICENSE"]
|
25
|
+
end
|
26
|
+
|
27
|
+
task :rdoc => :yardoc
|
28
|
+
|
29
|
+
CLOBBER.include 'doc'
|
30
|
+
CLOBBER.include '.yardoc'
|
31
|
+
rescue LoadError
|
32
|
+
puts "Yard, or one of its dependencies is not available. Please install it."
|
33
|
+
exit(1)
|
34
|
+
end
|
35
|
+
|
36
|
+
# vim: syntax=Ruby
|
@@ -0,0 +1,115 @@
|
|
1
|
+
module Calais
|
2
|
+
class Client
|
3
|
+
# base attributes of the call
|
4
|
+
attr_accessor :content
|
5
|
+
attr_accessor :license_id
|
6
|
+
|
7
|
+
# processing directives
|
8
|
+
attr_accessor :content_type, :output_format, :reltag_base_url, :calculate_relevance, :omit_outputting_original_text
|
9
|
+
attr_accessor :store_rdf, :metadata_enables, :metadata_discards
|
10
|
+
|
11
|
+
# user directives
|
12
|
+
attr_accessor :allow_distribution, :allow_search, :external_id, :submitter
|
13
|
+
|
14
|
+
attr_accessor :external_metadata
|
15
|
+
|
16
|
+
attr_accessor :use_beta
|
17
|
+
|
18
|
+
def initialize(options={}, &block)
|
19
|
+
options.each {|k,v| send("#{k}=", v)}
|
20
|
+
yield(self) if block_given?
|
21
|
+
end
|
22
|
+
|
23
|
+
def enlighten
|
24
|
+
post_args = {
|
25
|
+
"licenseID" => @license_id,
|
26
|
+
"content" => RUBY_VERSION.to_f < 1.9 ?
|
27
|
+
Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "#{@content} ").first[0..-2] :
|
28
|
+
"#{@content} ".encode(Encoding::UTF_8, :invalid => :replace, :undef => :replace, :replace => '')[0 .. -2],
|
29
|
+
"paramsXML" => params_xml
|
30
|
+
}
|
31
|
+
|
32
|
+
do_request(post_args)
|
33
|
+
end
|
34
|
+
|
35
|
+
def params_xml
|
36
|
+
check_params
|
37
|
+
document = Nokogiri::XML::Document.new
|
38
|
+
|
39
|
+
params_node = Nokogiri::XML::Node.new('c:params', document)
|
40
|
+
params_node['xmlns:c'] = 'http://s.opencalais.com/1/pred/'
|
41
|
+
params_node['xmlns:rdf'] = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
|
42
|
+
|
43
|
+
processing_node = Nokogiri::XML::Node.new('c:processingDirectives', document)
|
44
|
+
processing_node['c:contentType'] = AVAILABLE_CONTENT_TYPES[@content_type] if @content_type
|
45
|
+
processing_node['c:outputFormat'] = AVAILABLE_OUTPUT_FORMATS[@output_format] if @output_format
|
46
|
+
processing_node['c:calculateRelevanceScore'] = 'false' if @calculate_relevance == false
|
47
|
+
processing_node['c:reltagBaseURL'] = @reltag_base_url.to_s if @reltag_base_url
|
48
|
+
|
49
|
+
processing_node['c:enableMetadataType'] = @metadata_enables.join(',') unless @metadata_enables.empty?
|
50
|
+
processing_node['c:docRDFaccessible'] = @store_rdf if @store_rdf
|
51
|
+
processing_node['c:discardMetadata'] = @metadata_discards.join(';') unless @metadata_discards.empty?
|
52
|
+
processing_node['c:omitOutputtingOriginalText'] = 'true' if @omit_outputting_original_text
|
53
|
+
|
54
|
+
user_node = Nokogiri::XML::Node.new('c:userDirectives', document)
|
55
|
+
user_node['c:allowDistribution'] = @allow_distribution.to_s unless @allow_distribution.nil?
|
56
|
+
user_node['c:allowSearch'] = @allow_search.to_s unless @allow_search.nil?
|
57
|
+
user_node['c:externalID'] = @external_id.to_s if @external_id
|
58
|
+
user_node['c:submitter'] = @submitter.to_s if @submitter
|
59
|
+
|
60
|
+
params_node << processing_node
|
61
|
+
params_node << user_node
|
62
|
+
|
63
|
+
if @external_metadata
|
64
|
+
external_node = Nokogiri::XML::Node.new('c:externalMetadata', document)
|
65
|
+
external_node << @external_metadata
|
66
|
+
params_node << external_node
|
67
|
+
end
|
68
|
+
|
69
|
+
params_node.to_xml(:indent => 2)
|
70
|
+
end
|
71
|
+
|
72
|
+
def url
|
73
|
+
@url ||= URI.parse(calais_endpoint)
|
74
|
+
end
|
75
|
+
|
76
|
+
private
|
77
|
+
def check_params
|
78
|
+
raise 'missing content' if @content.nil? || @content.empty?
|
79
|
+
|
80
|
+
content_length = @content.length
|
81
|
+
raise 'content is too small' if content_length < MIN_CONTENT_SIZE
|
82
|
+
raise 'content is too large' if content_length > MAX_CONTENT_SIZE
|
83
|
+
|
84
|
+
raise 'missing license id' if @license_id.nil? || @license_id.empty?
|
85
|
+
|
86
|
+
raise 'unknown content type' unless AVAILABLE_CONTENT_TYPES.keys.include?(@content_type) if @content_type
|
87
|
+
raise 'unknown output format' unless AVAILABLE_OUTPUT_FORMATS.keys.include?(@output_format) if @output_format
|
88
|
+
|
89
|
+
%w[calculate_relevance store_rdf allow_distribution allow_search].each do |variable|
|
90
|
+
value = self.send(variable)
|
91
|
+
unless NilClass === value || TrueClass === value || FalseClass === value
|
92
|
+
raise "expected a boolean value for #{variable} but got #{value}"
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
@metadata_enables ||= []
|
97
|
+
unknown_enables = Set.new(@metadata_enables) - KNOWN_ENABLES
|
98
|
+
raise "unknown metadata enables: #{unknown_enables.to_a.inspect}" unless unknown_enables.empty?
|
99
|
+
|
100
|
+
@metadata_discards ||= []
|
101
|
+
unknown_discards = Set.new(@metadata_discards) - KNOWN_DISCARDS
|
102
|
+
raise "unknown metadata discards: #{unknown_discards.to_a.inspect}" unless unknown_discards.empty?
|
103
|
+
end
|
104
|
+
|
105
|
+
def do_request(post_fields)
|
106
|
+
@request ||= Net::HTTP::Post.new(url.path)
|
107
|
+
@request.set_form_data(post_fields)
|
108
|
+
Net::HTTP.new(url.host, url.port).start {|http| http.request(@request)}.body
|
109
|
+
end
|
110
|
+
|
111
|
+
def calais_endpoint
|
112
|
+
@use_beta ? BETA_REST_ENDPOINT : REST_ENDPOINT
|
113
|
+
end
|
114
|
+
end
|
115
|
+
end
|
data/lib/calais/error.rb
ADDED
@@ -0,0 +1,220 @@
|
|
1
|
+
module Calais
|
2
|
+
class Response
|
3
|
+
MATCHERS = {
|
4
|
+
:docinfo => 'DocInfo',
|
5
|
+
:docinfometa => 'DocInfoMeta',
|
6
|
+
:defaultlangid => 'DefaultLangId',
|
7
|
+
:doccat => 'DocCat',
|
8
|
+
:entities => 'type/em/e',
|
9
|
+
:relations => 'type/em/r',
|
10
|
+
:geographies => 'type/er',
|
11
|
+
:instances => 'type/sys/InstanceInfo',
|
12
|
+
:relevances => 'type/sys/RelevanceInfo',
|
13
|
+
}
|
14
|
+
|
15
|
+
attr_accessor :submitter_code, :signature, :language, :submission_date, :request_id, :doc_title, :doc_date
|
16
|
+
attr_accessor :hashes, :entities, :relations, :geographies, :categories, :socialtags, :relevances
|
17
|
+
|
18
|
+
def initialize(rdf_string)
|
19
|
+
@raw_response = rdf_string
|
20
|
+
|
21
|
+
@hashes = []
|
22
|
+
@entities = []
|
23
|
+
@relations = []
|
24
|
+
@geographies = []
|
25
|
+
@relevances = {} # key = String hash, val = Float relevance
|
26
|
+
@categories = []
|
27
|
+
@socialtags = []
|
28
|
+
|
29
|
+
extract_data
|
30
|
+
end
|
31
|
+
|
32
|
+
class Entity
|
33
|
+
attr_accessor :calais_hash, :type, :attributes, :relevance, :instances
|
34
|
+
end
|
35
|
+
|
36
|
+
class Relation
|
37
|
+
attr_accessor :calais_hash, :type, :attributes, :instances
|
38
|
+
end
|
39
|
+
|
40
|
+
class Geography
|
41
|
+
attr_accessor :name, :calais_hash, :attributes, :relevance
|
42
|
+
end
|
43
|
+
|
44
|
+
class Category
|
45
|
+
attr_accessor :name, :score
|
46
|
+
end
|
47
|
+
|
48
|
+
class SocialTag
|
49
|
+
attr_accessor :name, :importance
|
50
|
+
end
|
51
|
+
|
52
|
+
class Instance
|
53
|
+
attr_accessor :prefix, :exact, :suffix, :offset, :length
|
54
|
+
|
55
|
+
# Makes a new Instance object from an appropriate Nokogiri::XML::Node.
|
56
|
+
def self.from_node(node)
|
57
|
+
instance = self.new
|
58
|
+
instance.prefix = node.xpath("c:prefix[1]").first.content
|
59
|
+
instance.exact = node.xpath("c:exact[1]").first.content
|
60
|
+
instance.suffix = node.xpath("c:suffix[1]").first.content
|
61
|
+
instance.offset = node.xpath("c:offset[1]").first.content.to_i
|
62
|
+
instance.length = node.xpath("c:length[1]").first.content.to_i
|
63
|
+
|
64
|
+
instance
|
65
|
+
end
|
66
|
+
end
|
67
|
+
|
68
|
+
class CalaisHash
|
69
|
+
attr_accessor :value
|
70
|
+
|
71
|
+
def self.find_or_create(hash, hashes)
|
72
|
+
if !selected = hashes.select {|h| h.value == hash }.first
|
73
|
+
selected = self.new
|
74
|
+
selected.value = hash
|
75
|
+
hashes << selected
|
76
|
+
end
|
77
|
+
|
78
|
+
selected
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
private
|
83
|
+
def extract_data
|
84
|
+
doc = Nokogiri::XML(@raw_response)
|
85
|
+
|
86
|
+
if doc.root.xpath("/Error[1]").first
|
87
|
+
raise Calais::Error, doc.root.xpath("/Error/Exception").first.content
|
88
|
+
end
|
89
|
+
|
90
|
+
doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:docinfometa]}')]/..").each do |node|
|
91
|
+
@language = node['language']
|
92
|
+
@submission_date = DateTime.parse node['submissionDate']
|
93
|
+
|
94
|
+
attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
|
95
|
+
|
96
|
+
@signature = attributes.delete('signature')
|
97
|
+
@submitter_code = attributes.delete('submitterCode')
|
98
|
+
|
99
|
+
node.remove
|
100
|
+
end
|
101
|
+
|
102
|
+
doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:docinfo]}')]/..").each do |node|
|
103
|
+
@request_id = node['calaisRequestID']
|
104
|
+
|
105
|
+
attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
|
106
|
+
|
107
|
+
@doc_title = attributes.delete('docTitle')
|
108
|
+
@doc_date = Date.parse(attributes.delete('docDate'))
|
109
|
+
|
110
|
+
node.remove
|
111
|
+
end
|
112
|
+
|
113
|
+
@socialtags = doc.root.xpath("rdf:Description/c:socialtag/..").map do |node|
|
114
|
+
tag = SocialTag.new
|
115
|
+
tag.name = node.xpath("c:name[1]").first.content
|
116
|
+
tag.importance = node.xpath("c:importance[1]").first.content.to_i
|
117
|
+
|
118
|
+
node.remove if node.xpath("c:categoryName[1]").first.nil?
|
119
|
+
|
120
|
+
tag
|
121
|
+
end
|
122
|
+
|
123
|
+
@categories = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:doccat]}')]/..").map do |node|
|
124
|
+
category = Category.new
|
125
|
+
category.name = node.xpath("c:categoryName[1]").first.content
|
126
|
+
score = node.xpath("c:score[1]").first
|
127
|
+
category.score = score.content.to_f unless score.nil?
|
128
|
+
|
129
|
+
node.remove
|
130
|
+
category
|
131
|
+
end
|
132
|
+
|
133
|
+
@relevances = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:relevances]}')]/..").inject({}) do |acc, node|
|
134
|
+
subject_hash = node.xpath("c:subject[1]").first[:resource].split('/')[-1]
|
135
|
+
acc[subject_hash] = node.xpath("c:relevance[1]").first.content.to_f
|
136
|
+
|
137
|
+
node.remove
|
138
|
+
acc
|
139
|
+
end
|
140
|
+
|
141
|
+
@entities = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:entities]}')]/..").map do |node|
|
142
|
+
extracted_hash = node['about'].split('/')[-1] rescue nil
|
143
|
+
|
144
|
+
entity = Entity.new
|
145
|
+
entity.calais_hash = CalaisHash.find_or_create(extracted_hash, @hashes)
|
146
|
+
entity.type = extract_type(node)
|
147
|
+
entity.attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
|
148
|
+
|
149
|
+
entity.relevance = @relevances[extracted_hash]
|
150
|
+
entity.instances = extract_instances(doc, extracted_hash)
|
151
|
+
|
152
|
+
node.remove
|
153
|
+
entity
|
154
|
+
end
|
155
|
+
|
156
|
+
@relations = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:relations]}')]/..").map do |node|
|
157
|
+
extracted_hash = node['about'].split('/')[-1] rescue nil
|
158
|
+
|
159
|
+
relation = Relation.new
|
160
|
+
relation.calais_hash = CalaisHash.find_or_create(extracted_hash, @hashes)
|
161
|
+
relation.type = extract_type(node)
|
162
|
+
relation.attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
|
163
|
+
relation.instances = extract_instances(doc, extracted_hash)
|
164
|
+
|
165
|
+
node.remove
|
166
|
+
relation
|
167
|
+
end
|
168
|
+
|
169
|
+
@geographies = doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:geographies]}')]/..").map do |node|
|
170
|
+
attributes = extract_attributes(node.xpath("*[contains(name(), 'c:')]"))
|
171
|
+
|
172
|
+
geography = Geography.new
|
173
|
+
geography.name = attributes.delete('name')
|
174
|
+
geography.calais_hash = attributes.delete('subject')
|
175
|
+
geography.attributes = attributes
|
176
|
+
geography.relevance = extract_relevance(geography.calais_hash.value)
|
177
|
+
|
178
|
+
node.remove
|
179
|
+
geography
|
180
|
+
end
|
181
|
+
|
182
|
+
doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:defaultlangid]}')]/..").each { |node| node.remove }
|
183
|
+
doc.root.xpath("./*").each { |node| node.remove }
|
184
|
+
|
185
|
+
return
|
186
|
+
end
|
187
|
+
|
188
|
+
def extract_instances(doc, hash)
|
189
|
+
doc.root.xpath("rdf:Description/rdf:type[contains(@rdf:resource, '#{MATCHERS[:instances]}')]/..").select do |instance_node|
|
190
|
+
instance_node.xpath("c:subject[1]").first[:resource].split("/")[-1] == hash
|
191
|
+
end.map do |instance_node|
|
192
|
+
instance = Instance.from_node(instance_node)
|
193
|
+
instance_node.remove
|
194
|
+
|
195
|
+
instance
|
196
|
+
end
|
197
|
+
end
|
198
|
+
|
199
|
+
def extract_type(node)
|
200
|
+
node.xpath("*[name()='rdf:type']")[0]['resource'].split('/')[-1]
|
201
|
+
rescue
|
202
|
+
nil
|
203
|
+
end
|
204
|
+
|
205
|
+
def extract_attributes(nodes)
|
206
|
+
nodes.inject({}) do |hsh, node|
|
207
|
+
value = if node['resource']
|
208
|
+
extracted_hash = node['resource'].split('/')[-1] rescue nil
|
209
|
+
CalaisHash.find_or_create(extracted_hash, @hashes)
|
210
|
+
else
|
211
|
+
node.content
|
212
|
+
end
|
213
|
+
hsh.merge(node.name => value)
|
214
|
+
end
|
215
|
+
end
|
216
|
+
def extract_relevance(value)
|
217
|
+
return @relevances[value]
|
218
|
+
end
|
219
|
+
end
|
220
|
+
end
|