dover_to_calais 0.1.0 → 0.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +15 -0
- data/.gitignore +30 -14
- data/README.md +329 -9
- data/Rakefile +6 -0
- data/dover_to_calais.gemspec +10 -7
- data/features/data_sources.feature +14 -0
- data/features/filtering.feature +24 -0
- data/features/step_definitions/data_sources_steps.rb +40 -0
- data/features/step_definitions/filtering_steps.rb +60 -0
- data/lib/dover_to_calais.rb +35 -17
- data/lib/dover_to_calais/version.rb +1 -1
- data/test/test_file_1.doc +0 -0
- data/test/test_file_1.html +36 -0
- data/test/test_file_1.odt +0 -0
- data/test/test_file_1.pdf +0 -0
- data/test/test_file_1.rtf +54 -0
- data/test/test_file_1.txt +14 -0
- metadata +86 -28
checksums.yaml
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
---
|
2
|
+
!binary "U0hBMQ==":
|
3
|
+
metadata.gz: !binary |-
|
4
|
+
OWJmOWEyMGFjNDk2ZjZiODYyNjQ1NDM2YjM0YjMyNzQ1MmUzZjg3MA==
|
5
|
+
data.tar.gz: !binary |-
|
6
|
+
MTllMDRiOTNlNDg2Y2FiNmY1MmQyMjAyMzViNWJiZWFmN2ZjYTY3ZA==
|
7
|
+
SHA512:
|
8
|
+
metadata.gz: !binary |-
|
9
|
+
OGI3NDU4YWU1YzllMjBiNTVlMmU3NzNhMDUzYmNhNWYzODY0ZTE3MDQzZmMx
|
10
|
+
NmZiMDMxZjYzMTI3ZTdkYWU5MGNiNDc3ZTE2ZTRjYThjYjc5ZDQxNjFlZjU0
|
11
|
+
MGRmZWY5ZGM4NTAwYjAyZTEyZmY5M2I5MDdjNDA4NWQ1MDE4MDM=
|
12
|
+
data.tar.gz: !binary |-
|
13
|
+
ODUyZGFhN2JhYjdjZDAyNmMxMTNhZjY0MjJhNWQ5YjU2OTY0OTQyNmU4MDkz
|
14
|
+
NjljZTU1NDUzYTRhN2I2MTA3MmQ3MTM3MDYxODUyMjgzOGVkYTYzNzY4MjA1
|
15
|
+
YTgyZDNkNjE3YjI1NWJiYTJkMzNjN2RiYzEzN2M1MWNmYzFhMzU=
|
data/.gitignore
CHANGED
@@ -1,17 +1,33 @@
|
|
1
1
|
*.gem
|
2
2
|
*.rbc
|
3
|
-
|
4
|
-
|
5
|
-
|
3
|
+
/.config
|
4
|
+
/coverage/
|
5
|
+
/InstalledFiles
|
6
|
+
/pkg/
|
7
|
+
/spec/reports/
|
8
|
+
/test/tmp/
|
9
|
+
/test/version_tmp/
|
10
|
+
/tmp/
|
11
|
+
|
12
|
+
## Documentation cache and generated files:
|
13
|
+
/.yardoc/
|
14
|
+
/_yardoc/
|
15
|
+
/doc/
|
16
|
+
/rdoc/
|
17
|
+
|
18
|
+
## Environment normalisation:
|
19
|
+
/.bundle/
|
20
|
+
/lib/bundler/man/
|
21
|
+
|
22
|
+
# for a library or gem, you might want to ignore these files since the code is
|
23
|
+
# intended to run in multiple environments; otherwise, check them in:
|
6
24
|
Gemfile.lock
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
test/version_tmp
|
17
|
-
tmp
|
25
|
+
.ruby-version
|
26
|
+
.ruby-gemset
|
27
|
+
|
28
|
+
# unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
|
29
|
+
.rvmrc
|
30
|
+
|
31
|
+
## Sublime Text project files
|
32
|
+
*.sublime-project
|
33
|
+
*.sublime-workspace
|
data/README.md
CHANGED
@@ -1,14 +1,19 @@
|
|
1
|
+
|
2
|
+
|
1
3
|
# DoverToCalais
|
2
4
|
|
3
5
|
DoverToCalais allows the user to send a wide range of data sources (files & URLs)
|
4
6
|
to [OpenCalais](http://www.opencalais.com/about) and receive asynchronous responses when [OpenCalais](http://www.opencalais.com/about) has finished processing
|
5
|
-
the inputs. In addition, DoverToCalais enables
|
6
|
-
find relevant tags and/or tag values.
|
7
|
+
the inputs. In addition, DoverToCalais enables response filtering in order to find relevant tags and/or tag values.
|
7
8
|
|
8
9
|
## What is OpenCalais?
|
9
10
|
In short -and quoting the [OpenCalais](http://www.opencalais.com/about) creators:
|
11
|
+
> "*The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.*"
|
12
|
+
|
13
|
+
In general, OpenCalais Simple XML Format (the one used by DoverToCalais) returns three kinds of tags: [Entitites, Events](http://www.opencalais.com/documentation/calais-web-service-api/api-metadata/entity-index-and-definitions) and [Topics](http://www.opencalais.com/documentation/calais-web-service-api/api-metadata/document-categorization). ***Entities*** are static 'things', like Persons, Places, et al. that are involved in the textual context in some capacity. OpenCalais assigns a *relevance* score to each entity to indicate it's relevance within the context of the data source's general topic. ***Events*** are facts or actions that pertain to one or more Entities. ***Topics*** are a characterisation or generic description of the data source's context.
|
14
|
+
|
15
|
+
We can use these tags and the information within them to extract relevant information from the data or to draw useful conclusions about it. For example, if the data source tags include an *<Event>* with the value of *'CompanyExpansion'*, I can then look for the <City> or <Company> tags to find out which company is expanding and if it's near my location (hint: they may be looking for more staff :)) Or, I could pick out all <Company>s involved in a <JointVenture>, or all <Person>s implicated in an <Arrest> in my <City>, etc.
|
10
16
|
|
11
|
-
*The OpenCalais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing (NLP), machine learning and other methods, Calais analyzes your document and finds the entities within it. But, Calais goes well beyond classic entity identification and returns the facts and events hidden within your text as well.*
|
12
17
|
|
13
18
|
## Why use OpenCalais?
|
14
19
|
There are many reasons, mainly to:
|
@@ -16,11 +21,11 @@ There are many reasons, mainly to:
|
|
16
21
|
* incorporate tags into other applications, such as search, news aggregation, blogs, catalogs, etc.
|
17
22
|
* enrich search by looking for deeper, contextual meaning instead of merely phrases or keywords.
|
18
23
|
* help to discern relationships between semantic entities.
|
19
|
-
* facilitate data processing and analysis by allowing easy
|
24
|
+
* facilitate data processing and analysis by allowing easy identification of relevant or important data sources and the discarding of irrelevant ones.
|
20
25
|
|
21
26
|
|
22
27
|
## DoverToCalais Features
|
23
|
-
1. **
|
28
|
+
1. **Multiple data source support**: Thanks to the power of [Yomu](https://github.com/Erol/yomu), DoverToCalais can process a vast range of files (and, of course, web pages), extract text from them and send
|
24
29
|
them to OpenCalais for analysis and tag generation.
|
25
30
|
|
26
31
|
2. **Asynchronous responses (callbacks)**:
|
@@ -28,7 +33,7 @@ Users can set callbacks to receive the processed meta-data, once the OpenCalais
|
|
28
33
|
Furthermore, a user can set multiple callbacks for the same request (data source), thus enabling cleaner,
|
29
34
|
more modular code.
|
30
35
|
|
31
|
-
3. **Result filtering**: DoverToCalais uses the OpenCalais [Simple XML Format](http://www.opencalais.com/documentation/calais-web-service-api/interpreting-api-response/simple-format) as
|
36
|
+
3. **Result filtering**: DoverToCalais uses the OpenCalais [Simple XML Format](http://www.opencalais.com/documentation/calais-web-service-api/interpreting-api-response/simple-format) as the preferred response format. The user can work directly with the XML-formatted response, or -if feeling a bit lazy- can take advantage of the DoverToCalais filtering functionality and receive specific entities, optionally based on specified conditions.
|
32
37
|
|
33
38
|
For more details of the features and code samples, see [Usage](#usage).
|
34
39
|
|
@@ -53,20 +58,325 @@ Or install it yourself as:
|
|
53
58
|
|
54
59
|
$ gem install dover_to_calais
|
55
60
|
|
61
|
+
|
62
|
+
|
56
63
|
## Dependencies
|
57
|
-
DoverToCalais has been developed in Ruby 1.9.3 and
|
64
|
+
DoverToCalais has been developed in Ruby 1.9.3 and relies on the following gems to work (installation with the gem command will automatically install all dependencies)
|
58
65
|
|
59
66
|
* 'nokogiri', 1.6.0
|
60
67
|
* 'eventmachine', 1.0.3
|
61
68
|
* 'em-http-request', 1.1.0
|
62
|
-
* 'open-uri',
|
63
69
|
* 'yomu', 0.1.9
|
64
70
|
|
65
71
|
As [Yomu](https://github.com/Erol/yomu) depends on a working JRE in order to function, so does DoverToCalais.
|
66
72
|
|
67
73
|
## Usage
|
74
|
+
Using DoverToCalais is extremely simple.
|
75
|
+
|
76
|
+
### The Basics
|
77
|
+
As DoverToCalais uses the awesome-ness of [EventMachine](http://rubyeventmachine.com/), code must be placed within an EM *run* block:
|
78
|
+
|
79
|
+
```ruby
|
80
|
+
EM.run do
|
81
|
+
|
82
|
+
# use Control + C to stop the EM
|
83
|
+
Signal.trap('INT') { EventMachine.stop }
|
84
|
+
Signal.trap('TERM') { EventMachine.stop }
|
85
|
+
|
86
|
+
# we need an API key to use OpenCalais
|
87
|
+
DoverToCalais::API_KEY = 'my-opencalais-api-key'
|
88
|
+
# create a new dover
|
89
|
+
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
|
90
|
+
# parse the text and send it to OpenCalais
|
91
|
+
dover.analyse_this
|
92
|
+
puts 'do some stuff....'
|
93
|
+
# set a callback for when we receive a response
|
94
|
+
dover.to_calais { |response| puts response.error ? response.error : response }
|
95
|
+
|
96
|
+
puts 'do some more stuff....'
|
97
|
+
|
98
|
+
end
|
99
|
+
```
|
100
|
+
This will produce the following result:
|
101
|
+
|
102
|
+
|
103
|
+
> do some stuff.... <br>
|
104
|
+
> do some more stuff.... <br>
|
105
|
+
> <?xml version="1.0"?> <br>
|
106
|
+
> <OpenCalaisSimple> <br>
|
107
|
+
> .......... <br>
|
108
|
+
> (the rest of the XML response from OpenCalais) <br>
|
109
|
+
|
110
|
+
|
111
|
+
As can be observed, the callback (#to_calais) is trigerred after the rest of the code has been executed and only when the OpenCalais request has been completed.
|
112
|
+
|
113
|
+
Of course, we can analyse more than one sources at a time:
|
114
|
+
|
115
|
+
```ruby
|
116
|
+
EM.run do
|
117
|
+
|
118
|
+
# use Control + C to stop the EM
|
119
|
+
Signal.trap('INT') { EventMachine.stop }
|
120
|
+
Signal.trap('TERM') { EventMachine.stop }
|
121
|
+
|
122
|
+
DoverToCalais::API_KEY = 'my-opencalais-api-key'
|
123
|
+
|
124
|
+
d1 = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
|
125
|
+
d2 = DoverToCalais::Dover.new('/home/fred/Documents/RailsRecipes.pdf')
|
126
|
+
d3 = DoverToCalais::Dover.new('//network-drive/annual_forecast.doc')
|
127
|
+
|
128
|
+
d1.analyse_this; d2.analyse_this; d3.analyse_this;
|
129
|
+
|
130
|
+
puts 'do some stuff....'
|
131
|
+
|
132
|
+
d1.to_calais { |response| puts response.error ? response.error : response }
|
133
|
+
d2.to_calais { |response| puts response.error ? response.error : response }
|
134
|
+
d3.to_calais { |response| puts response.error ? response.error : response }
|
135
|
+
|
136
|
+
puts 'do some more stuff....'
|
137
|
+
|
138
|
+
end
|
139
|
+
```
|
140
|
+
|
141
|
+
This will output the two *puts* statements followed by the three callbacks (d1, d2, d3) in the order in which they are triggered, i.e. the first callback to receive a response from OpenCalais will fire first.
|
142
|
+
|
143
|
+
|
144
|
+
###Filtering the response
|
145
|
+
Why parse the response XML ourselves when DoverToCalais can do it for us? We'll just use the *#filter* method on the response object, passing a filtering hash:
|
146
|
+
|
147
|
+
```ruby
|
148
|
+
my_filter = {:entity => 'Entity1', :value => 'Value1', :given => {:entity => 'Entity2', :value => 'Value2'}}
|
149
|
+
reponse.filter(my_filter)
|
150
|
+
```
|
151
|
+
|
152
|
+
The above tells DoverToCalais to look in the reponse for an entity called 'Entity1' with a value of 'Value1', **only** if the response contains an entity called 'Entity2' which has a value of 'Value2'.
|
153
|
+
|
154
|
+
The conditional clause (*:given*) is optional; the filtering hash can be used in pretty much any permutation. For instance:
|
155
|
+
|
156
|
+
```ruby
|
157
|
+
EM.run do
|
158
|
+
|
159
|
+
DoverToCalais::API_KEY = 'my-opencalais-api-key'
|
160
|
+
|
161
|
+
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
|
162
|
+
dover.analyse_this
|
163
|
+
|
164
|
+
dover.to_calais do |response|
|
165
|
+
if response.error
|
166
|
+
puts response.error
|
167
|
+
else
|
168
|
+
puts response.filter({:entity => 'Company'})
|
169
|
+
end
|
170
|
+
end
|
171
|
+
|
172
|
+
end
|
173
|
+
```
|
174
|
+
|
175
|
+
This will pick out all entities tagged 'Company' from the data source. The output will be an Array of ResponseItem objects.
|
176
|
+
|
177
|
+
|
178
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="BBC News", relevance=0.654, count=13, normalized=nil, importance=nil, originalValue=nil><br>
|
179
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="TV Radio", relevance=0.565, count=2, normalized="HERALD & WEEKLY-TV,RADIO OPS", importance=nil, originalValue=nil> <br>
|
180
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="Reuters", relevance=0.255, count=2, normalized="THOMSON REUTERS GROUP LIMITED", importance=nil, originalValue=nil> <br>
|
181
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="Twitter", relevance=0.395, count=1, normalized="TWITTER, INC.", importance=nil, originalValue=nil> <br>
|
182
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="Huffington Post UK", relevance=0.136, count=1, normalized=nil, importance=nil, originalValue=nil> <br>
|
183
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="Ireland Kenya", relevance=0.144, count=1, normalized=nil, importance=nil, originalValue=nil> <br>
|
184
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="Yahoo! UK", relevance=0.144, count=1, normalized="YAHOO! UK LIMITED", importance=nil, originalValue=nil> <br>
|
185
|
+
|
186
|
+
|
187
|
+
If this output looks a bit cluttered, we can easily tidy it up:
|
188
|
+
|
189
|
+
```ruby
|
190
|
+
EM.run do
|
191
|
+
|
192
|
+
DoverToCalais::API_KEY = 'my-opencalais-api-key'
|
193
|
+
|
194
|
+
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/world-africa-24412315')
|
195
|
+
dover.analyse_this
|
196
|
+
|
197
|
+
dover.to_calais do |response|
|
198
|
+
if response.error
|
199
|
+
puts response.error
|
200
|
+
else
|
201
|
+
items = response.filter({:entity => 'Company'})
|
202
|
+
items.each do |item|
|
203
|
+
puts "#{item.name}: #{item.value}, relevance = #{item.relevance}"
|
204
|
+
end
|
205
|
+
end
|
206
|
+
end
|
207
|
+
|
208
|
+
end
|
209
|
+
```
|
210
|
+
|
211
|
+
Which will give us:
|
212
|
+
|
213
|
+
|
214
|
+
> Company: BBC News, relevance = 0.656 <br>
|
215
|
+
> Company: TV Radio, relevance = 0.566 <br>
|
216
|
+
> Company: Reuters, relevance = 0.26 <br>
|
217
|
+
> Company: Guardian.co.uk, relevance = 0.143 <br>
|
218
|
+
> Company: Twitter, relevance = 0.399 <br>
|
219
|
+
> Company: Huffington Post UK, relevance = 0.132 <br>
|
220
|
+
> Company: Ireland Kenya, relevance = 0.139 <br>
|
221
|
+
> Company: Yahoo! UK, relevance = 0.139 <br>
|
222
|
+
|
223
|
+
|
68
224
|
|
69
|
-
|
225
|
+
Let's see if the data source refers to any business partnerships:
|
226
|
+
|
227
|
+
```ruby
|
228
|
+
EM.run do
|
229
|
+
|
230
|
+
DoverToCalais::API_KEY = 'my-opencalais-api-key'
|
231
|
+
|
232
|
+
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/technology-24380202')
|
233
|
+
dover.analyse_this
|
234
|
+
|
235
|
+
dover.to_calais do |response|
|
236
|
+
if response.error
|
237
|
+
puts response.error
|
238
|
+
else
|
239
|
+
items = response.filter({:entity => 'Event', :value => 'Business Partnership'})
|
240
|
+
puts "There are #{items.length} events like that in the source"
|
241
|
+
end
|
242
|
+
end
|
243
|
+
|
244
|
+
end
|
245
|
+
```
|
246
|
+
|
247
|
+
which will produce:
|
248
|
+
|
249
|
+
> There are 1 events like that in the source
|
250
|
+
|
251
|
+
|
252
|
+
Now let's find all companies involved in any business partnerships:
|
253
|
+
|
254
|
+
```ruby
|
255
|
+
EM.run do
|
256
|
+
|
257
|
+
DoverToCalais::API_KEY = 'my-opencalais-api-key'
|
258
|
+
|
259
|
+
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/technology-24380202')
|
260
|
+
dover.analyse_this
|
261
|
+
|
262
|
+
dover.to_calais do |response|
|
263
|
+
if response.error
|
264
|
+
puts response.error
|
265
|
+
else
|
266
|
+
items = response.filter( {:entity => 'Company', :given => {:entity => 'Event', :value => 'Business Partnership'}} )
|
267
|
+
items.each do |item|
|
268
|
+
puts "#{item.name}: #{item.value} a.k.a #{item.normalized}, relevance = #{item.relevance}"
|
269
|
+
end
|
270
|
+
end
|
271
|
+
end
|
272
|
+
|
273
|
+
end
|
274
|
+
```
|
275
|
+
|
276
|
+
which gives us:
|
277
|
+
|
278
|
+
> Company: BBC News a.k.a , relevance = 0.678 <br>
|
279
|
+
> Company: Google a.k.a GOOGLE INC., relevance = 0.508 <br>
|
280
|
+
> Company: Flutter a.k.a FLUTTER COM INC, relevance = 0.531 <br>
|
281
|
+
> Company: TV Radio a.k.a HERALD & WEEKLY-TV,RADIO OPS, relevance = 0.558 <br>
|
282
|
+
> Company: Microsoft a.k.a MICROSOFT CORPORATION, relevance = 0.303 <br>
|
283
|
+
> Company: Adobe a.k.a ADOBE SYSTEMS INCORPORATED, relevance = 0.193 <br>
|
284
|
+
> Company: Netflix a.k.a NETFLIX, INC., relevance = 0.301 <br>
|
285
|
+
> Company: Y Combinator a.k.a Y Combinator, relevance = 0.258 <br>
|
286
|
+
> Company: Nintendo a.k.a Nintendo Co., Ltd., relevance = 0.286 <br>
|
287
|
+
> Company: Samsung a.k.a Samsung C&T Corporation, relevance = 0.285 <br>
|
288
|
+
> Company: Glyndwr University a.k.a , relevance = 0.269 <br>
|
289
|
+
|
290
|
+
|
291
|
+
|
292
|
+
At this point, someone may ask: "But what if we want to get more than one entity for a given condition? The filter hash doesn't allow that!"
|
293
|
+
|
294
|
+
No it doesn't. However, given that filtering is done on the *whole* reponse *after* it's been received, we can apply many filters on the same response:
|
295
|
+
|
296
|
+
```ruby
|
297
|
+
EM.run do
|
298
|
+
|
299
|
+
DoverToCalais::API_KEY = 'my-opencalais-api-key'
|
300
|
+
|
301
|
+
dover = DoverToCalais::Dover.new('http://www.bbc.co.uk/news/technology-24380202')
|
302
|
+
dover.analyse_this
|
303
|
+
|
304
|
+
dover.to_calais do |response|
|
305
|
+
if response.error
|
306
|
+
puts response.error
|
307
|
+
else
|
308
|
+
result1 = response.filter( {:entity => 'Company', :value => 'Google', :given => {:entity => 'Technology', :value => 'gesture recognition'}} )
|
309
|
+
result2 = response.filter( {:entity => 'Product', :given => {:entity => 'Technology', :value => 'gesture recognition'}} )
|
310
|
+
puts result1 | result2
|
311
|
+
end
|
312
|
+
end
|
313
|
+
|
314
|
+
end
|
315
|
+
```
|
316
|
+
|
317
|
+
Which will give us all the gesture-recognition products that Google is associated with according to our data source:
|
318
|
+
|
319
|
+
> <struct DoverToCalais::ResponseItem name="Company", value="Google", relevance=0.506, count=7, normalized="GOOGLE INC.", importance=nil, originalValue=nil> <br>
|
320
|
+
> <struct DoverToCalais::ResponseItem name="Product", value="Xbox Kinect", relevance=0.286, count=1, normalized=nil, importance=nil, originalValue=nil> <br>
|
321
|
+
> <struct DoverToCalais::ResponseItem name="Product", value="Galaxy S4 smartphone", relevance=0.282, count=1, normalized=nil, importance=nil, originalValue=nil> <br>
|
322
|
+
> <struct DoverToCalais::ResponseItem name="Product", value="Wii", relevance=0.286, count=1, normalized=nil, importance=nil, originalValue=nil> <br>
|
323
|
+
> <struct DoverToCalais::ResponseItem name="Product", value="Galaxy S4", relevance=0.282, count=1, normalized=nil, importance=nil, originalValue=nil> <br>
|
324
|
+
|
325
|
+
|
326
|
+
|
327
|
+
|
328
|
+
***PS***: If you're not sure about the names or values of the tags you want to filter, you can get a listing with the following Constants:
|
329
|
+
|
330
|
+
```ruby
|
331
|
+
CalaisOntology::CALAIS_ENTITIES
|
332
|
+
CalaisOntology::CALAIS_EVENTS
|
333
|
+
CalaisOntology::CALAIS_TOPICS
|
334
|
+
```
|
335
|
+
|
336
|
+
###Code samples
|
337
|
+
|
338
|
+
More examples of using DoverToCalais can be found as GitHub Gists:
|
339
|
+
|
340
|
+
[Using DoverToCalais to semantically tag all files in a directory](https://gist.github.com/RedFred7/6961349)
|
341
|
+
[Use DoverToCalais to find all Persons or Organizations with a relevance score greater than 0.1, if the data source contains an environmental event](https://gist.github.com/RedFred7/6961853)
|
342
|
+
|
343
|
+
|
344
|
+
### Using a Proxy
|
345
|
+
|
346
|
+
If you're behind a corporate firewall and the only way to reach outside is through a proxy then you need to set the *DoverToCalais::PROXY* constant:
|
347
|
+
|
348
|
+
```ruby
|
349
|
+
DoverToCalais::PROXY =
|
350
|
+
:proxy => {
|
351
|
+
:host => 'www.myproxy.com',
|
352
|
+
:port => 8080,
|
353
|
+
:authorization => ['username', 'password'] #optional
|
354
|
+
}
|
355
|
+
```
|
356
|
+
|
357
|
+
|
358
|
+
If you're connecting through a SOCKS5 Proxy just set the *:type* key to :socks5.
|
359
|
+
|
360
|
+
```ruby
|
361
|
+
DoverToCalais::PROXY =
|
362
|
+
:proxy => {
|
363
|
+
:host => 'www.myproxy.com',
|
364
|
+
:port => 8080,
|
365
|
+
:type => :socks5
|
366
|
+
}
|
367
|
+
```
|
368
|
+
|
369
|
+
## Documentation
|
370
|
+
|
371
|
+
Comprehensive documentation can be found at http://rubydoc.info/gems/dover_to_calais.
|
372
|
+
|
373
|
+
## Testing
|
374
|
+
|
375
|
+
A list of Cucumber features and scenarios can be found in the *features* directory. The list is far from exhaustive, so feel free to add your own scenarios and steps.
|
376
|
+
|
377
|
+
To run the tests, there is already a rake task set up. Just type:
|
378
|
+
|
379
|
+
rake features API_KEY='my_api_key'
|
70
380
|
|
71
381
|
## Contributing
|
72
382
|
|
@@ -75,3 +385,13 @@ TODO: Write usage instructions here
|
|
75
385
|
3. Commit your changes (`git commit -am 'Add some feature'`)
|
76
386
|
4. Push to the branch (`git push origin my-new-feature`)
|
77
387
|
5. Create new Pull Request
|
388
|
+
|
389
|
+
|
390
|
+
##Changelog
|
391
|
+
|
392
|
+
* **07-Oct-2013** Version: 0.1.0
|
393
|
+
Initial release
|
394
|
+
* **10-Feb-2014** Version: 0.1.1
|
395
|
+
Improved Response error message
|
396
|
+
* **10-Feb-2014** Version: 0.2.0
|
397
|
+
Added #analyse_this to public interface
|
data/Rakefile
CHANGED
data/dover_to_calais.gemspec
CHANGED
@@ -7,20 +7,21 @@ Gem::Specification.new do |spec|
|
|
7
7
|
spec.name = "dover_to_calais"
|
8
8
|
spec.version = DoverToCalais::VERSION
|
9
9
|
spec.authors = ["Fred Heath"]
|
10
|
-
spec.email = ["
|
10
|
+
spec.email = ["fred_h@bootstrap.me.uk"]
|
11
11
|
spec.description = %q{DoverToCalais allows the user to send a wide range of data sources (files & URLs)
|
12
12
|
to OpenCalais and receive asynchronous responses when OpenCalais has finished processing
|
13
13
|
the inputs. In addition, DoverToCalais enables the filtering of the response in order to
|
14
14
|
find relevant tags and/or tag values. }
|
15
15
|
spec.summary = %q{An easy-to-use wrapper round the OpenCalais semantic analysis web service. }
|
16
|
-
spec.homepage = ""
|
16
|
+
spec.homepage = "https://github.com/RedFred7/dover_to_calais"
|
17
17
|
spec.license = "MIT"
|
18
18
|
|
19
19
|
|
20
|
-
spec.add_runtime_dependency "nokogiri", "~>1.6
|
21
|
-
spec.add_runtime_dependency "eventmachine", "~>1.0.3"
|
22
|
-
spec.add_runtime_dependency "em-http-request", "~>1.1
|
23
|
-
spec.add_runtime_dependency "yomu", "~>0.1.9"
|
20
|
+
spec.add_runtime_dependency "nokogiri", "~> 1.6"
|
21
|
+
spec.add_runtime_dependency "eventmachine", "~> 1.0", ">= 1.0.3"
|
22
|
+
spec.add_runtime_dependency "em-http-request", "~> 1.1"
|
23
|
+
spec.add_runtime_dependency "yomu", "~> 0.1", ">= 0.1.9"
|
24
|
+
|
24
25
|
|
25
26
|
spec.files = `git ls-files`.split($/)
|
26
27
|
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
@@ -29,5 +30,7 @@ Gem::Specification.new do |spec|
|
|
29
30
|
|
30
31
|
|
31
32
|
spec.add_development_dependency "bundler", "~> 1.3"
|
32
|
-
spec.add_development_dependency "rake"
|
33
|
+
spec.add_development_dependency "rake", "~> 0"
|
34
|
+
spec.add_development_dependency "cucumber", "~> 1.3", ">= 1.3.8"
|
35
|
+
spec.add_development_dependency "rspec", "~> 2.14", ">= 2.14.1"
|
33
36
|
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
Feature: Able to handle wide range of data formats as input
|
2
|
+
Scenario Outline: Processing various data-source formats
|
3
|
+
Given the file <input>
|
4
|
+
When DoverToCalais processes this file
|
5
|
+
Then the output should have no errors
|
6
|
+
|
7
|
+
Examples:
|
8
|
+
| input |
|
9
|
+
|test_file_1.doc |
|
10
|
+
|test_file_1.html|
|
11
|
+
|test_file_1.odt|
|
12
|
+
|test_file_1.pdf|
|
13
|
+
|test_file_1.rtf|
|
14
|
+
|test_file_1.txt|
|
@@ -0,0 +1,24 @@
|
|
1
|
+
Feature: Ability to select certain OpenCalais entities based on certain conditions
|
2
|
+
|
3
|
+
Background:
|
4
|
+
Given the file 'test_file_1.txt' is successfully processed
|
5
|
+
|
6
|
+
|
7
|
+
Scenario: Select all entities with a specific name
|
8
|
+
When I filter on {:entity => 'EmailAddress'}
|
9
|
+
Then the output should have 2 entries
|
10
|
+
And All entries should be named 'EmailAddress'
|
11
|
+
|
12
|
+
Scenario: Select an entity with a specific value
|
13
|
+
When I filter on {:entity => 'Event', :value => 'Meeting'}
|
14
|
+
Then the output should have 1 entries
|
15
|
+
And All entries should be named 'Event'
|
16
|
+
And All entries should have the value 'Meeting'
|
17
|
+
|
18
|
+
|
19
|
+
Scenario: Select an entity only if another entity with a specific value exists in the data source
|
20
|
+
When I filter on {:entity => 'Person', :given => {:entity => 'Event', :value => 'Meeting'}}
|
21
|
+
Then the output should have 2 entries
|
22
|
+
And All entries should be named 'Person'
|
23
|
+
And One entry should have the value 'Roger Kay'
|
24
|
+
And One entry should have the value 'David Bailey'
|
@@ -0,0 +1,40 @@
|
|
1
|
+
|
2
|
+
require 'nokogiri'
|
3
|
+
require 'eventmachine'
|
4
|
+
require 'em-http-request'
|
5
|
+
require 'yomu'
|
6
|
+
require 'rspec'
|
7
|
+
require File.expand_path('../../../lib/dover_to_calais', __FILE__)
|
8
|
+
|
9
|
+
|
10
|
+
# N.B Cucumber must be run with the Environment variable 'API_KEY' set
|
11
|
+
# to the OpenCalais API Key value.
|
12
|
+
|
13
|
+
|
14
|
+
Given(/^the file (\w+\.\w{3,4})$/) do |arg1|
|
15
|
+
puts arg1
|
16
|
+
@input = Dir.pwd + '/test/' + arg1
|
17
|
+
@output = nil
|
18
|
+
end
|
19
|
+
|
20
|
+
|
21
|
+
|
22
|
+
When(/^DoverToCalais processes this file$/) do
|
23
|
+
EM.run {
|
24
|
+
|
25
|
+
DoverToCalais::API_KEY = ENV['API_KEY']
|
26
|
+
d1 = DoverToCalais::Dover.new(@input)
|
27
|
+
d1.analyse_this
|
28
|
+
d1.to_calais do |response|
|
29
|
+
@output = response
|
30
|
+
EM.stop
|
31
|
+
end
|
32
|
+
|
33
|
+
}
|
34
|
+
end
|
35
|
+
|
36
|
+
|
37
|
+
|
38
|
+
Then(/^the output should have no errors$/) do
|
39
|
+
@output.error.should be_nil
|
40
|
+
end
|
@@ -0,0 +1,60 @@
|
|
1
|
+
require 'nokogiri'
|
2
|
+
require 'eventmachine'
|
3
|
+
require 'em-http-request'
|
4
|
+
require 'yomu'
|
5
|
+
require 'rspec'
|
6
|
+
require File.expand_path('../../../lib/dover_to_calais', __FILE__)
|
7
|
+
|
8
|
+
|
9
|
+
# N.B Cucumber must be run with the Environment variable 'API_KEY' set
|
10
|
+
# to the OpenCalais API Key value.
|
11
|
+
|
12
|
+
|
13
|
+
|
14
|
+
Given(/^the file '(\w+\.\w{3,4})' is successfully processed$/) do |file|
|
15
|
+
|
16
|
+
steps %{
|
17
|
+
Given the file #{file}
|
18
|
+
When DoverToCalais processes this file
|
19
|
+
Then the output should have no errors
|
20
|
+
}
|
21
|
+
|
22
|
+
end
|
23
|
+
|
24
|
+
|
25
|
+
When(/^I filter on ({.+})/) do |filter|
|
26
|
+
@filtered_output = @output.filter(eval(filter))
|
27
|
+
|
28
|
+
end
|
29
|
+
|
30
|
+
Then(/^the output should have (\d+) entries$/) do |item_num|
|
31
|
+
@filtered_output.size.should == item_num.to_i
|
32
|
+
end
|
33
|
+
|
34
|
+
Then(/^All entries should be named '(\w+)'$/) do |name|
|
35
|
+
@filtered_output.each do |item|
|
36
|
+
item.name.should == name
|
37
|
+
end
|
38
|
+
|
39
|
+
end
|
40
|
+
|
41
|
+
|
42
|
+
And(/^All entries should have the value '(\w+)'$/) do |value|
|
43
|
+
@filtered_output.each do |item|
|
44
|
+
item.value.match(value).should_not be_nil
|
45
|
+
end
|
46
|
+
end
|
47
|
+
|
48
|
+
|
49
|
+
And(/^One entry should have the value '(\w+\s*\w+)'$/) do |value|
|
50
|
+
found = false
|
51
|
+
@filtered_output.each do |item|
|
52
|
+
if item.value.match(value)
|
53
|
+
found =true
|
54
|
+
break
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
fail("couldn't match value '#{value}'") unless found
|
59
|
+
end
|
60
|
+
|
data/lib/dover_to_calais.rb
CHANGED
@@ -130,14 +130,15 @@ module DoverToCalais
|
|
130
130
|
node_count = node.attribute('count').text.to_i if node.has_attribute?('count')
|
131
131
|
node_normalized = node.attribute('normalized').text if node.has_attribute?('normalized')
|
132
132
|
node_importance = node.attribute('importance').text.to_i if node.has_attribute?('importance')
|
133
|
+
node_orig_value = node.xpath('originalValue').text if node.name.eql?('SocialTag')
|
133
134
|
|
134
135
|
ResponseItem.new(node.name,
|
135
|
-
node.
|
136
|
+
node.text,
|
136
137
|
node_relevance,
|
137
138
|
node_count,
|
138
139
|
node_normalized,
|
139
140
|
node_importance,
|
140
|
-
|
141
|
+
node_orig_value )
|
141
142
|
|
142
143
|
end
|
143
144
|
|
@@ -172,7 +173,6 @@ module DoverToCalais
|
|
172
173
|
def initialize(data_src)
|
173
174
|
@data_src = data_src
|
174
175
|
@callbacks = []
|
175
|
-
analyse_this
|
176
176
|
end
|
177
177
|
|
178
178
|
|
@@ -202,7 +202,7 @@ module DoverToCalais
|
|
202
202
|
# @return N/A
|
203
203
|
def to_calais(&block)
|
204
204
|
#fred rules ok
|
205
|
-
if
|
205
|
+
if !@error
|
206
206
|
@callbacks << block
|
207
207
|
else
|
208
208
|
result = ResponseData.new nil, @error
|
@@ -211,12 +211,19 @@ module DoverToCalais
|
|
211
211
|
|
212
212
|
end #method
|
213
213
|
|
214
|
+
# Gets the source text parsed. If the parsing is successful, the data source is POSTed to OpenCalais
|
215
|
+
# via an EventMachine request and a callback is set to manage the OpenCalais response.
|
216
|
+
# All Dover object callbacks are then called with the request result yielded to them.
|
214
217
|
#
|
218
|
+
# @param N/A
|
219
|
+
# @return a {Class ResponseData} object
|
215
220
|
def analyse_this
|
216
221
|
|
217
222
|
@document = get_src_data(@data_src)
|
218
223
|
begin
|
219
|
-
if @document
|
224
|
+
if @document[0..2].eql?('ERR')
|
225
|
+
raise 'Invalid data source'
|
226
|
+
else
|
220
227
|
response = nil
|
221
228
|
|
222
229
|
connection_options = {:inactivity_timeout => 0}
|
@@ -242,36 +249,47 @@ module DoverToCalais
|
|
242
249
|
|
243
250
|
|
244
251
|
http.callback do
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
config
|
252
|
+
|
253
|
+
if http.response_header.status == 200
|
254
|
+
http.response.match(/<OpenCalaisSimple>/) do |m|
|
255
|
+
response = Nokogiri::XML('<OpenCalaisSimple>' + m.post_match) do |config|
|
256
|
+
#strict xml parsing, disallow network connections
|
257
|
+
config.strict.nonet
|
258
|
+
end #block
|
249
259
|
end #block
|
250
|
-
end #block
|
251
260
|
|
252
|
-
|
261
|
+
result = response ?
|
262
|
+
ResponseData.new(response, nil) :
|
263
|
+
ResponseData.new(nil,'ERR: cannot find <OpenCalaisSimple> tag in response data - source invalid?')
|
264
|
+
else #non-200 response header
|
265
|
+
result = ResponseData.new nil,
|
266
|
+
"ERR: OpenCalais service responded with #{http.response_header.status} - response body: '#{http.response}'"
|
267
|
+
end
|
268
|
+
|
253
269
|
@callbacks.each { |c| c.call(result) }
|
270
|
+
|
254
271
|
end #callback
|
255
272
|
|
256
273
|
|
257
274
|
http.errback do
|
258
|
-
|
259
|
-
result = ResponseData.new nil, "#{http.error}"
|
275
|
+
result = ResponseData.new nil, "ERR: #{http.error}"
|
260
276
|
@callbacks.each { |c| c.call(result) }
|
261
277
|
end #errback
|
262
278
|
|
263
279
|
|
264
280
|
end #if
|
265
281
|
rescue Exception=>e
|
266
|
-
|
267
|
-
|
282
|
+
#result = ResponseData.new nil, "ERR: #{e}"
|
283
|
+
#@callbacks.each { |c| c.call(result) }
|
284
|
+
@error = "ERR: #{e}"
|
268
285
|
end
|
269
286
|
|
270
287
|
end #method
|
271
288
|
|
272
289
|
|
273
|
-
|
274
|
-
|
290
|
+
alias_method :analyze_this, :analyse_this
|
291
|
+
public :to_calais, :analyse_this
|
292
|
+
private :get_src_data
|
275
293
|
|
276
294
|
|
277
295
|
end #class
|
Binary file
|
@@ -0,0 +1,36 @@
|
|
1
|
+
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
2
|
+
<HTML>
|
3
|
+
<HEAD>
|
4
|
+
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=utf-8">
|
5
|
+
<TITLE></TITLE>
|
6
|
+
<META NAME="GENERATOR" CONTENT="LibreOffice 3.5 (Linux)">
|
7
|
+
<META NAME="CREATED" CONTENT="0;0">
|
8
|
+
<META NAME="CHANGED" CONTENT="0;0">
|
9
|
+
<STYLE TYPE="text/css">
|
10
|
+
<!--
|
11
|
+
@page { margin: 2cm }
|
12
|
+
P { margin-bottom: 0.21cm }
|
13
|
+
P.western { so-language: en-GB }
|
14
|
+
PRE.western { so-language: en-GB }
|
15
|
+
PRE.cjk { font-family: "WenQuanYi Micro Hei", monospace }
|
16
|
+
PRE.ctl { font-family: "Lohit Hindi", monospace }
|
17
|
+
-->
|
18
|
+
</STYLE>
|
19
|
+
</HEAD>
|
20
|
+
<BODY LANG="en-GB" DIR="LTR">
|
21
|
+
<PRE CLASS="western">Tensleep Corporation (Other OTC:TENS.PK - News) ("Tensleep") announced that with the acquisition of XSTV Media, Inc. ("XSTV"),
|
22
|
+
it will become an online independent sports company. The transaction is
|
23
|
+
to close on or before September 15, 2007. Tensleep will, by the end of
|
24
|
+
this week or early next week, call a special meeting of shareholders to
|
25
|
+
approve the change name to "XSTV Corporation."
|
26
|
+
|
27
|
+
David Bailey, an analyst at Gerard Klauer Mattison who can be contacted at david at bailey dot com, said such cuts "could include head count reductions."
|
28
|
+
Layoffs to some degree are inevitable, said IDC analyst Roger Kay. For years,
|
29
|
+
the company enjoyed a lower cost structure than other PC makers because
|
30
|
+
it sold computers directly.
|
31
|
+
International Star Inc. (info@ITSR.com - OTC BB: ILST) announced that the annual meeting of
|
32
|
+
shareholders of International Star Inc. will be held on May 19, 2008,
|
33
|
+
at 3:00 p.m. (local time) at The Hilton Hotel, 104 Market Street,
|
34
|
+
Shreveport, La., 71101. </PRE>
|
35
|
+
</BODY>
|
36
|
+
</HTML>
|
Binary file
|
Binary file
|
@@ -0,0 +1,54 @@
|
|
1
|
+
{\rtf1\ansi\deff3\adeflang1025
|
2
|
+
{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}{\f1\froman\fprq2\fcharset2 Symbol;}{\f2\fswiss\fprq2\fcharset0 Arial;}{\f3\froman\fprq2\fcharset128 Times New Roman;}{\f4\fswiss\fprq2\fcharset128 Arial;}{\f5\fmodern\fprq1\fcharset128 Droid Sans Mono;}{\f6\fnil\fprq2\fcharset128 Droid Sans;}{\f7\fmodern\fprq1\fcharset128 WenQuanYi Micro Hei;}{\f8\fnil\fprq2\fcharset128 Lohit Hindi;}{\f9\fnil\fprq0\fcharset128 Lohit Hindi;}{\f10\fmodern\fprq1\fcharset128 Lohit Hindi;}}
|
3
|
+
{\colortbl;\red0\green0\blue0;\red0\green0\blue128;\red128\green0\blue0;\red128\green128\blue128;}
|
4
|
+
{\stylesheet{\s0\snext0\nowidctlpar{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\cf0\hich\af6\langfe2052\dbch\af8\afs24\alang1081\loch\f3\fs24\lang2057 Normal;}
|
5
|
+
{\*\cs15\snext15 Footnote Characters;}
|
6
|
+
{\*\cs16\snext16 Endnote Characters;}
|
7
|
+
{\*\cs17\snext17\cf2\ul\ulc0\langfe255\alang255\lang255 Internet Link;}
|
8
|
+
{\*\cs18\snext18\cf3\ul\ulc0\langfe255\alang255\lang255 Visited Internet Link;}
|
9
|
+
{\s19\sbasedon0\snext20\sb240\sa120\keepn\hich\af6\dbch\af8\afs28\loch\f4\fs28 Heading;}
|
10
|
+
{\s20\sbasedon0\snext20\sb0\sa120 Text body;}
|
11
|
+
{\s21\sbasedon20\snext21\sb0\sa120\dbch\af9 List;}
|
12
|
+
{\s22\sbasedon0\snext22\sb120\sa120\noline\i\dbch\af9\afs24\ai\fs24 Caption;}
|
13
|
+
{\s23\sbasedon0\snext23\noline\dbch\af9 Index;}
|
14
|
+
{\s24\sbasedon0\snext24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20 Preformatted Text;}
|
15
|
+
{\s25\sbasedon0\snext25\li567\ri0\lin567\rin0\fi0 List Contents;}
|
16
|
+
}{\info{\creatim\yr0\mo0\dy0\hr0\min0}{\revtim\yr0\mo0\dy0\hr0\min0}{\printim\yr0\mo0\dy0\hr0\min0}{\comment LibreOffice}{\vern3500}}\deftab709
|
17
|
+
|
18
|
+
{\*\pgdsctbl
|
19
|
+
{\pgdsc0\pgdscuse195\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\pgdscnxt0 Default;}}
|
20
|
+
\formshade\paperh16838\paperw11906\margl1134\margr1134\margt1134\margb1134\sectd\sbknone\sectunlocked1\pgndec\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\ftnbj\ftnstart1\ftnrstcont\ftnnar\aenddoc\aftnrstcont\aftnstart1\aftnnrlc
|
21
|
+
\pgndec\pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
22
|
+
Tensleep Corporation (Other OTC:TENS.PK - News) ("Tensleep") announced that with the acquisition of XSTV Media, Inc. ("XSTV"),}
|
23
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch
|
24
|
+
}{\rtlch \ltrch\loch
|
25
|
+
it will become an online independent sports company. The transaction is}
|
26
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
27
|
+
to close on or before September 15, 2007. Tensleep will, by the end of}
|
28
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
29
|
+
this week or early next week, call a special meeting of shareholders to}
|
30
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
31
|
+
approve the change name to "XSTV Corporation."}
|
32
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20\rtlch \ltrch\loch
|
33
|
+
|
34
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
35
|
+
David Bailey, an analyst at Gerard Klauer Mattison who can be contacted at david at bailey dot com, said such cuts "could include head count reductions."}
|
36
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
37
|
+
Layoffs to some degree are inevitable, said IDC analyst Roger Kay. For years,}
|
38
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch
|
39
|
+
}{\rtlch \ltrch\loch
|
40
|
+
the company enjoyed a lower cost structure than other PC makers because}
|
41
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
42
|
+
it sold computers directly.}
|
43
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch
|
44
|
+
}{\rtlch \ltrch\loch
|
45
|
+
International Star Inc. (info@ITSR.com - OTC BB: ILST) announced that the annual meeting of}
|
46
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch\loch
|
47
|
+
shareholders of International Star Inc. will be held on May 19, 2008,}
|
48
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch
|
49
|
+
}{\rtlch \ltrch\loch
|
50
|
+
at 3:00 p.m. (local time) at The Hilton Hotel, 104 Market Street,}
|
51
|
+
\par \pard\plain \s24\sb0\sa0\hich\af7\dbch\af10\afs20\loch\f5\fs20{\rtlch \ltrch
|
52
|
+
}{\rtlch \ltrch\loch
|
53
|
+
Shreveport, La., 71101. }
|
54
|
+
\par }
|
@@ -0,0 +1,14 @@
|
|
1
|
+
Tensleep Corporation (Other OTC:TENS.PK - News) ("Tensleep") announced that with the acquisition of XSTV Media, Inc. ("XSTV"),
|
2
|
+
it will become an online independent sports company. The transaction is
|
3
|
+
to close on or before September 15, 2007. Tensleep will, by the end of
|
4
|
+
this week or early next week, call a special meeting of shareholders to
|
5
|
+
approve the change name to "XSTV Corporation."
|
6
|
+
|
7
|
+
David Bailey, an analyst at Gerard Klauer Mattison who can be contacted at david at bailey dot com, said such cuts "could include head count reductions."
|
8
|
+
Layoffs to some degree are inevitable, said IDC analyst Roger Kay. For years,
|
9
|
+
the company enjoyed a lower cost structure than other PC makers because
|
10
|
+
it sold computers directly.
|
11
|
+
International Star Inc. (info@ITSR.com - OTC BB: ILST) announced that the annual meeting of
|
12
|
+
shareholders of International Star Inc. will be held on May 19, 2008,
|
13
|
+
at 3:00 p.m. (local time) at The Hilton Hotel, 104 Market Street,
|
14
|
+
Shreveport, La., 71101.
|
metadata
CHANGED
@@ -1,84 +1,86 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: dover_to_calais
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
5
|
-
prerelease:
|
4
|
+
version: 0.2.0
|
6
5
|
platform: ruby
|
7
6
|
authors:
|
8
7
|
- Fred Heath
|
9
8
|
autorequire:
|
10
9
|
bindir: bin
|
11
10
|
cert_chain: []
|
12
|
-
date:
|
11
|
+
date: 2014-02-10 00:00:00.000000000 Z
|
13
12
|
dependencies:
|
14
13
|
- !ruby/object:Gem::Dependency
|
15
14
|
name: nokogiri
|
16
15
|
requirement: !ruby/object:Gem::Requirement
|
17
|
-
none: false
|
18
16
|
requirements:
|
19
17
|
- - ~>
|
20
18
|
- !ruby/object:Gem::Version
|
21
|
-
version: 1.6
|
19
|
+
version: '1.6'
|
22
20
|
type: :runtime
|
23
21
|
prerelease: false
|
24
22
|
version_requirements: !ruby/object:Gem::Requirement
|
25
|
-
none: false
|
26
23
|
requirements:
|
27
24
|
- - ~>
|
28
25
|
- !ruby/object:Gem::Version
|
29
|
-
version: 1.6
|
26
|
+
version: '1.6'
|
30
27
|
- !ruby/object:Gem::Dependency
|
31
28
|
name: eventmachine
|
32
29
|
requirement: !ruby/object:Gem::Requirement
|
33
|
-
none: false
|
34
30
|
requirements:
|
35
31
|
- - ~>
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '1.0'
|
34
|
+
- - ! '>='
|
36
35
|
- !ruby/object:Gem::Version
|
37
36
|
version: 1.0.3
|
38
37
|
type: :runtime
|
39
38
|
prerelease: false
|
40
39
|
version_requirements: !ruby/object:Gem::Requirement
|
41
|
-
none: false
|
42
40
|
requirements:
|
43
41
|
- - ~>
|
42
|
+
- !ruby/object:Gem::Version
|
43
|
+
version: '1.0'
|
44
|
+
- - ! '>='
|
44
45
|
- !ruby/object:Gem::Version
|
45
46
|
version: 1.0.3
|
46
47
|
- !ruby/object:Gem::Dependency
|
47
48
|
name: em-http-request
|
48
49
|
requirement: !ruby/object:Gem::Requirement
|
49
|
-
none: false
|
50
50
|
requirements:
|
51
51
|
- - ~>
|
52
52
|
- !ruby/object:Gem::Version
|
53
|
-
version: 1.1
|
53
|
+
version: '1.1'
|
54
54
|
type: :runtime
|
55
55
|
prerelease: false
|
56
56
|
version_requirements: !ruby/object:Gem::Requirement
|
57
|
-
none: false
|
58
57
|
requirements:
|
59
58
|
- - ~>
|
60
59
|
- !ruby/object:Gem::Version
|
61
|
-
version: 1.1
|
60
|
+
version: '1.1'
|
62
61
|
- !ruby/object:Gem::Dependency
|
63
62
|
name: yomu
|
64
63
|
requirement: !ruby/object:Gem::Requirement
|
65
|
-
none: false
|
66
64
|
requirements:
|
67
65
|
- - ~>
|
66
|
+
- !ruby/object:Gem::Version
|
67
|
+
version: '0.1'
|
68
|
+
- - ! '>='
|
68
69
|
- !ruby/object:Gem::Version
|
69
70
|
version: 0.1.9
|
70
71
|
type: :runtime
|
71
72
|
prerelease: false
|
72
73
|
version_requirements: !ruby/object:Gem::Requirement
|
73
|
-
none: false
|
74
74
|
requirements:
|
75
75
|
- - ~>
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: '0.1'
|
78
|
+
- - ! '>='
|
76
79
|
- !ruby/object:Gem::Version
|
77
80
|
version: 0.1.9
|
78
81
|
- !ruby/object:Gem::Dependency
|
79
82
|
name: bundler
|
80
83
|
requirement: !ruby/object:Gem::Requirement
|
81
|
-
none: false
|
82
84
|
requirements:
|
83
85
|
- - ~>
|
84
86
|
- !ruby/object:Gem::Version
|
@@ -86,7 +88,6 @@ dependencies:
|
|
86
88
|
type: :development
|
87
89
|
prerelease: false
|
88
90
|
version_requirements: !ruby/object:Gem::Requirement
|
89
|
-
none: false
|
90
91
|
requirements:
|
91
92
|
- - ~>
|
92
93
|
- !ruby/object:Gem::Version
|
@@ -94,26 +95,64 @@ dependencies:
|
|
94
95
|
- !ruby/object:Gem::Dependency
|
95
96
|
name: rake
|
96
97
|
requirement: !ruby/object:Gem::Requirement
|
97
|
-
none: false
|
98
98
|
requirements:
|
99
|
-
- -
|
99
|
+
- - ~>
|
100
100
|
- !ruby/object:Gem::Version
|
101
101
|
version: '0'
|
102
102
|
type: :development
|
103
103
|
prerelease: false
|
104
104
|
version_requirements: !ruby/object:Gem::Requirement
|
105
|
-
none: false
|
106
105
|
requirements:
|
107
|
-
- -
|
106
|
+
- - ~>
|
108
107
|
- !ruby/object:Gem::Version
|
109
108
|
version: '0'
|
109
|
+
- !ruby/object:Gem::Dependency
|
110
|
+
name: cucumber
|
111
|
+
requirement: !ruby/object:Gem::Requirement
|
112
|
+
requirements:
|
113
|
+
- - ~>
|
114
|
+
- !ruby/object:Gem::Version
|
115
|
+
version: '1.3'
|
116
|
+
- - ! '>='
|
117
|
+
- !ruby/object:Gem::Version
|
118
|
+
version: 1.3.8
|
119
|
+
type: :development
|
120
|
+
prerelease: false
|
121
|
+
version_requirements: !ruby/object:Gem::Requirement
|
122
|
+
requirements:
|
123
|
+
- - ~>
|
124
|
+
- !ruby/object:Gem::Version
|
125
|
+
version: '1.3'
|
126
|
+
- - ! '>='
|
127
|
+
- !ruby/object:Gem::Version
|
128
|
+
version: 1.3.8
|
129
|
+
- !ruby/object:Gem::Dependency
|
130
|
+
name: rspec
|
131
|
+
requirement: !ruby/object:Gem::Requirement
|
132
|
+
requirements:
|
133
|
+
- - ~>
|
134
|
+
- !ruby/object:Gem::Version
|
135
|
+
version: '2.14'
|
136
|
+
- - ! '>='
|
137
|
+
- !ruby/object:Gem::Version
|
138
|
+
version: 2.14.1
|
139
|
+
type: :development
|
140
|
+
prerelease: false
|
141
|
+
version_requirements: !ruby/object:Gem::Requirement
|
142
|
+
requirements:
|
143
|
+
- - ~>
|
144
|
+
- !ruby/object:Gem::Version
|
145
|
+
version: '2.14'
|
146
|
+
- - ! '>='
|
147
|
+
- !ruby/object:Gem::Version
|
148
|
+
version: 2.14.1
|
110
149
|
description: ! "DoverToCalais allows the user to send a wide range of data sources
|
111
150
|
(files & URLs)\n to OpenCalais and receive asynchronous
|
112
151
|
responses when OpenCalais has finished processing\n the
|
113
152
|
inputs. In addition, DoverToCalais enables the filtering of the response in order
|
114
153
|
to\n find relevant tags and/or tag values. "
|
115
154
|
email:
|
116
|
-
-
|
155
|
+
- fred_h@bootstrap.me.uk
|
117
156
|
executables: []
|
118
157
|
extensions: []
|
119
158
|
extra_rdoc_files: []
|
@@ -125,32 +164,51 @@ files:
|
|
125
164
|
- README.md
|
126
165
|
- Rakefile
|
127
166
|
- dover_to_calais.gemspec
|
167
|
+
- features/data_sources.feature
|
168
|
+
- features/filtering.feature
|
169
|
+
- features/step_definitions/data_sources_steps.rb
|
170
|
+
- features/step_definitions/filtering_steps.rb
|
128
171
|
- lib/dover_to_calais.rb
|
129
172
|
- lib/dover_to_calais/ontology.rb
|
130
173
|
- lib/dover_to_calais/version.rb
|
131
|
-
|
174
|
+
- test/test_file_1.doc
|
175
|
+
- test/test_file_1.html
|
176
|
+
- test/test_file_1.odt
|
177
|
+
- test/test_file_1.pdf
|
178
|
+
- test/test_file_1.rtf
|
179
|
+
- test/test_file_1.txt
|
180
|
+
homepage: https://github.com/RedFred7/dover_to_calais
|
132
181
|
licenses:
|
133
182
|
- MIT
|
183
|
+
metadata: {}
|
134
184
|
post_install_message:
|
135
185
|
rdoc_options: []
|
136
186
|
require_paths:
|
137
187
|
- lib
|
138
188
|
required_ruby_version: !ruby/object:Gem::Requirement
|
139
|
-
none: false
|
140
189
|
requirements:
|
141
190
|
- - ! '>='
|
142
191
|
- !ruby/object:Gem::Version
|
143
192
|
version: '0'
|
144
193
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
145
|
-
none: false
|
146
194
|
requirements:
|
147
195
|
- - ! '>='
|
148
196
|
- !ruby/object:Gem::Version
|
149
197
|
version: '0'
|
150
198
|
requirements: []
|
151
199
|
rubyforge_project:
|
152
|
-
rubygems_version:
|
200
|
+
rubygems_version: 2.2.0
|
153
201
|
signing_key:
|
154
|
-
specification_version:
|
202
|
+
specification_version: 4
|
155
203
|
summary: An easy-to-use wrapper round the OpenCalais semantic analysis web service.
|
156
|
-
test_files:
|
204
|
+
test_files:
|
205
|
+
- features/data_sources.feature
|
206
|
+
- features/filtering.feature
|
207
|
+
- features/step_definitions/data_sources_steps.rb
|
208
|
+
- features/step_definitions/filtering_steps.rb
|
209
|
+
- test/test_file_1.doc
|
210
|
+
- test/test_file_1.html
|
211
|
+
- test/test_file_1.odt
|
212
|
+
- test/test_file_1.pdf
|
213
|
+
- test/test_file_1.rtf
|
214
|
+
- test/test_file_1.txt
|