bookshark 1.0.0.alpha.5 → 1.0.0.alpha.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 818b101a82314fcff676b111f3e407821a9f0dc5
4
- data.tar.gz: 12dd9368013a7911b4ce8c3f799db90e42c80671
3
+ metadata.gz: 86da0016db68373dc847457106fcddacbd8a62f2
4
+ data.tar.gz: f1127ec79a44ce2e6ba297bff7af9cccb686597c
5
5
  SHA512:
6
- metadata.gz: b937fcae31844c3742ff6ad00ef91e27a24fe2d8f67c2319fbafeab5db16e1fdc382e26fb37a9d1409ba446fefe8c4407b9cc3d7d95038da6533e09aac2a8900
7
- data.tar.gz: 70932725282459f1b6c161517630c5a8edd364efca16d4f2e41b1170ebba721368b9fef14eab7d9fa710c2ebd2e099374d76f53ee15c40b9df0dfa99799eab0e
6
+ metadata.gz: b645206269b91d344d392fcbad2d547210bbcd57d3628d8f60ce2e31638a030cb3975326acd057b29d9b4739e30396885243c701a45662b3c4d5450b774a2e4d
7
+ data.tar.gz: 1529775eb72d0fc3277491db186c4d1e6cdcf2284f8fabca8b18c4c75f5d7a6ce2b79df87a57a806ca744c2aaf15ba2aa44e4c0e1f5f8a473bcafcc12cd86c07
@@ -0,0 +1,8 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.2.0
4
+ - 2.1.0
5
+ - 2.0.0
6
+ - 1.9.3
7
+ - rbx-2
8
+ - rbx-2.5.2
data/README.md CHANGED
@@ -5,6 +5,8 @@ A ruby library for book metadata extraction from biblionet.gr which
5
5
  extracts books, authors, publishers and ddc metatdata.
6
6
  The representation of bibliographic metadata in JSON is inspired by [BibJSON](http://okfnlabs.org/bibjson/) but some tags may be different.
7
7
 
8
+ [![Build Status](https://travis-ci.org/dklisiaris/bookshark.svg?branch=master)](https://travis-ci.org/dklisiaris/bookshark)
9
+
8
10
  ## Installation
9
11
 
10
12
  Add this line to your application's Gemfile:
@@ -22,6 +24,7 @@ Or install it yourself as:
22
24
  $ gem install bookshark --pre
23
25
 
24
26
  Require and include bookshark in your class/module.
27
+
25
28
  ```ruby
26
29
  require 'bookshark'
27
30
  class Foo
@@ -29,6 +32,7 @@ class Foo
29
32
  end
30
33
  ```
31
34
  Alternatively you can use this syntax
35
+
32
36
  ```ruby
33
37
  Bookshark::Extractor.new
34
38
 
@@ -41,6 +45,7 @@ Extractor.new
41
45
  An extractor object must be created in order to perform any metadata extractions.
42
46
 
43
47
  Create an extractor object
48
+
44
49
  ```ruby
45
50
  Extractor.new
46
51
  Extractor.new(format: 'json')
@@ -58,15 +63,24 @@ Extractor.new(format: 'hash', site: 'biblionet')
58
63
  ### Extract Book Data
59
64
 
60
65
  You need book's id on biblionet website or its uri.
61
- Currently more advanced search functions based on title and author are not available, but they will be until the stable version 1.0.0 release.
62
-
63
66
  First create an extractor object:
67
+
64
68
  ```ruby
65
69
  # Create a new extractor object with pretty json format.
66
70
  extractor = Extractor.new(format: 'pretty_json')
67
71
  ```
68
72
  Then you can extract books
73
+
69
74
  ```ruby
75
+ # Extract book with isbn 960-14-1157-7 from website
76
+ extractor.book(isbn: '960-14-1157-7')
77
+
78
+ # ISBN-13 also works
79
+ extractor.book(isbn: '978-960-14-1157-6')
80
+
81
+ # ISBN without any dashes is ok too
82
+ extractor.book(isbn: '9789601411576')
83
+
70
84
  # Extract book with id 103788 from website
71
85
  extractor.book(id: 103788)
72
86
 
@@ -76,6 +90,7 @@ extractor.book(uri: 'http://biblionet.gr/book/103788/')
76
90
  # Extract book with id 103788 from local storage
77
91
  extractor.book(id: 103788, local: true)
78
92
  ```
93
+ For more options, like book's title or author, use the search method which is described below.
79
94
 
80
95
  **Book Options**
81
96
  (Recommended option is to use just the id and let bookshark to generate uri):
@@ -102,6 +117,7 @@ extractor.book(id: 103788, eager: true)
102
117
  ```
103
118
 
104
119
  The expected result of a book extraction is something like this:
120
+
105
121
  ```json
106
122
  {
107
123
  "book": [
@@ -131,7 +147,6 @@ The expected result of a book extraction is something like this:
131
147
  "name": "Εκδοτικός Οίκος Α. Α. Λιβάνη",
132
148
  "b_id": "271"
133
149
  },
134
-
135
150
  "publication_year": "2006",
136
151
  "pages": "326",
137
152
  "isbn": "960-14-1157-7",
@@ -139,7 +154,6 @@ The expected result of a book extraction is something like this:
139
154
  "status": "Κυκλοφορεί",
140
155
  "price": "16,31",
141
156
  "award": [
142
-
143
157
  ],
144
158
  "description": "Τι είναι πιο επικίνδυνο, ένα όπλο ή μια πισίνα; Τι κοινό έχουν οι δάσκαλοι με τους παλαιστές του σούμο;...",
145
159
  "category": [
@@ -156,14 +170,109 @@ The expected result of a book extraction is something like this:
156
170
  ```
157
171
  Here is a [Book Sample](https://gist.github.com/dklisiaris/a6f3d6f37806186f3c79) extracted with eager option enabled.
158
172
 
173
+ ### Book Search
174
+ Instead of providing the exact book id and extract that book directly, a search function can be used to get one or more books based on some parameters.
175
+
176
+ ```ruby
177
+ # Create a new extractor object with pretty json format.
178
+ extractor = Extractor.new(format: 'pretty_json')
179
+
180
+ # Extract books with these words in title
181
+ extractor.search(title: 'σημεια και τερατα')
182
+
183
+ # Extract books with these words in title and this name in author
184
+ extractor.search(title: 'χομπιτ', author: 'τολκιν', results_type: 'metadata')
185
+
186
+ # Extract books from specific author, published after 1984
187
+ extractor.search(author: 'arthur doyle', after_year: '2010')
188
+
189
+ # Extract ids of books books with these words in title and this name in author
190
+ extractor.search(title: 'αρχοντας', author: 'τολκιν', results_type: 'ids')
191
+ ```
192
+ Searching and extracting several books can be very slow at times, so instead of extracting every single book you may prefer only the ids of found books. In that case pass the option `results_type: 'ids'`.
193
+
194
+ **Search Options**:
195
+ With enought options you can customize your query to your needs. It is recommended to use at least two of the search options.
196
+
197
+ * title (The title of book to search)
198
+ * author (The author's last name is enough for filter the search)
199
+ * publisher
200
+ * category
201
+ * title_split
202
+ * 0 (The exact title phrase must by matched)
203
+ * 1 (Default - All the words in title must be matched in whatever order)
204
+ * 2 (At least one word should match)
205
+ * book_id (Providing id means only one book should returned)
206
+ * isbn
207
+ * author_id (ID of the selected author)
208
+ * publisher_id
209
+ * category_id
210
+ * after_year (Published this year or later)
211
+ * before_year (Published this year or before)
212
+ * results_type
213
+ * metadata (Default - Every book is extracted and an array of metadata is returned)
214
+ * ids (Only ids are returned)
215
+ * format : The format in which the extracted data are returned
216
+ * hash (default)
217
+ * json
218
+ * pretty_json
219
+
220
+ Results with ids option look like that:
221
+
222
+ ```json
223
+ {
224
+ "book": [
225
+ "119000",
226
+ "103788",
227
+ "87815",
228
+ "87812",
229
+ "15839",
230
+ "77381",
231
+ "46856",
232
+ "46763",
233
+ "33301"
234
+ ]
235
+ }
236
+ ```
237
+ Normally results are multiple books like the ones in book extractors:
238
+
239
+ ```json
240
+ {
241
+ "book": [
242
+ {
243
+ "title": "Στης Χλόης τα απόκρυφα",
244
+ "subtitle": "…και άλλα σημεία και τέρατα",
245
+ "... Rest of Metadata ...": "... condensed ..."
246
+ },
247
+ {
248
+ "title": "Σημεία και τέρατα της οικονομίας",
249
+ "subtitle": "Η κρυφή πλευρά των πάντων",
250
+ "... Rest of Metadata ...": "... condensed ..."
251
+ },
252
+ {
253
+ "title": "Και άλλα σημεία και τέρατα από την ιστορία",
254
+ "subtitle": null,
255
+ "... Rest of Metadata ...": "... condensed ..."
256
+ },
257
+ {
258
+ "title": "Σημεία και τέρατα από την ιστορία",
259
+ "subtitle": null,
260
+ "... Rest of Metadata ...": "... condensed ..."
261
+ }
262
+ ]
263
+ }
264
+ ```
265
+
159
266
  ### Extract Author Data
160
267
 
161
268
  You need author's id on biblionet website or his uri
269
+
162
270
  ```ruby
163
271
  Extractor.new.author(id: 10207)
164
272
  Extractor.new(format: 'json').author(uri: 'http://www.biblionet.gr/author/10207/')
165
273
  ```
166
274
  Extraction from local saved html pages is also possible, but not recommended
275
+
167
276
  ```ruby
168
277
  extractor = Extractor.new(format: 'json')
169
278
  extractor.author(uri: 'storage/html_author_pages/2/author_2423.html', local: true)
@@ -174,6 +283,7 @@ extractor.author(uri: 'storage/html_author_pages/2/author_2423.html', local: tru
174
283
  * local : Boolean value. Has page been saved locally? (default is false)
175
284
 
176
285
  The expected result of an author extraction is something like this:
286
+
177
287
  ```json
178
288
  {
179
289
  "author": [
@@ -200,6 +310,7 @@ So, it is easy to include metadata for multiple authors or even for multiple typ
200
310
 
201
311
  ### Extract Publisher Data
202
312
  Methods are pretty same as author:
313
+
203
314
  ```ruby
204
315
  # Create a new extractor object with pretty json format.
205
316
  extractor = Extractor.new(format: 'pretty_json')
@@ -224,6 +335,7 @@ extractor.publisher(id: 20, local: true)
224
335
  * pretty_json
225
336
 
226
337
  The expected result of an author extraction is something like this:
338
+
227
339
  ```json
228
340
  {
229
341
  "publisher": [
@@ -271,6 +383,7 @@ The expected result of an author extraction is something like this:
271
383
  ```
272
384
  ### Extract Categories
273
385
  Biblionet's categories are based on [Dewey Decimal Classification](http://en.wikipedia.org/wiki/Dewey_Decimal_Classification). It is possible to extract these categories also as seen below.
386
+
274
387
  ```ruby
275
388
  # Create a new extractor object with pretty json format.
276
389
  extractor = Extractor.new(format: 'pretty_json')
@@ -298,6 +411,7 @@ Notice that when you are extracting a category you also extract parent categorie
298
411
 
299
412
  The expected result of a category extraction is something like this:
300
413
  (Here the extracted category is the 1041, but parent and sub categories were also extracted.
414
+
301
415
  ```json
302
416
  {
303
417
  "category": [
@@ -341,98 +455,8 @@ The expected result of a category extraction is something like this:
341
455
  }
342
456
  ]
343
457
  }
344
- Notice that the last item is the current category. The rest is the category tree.
345
-
346
- ```
347
- ### Book Search
348
- Instead of providing the exact book id and extract that book directly, a search function can be used to get one or more books based on some parameters.
349
- ```ruby
350
- # Create a new extractor object with pretty json format.
351
- extractor = Extractor.new(format: 'pretty_json')
352
-
353
- # Extract books with these words in title
354
- extractor.search(title: 'σημεια και τερατα')
355
-
356
- # Extract books with these words in title and this name in author
357
- extractor.search(title: 'χομπιτ', author: 'τολκιν', results_type: 'metadata')
358
-
359
- # Extract books from specific author, published after 1984
360
- extractor.search(author: 'arthur doyle', after_year: '2010')
361
-
362
- # Extract ids of books books with these words in title and this name in author
363
- extractor.search(title: 'αρχοντας', author: 'τολκιν', results_type: 'ids')
364
- ```
365
- Searching and extracting several books can be very slow at times, so instead of extracting every single book you may prefer only the ids of found books. In that case pass the option `results_type: 'ids'`.
366
-
367
- **Search Options**:
368
- With enought options you can customize your query to your needs. It is recommended to use at least two of the search options.
369
-
370
- * title (The title of book to search)
371
- * author (The author's last name is enough for filter the search)
372
- * publisher
373
- * category
374
- * title_split
375
- * 0 (The exact title phrase must by matched)
376
- * 1 (Default - All the words in title must be matched in whatever order)
377
- * 2 (At least one word should match)
378
- * book_id (Providing id means only one book should returned)
379
- * isbn
380
- * author_id (ID of the selected author)
381
- * publisher_id
382
- * category_id
383
- * after_year (Published this year or later)
384
- * before_year (Published this year or before)
385
- * results_type
386
- * metadata (Default - Every book is extracted and an array of metadata is returned)
387
- * ids (Only ids are returned)
388
- * format : The format in which the extracted data are returned
389
- * hash (default)
390
- * json
391
- * pretty_json
392
-
393
- Results with ids option look like that:
394
- ```json
395
- {
396
- "book": [
397
- "119000",
398
- "103788",
399
- "87815",
400
- "87812",
401
- "15839",
402
- "77381",
403
- "46856",
404
- "46763",
405
- "33301"
406
- ]
407
- }
408
- ```
409
- Normally results are multiple books like the ones in book extractors:
410
- ```json
411
- {
412
- "book": [
413
- {
414
- "title": "Στης Χλόης τα απόκρυφα",
415
- "subtitle": "…και άλλα σημεία και τέρατα",
416
- "... Rest of Metadata ...": "... condensed ..."
417
- },
418
- {
419
- "title": "Σημεία και τέρατα της οικονομίας",
420
- "subtitle": "Η κρυφή πλευρά των πάντων",
421
- "... Rest of Metadata ...": "... condensed ..."
422
- },
423
- {
424
- "title": "Και άλλα σημεία και τέρατα από την ιστορία",
425
- "subtitle": null,
426
- "... Rest of Metadata ...": "... condensed ..."
427
- },
428
- {
429
- "title": "Σημεία και τέρατα από την ιστορία",
430
- "subtitle": null,
431
- "... Rest of Metadata ...": "... condensed ..."
432
- }
433
- ]
434
- }
435
458
  ```
459
+ Notice that the last item is the current category. The rest is the category tree.
436
460
 
437
461
  ### Where do IDs point?
438
462
  The id of each data type points to the corresponding type webpage.
data/Rakefile CHANGED
@@ -2,3 +2,4 @@ require "bundler/gem_tasks"
2
2
 
3
3
  Dir.glob('tasks/**/*.rake').each(&method(:import))
4
4
 
5
+ task :default => :spec
@@ -13,6 +13,8 @@ Gem::Specification.new do |spec|
13
13
  spec.homepage = "https://github.com/dklisiaris/bookshark"
14
14
  spec.license = "MIT"
15
15
 
16
+ spec.required_ruby_version = '>= 1.9.3'
17
+
16
18
  spec.files = `git ls-files -z`.split("\x0")
17
19
  spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
20
  spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
@@ -23,7 +25,7 @@ Gem::Specification.new do |spec|
23
25
  spec.add_dependency "json", "~> 1.8"
24
26
  spec.add_dependency "htmlentities", "~> 4.3"
25
27
 
26
- spec.add_development_dependency "bundler", "~> 1.7"
28
+ spec.add_development_dependency "bundler", ">= 1.3"
27
29
  spec.add_development_dependency "rake", "~> 10.0"
28
30
  spec.add_development_dependency 'rspec', "~> 3.1"
29
31
  end
@@ -17,9 +17,10 @@ module Bookshark
17
17
  }
18
18
 
19
19
  def self.root
20
- File.dirname __dir__
20
+ # File.dirname __dir__ # Works only on ruby > 2.0.0
21
+ File.expand_path(File.join(File.dirname(__FILE__), '../'))
21
22
  end
22
-
23
+
23
24
  def self.path_to_storage
24
25
  File.join root, 'lib/bookshark/storage'
25
26
  end
@@ -66,10 +67,15 @@ module Bookshark
66
67
 
67
68
  def book(options = {})
68
69
  book_extractor = Biblionet::Extractors::BookExtractor.new
70
+
71
+ if book_extractor.present?(options[:isbn])
72
+ search_engine = Biblionet::Extractors::Search.new
73
+ options[:id] = search_engine.search_by_isbn(options[:isbn])
74
+ end
69
75
 
70
76
  uri = process_options(options, __method__)
71
77
  options[:format] ||= @format
72
- options[:eager] ||= false
78
+ options[:eager] ||= false
73
79
 
74
80
  if options[:eager]
75
81
  book = eager_extract_book(uri)
@@ -1,3 +1,6 @@
1
+ #!/bin/env ruby
2
+ # encoding: utf-8
3
+
1
4
  require_relative 'base'
2
5
 
3
6
  module Biblionet
@@ -1,3 +1,6 @@
1
+ #!/bin/env ruby
2
+ # encoding: utf-8
3
+
1
4
  require 'rubygems'
2
5
  require 'open-uri'
3
6
  require 'fileutils'
@@ -1,3 +1,6 @@
1
+ #!/bin/env ruby
2
+ # encoding: utf-8
3
+
1
4
  require_relative 'base'
2
5
  require 'sanitize'
3
6
 
@@ -1,3 +1,6 @@
1
+ #!/bin/env ruby
2
+ # encoding: utf-8
3
+
1
4
  require_relative 'base'
2
5
 
3
6
  module Biblionet
@@ -1,3 +1,6 @@
1
+ #!/bin/env ruby
2
+ # encoding: utf-8
3
+
1
4
  require_relative 'base'
2
5
 
3
6
  module Biblionet
@@ -1,3 +1,6 @@
1
+ #!/bin/env ruby
2
+ # encoding: utf-8
3
+
1
4
  require_relative 'book_extractor'
2
5
 
3
6
  module Biblionet
@@ -40,6 +43,11 @@ module Biblionet
40
43
  return books
41
44
  end
42
45
 
46
+ def search_by_isbn(isbn)
47
+ results = perform_search(isbn: isbn, results_type: 'ids')
48
+ book_id = results.empty? ? nil : results.first.to_i
49
+ end
50
+
43
51
  def build_search_url(options = {})
44
52
  title = present?(options[:title]) ? options[:title].gsub(' ','+') : ''
45
53
  author = present?(options[:author]) ? options[:author].gsub(' ','+') : ''
@@ -1,3 +1,3 @@
1
1
  module Bookshark
2
- VERSION = "1.0.0.alpha.5"
2
+ VERSION = "1.0.0.alpha.7"
3
3
  end
@@ -1,3 +1,6 @@
1
+ #!/bin/env ruby
2
+ # encoding: utf-8
3
+
1
4
  require 'spec_helper'
2
5
 
3
6
  describe Bookshark::Extractor do
@@ -120,12 +123,25 @@ describe Bookshark::Extractor do
120
123
  it 'reads html from the web and eager extracts all book and reference data' do
121
124
  expect(subject.book(id: 184923, eager: true)).to eq eager_book_184923
122
125
  end
126
+
127
+ it 'reads html from the web based on given isbn and extracts book data' do
128
+ expect(subject.book(isbn: '960-14-1157-7')).to eq book_103788
129
+ end
130
+
131
+ it 'reads html from the web based on given isbn-13 and extracts book data' do
132
+ expect(subject.book(isbn: '978-960-14-1157-6')).to eq book_103788
133
+ end
123
134
  end
124
135
  context 'when the book doesnt exist' do
125
136
  it 'returns an empty array' do
126
137
  expect(subject.book(id: 0)).to eq empty_book
127
138
  end
128
139
  end
140
+ context 'when the books isbn is nonsense' do
141
+ it 'returns an empty array' do
142
+ expect(subject.book(isbn: 'wrong-isbn')).to eq empty_book
143
+ end
144
+ end
129
145
  context 'when no options are set' do
130
146
  it 'returns an empty array' do
131
147
  expect(subject.book).to eq empty_book
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bookshark
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0.alpha.5
4
+ version: 1.0.0.alpha.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dimitris Klisiaris
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-03-24 00:00:00.000000000 Z
11
+ date: 2015-03-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -76,16 +76,16 @@ dependencies:
76
76
  name: bundler
77
77
  requirement: !ruby/object:Gem::Requirement
78
78
  requirements:
79
- - - "~>"
79
+ - - ">="
80
80
  - !ruby/object:Gem::Version
81
- version: '1.7'
81
+ version: '1.3'
82
82
  type: :development
83
83
  prerelease: false
84
84
  version_requirements: !ruby/object:Gem::Requirement
85
85
  requirements:
86
- - - "~>"
86
+ - - ">="
87
87
  - !ruby/object:Gem::Version
88
- version: '1.7'
88
+ version: '1.3'
89
89
  - !ruby/object:Gem::Dependency
90
90
  name: rake
91
91
  requirement: !ruby/object:Gem::Requirement
@@ -123,6 +123,7 @@ extra_rdoc_files: []
123
123
  files:
124
124
  - ".gitignore"
125
125
  - ".rspec"
126
+ - ".travis.yml"
126
127
  - Gemfile
127
128
  - LICENSE.txt
128
129
  - README.md
@@ -170,7 +171,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
170
171
  requirements:
171
172
  - - ">="
172
173
  - !ruby/object:Gem::Version
173
- version: '0'
174
+ version: 1.9.3
174
175
  required_rubygems_version: !ruby/object:Gem::Requirement
175
176
  requirements:
176
177
  - - ">"