RubyGems - bookshark - Versions diffs - 1.0.0.alpha.5 → 1.0.0.alpha.7 - Mend

bookshark 1.0.0.alpha.5 → 1.0.0.alpha.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/.travis.yml +8 -0
data/README.md +119 -95
data/Rakefile +1 -0
data/bookshark.gemspec +3 -1
data/lib/bookshark.rb +9 -3
data/lib/bookshark/extractors/author_extractor.rb +3 -0
data/lib/bookshark/extractors/base.rb +3 -0
data/lib/bookshark/extractors/book_extractor.rb +3 -0
data/lib/bookshark/extractors/category_extractor.rb +3 -0
data/lib/bookshark/extractors/publisher_extractor.rb +3 -0
data/lib/bookshark/extractors/search.rb +8 -0
data/lib/bookshark/version.rb +1 -1
data/spec/bookshark_spec.rb +16 -0
metadata +8 -7

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 818b101a82314fcff676b111f3e407821a9f0dc5
-  data.tar.gz: 12dd9368013a7911b4ce8c3f799db90e42c80671
+  metadata.gz: 86da0016db68373dc847457106fcddacbd8a62f2
+  data.tar.gz: f1127ec79a44ce2e6ba297bff7af9cccb686597c
 SHA512:
-  metadata.gz: b937fcae31844c3742ff6ad00ef91e27a24fe2d8f67c2319fbafeab5db16e1fdc382e26fb37a9d1409ba446fefe8c4407b9cc3d7d95038da6533e09aac2a8900
-  data.tar.gz: 70932725282459f1b6c161517630c5a8edd364efca16d4f2e41b1170ebba721368b9fef14eab7d9fa710c2ebd2e099374d76f53ee15c40b9df0dfa99799eab0e
+  metadata.gz: b645206269b91d344d392fcbad2d547210bbcd57d3628d8f60ce2e31638a030cb3975326acd057b29d9b4739e30396885243c701a45662b3c4d5450b774a2e4d
+  data.tar.gz: 1529775eb72d0fc3277491db186c4d1e6cdcf2284f8fabca8b18c4c75f5d7a6ce2b79df87a57a806ca744c2aaf15ba2aa44e4c0e1f5f8a473bcafcc12cd86c07

data/.travis.yml ADDED

@@ -0,0 +1,8 @@
+language: ruby
+rvm:
+  - 2.2.0
+  - 2.1.0
+  - 2.0.0
+  - 1.9.3
+  - rbx-2
+  - rbx-2.5.2

data/README.md CHANGED

@@ -5,6 +5,8 @@ A ruby library for book metadata extraction from biblionet.gr which
 extracts books, authors, publishers and ddc metatdata.
 The representation of bibliographic metadata in JSON is inspired by [BibJSON](http://okfnlabs.org/bibjson/) but some tags may be different.
+[![Build Status](https://travis-ci.org/dklisiaris/bookshark.svg?branch=master)](https://travis-ci.org/dklisiaris/bookshark)
 ## Installation
 Add this line to your application's Gemfile:
@@ -22,6 +24,7 @@ Or install it yourself as:
     $ gem install bookshark --pre
 Require and include bookshark in your class/module.
 ```ruby
 require 'bookshark'
 class Foo
@@ -29,6 +32,7 @@ class Foo
 end
 ```
 Alternatively you can use this syntax
 ```ruby
 Bookshark::Extractor.new
@@ -41,6 +45,7 @@ Extractor.new
 An extractor object must be created in order to perform any metadata extractions.
 Create an extractor object
 ```ruby
 Extractor.new
 Extractor.new(format: 'json')
@@ -58,15 +63,24 @@ Extractor.new(format: 'hash', site: 'biblionet')
 ### Extract Book Data
 You need book's id on biblionet website or its uri.
-Currently more advanced search functions based on title and author are not available, but they will be until the stable version 1.0.0 release.
 First create an extractor object:
 ```ruby
 # Create a new extractor object with pretty json format.
 extractor = Extractor.new(format: 'pretty_json')
 ```
 Then you can extract books
 ```ruby
+# Extract book with isbn 960-14-1157-7 from website
+extractor.book(isbn: '960-14-1157-7')
+# ISBN-13 also works
+extractor.book(isbn: '978-960-14-1157-6')
+# ISBN without any dashes is ok too
+extractor.book(isbn: '9789601411576')
 # Extract book with id 103788 from website
 extractor.book(id: 103788)
@@ -76,6 +90,7 @@ extractor.book(uri: 'http://biblionet.gr/book/103788/')
 # Extract book with id 103788 from local storage
 extractor.book(id: 103788, local: true)
 ```
+For more options, like book's title or author, use the search method which is described below.
 **Book Options**
 (Recommended option is to use just the id and let bookshark to generate uri):
@@ -102,6 +117,7 @@ extractor.book(id: 103788, eager: true)
 ```
 The expected result of a book extraction is something like this:
 ```json
 {
   "book": [
@@ -131,7 +147,6 @@ The expected result of a book extraction is something like this:
         "name": "Εκδοτικός Οίκος Α. Α. Λιβάνη",
         "b_id": "271"
       },
       "publication_year": "2006",
       "pages": "326",
       "isbn": "960-14-1157-7",
@@ -139,7 +154,6 @@ The expected result of a book extraction is something like this:
       "status": "Κυκλοφορεί",
       "price": "16,31",
       "award": [
       ],
       "description": "Τι είναι πιο επικίνδυνο, ένα όπλο ή μια πισίνα; Τι κοινό έχουν οι δάσκαλοι με τους παλαιστές του σούμο;...",
       "category": [
@@ -156,14 +170,109 @@ The expected result of a book extraction is something like this:
 ```
 Here is a [Book Sample](https://gist.github.com/dklisiaris/a6f3d6f37806186f3c79) extracted with eager option enabled.
+### Book Search
+Instead of providing the exact book id and extract that book directly, a search function can be used to get one or more books based on some parameters.
+```ruby
+# Create a new extractor object with pretty json format.
+extractor = Extractor.new(format: 'pretty_json')
+# Extract books with these words in title
+extractor.search(title: 'σημεια και τερατα')
+# Extract books with these words in title and this name in author
+extractor.search(title: 'χομπιτ', author: 'τολκιν', results_type: 'metadata')
+# Extract books from specific author, published after 1984
+extractor.search(author: 'arthur doyle', after_year: '2010')
+# Extract ids of books books with these words in title and this name in author
+extractor.search(title: 'αρχοντας', author: 'τολκιν', results_type: 'ids')
+```
+Searching and extracting several books can be very slow at times, so instead of extracting every single book you may prefer only the ids of found books. In that case pass the option `results_type: 'ids'`.
+**Search Options**:
+With enought options you can customize your query to your needs. It is recommended to use at least two of the search options.
+* title (The title of book to search)
+* author (The author's last name is enough for filter the search)
+* publisher
+* category
+* title_split
+  * 0 (The exact title phrase must by matched)
+  * 1 (Default - All the words in title must be matched in whatever order)
+  * 2 (At least one word should match)
+* book_id (Providing id means only one book should returned)
+* isbn
+* author_id (ID of the selected author)
+* publisher_id
+* category_id
+* after_year (Published this year or later)
+* before_year (Published this year or before)
+* results_type
+  * metadata (Default - Every book is extracted and an array of metadata is returned)
+  * ids (Only ids are returned)
+* format : The format in which the extracted data are returned
+  * hash (default)
+  * json
+  * pretty_json
+Results with ids option look like that:
+```json
+{
+ "book": [
+    "119000",
+    "103788",
+    "87815",
+    "87812",
+    "15839",
+    "77381",
+    "46856",
+    "46763",
+    "33301"
+  ]
+}
+```
+Normally results are multiple books like the ones in book extractors:
+```json
+{
+  "book": [
+    {
+      "title": "Στης Χλόης τα απόκρυφα",
+      "subtitle": "…και άλλα σημεία και τέρατα",
+      "... Rest of Metadata ...": "... condensed ..."
+    },
+    {
+      "title": "Σημεία και τέρατα της οικονομίας",
+      "subtitle": "Η κρυφή πλευρά των πάντων",
+      "... Rest of Metadata ...": "... condensed ..."
+    },
+    {
+      "title": "Και άλλα σημεία και τέρατα από την ιστορία",
+      "subtitle": null,
+      "... Rest of Metadata ...": "... condensed ..."
+    },
+    {
+      "title": "Σημεία και τέρατα από την ιστορία",
+      "subtitle": null,
+      "... Rest of Metadata ...": "... condensed ..."
+    }
+  ]
+}
+```
 ### Extract Author Data
 You need author's id on biblionet website or his uri
 ```ruby
 Extractor.new.author(id: 10207)
 Extractor.new(format: 'json').author(uri: 'http://www.biblionet.gr/author/10207/')
 ```
 Extraction from local saved html pages is also possible, but not recommended
 ```ruby
 extractor = Extractor.new(format: 'json')
 extractor.author(uri: 'storage/html_author_pages/2/author_2423.html', local: true)
@@ -174,6 +283,7 @@ extractor.author(uri: 'storage/html_author_pages/2/author_2423.html', local: tru
 * local : Boolean value. Has page been saved locally? (default is false)
 The expected result of an author extraction is something like this:
 ```json
 {
   "author": [
@@ -200,6 +310,7 @@ So, it is easy to include metadata for multiple authors or even for multiple typ
 ### Extract Publisher Data
 Methods are pretty same as author:
 ```ruby
 # Create a new extractor object with pretty json format.
 extractor = Extractor.new(format: 'pretty_json')
@@ -224,6 +335,7 @@ extractor.publisher(id: 20, local: true)
   * pretty_json
 The expected result of an author extraction is something like this:
 ```json
 {
   "publisher": [
@@ -271,6 +383,7 @@ The expected result of an author extraction is something like this:
 ```
 ### Extract Categories
 Biblionet's categories are based on [Dewey Decimal Classification](http://en.wikipedia.org/wiki/Dewey_Decimal_Classification). It is possible to extract these categories also as seen below.
 ```ruby
 # Create a new extractor object with pretty json format.
 extractor = Extractor.new(format: 'pretty_json')
@@ -298,6 +411,7 @@ Notice that when you are extracting a category you also extract parent categorie
 The expected result of a category extraction is something like this:
 (Here the extracted category is the 1041, but parent and sub categories were also extracted.
 ```json
 {
   "category": [
@@ -341,98 +455,8 @@ The expected result of a category extraction is something like this:
     }
   ]
 }
-Notice that the last item is the current category. The rest is the category tree.
-```
-### Book Search
-Instead of providing the exact book id and extract that book directly, a search function can be used to get one or more books based on some parameters.
-```ruby
-# Create a new extractor object with pretty json format.
-extractor = Extractor.new(format: 'pretty_json')
-# Extract books with these words in title
-extractor.search(title: 'σημεια και τερατα')
-# Extract books with these words in title and this name in author
-extractor.search(title: 'χομπιτ', author: 'τολκιν', results_type: 'metadata')
-# Extract books from specific author, published after 1984
-extractor.search(author: 'arthur doyle', after_year: '2010')
-# Extract ids of books books with these words in title and this name in author
-extractor.search(title: 'αρχοντας', author: 'τολκιν', results_type: 'ids')
-```
-Searching and extracting several books can be very slow at times, so instead of extracting every single book you may prefer only the ids of found books. In that case pass the option `results_type: 'ids'`.
-**Search Options**:
-With enought options you can customize your query to your needs. It is recommended to use at least two of the search options.
-* title (The title of book to search)
-* author (The author's last name is enough for filter the search)
-* publisher
-* category
-* title_split
-  * 0 (The exact title phrase must by matched)
-  * 1 (Default - All the words in title must be matched in whatever order)
-  * 2 (At least one word should match)
-* book_id (Providing id means only one book should returned)
-* isbn
-* author_id (ID of the selected author)
-* publisher_id
-* category_id
-* after_year (Published this year or later)
-* before_year (Published this year or before)
-* results_type
-  * metadata (Default - Every book is extracted and an array of metadata is returned)
-  * ids (Only ids are returned)
-* format : The format in which the extracted data are returned
-  * hash (default)
-  * json
-  * pretty_json
-Results with ids option look like that:
-```json
-{
- "book": [
-    "119000",
-    "103788",
-    "87815",
-    "87812",
-    "15839",
-    "77381",
-    "46856",
-    "46763",
-    "33301"
-  ]
-}
-```
-Normally results are multiple books like the ones in book extractors:
-```json
-{
-  "book": [
-    {
-      "title": "Στης Χλόης τα απόκρυφα",
-      "subtitle": "…και άλλα σημεία και τέρατα",
-      "... Rest of Metadata ...": "... condensed ..."
-    },
-    {
-      "title": "Σημεία και τέρατα της οικονομίας",
-      "subtitle": "Η κρυφή πλευρά των πάντων",
-      "... Rest of Metadata ...": "... condensed ..."
-    },
-    {
-      "title": "Και άλλα σημεία και τέρατα από την ιστορία",
-      "subtitle": null,
-      "... Rest of Metadata ...": "... condensed ..."
-    },
-    {
-      "title": "Σημεία και τέρατα από την ιστορία",
-      "subtitle": null,
-      "... Rest of Metadata ...": "... condensed ..."
-    }
-  ]
-}
 ```
+Notice that the last item is the current category. The rest is the category tree.
 ### Where do IDs point?
 The id of each data type points to the corresponding type webpage.

data/Rakefile CHANGED

@@ -2,3 +2,4 @@ require "bundler/gem_tasks"
 Dir.glob('tasks/**/*.rake').each(&method(:import))
+task :default => :spec

data/bookshark.gemspec CHANGED

@@ -13,6 +13,8 @@ Gem::Specification.new do |spec|
   spec.homepage      = "https://github.com/dklisiaris/bookshark"
   spec.license       = "MIT"
+  spec.required_ruby_version = '>= 1.9.3'
   spec.files         = `git ls-files -z`.split("\x0")
   spec.executables   = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
   spec.test_files    = spec.files.grep(%r{^(test|spec|features)/})
@@ -23,7 +25,7 @@ Gem::Specification.new do |spec|
   spec.add_dependency "json", "~> 1.8"
   spec.add_dependency "htmlentities", "~> 4.3"
-  spec.add_development_dependency "bundler", "~> 1.7"
+  spec.add_development_dependency "bundler", ">= 1.3"
   spec.add_development_dependency "rake", "~> 10.0"
   spec.add_development_dependency 'rspec', "~> 3.1"
 end

data/lib/bookshark.rb CHANGED

@@ -17,9 +17,10 @@ module Bookshark
   }
   def self.root
-    File.dirname __dir__
+    # File.dirname __dir__ # Works only on ruby > 2.0.0
+    File.expand_path(File.join(File.dirname(__FILE__), '../'))
   end
   def self.path_to_storage
     File.join root, 'lib/bookshark/storage'
   end
@@ -66,10 +67,15 @@ module Bookshark
     def book(options = {})
       book_extractor = Biblionet::Extractors::BookExtractor.new
+      if book_extractor.present?(options[:isbn])
+        search_engine = Biblionet::Extractors::Search.new
+        options[:id]  = search_engine.search_by_isbn(options[:isbn])
+      end
       uri = process_options(options, __method__)
       options[:format]  ||= @format
-      options[:eager]   ||= false
+      options[:eager]   ||= false
       if options[:eager]
         book = eager_extract_book(uri)

data/lib/bookshark/extractors/author_extractor.rb CHANGED

@@ -1,3 +1,6 @@
+#!/bin/env ruby
+# encoding: utf-8
 require_relative 'base'
 module Biblionet

data/lib/bookshark/extractors/base.rb CHANGED

@@ -1,3 +1,6 @@
+#!/bin/env ruby
+# encoding: utf-8
 require 'rubygems'
 require 'open-uri'
 require 'fileutils'

data/lib/bookshark/extractors/book_extractor.rb CHANGED

@@ -1,3 +1,6 @@
+#!/bin/env ruby
+# encoding: utf-8
 require_relative 'base'
 require 'sanitize'

data/lib/bookshark/extractors/category_extractor.rb CHANGED

@@ -1,3 +1,6 @@
+#!/bin/env ruby
+# encoding: utf-8
 require_relative 'base'
 module Biblionet

data/lib/bookshark/extractors/publisher_extractor.rb CHANGED

@@ -1,3 +1,6 @@
+#!/bin/env ruby
+# encoding: utf-8
 require_relative 'base'
 module Biblionet

data/lib/bookshark/extractors/search.rb CHANGED

@@ -1,3 +1,6 @@
+#!/bin/env ruby
+# encoding: utf-8
 require_relative 'book_extractor'
 module Biblionet
@@ -40,6 +43,11 @@ module Biblionet
         return books
       end
+      def search_by_isbn(isbn)
+        results = perform_search(isbn: isbn, results_type: 'ids')
+        book_id = results.empty? ? nil : results.first.to_i
+      end
       def build_search_url(options = {})
         title         = present?(options[:title])     ? options[:title].gsub(' ','+')     : ''
         author        = present?(options[:author])    ? options[:author].gsub(' ','+')    : ''

data/lib/bookshark/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Bookshark
-  VERSION = "1.0.0.alpha.5"
+  VERSION = "1.0.0.alpha.7"
 end

data/spec/bookshark_spec.rb CHANGED

@@ -1,3 +1,6 @@
+#!/bin/env ruby
+# encoding: utf-8
 require 'spec_helper'
 describe Bookshark::Extractor do
@@ -120,12 +123,25 @@ describe Bookshark::Extractor do
         it 'reads html from the web and eager extracts all book and reference data' do
           expect(subject.book(id: 184923, eager: true)).to eq eager_book_184923
         end
+        it 'reads html from the web based on given isbn and extracts book data' do
+          expect(subject.book(isbn: '960-14-1157-7')).to eq book_103788
+        end
+        it 'reads html from the web based on given isbn-13 and extracts book data' do
+          expect(subject.book(isbn: '978-960-14-1157-6')).to eq book_103788
+        end
       end
       context 'when the book doesnt exist' do
         it 'returns an empty array' do
           expect(subject.book(id: 0)).to eq empty_book
         end
       end
+      context 'when the books isbn is nonsense' do
+        it 'returns an empty array' do
+          expect(subject.book(isbn: 'wrong-isbn')).to eq empty_book
+        end
+      end
       context 'when no options are set' do
         it 'returns an empty array' do
           expect(subject.book).to eq empty_book

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: bookshark
 version: !ruby/object:Gem::Version
-  version: 1.0.0.alpha.5
+  version: 1.0.0.alpha.7
 platform: ruby
 authors:
 - Dimitris Klisiaris
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-03-24 00:00:00.000000000 Z
+date: 2015-03-26 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
@@ -76,16 +76,16 @@ dependencies:
   name: bundler
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '1.7'
+        version: '1.3'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '1.7'
+        version: '1.3'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
@@ -123,6 +123,7 @@ extra_rdoc_files: []
 files:
 - ".gitignore"
 - ".rspec"
+- ".travis.yml"
 - Gemfile
 - LICENSE.txt
 - README.md
@@ -170,7 +171,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '0'
+      version: 1.9.3
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">"