RubyGems - epub-parser - Versions diffs - 0.2.0 → 0.2.1 - Mend

epub-parser 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

checksums.yaml +4 -4
data/.yardopts +2 -0
data/CHANGELOG.markdown +11 -1
data/README.markdown +53 -19
data/bin/epub-open +7 -0
data/bin/epubinfo +7 -0
data/docs/ExtractContentsFromWeb.markdown +70 -0
data/docs/Home.markdown +39 -0
data/docs/Item.markdown +1 -1
data/epub-parser.gemspec +3 -2
data/examples/extract-contents-from-web.rb +45 -0
data/lib/epub/book/features.rb +1 -0
data/lib/epub/constants.rb +37 -43
data/lib/epub/content_document/xhtml.rb +1 -1
data/lib/epub/ocf/physical_container.rb +6 -10
data/lib/epub/ocf/physical_container/archive_zip.rb +51 -0
data/lib/epub/ocf/physical_container/unpacked_uri.rb +26 -0
data/lib/epub/ocf/physical_container/zipruby.rb +20 -7
data/lib/epub/parser.rb +8 -3
data/lib/epub/parser/content_document.rb +3 -8
data/lib/epub/parser/ocf.rb +1 -1
data/lib/epub/parser/utils.rb +1 -1
data/lib/epub/parser/version.rb +1 -1
data/lib/epub/publication/package/metadata.rb +2 -2
data/test/test_ocf_physical_container.rb +31 -0
metadata +25 -7

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 0da9ce97158d8bd76d740d45a8df755f7016c32c
-  data.tar.gz: 79c264c87d61cf10c2cf3f3bd9fd6dd316756b98
+  metadata.gz: cad6963a6325a736ef8f5006e9b0a037e0718070
+  data.tar.gz: d1ef1c2fbb7dd77791524c39cab200eecee063ad
 SHA512:
-  metadata.gz: ba7491533f29d1cbf2350b23e24c5a786eda2db6d51f0f07b002b94103a52dc4322e0c265bc83cdc2b85054f582f1e977d94f7aa3c2b7fa43138781821f493a2
-  data.tar.gz: a3d50715ac0c54fbd0093507651fbd22da5448d4bb28c62bb9adc15c6c9c80a4a40c793c42adaba8f2d53e81561ca739cd5ebf2f2c6d7989e2d9a5c44d045536
+  metadata.gz: 05c2b6004493b0f41d6b3ba7e9f32f6aed5c171f34f9477d39d7a10493d2dce2e711c49816fc26784ff25deb7a966c9b297cc1e1a0d12398920bccf17aacc2cc
+  data.tar.gz: b4d737ae179399f3f159561d103a5b52bd2dc9c7c17e5fed8115cb1b1a0dca296ba5d60c8840f72b425d3d222503de6dd07fc3aceac1adde72ca744a7d3af3d4

data/.yardopts CHANGED

@@ -9,3 +9,5 @@ docs/Epubinfo.markdown
 docs/EpubOpen.markdown
 docs/Navigation.markdown
 docs/Searcher.markdown
+docs/UnpackedArchive.markdown
+docs/ExtractContentsFromWeb.markdown

data/CHANGELOG.markdown CHANGED

@@ -1,11 +1,21 @@
 CHANGELOG
 =========
+0.2.1
+-----
+* Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
+* Make it possible to use [archive-zip][] gem to extract contents from EPUB package via `EPUB::OCF::PhysicalContainer::ArchiveZip`
+* Add warning about default physical container adapter change
+* Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI`. See {file:ExtractContentsFromWeb.markdown} for details.
+[archive-zip]: https://github.com/javanthropus/archive-zip
 0.2.0
 -----
 * Introduce abstraction layer for OCF physical container
-* Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
+* Add `EPUB::OCF::PhysicalContainer::File` and make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
 * Remove `EPUB::Parser::OCF::CONTAINER_FILE` and other constants
 0.1.9

data/README.markdown CHANGED

@@ -6,7 +6,7 @@ EPUB Parser
 INSTALLATION
 -------
-    gem install epub-parser
+    gem install epub-parser
 USAGE
 -----
@@ -30,7 +30,7 @@ USAGE
 See document's {file:docs/Home.markdown} or [API Documentation][rubydoc] for more info.
-[rubydoc]: http://rubydoc.info/gems/epub-parser/frames
+[rubydoc]: http://rubydoc.info/gems/epub-parser
 ### `epubinfo` command-line tool
@@ -90,6 +90,46 @@ IRB starts. `self` becomes the EPUB book and can access to methods of `EPUB`.
 See {file:docs/EpubOpen} for more info.
+DOCUMENTATION
+-------------
+Documentation is available in [homepage][].
+If you installed EPUB Parser by gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
+    $ gem install epub-parser
+    $ gem yardoc epub-parser
+    ...
+    Files:          33
+    Modules:        20 (   20 undocumented)
+    Classes:        45 (   44 undocumented)
+    Constants:      31 (   31 undocumented)
+    Methods:       292 (   88 undocumented)
+    52.84% documented
+    YARD documentation is generated to:
+    /path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
+It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
+Or, generating by yardoc command is possible, too:
+    $ git clone https://github.com/KitaitiMakoto/epub-parser.git
+    $ cd epub-parser
+    $ bundle install --path=deps
+    $ bundle exec rake doc:yard
+    ...
+    Files:          33
+    Modules:        20 (   20 undocumented)
+    Classes:        45 (   44 undocumented)
+    Constants:      31 (   31 undocumented)
+    Methods:       292 (   88 undocumented)
+    52.84% documented
+Then documentation will be available in `doc` directory.
+[homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
+[rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
 REQUIREMENTS
 ------------
 * Ruby 2.0.0 or later
@@ -110,9 +150,18 @@ If you find other gems, please tell me or request a pull request.
 RECENT CHANGES
 --------------
+### 0.2.1
+* Remove deprecated `EPUB::Constants::MediaType::UnsupportedError`. Use `UnsupportedMediatType` instead.
+* Make it possible to use [archive-zip][] gem to extract contents from EPUB package
+* Add warning about default physical container adapter change
+* Make it possible to extract contents from the web via `EPUB::OCF::PhysicalContainer::UnpackedURI` See {file:ExtractContentsFromWeb.markdown} for details.
+[archive-zip]: https://github.com/javanthropus/archive-zip
 ### 0.2.0
-* Make it possible to parse file system directory an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
+* Make it possible to parse file system directory as an EPUB file. See {file:docs/UnpackedArchive.markdown} for details.
 ### 0.1.9
@@ -124,21 +173,6 @@ RECENT CHANGES
 [nokogumbo]: https://github.com/rubys/nokogumbo/
-### 0.1.8
-* Explicity #close each zip member file that has been opened via #fopen(Thanks [xunker][]!)
-[xunker]: https://github.com/xunker
-### 0.1.7.1
-* Don't set encoding when content is not text
-### 0.1.7
-* [Experimental]Add `EPUB::Searcher` module. See {file:Searcher.markdown} for details
-* Detect and set character encoding in `EPUB::Publication::Package::Item#read`
 See {file:CHANGELOG.markdown} for older changelogs and details.
 TODOS
@@ -152,7 +186,6 @@ TODOS
 * Content Document
 * Digital Signature
 * Using SAX on parsing
-* Extracting and organizing common behavior from some classes to modules
 * Abstraction of XML parser(making it possible to use REXML, standard bundled XML library of Ruby)
 * Handle with encodings other than UTF-8
@@ -165,6 +198,7 @@ DONE
 * Fixed Layout
 * Vocabulary Association Mechanisms(only for itemref)
 * Archive library abstraction
+* Extracting and organizing common behavior from some classes to modules
 LICENSE
 -------

data/bin/epub-open CHANGED

@@ -22,6 +22,13 @@ $0 = File.basename($PROGRAM_NAME)
 include EPUB::Book::Features
 file = ARGV.shift
 EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
+unless File.readable? file
+  uri = URI.parse(file) rescue nil
+  if uri
+    EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
+    file = uri
+  end
+end
 EPUB::Parser.parse(file, :book => self)
 $stderr.puts "Enter \"exit\" to exit #{shell}"
 shell.start

data/bin/epubinfo CHANGED

@@ -31,6 +31,13 @@ unless file
 end
 EPUB::OCF::PhysicalContainer.adapter = :File if File.directory? file
+unless File.readable? file
+  uri = URI.parse(file) rescue nil
+  if uri
+    EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
+    file = uri
+  end
+end
 book = EPUB::Parser.parse(file)
 data = {'Title' => [book.title]}
 data.merge!(book.metadata.to_h)

data/docs/ExtractContentsFromWeb.markdown ADDED

@@ -0,0 +1,70 @@
+{file:docs/Home.markdown} > **{file:docs/ExtractContentsFromWeb.markdown}**
+Extract Contents From the Web
+=============================
+From version 0.2.1, EPUB Parser can parse unpacked(unzipped) EPUB files on the web and extract contents in the books.
+Let's get contents of pretty cmmic Page Blanche from IDPF's GitHub repository: https://github.com/IDPF/epub3-samples/tree/master/30/page-blanche
+We can consider URI `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/` as the root directory of the book because we can get EPUB Open Container Format's `container.xml` file from `https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/META-INF/container.xml`.
+**Note: Don't forget slash at the end of URI**
+EPUB Parser can treat the URI as EPUB book file path and parse contents from it by using {EPUB::OCF::PhysicalContainer::UnpackedURI}:
+    require 'epub/parser'
+    uri = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
+    epub = EPUB::Parser.parse(uri, container_adapter: :UnpackedURI)
+The trick is to set {EPUB::OCF::PhysicalContainer.adapter container adapter} to {EPUB::OCF::PhysicalContainer::UnpackedURI :UnpackedURI}. It makes it possible to parse EPUB book from the web.
+Now we can play with EPUB books as always!
+As an example, I will show you a script to download all the files of specified EPUB book to local directory(source code is available in repository's examples/extract-contents-from-web.rb).
+{include:file:examples/extract-contents-from-web.rb}
+Execution:
+    $ ruby examples/extract-contents-from-web.rb https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
+    Started downloading EPUB contents...
+      from: https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
+      to: /tmp/epub-parser20150703-13148-ghdtfq
+    Making mimetype file...
+    Downloading META-INF/container.xml ...
+    Downloading EPUB/package.opf ...
+    Downloading EPUB/Style/style.css ...
+    Downloading EPUB/Navigation/nav.xhtml ...
+    Downloading EPUB/Navigation/toc.ncx ...
+    Downloading EPUB/Content/cover.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_000.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_001.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_002.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_003.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_004.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_005.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_006.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_007.xhtml ...
+    Downloading EPUB/Content/PageBlanche_Page_008.xhtml ...
+    Downloading EPUB/Image/cover.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_001.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_002.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_003.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_004.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_005.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_006.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_007.jpg ...
+    Downloading EPUB/Image/PageBlanche_Page_008.jpg ...
+    /tmp/epub-parser20150703-13148-ghdtfq
+The last line of the output is path to directory which contents are downloaded to. We can repackage it as an EPUB file. Let's use [epzip][] utility to do that easily:
+    $ epzip /tmp/epub-parser20150703-13148-ghdtfq ./page-blanche.epub
+[epzip]: https://github.com/takahashim/epzip
+Command-line tools
+------------------
+Command-line tools `epubinfo` and `epub-open` may also handle with URI as EPUB books.

data/docs/Home.markdown CHANGED

@@ -90,6 +90,9 @@ You are also able to find YourBook object for the first:
     ret == book # => true; this API is not good I feel... Welcome suggestion!
     # do something with your book
+Documentation
+-------------
 More documentations are avaiable in:
 * {file:docs/Publication.markdown}
@@ -98,6 +101,42 @@ More documentations are avaiable in:
 * {file:docs/Navigation.markdown}
 * {file:docs/Searcher.markdown}
 * {file:docs/UnpackedArchive.markdown}
+* {file:docs/ExtractContentsFromWeb.markdown}
+If you installed EPUB Parser via gem command, you can also generate documentaiton by your own([rubygems-yardoc][] gem is needed):
+    $ gem install epub-parser
+    $ gem yardoc epub-parser
+    ...
+    Files:          33
+    Modules:        20 (   20 undocumented)
+    Classes:        45 (   44 undocumented)
+    Constants:      31 (   31 undocumented)
+    Methods:       292 (   88 undocumented)
+    52.84% documented
+    YARD documentation is generated to:
+    /path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
+It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
+Or, generating yardoc command is possible, too:
+    $ git clone https://github.com/KitaitiMakoto/epub-parser.git
+    $ cd epub-parser
+    $ bundle install --path=deps
+    $ bundle exec rake doc:yard
+    ...
+    Files:          33
+    Modules:        20 (   20 undocumented)
+    Classes:        45 (   44 undocumented)
+    Constants:      31 (   31 undocumented)
+    Methods:       292 (   88 undocumented)
+    52.84% documented
+Then documentation will be available in `doc` directory.
+[homepage]: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
+[rubygems-yardoc]: https://rubygems.org/gems/rubygems-yardoc
 Requirements
 ------------

data/docs/Item.markdown CHANGED

@@ -66,7 +66,7 @@ Also you can use {EPUB::Publication::Package::Manifest::Item#use_fallback_chain
 If item's media type is, for instance, 'image/x-eps', the fallback is used.
 If the fallback item's media type is 'image/png', `png` variable means the item, if not, "fallback of fallback" will be checked.
-Finally you can use the item you want, or {EPUB::Constants::MediaType::UnsupportedMediaType EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
+Finally you can use the item you want, or {EPUB::MediaType::UnsupportedMediaType} exception will be raised(if no item you can accept found).
 Therefore, you should `rescue` clause:
     # :unsupported option can also be used

data/epub-parser.gemspec CHANGED

@@ -7,7 +7,7 @@ Gem::Specification.new do |s|
   s.version     = EPUB::Parser::VERSION
   s.authors     = ["KITAITI Makoto"]
   s.email       = ["KitaitiMakoto@gmail.com"]
-  s.homepage    = "https://github.com/KitaitiMakoto/epub-parser"
+  s.homepage    = "http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown"
   s.summary     = %q{EPUB 3 Parser}
   s.description = %q{Parse EPUB 3 book loosely}
   s.license     = 'MIT'
@@ -26,6 +26,7 @@ Gem::Specification.new do |s|
   s.has_rdoc = 'yard'
   s.add_development_dependency 'rake'
+  s.add_development_dependency 'archive-zip'
   s.add_development_dependency 'pry'
   s.add_development_dependency 'pry-doc'
   s.add_development_dependency 'test-unit'
@@ -42,5 +43,5 @@ Gem::Specification.new do |s|
   s.add_runtime_dependency 'nokogiri', '~> 1.6'
   s.add_runtime_dependency 'nokogumbo'
   s.add_runtime_dependency 'addressable', '>= 2.3.5'
-  s.add_runtime_dependency 'rchardet', '< 1.6'
+  s.add_runtime_dependency 'rchardet', '>= 1.6.1'
 end

data/examples/extract-contents-from-web.rb ADDED

@@ -0,0 +1,45 @@
+require 'pathname'
+require 'tmpdir'
+require 'epub/parser'
+EPUB_URI = URI.parse(ARGV.shift)
+DOWNLOAD_DIR = Pathname.new(ARGV.shift || Dir.mktmpdir('epub-parser'))
+$stderr.puts <<EOI
+Started downloading EPUB contents...
+  from: #{EPUB_URI}
+  to:   #{DOWNLOAD_DIR}
+EOI
+# Make it possible to use URI as EPUB file path
+EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
+def main
+  make_mimetype
+  container_xml = 'META-INF/container.xml'
+  download container_xml
+  epub = EPUB::Parser.parse(EPUB_URI, container_adapter: :UnpackedURI)
+  download epub.rootfile_path
+  epub.resources.each do |resource|
+    download resource.entry_name
+  end
+  puts DOWNLOAD_DIR
+end
+def make_mimetype
+  $stderr.puts "Making mimetype file..."
+  DOWNLOAD_DIR.join('mimetype').write 'application/epub+zip'
+end
+def download(path)
+  path = path.to_s
+  src = EPUB_URI + path
+  dest = DOWNLOAD_DIR + path
+  $stderr.puts "Downloading #{path} ..."
+  dest.dirname.mkpath
+  dest.write src.read
+end
+main

data/lib/epub/book/features.rb CHANGED

@@ -35,6 +35,7 @@ module EPUB
       def container_adapter=(adapter)
         @adapter = adapter.instance_of?(Class) ? adapter : OCF::PhysicalContainer.const_get(adapter)
+        adapter
       end
       # @overload each_page_on_spine(&blk)

data/lib/epub/constants.rb CHANGED

@@ -1,48 +1,42 @@
 module EPUB
-  module Constants
-    NAMESPACES = {
-      'dc'    => 'http://purl.org/dc/elements/1.1/',
-      'ocf'   => 'urn:oasis:names:tc:opendocument:xmlns:container',
-      'opf'   => 'http://www.idpf.org/2007/opf',
-      'xhtml' => 'http://www.w3.org/1999/xhtml',
-      'epub'  => 'http://www.idpf.org/2007/ops',
-      'm'     => 'http://www.w3.org/1998/Math/MathML',
-      'svg'   => 'http://www.w3.org/2000/svg',
-      'smil'  => 'http://www.w3.org/ns/SMIL'
-    }
+  NAMESPACES = {
+    'dc'    => 'http://purl.org/dc/elements/1.1/',
+    'ocf'   => 'urn:oasis:names:tc:opendocument:xmlns:container',
+    'opf'   => 'http://www.idpf.org/2007/opf',
+    'xhtml' => 'http://www.w3.org/1999/xhtml',
+    'epub'  => 'http://www.idpf.org/2007/ops',
+    'm'     => 'http://www.w3.org/1998/Math/MathML',
+    'svg'   => 'http://www.w3.org/2000/svg',
+    'smil'  => 'http://www.w3.org/ns/SMIL'
+  }
-    module MediaType
-      # @deprecated Use {UnsupportedMediaType} instead
-      class UnsupportedError < StandardError; end
-      class UnsupportedMediaType < StandardError; end
+  module MediaType
+    class UnsupportedMediaType < StandardError; end
-      EPUB = 'application/epub+zip'
-      ROOTFILE = 'application/oebps-package+xml'
-      IMAGE = %w[
-        image/gif
-        image/jpeg
-        image/png
-        image/svg+xml
-      ]
-      APPLICATION = %w[
-        application/xhtml+xml
-        application/x-dtbncx+xml
-        application/vnd.ms-opentype
-        application/font-woff
-        application/smil+xml
-        application/pls+xml
-      ]
-      AUDIO = %w[
-        audio/mpeg
-        audio/mp4
-      ]
-      TEXT = %w[
-        text/css
-        text/javascript
-      ]
-      CORE = IMAGE + APPLICATION + AUDIO + TEXT
-    end
+    EPUB = 'application/epub+zip'
+    ROOTFILE = 'application/oebps-package+xml'
+    IMAGE = %w[
+      image/gif
+      image/jpeg
+      image/png
+      image/svg+xml
+    ]
+    APPLICATION = %w[
+      application/xhtml+xml
+      application/x-dtbncx+xml
+      application/vnd.ms-opentype
+      application/font-woff
+      application/smil+xml
+      application/pls+xml
+    ]
+    AUDIO = %w[
+      audio/mpeg
+      audio/mp4
+    ]
+    TEXT = %w[
+      text/css
+      text/javascript
+    ]
+    CORE = IMAGE + APPLICATION + AUDIO + TEXT
   end
-  include Constants
 end

data/lib/epub/content_document/xhtml.rb CHANGED

@@ -34,7 +34,7 @@ module EPUB
       # @return [Nokogiri::XML::Document] content as Nokogiri::XML::Document object
       def nokogiri
-        require 'nokogumbo'
+        require 'nokogumbo' unless Nokogiri.respond_to? :HTML5
         @nokogiri ||= Nokogiri.HTML5(raw_document)
       end
     end

data/lib/epub/ocf/physical_container.rb CHANGED

@@ -1,5 +1,6 @@
 require 'epub/ocf/physical_container/zipruby'
 require 'epub/ocf/physical_container/file'
+require 'epub/ocf/physical_container/unpacked_uri'
 module EPUB
   class OCF
@@ -8,19 +9,14 @@ module EPUB
       class << self
         def adapter
-          if self == PhysicalContainer
-            @adapter
-          else
-            raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
-          end
+          raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
+          @adapter
         end
         def adapter=(adapter)
-          if self == PhysicalContainer
-            @adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
-          else
-            raise NoMethodError.new("undefined method `#{__method__}' for #{self}")
-          end
+          raise NoMethodError, "undefined method `#{__method__}' for #{self}" unless self == PhysicalContainer
+          @adapter = adapter.instance_of?(Class) ? adapter : const_get(adapter)
+          adapter
         end
         def open(container_path)

data/lib/epub/ocf/physical_container/archive_zip.rb ADDED

@@ -0,0 +1,51 @@
+require 'archive/zip'
+module EPUB
+  class OCF
+    class PhysicalContainer
+      class ArchiveZip < self
+        def initialize(container_path)
+          super
+          @entries = {}
+          @last_iterated_entry_index = 0
+        end
+        def open
+          Archive::Zip.open @container_path do |archive|
+            @archive = archive
+            begin
+              yield self
+            ensure
+              @archive = nil
+            end
+          end
+        end
+        def read(path_name)
+          target_index = @entries[path_name]
+          if @archive
+            @archive.each.with_index do |entry, index|
+              if target_index
+                if target_index == index
+                  return entry.file_data.read
+                else
+                  next
+                end
+              end
+              next if index < @last_iterated_entry_index
+              # We can force encoding UTF-8 becase EPUB spec allows only UTF-8 filenames
+              entry_path = entry.zip_path.force_encoding('UTF-8')
+              @entries[entry_path] = index
+              @last_iterated_entry_index = index
+              if entry_path == path_name
+                return entry.file_data.read
+              end
+            end
+          else
+            open {|container| container.read(path_name)}
+          end
+        end
+      end
+    end
+  end
+end

data/lib/epub/ocf/physical_container/unpacked_uri.rb ADDED

@@ -0,0 +1,26 @@
+require 'open-uri'
+module EPUB
+  class OCF
+    class PhysicalContainer
+      class UnpackedURI < self
+        # EPUB URI: http://example.net/path/to/book/
+        # container.xml: http://example.net/path/to/book/META-INF/container.xml
+        # @param [URI, String] container_path URI of EPUB container's root directory.
+        #   For exapmle, <code>"http://example.net/path/to/book/"</code>, which
+        #   should contain <code>"http://example.net/path/to/book/META-INF/container.xml"</code> as its container.xml file. Note that this should end with "/"(slash).
+        def initialize(container_path)
+          super(URI(container_path))
+        end
+        def open
+          yield self
+        end
+        def read(path_name)
+          (@container_path + path_name).read
+        end
+      end
+    end
+  end
+end

data/lib/epub/ocf/physical_container/zipruby.rb CHANGED

@@ -1,15 +1,30 @@
 require 'zipruby'
+if $VERBOSE
+  warn <<EOW
+[WARNING]Default OCF physical container adapter will become ArchiveZip, which uses archive-zip gem to extract contents from EPUB package, instead of current default Zipruby, which uses zipruby gem, in the near future.
+You can try ArchiveZip adapter by:
+1. gem install archive-zip
+2. require 'epub/ocf/physical_container/archive_zip'
+3. EPUB::OCF::PhysicalContainer.adapter = :ArchiveZip
+If you find problems, please inform me via GitHub issues: https://github.com/KitaitiMakoto/epub-parser/issues
+EOW
+end
 module EPUB
   class OCF
     class PhysicalContainer
       class Zipruby < self
         def open
           Zip::Archive.open @container_path do |archive|
-            @archive = archive
-            result = yield self
-            @archive = nil
-            result
+            begin
+              @archive = archive
+              yield self
+            ensure
+              @archive = nil
+            end
           end
         end
@@ -17,9 +32,7 @@ module EPUB
           if @archive
             @archive.fopen(path_name) {|entry| entry.read}
           else
-            Zip::Archive.open(@container_path) {|archive|
-              archive.fopen(path_name) {|entry| entry.read}
-            }
+            open {|container| container.read(path_name)}
           end
         end
       end

data/lib/epub/parser.rb CHANGED

@@ -42,10 +42,15 @@ module EPUB
     end
     def initialize(filepath, **options)
-      raise "File #{filepath} not readable" unless File.readable_real? filepath
+      path_is_uri = (options[:container_adapter] == EPUB::OCF::PhysicalContainer::UnpackedURI or
+                     options[:container_adapter] == :UnpackedURI or
+                     EPUB::OCF::PhysicalContainer.adapter == EPUB::OCF::PhysicalContainer::UnpackedURI)
-      @filepath = File.realpath filepath
-      @book = create_book options
+      raise "File #{filepath} not readable" if
+        !path_is_uri and !File.readable_real?(filepath)
+      @filepath = path_is_uri ? filepath : File.realpath(filepath)
+      @book = create_book(options)
       @book.epub_file = @filepath
       if options[:container_adapter]
         adapter = options[:container_adapter]

data/lib/epub/parser/content_document.rb CHANGED

@@ -69,18 +69,13 @@ module EPUB
             embedded_content = a_or_span.xpath('./xhtml:audio[1]|xhtml:canvas[1]|xhtml:embed[1]|xhtml:iframe[1]|xhtml:img[1]|xhtml:math[1]|xhtml:object[1]|xhtml:svg[1]|xhtml:video[1]', EPUB::NAMESPACES).first
             unless embedded_content.nil?
               case embedded_content.name
-              when 'audio'
-              when 'canvas'
-              when 'embed'
-              when 'iframe'
+              when 'audio', 'canvas', 'embed', 'iframe'
                 item.text = extract_attribute(embedded_content, 'name') || extract_attribute(embedded_content, 'srcdoc')
               when 'img'
                 item.text = extract_attribute(embedded_content, 'alt')
-              when 'math'
-              when 'object'
+              when 'math', 'object'
                 item.text = extract_attribute(embedded_content, 'name')
-              when 'svg'
-              when 'video'
+              when 'svg', 'video'
               else
               end
             end

data/lib/epub/parser/ocf.rb CHANGED

@@ -27,7 +27,7 @@ module EPUB
           begin
             data = @container.read(File.join(DIRECTORY, "#{m}.xml"))
             @ocf.__send__ "#{m}=", __send__("parse_#{m}", data)
-          rescue ::Zip::Error, ::Errno::ENOENT
+          rescue ::Zip::Error, ::Errno::ENOENT, OpenURI::HTTPError
           end
         end

data/lib/epub/parser/utils.rb CHANGED

@@ -7,7 +7,7 @@ module EPUB
       #
       # @param [Nokogiri::XML::Element] element
       # @param [String] name name of attribute excluding namespace prefix
-      # @param [String, nil] prefix XML namespace prefix in {EPUB::Constants::NAMESPACES} keys
+      # @param [String, nil] prefix XML namespace prefix in {EPUB::NAMESPACES} keys
       # @return [String] value of attribute when the attribute exists
       # @return nil when the attribute doesn't exist
       def extract_attribute(element, name, prefix=nil)

data/lib/epub/parser/version.rb CHANGED

@@ -1,5 +1,5 @@
 module EPUB
   class Parser
-    VERSION = "0.2.0"
+    VERSION = "0.2.1"
   end
 end

data/lib/epub/publication/package/metadata.rb CHANGED

@@ -128,8 +128,8 @@ module EPUB
           attr_reader :refines
           def refines=(refinee)
-            @refines = refinee
             refinee.refiners << self
+            @refines = refinee
           end
           def refines?
@@ -160,8 +160,8 @@ module EPUB
           attr_reader :refines
           def refines=(refinee)
-            @refines = refinee
             refinee.refiners << self
+            @refines = refinee
           end
         end
       end

data/test/test_ocf_physical_container.rb CHANGED

@@ -70,4 +70,35 @@ class TestOCFPhysicalContainer < Test::Unit::TestCase
       EPUB::OCF::PhysicalContainer.adapter = adapter
     end
   end
+  require 'epub/ocf/physical_container/archive_zip'
+  class TestArchiveZip < self
+    include ConcreteContainer
+    def setup
+      super
+      @class = EPUB::OCF::PhysicalContainer::ArchiveZip
+      @container = @class.new(@container_path)
+    end
+  end
+  class TestUnpackedURI < self
+    def setup
+      super
+      @container_path = 'https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/'
+      @class = EPUB::OCF::PhysicalContainer::UnpackedURI
+      @container = @class.new(@container_path)
+    end
+    def test_read
+      path = 'META-INF/container.xml'
+      content = 'content'
+      root_uri = URI(@container_path)
+      container_xml_uri = root_uri + path
+      stub(root_uri).+ {container_xml_uri}
+      stub(container_xml_uri).read {content}
+      assert_equal content, @class.new(root_uri).read('META-INF/container.xml')
+    end
+  end
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: epub-parser
 version: !ruby/object:Gem::Version
-  version: 0.2.0
+  version: 0.2.1
 platform: ruby
 authors:
 - KITAITI Makoto
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-06-13 00:00:00.000000000 Z
+date: 2015-07-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
@@ -24,6 +24,20 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
+- !ruby/object:Gem::Dependency
+  name: archive-zip
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 - !ruby/object:Gem::Dependency
   name: pry
   requirement: !ruby/object:Gem::Requirement
@@ -238,16 +252,16 @@ dependencies:
   name: rchardet
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "<"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '1.6'
+        version: 1.6.1
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "<"
+    - - ">="
       - !ruby/object:Gem::Version
-        version: '1.6'
+        version: 1.6.1
 description: Parse EPUB 3 book loosely
 email:
 - KitaitiMakoto@gmail.com
@@ -271,6 +285,7 @@ files:
 - bin/epubinfo
 - docs/EpubOpen.markdown
 - docs/Epubinfo.markdown
+- docs/ExtractContentsFromWeb.markdown
 - docs/FixedLayout.markdown
 - docs/Home.markdown
 - docs/Item.markdown
@@ -279,6 +294,7 @@ files:
 - docs/Searcher.markdown
 - docs/UnpackedArchive.markdown
 - epub-parser.gemspec
+- examples/extract-contents-from-web.rb
 - features/epubinfo.feature
 - features/step_definitions/epubinfo_steps.rb
 - features/support/env.rb
@@ -296,7 +312,9 @@ files:
 - lib/epub/ocf/manifest.rb
 - lib/epub/ocf/metadata.rb
 - lib/epub/ocf/physical_container.rb
+- lib/epub/ocf/physical_container/archive_zip.rb
 - lib/epub/ocf/physical_container/file.rb
+- lib/epub/ocf/physical_container/unpacked_uri.rb
 - lib/epub/ocf/physical_container/zipruby.rb
 - lib/epub/ocf/rights.rb
 - lib/epub/ocf/signatures.rb
@@ -348,7 +366,7 @@ files:
 - test/test_parser_publication.rb
 - test/test_publication.rb
 - test/test_searcher.rb
-homepage: https://github.com/KitaitiMakoto/epub-parser
+homepage: http://www.rubydoc.info/gems/epub-parser/file/docs/Home.markdown
 licenses:
 - MIT
 metadata: {}