RubyGems - epub-parser - Versions diffs - 0.3.6 → 0.3.7 - Mend

epub-parser 0.3.6 → 0.3.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

checksums.yaml +4 -4
data/.gitlab-ci.yml +51 -1
data/.yardopts +5 -3
data/{CHANGELOG.markdown → CHANGELOG.adoc} +49 -84
data/README.adoc +228 -0
data/Rakefile +3 -1
data/bin/epub-cover +51 -0
data/docs/EpubCover.adoc +46 -0
data/docs/Examples.adoc +9 -0
data/docs/Home.adoc +224 -0
data/docs/Searcher.adoc +132 -0
data/epub-parser.gemspec +2 -1
data/lib/epub/book/features.rb +7 -1
data/lib/epub/metadata.rb +9 -1
data/lib/epub/parser/metadata.rb +4 -2
data/lib/epub/parser/version.rb +1 -1
data/lib/epub/publication/package/manifest.rb +1 -1
data/lib/epub/searcher/xhtml.rb +1 -0
data/test/helper.rb +1 -1
metadata +26 -8
data/README.markdown +0 -219
data/docs/Home.markdown +0 -196
data/docs/Searcher.markdown +0 -109

data/Rakefile CHANGED

@@ -41,8 +41,10 @@ namespace :doc do
   YARD::Rake::YardocTask.new
   Rake::RDocTask.new do |rdoc|
     rdoc.rdoc_files = FileList['lib/**/*.rb']
-    rdoc.rdoc_files.include 'README.markdown'
+    rdoc.rdoc_files.include 'README.adoc'
+    rdoc.rdoc_files.include 'CHANGELOG.adoc'
     rdoc.rdoc_files.include 'MIT-LICENSE'
+    rdoc.rdoc_files.include 'docs/**/*.adoc'
     rdoc.rdoc_files.include 'docs/**/*.md'
   end
 end

data/bin/epub-cover ADDED

@@ -0,0 +1,51 @@
+require "optparse"
+require "uri"
+require "epub/parser"
+def main(argv)
+  option_parser = OptionParser.new {|opt|
+    opt.banner = <<EOB
+Extract cover image.
+Image is put to current directory with the same name in EPUB.
+It is put to specified directory when `--output' option is given.
+Usage: #{opt.program_name} [options] EPUBFILE
+EOB
+    opt.separator "Options:"
+    opt.on "-o", "--output=DIR", "Directory to put image file"
+  }
+  options = option_parser.getopts(argv)
+  path = argv.shift
+  error "EPUBFILE not given" unless path
+  unless File.file? path
+    if File.directory? path
+      EPUB::OCF::PhysicalContainer.adapter = :UnpackedDirectory
+    else
+      path = URI.parse(path) rescue nil
+      if path
+        EPUB::OCF::PhysicalContainer.adapter = :UnpackedURI
+      else
+        error "EPUBFILE not a file"
+      end
+    end
+  end
+  error "output not a directory" if options["output"] && !File.directory?(options["output"])
+  cover_image = EPUB::Parser.parse(path).cover_image
+  error "cover image not found" unless cover_image
+  path = File.basename(cover_image.href.to_s)
+  path = File.join(options["output"], path) if options["output"]
+  File.write path, cover_image.read
+  $stderr.print "Cover image output to "
+  print path
+  $stderr.puts ""
+end
+def error(message)
+  $stderr.puts "Error: #{message}"
+  $stderr.puts ""
+  $stderr.puts option_parser.help
+  abort
+end
+main(ARGV)

data/docs/EpubCover.adoc ADDED

@@ -0,0 +1,46 @@
+{file:docs/Home} > *{file:docs/EpubCover.adoc}*
+= `epub-cover` command-line tool
+`epub-cover` tool extract cover image from EPUB book.
+== Usage
+----
+% epub-cover --help
+Extract cover image.
+Image is put to current directory with the same name in EPUB.
+It is put to specified directory when `--output' option is given.
+Usage: epub-cover [options] EPUBFILE
+Options:
+    -o, --output=DIR                 Directory to put image file
+----
+Example:
+----
+% epub-cover childrens-literature.epub
+Cover image output to cover.png
+----
+As output indicates, cover image file is output to current directory. The file name is the same to one in EPUB file.
+=== Output directory
+You can specify a directory to output the cover file by `--output` option.
+----
+% epub-cover --output=/tmp childrens-literature.epub
+Cover image output to /tmp/cover.png
+----
+=== Extract from the web
+`epub-open` accepts URI instead of file path.
+----
+% epub-cover https://raw.githubusercontent.com/IDPF/epub3-samples/master/30/page-blanche/
+Cover image output to cover.jpg
+----

data/docs/Examples.adoc ADDED

@@ -0,0 +1,9 @@
+= Examples
+= {doctitle}
+There are examples helping you find how to use EPUB parser gem.
+* {file:docs/AggregateContentsFromWeb.markdown Aggregate Contents From the Web}
+* {file:examples/exctract-content-using-cfi.rb Extract contents from EPUB files using EPUB CFI(identifier for EPUB)}
+* {file:examples/find-elements-and-cfis.rb Find elements and CFIs}

data/docs/Home.adoc ADDED

@@ -0,0 +1,224 @@
+= EPUB Parser
+= {doctitle}
+EPUB Parser gem parses EPUB 3 book loosely.
+image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/build.svg[link="https://gitlab.com/KitaitiMakoto/epub-parser/commits/master", title="pipeline status"]
+image:https://gemnasium.com/KitaitiMakoto/epub-parser.png[link="https://gitlab.com/KitaitiMakoto/epub-parser/commits/master",title="Dependency Status"]
+image:https://badge.fury.io/rb/epub-parser.svg[link="https://gemnasium.com/KitaitiMakoto/epub-parser",title="Gem Version"]
+image:https://gitlab.com/KitaitiMakoto/epub-parser/badges/master/coverage.svg[link="https://kitaitimakoto.gitlab.io/epub-parser/coverage/",title="coverage report"]
+* https://kitaitimakoto.gitlab.io/epub-parser/file.Home.html[Homepage]
+* https://kitaitimakoto.gitlab.io/epub-parser/[Documentation]
+* https://gitlab.com/KitaitiMakoto/epub-parser[Source Code]
+* https://kitaitimakoto.gitlab.io/epub-parser/coverage/[Test Coverage]
+== Installation
+    gem install epub-parser
+== Usage
+=== As command-line tools
+==== epubinfo
+`epubinfo` tool extracts and shows the metadata of specified EPUB book.
+See {file:docs/Epubinfo.markdown}.
+==== epub-open
+`epub-open` tool provides interactive shell(IRB) which helps you research about EPUB book.
+See {file:docs/EpubOpen.markdown}.
+==== epub-cover
+`epub-cover` tool extract cover image from EPUB book.
+See {file:docs/EpubCover.adoc}.
+=== As a library
+Use `EPUB::Parser.parse` at first:
+----
+require 'epub/parser'
+book = EPUB::Parser.parse('/path/to/book.epub')
+----
+This book object can yield page by spine's order(spine defines the order to read that the author determines):
+----
+book.each_page_on_spine do |page|
+  # do something...
+end
+----
+`page` above is an {EPUB::Publication::Package::Manifest::Item} object and you can call {EPUB::Publication::Package::Manifest::Item#href #href} to see where is the page file:
+----
+book.each_page_on_spine do |page|
+  file = page.href # => path/to/page/in/zip/archive
+  html = Zip::Archive.open('/path/to/book.epub') {|zip|
+    zip.fopen(file.to_s) {|file| file.read}
+  }
+end
+----
+And {EPUB::Publication::Package::Manifest::Item Item} provides syntax suger {EPUB::Publication::Package::Manifest::Item#read #read} for above:
+----
+html = page.read
+doc = Nokogiri.HTML(html)
+# do something with Nokogiri as always
+----
+For several utilities of Item, see {file:docs/Item.markdown} page.
+By the way, although `book` above is a {EPUB::Book} object, all features are provided by {EPUB::Book::Features} module. Therefore YourBook class can include the features of {EPUB::Book::Features}:
+----
+require 'epub'
+class YourBook < ActiveRecord::Base
+    include EPUB::Book::Features
+end
+book = EPUB::Parser.parse(
+  'uploaded-book.epub',
+  :class => YourBook # *************** pass YourBook class
+)
+book.instance_of? YourBook # => true
+book.required = 'value for required field'
+book.save!
+book.each_page_on_spine do |epage|
+  page = YouBookPage.create(
+    :some_attr    => 'some attr',
+    :content      => epage.read,
+    :another_attr => 'another attr'
+  )
+  book.pages << page
+end
+----
+You are also able to find YourBook object for the first:
+----
+book = YourBook.find params[:id]
+ret = EPUB::Parser.parse(
+  'uploaded-book.epub',
+  :book => book # ******************* pass your book instance
+) # => book
+ret == book # => true; this API is not good I feel... Welcome suggestion!
+# do something with your book
+----
+==== Switching ZIP library
+EPUB Parser uses https://github.com/javanthropus/archive-zip[Archive::Zip], a pure Ruby ZIP library, by default. You can use https://bitbucket.org/winebarrel/zip-ruby/wiki/Home[Zip/Ruby], a Ruby bindings for https://libzip.org/[libzip] if you have already installed Zip/Ruby gem by RubyGems or Bundler.
+Globally:
+----
+EPUB::OCF::PhysicalContainer.adapter = :Zipruby
+book = EPUB::Parser.parse("path/to/book.epub")
+----
+For each EPUB book:
+----
+book = EPUB::Parser.parse("path/to/book.epub", container_adapter: :Zipruby)
+----
+== Documentation
+=== APIs
+More documentations are avaiable in:
+* {file:docs/Publication.markdown} includes document's meta data, file list and so on.
+* {file:docs/Item.markdown} represents a file in EPUB package.
+* {file:docs/FixedLayout.markdown} provides APIs to declare how EPUB reader renders in such as reflowable or fixed layout.
+* {file:docs/Navigation.markdown} describes how to use Navigation Document.
+* {file:docs/Searcher.markdown} introduces APIs to search words and elements, and search by EPUB CFIs(a position pointer for EPUB) from EPUB documents.
+* {file:docs/UnpackedArchive.markdown} describes how to handle directories which was generated by unzip EPUB files instead of EPUB files themselves.
+* {file:docs/MultipleRenditions.markdown} describes about EPUB Multiple-Rendistions Publication and APIs for that.
+=== Examples
+Example usages are listed in {file:Examples} page.
+* {file:docs/AggregateContentsFromWeb.markdown Aggregate Contents From the Web}
+* {file:examples/exctract-content-using-cfi.rb Extract contents from EPUB files using EPUB CFI(identifier for EPUB)}
+* {file:examples/find-elements-and-cfis.rb Find elements and CFIs}
+=== Building documentation
+If you installed EPUB Parser via gem command, you can also generate documentaiton by your own(https://gitlab.com/KitaitiMakoto/rubygems-yardoc[rubygems-yardoc] gem is needed):
+----
+$ gem install epub-parser
+$ gem yardoc epub-parser
+...
+Files:          33
+Modules:        20 (   20 undocumented)
+Classes:        45 (   44 undocumented)
+Constants:      31 (   31 undocumented)
+Methods:       292 (   88 undocumented)
+52.84% documented
+YARD documentation is generated to:
+/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc
+----
+It will show you path to generated documentation(`/path/to/gempath/ruby/2.2.0/doc/epub-parser-0.2.0/yardoc` here) at the end.
+Or, generating yardoc command is possible, too:
+----
+$ git clone https://gitlab.com/KitaitiMakoto/epub-parser.git
+$ cd epub-parser
+$ bundle install --path=deps
+$ bundle exec rake doc:yard
+...
+Files:          33
+Modules:        20 (   20 undocumented)
+Classes:        45 (   44 undocumented)
+Constants:      31 (   31 undocumented)
+Methods:       292 (   88 undocumented)
+52.84% documented
+----
+Then documentation will be available in `doc` directory.
+== Requirements
+* Ruby 2.2.0 or later
+* `patch` command to install Nokogiri
+* C compiler to compile Zip/Ruby and Nokogiri
+== History
+See {file:CHANGELOG.adoc}.
+== Note
+This library is still in work.
+Only a few features are implemented and APIs might be changed in the future.
+Note that.
+Currently implemented:
+* container.xml of http://idpf.org/epub/30/spec/epub30-ocf.html#sec-container-metainf-container.xml[EPUB Open Container Format (OCF) 3.0]
+* http://idpf.org/epub/30/spec/epub30-publications.html[EPUB Publications 3.0]
+* EPUB Navigation Documents of http://www.idpf.org/epub/30/spec/epub30-contentdocs.html[EPUB Content Documents 3.0]
+* http://www.idpf.org/epub/fxl/[EPUB 3 Fixed-Layout Documents]
+* metadata.xml of http://www.idpf.org/epub/renditions/multiple/[EPUB Multiple-Rendition Publications]
+== License
+This library is distributed under the term of the MIT Licence.
+See {file:MIT-LICENSE} file for more info.

data/docs/Searcher.adoc ADDED

@@ -0,0 +1,132 @@
+{file:docs/Home.markdown} > **{file:docs/Searcher.markdown}**
+= Searcher
+*Searcher is experimental now. Note that all interfaces are not stable at all.*
+== Example
+----
+epub = EPUB::Parser.parse('childrens-literature.epub')
+search_word = 'INTRODUCTORY'
+results = EPUB::Searcher.search_text(epub, search_word)
+# => [#<EPUB::Searcher::Result:0x007f80ccde9528
+#   @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9730 @index=12, @info={}, @type=:character>],
+#   @parent_steps=
+#    [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccf3d3e8 @index=1, @info={:id=>nil}, @type=:itemref>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9e88 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9e38 @index=0, @info={:name=>"nav", :id=>"toc"}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9de8 @index=1, @info={:name=>"ol", :id=>"tocList"}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9d98 @index=0, @info={:name=>"li", :id=>"np-313"}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9d48 @index=1, @info={:name=>"ol", :id=>nil}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9ca8 @index=1, @info={:name=>"li", :id=>"np-317"}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9c08 @index=0, @info={:name=>"a", :id=>nil}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde9bb8 @index=0, @info={}, @type=:text>],
+#   @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccde9af0 @index=0, @info={}, @type=:character>]>,
+#  #<EPUB::Searcher::Result:0x007f80ccebcb30
+#   @end_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebcdb0 @index=12, @info={}, @type=:character>],
+#   @parent_steps=
+#    [#<EPUB::Searcher::Result::Step:0x007f80ccf571d0 @index=2, @info={:name=>"spine", :id=>nil}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccde94b0 @index=2, @info={:id=>nil}, @type=:itemref>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd328 @index=1, @info={:name=>"body", :id=>nil}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd2d8 @index=0, @info={:name=>"section", :id=>"pgepubid00492"}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd260 @index=3, @info={:name=>"section", :id=>"pgepubid00498"}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd210 @index=1, @info={:name=>"h3", :id=>nil}, @type=:element>,
+#     ##<EPUB::Searcher::Result::Step:0x007f80ccebd198 @index=0, @info={}, @type=:text>],
+#   @start_steps=[#<EPUB::Searcher::Result::Step:0x007f80ccebd0d0 @index=0, @info={}, @type=:character>]>]
+puts results.collect(&:to_cfi).collect(&:to_fragment)
+# epubcfi(/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]/2/1,:0,:12)
+# epubcfi(/6/6!/4/2[pgepubid00492]/8[pgepubid00498]/4/1,:0,:12)
+# => nil
+----
+== Search result
+Search result is an array of {EPUB::Searcher::Result} and it may be converted to an EPUBCFI string by {EPUB::Searcher::Result#to_cfi_s}.
+== Seamless XHTML Searcher
+Now default searcher for XHTML is *seamless* searcher, which ignores tags when searching.
+You can search words 'search word' from XHTML document below:
+----
+<html>
+  <head>
+    <title>Sample document</title>
+  </head>
+  <body>
+    <p><em>search</em> word</p>
+  </body>
+</html>
+----
+== Restricted XHTML Searcher
+You can also use *restricted* searcher, which means that it can search from only single elements. For instance, it can find 'search word' from XHTML document below:
+----
+<html>
+  <head>
+    <title>Sample document</title>
+  </head>
+  <body>
+    <p>search word</p>
+  </body>
+</html>
+----
+But cannot from document below:
+----
+<html>
+  <head>
+    <title>Sample document</title>
+  </head>
+  <body>
+    <p><em>search</em> word</p>
+  </body>
+</html>
+----
+because the words 'search' and 'word' are not in the same element.
+To use restricted searcher, specify `algorithm` option for `search` method:
+    results = EPUB::Searcher.search_text(epub, search_word, algorithm: :restricted)
+== Element Searcher
+You can search XHTML elements by CSS selector or XPath.
+----
+EPUB::Searcher::Publication.search_element(@package, css: 'ol > li').collect {|result| result[:location]}.map(&:to_fragment)
+# => ["epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313])",
+#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/2[np-315])",
+#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317])",
+#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6)",
+#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319])",
+#  "epubcfi(/4/4!/4/2[toc]/4[tocList]/2[np-313]/4/6/4/2[np-319]/4/2)",
+#    :
+#    :
+----
+== Search by EPUB CFI
+You can fetch XML node from EPUB document by EPUB CFI.
+----
+require "epub/parser"
+require "epub/searcher"
+epub = EPUB::Parser.parse("childrens-literature.epub")
+cfi = EPUB::CFI("/6/4!/4/2[toc]/4[tocList]/2[np-313]/4/4[np-317]")
+itemref, node = EPUB::Searcher.search_by_cfi(epub, cfi)
+puts itemref.item.full_path
+puts node
+# EPUB/nav.xhtml
+# <li id="np-317" class="front">
+#                                                         <a href="s04.xhtml#pgepubid00498">INTRODUCTORY</a>
+#                                                 </li>
+----