RubyGems - bookbinder - Versions diffs - 0.2.0 - Mend

bookbinder 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

data/README.md +97 -0
data/Rakefile +12 -0
data/bin/bookbinder +17 -0
data/lib/bookbinder/document_proxy.rb +171 -0
data/lib/bookbinder/file.rb +149 -0
data/lib/bookbinder/file_system/directory.rb +62 -0
data/lib/bookbinder/file_system/memory.rb +57 -0
data/lib/bookbinder/file_system/zip_file.rb +106 -0
data/lib/bookbinder/file_system.rb +35 -0
data/lib/bookbinder/media_type.rb +17 -0
data/lib/bookbinder/operations.rb +59 -0
data/lib/bookbinder/package/epub.rb +69 -0
data/lib/bookbinder/package/openbook.rb +33 -0
data/lib/bookbinder/package.rb +295 -0
data/lib/bookbinder/transform/epub/audio_overlay.rb +227 -0
data/lib/bookbinder/transform/epub/audio_soundtrack.rb +73 -0
data/lib/bookbinder/transform/epub/contributor.rb +11 -0
data/lib/bookbinder/transform/epub/cover_image.rb +80 -0
data/lib/bookbinder/transform/epub/cover_page.rb +148 -0
data/lib/bookbinder/transform/epub/creator.rb +67 -0
data/lib/bookbinder/transform/epub/description.rb +43 -0
data/lib/bookbinder/transform/epub/language.rb +29 -0
data/lib/bookbinder/transform/epub/metadata.rb +140 -0
data/lib/bookbinder/transform/epub/nav.rb +60 -0
data/lib/bookbinder/transform/epub/nav_toc.rb +177 -0
data/lib/bookbinder/transform/epub/ncx.rb +63 -0
data/lib/bookbinder/transform/epub/ocf.rb +33 -0
data/lib/bookbinder/transform/epub/opf.rb +22 -0
data/lib/bookbinder/transform/epub/package_identifier.rb +87 -0
data/lib/bookbinder/transform/epub/rendition.rb +265 -0
data/lib/bookbinder/transform/epub/resources.rb +38 -0
data/lib/bookbinder/transform/epub/spine.rb +79 -0
data/lib/bookbinder/transform/epub/title.rb +92 -0
data/lib/bookbinder/transform/epub/version.rb +39 -0
data/lib/bookbinder/transform/generator.rb +8 -0
data/lib/bookbinder/transform/openbook/json.rb +15 -0
data/lib/bookbinder/transform/organizer.rb +41 -0
data/lib/bookbinder/transform.rb +7 -0
data/lib/bookbinder/version.rb +5 -0
data/lib/bookbinder.rb +29 -0
metadata +131 -0

data/README.md ADDED Viewed

@@ -0,0 +1,97 @@
+# Bookbinder
+Ebook format conversion.
+## Basic use
+Display the contents of an EPUB as a JSON "map":
+    $ bookbinder map path/to/file.epub
+Convert an EPUB to an Openbook directory (the directory need not exist yet):
+    $ bookbinder convert path/to/file.epub path/to/dir
+Convert an EPUB to an Openbook archive:
+    $ bookbinder convert path/to/file.epub path/to/file.openbook
+Convert an Openbook to an EPUB (...is not yet fully implemented!)
+## Use as a Ruby library
+EPUB to Openbook:
+    require 'bookbinder'
+    epub = Bookbinder::Package::EPUB.read('book.epub')
+    openbook = epub.export(Bookbinder::Package::Openbook)
+    openbook.write('book.openbook')
+For other basic actions, take a look at `lib/bookbinder/operations.rb`. This
+class provides a basic layer of convenience, such as reducing the above to:
+    require 'bookbinder'
+    Bookbinder::Operations.convert('book.epub', 'book.openbook')
+## Improving Bookbinder
+Inside Bookbinder, a "book" is simply a nested hash of properties and values.
+This hash is called "the map". Properties of the map that are transferrable
+between ebook package formats should follow the Openbook convention, which is
+currently maintained here:
+> https://gist.github.com/joseph/7303930
+The key to Bookbinder is this: for every feature of an ebook format, we create
+a "transform" class that does two things:
+- parses the raw config from the package into standard Openbook properties
+  on the map; and
+- generates raw config into the package from those same standard Openbook
+  properties on the map
+Basically, for every feature, the transform class describes how to read it,
+and how to write it.
+The nice thing about this set-up is that if multiple package formats
+support the same feature, their transform classes work on the same map. Say
+you are converting from EPUB3 to Openbook - the book's title is parsed out of
+the EPUB file into the map using the transform at
+`lib/bookbinder/transform/epub/title.rb`. Then the map is handed over to
+the Openbook package, and the transform at
+`lib/bookbinder/transform/openbook/title.rb` would write it out to the
+new package file.
+You can of course convert a package to its own format: in this case the same
+transform class does both the reading and the writing out -- the effect of
+this is to "tidy" the package.
+To add a package format Bookbinder, you should create the package class in
+`lib/bookbinder/package`, then create directory of transforms in
+`lib/bookbinder/transform`. You can borrow transforms from other packages. For
+instance, it might make sense for a DAISY package to share some of the
+transforms in EPUB, or for a hPub package to borrow some transforms
+from Openbook.
+If you are adding a new feature to Bookbinder, you create the
+appropriate transform class for each package that supports the feature, and
+then add the equivalent tests.
+## Planned format support
+* Openbook
+* EPUB3
+* EPUB2
+* hPub
+* PDF
+* ...
+## Attribution and licensing
+Bookbinder was originally developed at OverDrive, Inc. Released under the
+MIT License. See `MIT-LICENSE` in this directory.

data/Rakefile ADDED Viewed

@@ -0,0 +1,12 @@
+#!/usr/bin/env rake
+require 'bundler/gem_tasks'
+require 'rake/testtask'
+Rake::TestTask.new { |t|
+  t.libs << 'test'
+  t.test_files = FileList['test/unit/**/test*.rb']
+  t.verbose = true
+}
+desc('Run tests')
+task(:default => :test)

data/bin/bookbinder ADDED Viewed

@@ -0,0 +1,17 @@
+#!/usr/bin/env ruby
+require 'bookbinder'
+if ARGV[0] == 'map'
+  puts Bookbinder::Operations.map(ARGV[1])
+elsif ARGV[0] == 'validate'
+  puts('Validate: not yet implemented.')
+elsif ARGV[0] == 'normalize'
+  puts('Normalize: still too dangerous.')
+elsif ARGV[0] == 'convert'
+  src_path = ARGV[1]
+  dest_path = ARGV[2]
+  src_pkg, dest_pkg = Bookbinder::Operations.convert(src_path, dest_path)
+  puts("Converted #{src_pkg.class} at #{src_path}")
+  puts("------ to #{dest_pkg.class} at #{dest_path}")
+end

data/lib/bookbinder/document_proxy.rb ADDED Viewed

@@ -0,0 +1,171 @@
+class Bookbinder::DocumentProxy
+  XMLNS = {
+    'xhtml' => 'http://www.w3.org/1999/xhtml',
+    'dc' => 'http://purl.org/dc/elements/1.1/',
+    'dcterms' => 'http://purl.org/dc/terms/',
+    'mathml' => 'http://www.w3.org/1998/Math/MathML',
+    'svg' => 'http://www.w3.org/2000/svg',
+    'ocf' => 'urn:oasis:names:tc:opendocument:xmlns:container',
+    'opf' => 'http://www.idpf.org/2007/opf',
+    'ncx' => 'http://www.daisy.org/z3986/2005/ncx/',
+    'epub' => 'http://www.idpf.org/2007/ops'
+  }
+  XML_PREFIX = {
+    'ibooks' => 'http://vocabulary.itunes.apple.com/rdf/ibooks/vocabulary-extensions-1.0/',
+    'rendition' => 'http://www.idpf.org/vocab/rendition/#'
+  }
+  attr_reader(:doc)
+  def load(string)
+    begin
+      @doc = Nokogiri::XML(string)
+    rescue
+      @doc = Nokogiri::HTML(string)
+    end
+    self
+  end
+  def build(&blk)
+    builder = Nokogiri::XML::Builder.new { |x|
+      @doc = x.doc
+      yield(self, x)
+    }
+    self
+  end
+  def new_node(tag, options = {})
+    Nokogiri::XML::Node.new(tag, doc).tap { |node|
+      yield(node)  if block_given?
+      if parent = options[:append]
+        parent = find(parent)  if parent.kind_of?(String)
+        parent.add_child(node)
+      end
+    }
+  end
+  # Given a CSS query, returns the first result, or nil.
+  #
+  def find(query)
+    @doc.at_css(query, node_namespaces(@doc.root))
+  end
+  # Given a CSS query, returns all results, or an empty array.
+  #
+  def search(query)
+    @doc.css(query, node_namespaces(@doc.root))
+  end
+  # Iterates over the results of the search for the given CSS query.
+  #
+  def each(query, &blk)
+    search(query).each(&blk)
+  end
+  def find_within(node, query)
+    node.at_css(query, node_namespaces(node))
+  end
+  def search_within(node, query)
+    node.css(query, node_namespaces(node))
+  end
+  def each_within(node, query, &blk)
+    search_within(node, query).each(&blk)
+  end
+  def add_namespace(namespace_label, default = false)
+    add_node_namespace(@doc.root, namespace_label, default)
+  end
+  def add_prefix(prefix_label, prefix_attribute = 'prefix')
+    add_node_prefix(@doc.root, prefix_label, prefix_attribute)
+  end
+  def add_node_namespace(node, namespace_label, default = false)
+    xmlns = default ? nil : namespace_label
+    node.add_namespace_definition(xmlns, XMLNS[namespace_label])
+  end
+  def add_node_prefix(node, prefix_label, prefix_attribute = 'prefix')
+    prefix = "#{prefix_label}: #{XML_PREFIX[prefix_label]}"
+    prefixes = [node[prefix_attribute], prefix].compact
+    node[prefix_attribute] = prefixes.join("\n")
+  end
+  def to_str
+    if @doc.kind_of?(Nokogiri::HTML::Document)
+      # Remove the old-style charset meta tag that Nokogiri auto-inserts.
+      # This is nasty business, but apparently once again Nokogiri is
+      # wrong and Markus Gylling knows best:
+      # http://code.google.com/p/epubcheck/issues/detail?id=135#c3
+      html = @doc.to_xhtml
+      html.sub!(
+        '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />',
+        ''
+      )
+      html
+    elsif @doc.kind_of?(Nokogiri::XML::Document)
+      @doc.to_xml
+    end
+  end
+  protected
+    # When performing a css() or at_css() query that has a namespaced
+    # element (epub|switch) or a namespaced attribute (*[epub|type]),
+    # you need to list the namespaces you want to use.
+    #
+    # You could just use the namespaces in the document, but it's possible
+    # that the document author has bound the namespace to something
+    # unexpected. For eg, they might have <dublincore:title> elements,
+    # rather than <dc:title> elements. They probably deserve everything
+    # that's coming to them, but it's valid XML, so we should support it.
+    #
+    # Therefore we should supply the namespace labels that WE expect,
+    # rather than what the document author supplied, and let LibXML
+    # do the translation automatically.
+    #
+    # This method constructs a hash of namespaces we can give to the
+    # Nokogiri query, using our XMLNS constant. It also includes any
+    # namespaces defined on the node or its ancestors -- typically
+    # so that we can get the 'default' namespace for the document.
+    #
+    def node_namespaces(node)
+      @node_namespaces ||= {}
+      @node_namespaces[node] ||= node.namespaces.merge(default_namespaces)
+    end
+    def default_namespaces
+      @default_namespaces ||= XMLNS.inject({}) { |acc, arr|
+        key, val = arr
+        acc.update("xmlns#{key ? ":#{key}" : ''}" => val)
+      }
+    end
+    def method_missing(mthd, *args, &blk)
+      @doc.send(mthd, *args, &blk)
+    end
+end

data/lib/bookbinder/file.rb ADDED Viewed

@@ -0,0 +1,149 @@
+class Bookbinder::File
+  attr_accessor(:path, :file_type)
+  def initialize(path, file_system)
+    @path = path
+    @file_system = file_system
+    @file_type = analyze_file_type(@path)
+  end
+  # Gets a representation of the contents -- if json, as a hash, if xml,
+  # as a Nokogiri document, etc. If mode includes 'w', then the document
+  # will be saved eventually.
+  #
+  def document(mode = 'rw')
+    dirty!  if mode.match(/w/)
+    @document ||= string_to_document
+  end
+  def new_xml_document(&blk)
+    @file_system.write(@path, '')  unless dirty?
+    dirty!
+    @document = Bookbinder::DocumentProxy.new.build(&blk)
+  end
+  # Indicates that we should write out the document on save.
+  #
+  def dirty!
+    @dirty = true
+  end
+  # Has the document changed, therefore we should write it out on save?
+  #
+  def dirty?
+    @dirty ? true : false
+  end
+  # Modifies the file in this file_system.
+  #
+  def save
+    @file_system.write(@path, document_to_string)  if dirty?
+    reset
+  end
+  # Resets state so that the next call to `document` will load it
+  # fresh from the string. Returns self for easy chaining.
+  #
+  def reset
+    @document = nil
+    @dirty = false
+    self
+  end
+  # Writes the file to another file system.
+  #
+  def copy_to(dest_file_system, dest_path)
+    @file_system.get_file(@path) { |file|
+      dest_file_system.set_file(dest_path, file)
+    }
+    self
+  end
+  # Guesses the mime-type (aka content-type, aka media-type)
+  # of this file based on its extension.
+  #
+  def media_type
+    @media_type ||= Bookbinder::MediaType.of(@path)
+  end
+  # Proxy through to FileSystem#exists?.
+  #
+  def exists?
+    @file_system.exists?(path)
+  end
+  # Proxy through to FileSystem#read.
+  #
+  def read
+    @file_system.read(path)
+  end
+  # Proxy through to FileSystem#write.
+  #
+  def write(data)
+    @file_system.write(path, data)
+  end
+  # Proxy through to FileSystem#get_file.
+  #
+  def get_file(mode = 'r', &blk)
+    @file_system.get_file(path, mode, &blk)
+  end
+  # Proxy through to FileSystem#set_file.
+  #
+  def set_file(file_io)
+    @file_system.set_file(path, io)
+  end
+  protected
+    def analyze_file_type(path)
+      if media_type.match(/json/)
+        :json
+      elsif media_type.match(/xml$/) || media_type.match(/html$/)
+        :xml
+      end
+    end
+    def string_to_document(ftype = @file_type)
+      if ftype == :json
+        JSON.load(@file_system.read(@path))
+      elsif ftype == :xml
+        Bookbinder::DocumentProxy.new.load(@file_system.read(@path))
+      elsif ftype == :text
+        @file_system.read(@path)
+      end
+    end
+    def document_to_string
+      raise 'Document not loaded'  unless @document
+      if @document.kind_of?(Bookbinder::DocumentProxy)
+        @document.to_str
+      elsif @document.kind_of?(Hash)
+        JSON.dump(@document)
+      else
+        @document.to_s
+      end
+    end
+end

data/lib/bookbinder/file_system/directory.rb ADDED Viewed

@@ -0,0 +1,62 @@
+class Bookbinder::FileSystem::Directory
+  def initialize(path)
+    @dir = path
+  end
+  def exists?(path)
+    File.exists?(full_path(path))
+  end
+  def read(path)
+    fpath = full_path(path)
+    must_exist(fpath)
+    File.read(fpath)
+  end
+  def write(path, data)
+    fpath = full_path(path)
+    FileUtils.mkdir_p(File.dirname(fpath))
+    File.open(fpath, 'w') { |f| f.write(data) }
+    nil
+  end
+  def get_file(path, mode = 'r', &blk)
+    fpath = full_path(path)
+    must_exist(fpath)  if mode[0] != 'w'
+    File.open(fpath, mode, &blk)
+  end
+  def set_file(path, file)
+    fpath = full_path(path)
+    FileUtils.mkdir_p(File.dirname(fpath))
+    FileUtils.cp(file.path, fpath)
+  end
+  def each
+    Dir.glob(File.join(@dir, '**', '*')) { |path|
+      next  if File.directory?(path)
+      yield(path.sub(/#{@dir}\//, ''))
+    }
+  end
+  protected
+    def must_exist(fpath)
+      return  if File.exists?(fpath)
+      raise(Bookbinder::FileSystem::UnknownPath, fpath)
+    end
+    def full_path(path)
+      File.join(@dir, path)
+    end
+end

data/lib/bookbinder/file_system/memory.rb ADDED Viewed

@@ -0,0 +1,57 @@
+class Bookbinder::FileSystem::Memory
+  def initialize
+    @index = {}
+  end
+  def exists?(path)
+    @index.key?(path)
+  end
+  def read(path)
+    raise(Bookbinder::FileSystem::UnknownPath, path)  unless exists?(path)
+    @index[path]
+  end
+  def write(path, data)
+    @index[path] = data
+    nil
+  end
+  # Creates a tempfile so you can do memory-efficient stuff.
+  #
+  def get_file(path, mode = 'r', &blk)
+    read_before = mode[0] != 'w'
+    write_after = mode[0] != 'r'
+    begin
+      tmpfile = Tempfile.new(File.basename(path))
+      if read_before
+        tmpfile.write(read(path))
+        tmpfile.rewind
+      end
+      blk.call(tmpfile)
+      if write_after
+        tmpfile.rewind
+        write(path, tmpfile.read)
+      end
+    ensure
+      tmpfile.close
+      tmpfile.unlink
+    end
+  end
+  def set_file(path, file)
+    write(path, file.read)
+  end
+  def each(&blk)
+    @index.each_key(&blk)
+  end
+end

data/lib/bookbinder/file_system/zip_file.rb ADDED Viewed

@@ -0,0 +1,106 @@
+class Bookbinder::FileSystem::ZipFile < Bookbinder::FileSystem
+  def initialize(path)
+    @zipfile_path = path
+    @zipfile = nil
+    instantiate_zipfile  if File.exists?(@zipfile_path)
+  end
+  def exists?(path)
+    @zipfile && @zipfile.find_entry(path) ? true : false
+  end
+  def read(path)
+    must_exist(path)
+    @zipfile.read(path)
+  end
+  def write(path, data)
+    return set_mimetype(data)  if path == 'mimetype'
+    instantiate_zipfile
+    @zipfile.get_output_stream(path) { |f| f.write(data) }
+    nil
+  end
+  def get_file(path, mode = 'r', &blk)
+    read_before = mode[0] != 'w'
+    write_after = mode[0] != 'r'
+    if read_before
+      must_exist(path)
+      tmp_path = Dir::Tmpname.create(File.basename(path)) { |tmp_path|
+        raise Errno::EEXIST  if File.exists?(tmp_path)
+      }
+      @zipfile.commit
+      @zipfile.extract(path, tmp_path)
+      File.open(tmp_path, mode) { |tmp_file|
+        blk.call(tmp_file)
+        if write_after
+          set_file(path, tmp_file)
+          @zipfile.commit
+        end
+      }
+      File.unlink(tmp_path)
+    else
+      tmp_file = Tempfile.new(File.basename(path))
+      blk.call(tmp_file)
+      tmp_file.close
+      set_file(path, tmp_file)  if write_after
+      tmp_file.unlink
+    end
+    nil
+  end
+  def set_file(path, file)
+    instantiate_zipfile
+    @zipfile.add(path, file.path) { true }
+    @zipfile.commit
+    nil
+  end
+  def each(&blk)
+    @zipfile.each(&blk)
+  end
+  def close
+    @zipfile.close  if @zipfile
+  end
+  protected
+    def must_exist(path)
+      raise(Bookbinder::FileSystem::UnknownPath, path)  unless exists?(path)
+    end
+    def instantiate_zipfile(options = {})
+      @zipfile ||= Zip::File.open(@zipfile_path, Zip::File::CREATE)
+    end
+    # This is all EPUB's fault.
+    #
+    def set_mimetype(mimetype)
+      if @zipfile || File.exists?(@zipfile_path)
+        raise("Cannot set mimetype for existing archive: #{@zipfile_path}")
+      end
+      Zip::OutputStream.open(@zipfile_path) { |stream|
+        stream.put_next_entry(
+          'mimetype',
+          nil,
+          nil,
+          Zip::Entry::STORED,
+          Zlib::NO_COMPRESSION
+        )
+        stream.write(mimetype)
+      }
+    end
+end

data/lib/bookbinder/file_system.rb ADDED Viewed

@@ -0,0 +1,35 @@
+class Bookbinder::FileSystem
+  def exists?(path)
+    # IMPLEMENT IN SUBCLASS
+  end
+  def read(path)
+    # IMPLEMENT IN SUBCLASS
+  end
+  def write(path, data)
+    # IMPLEMENT IN SUBCLASS
+  end
+  def get_file(path, mode = 'r', &blk)
+    # IMPLEMENT IN SUBCLASS
+  end
+  def set_file(path, file)
+    # IMPLEMENT IN SUBCLASS
+  end
+  def each(&blk)
+    # IMPLEMENT IN SUBCLASS
+  end
+  class UnknownPath < RuntimeError; end
+end