RubyGems - invisiblellama-repub - Versions diffs - 0.3.1 → 0.3.2 - Mend

invisiblellama-repub 0.3.1 → 0.3.2

Files changed (12) hide show

data/History.txt CHANGED Viewed

@@ -11,3 +11,8 @@
 == 0.3.1 / 2009-06-28
 * Fixed App.data_path bug
+== 0.3.2 / 2009-06-30
+* Improved Win32 support
+* Updated documentation

data/README.rdoc ADDED Viewed

@@ -0,0 +1,173 @@
+== Repub
+by Invisible Llama (dg at invisiblellama dot net)
+{RubyForge Project}[http://rubyforge.org/projects/repub/] | {Github}[http://github.com/invisiblellama/repub/tree/master]
+== DESCRIPTION:
+Repub is a simple HTML to ePub converter.
+It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
+for title and table of contents. In return, it provides you with greater control over generated
+ePub documents.
+== FEATURES:
+Repub accepts the following parameters:
+* Source document URL
+* List of XPath expressions for locating source document title, table of contents, TOC items and TOC sub-sections
+* List of XPath expressions for describing elements that will be removed from the converted document
+* List of regular expressions for editing the source document
+* Publication information metadata tags
+All parameters except document URL are optional; the resulting ePub will (probably, if original HTML isn't
+broken too bad) be readable but will be lacking any metadata or TOC.
+Few examples:
+* Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES (with proper table of contents)
+    repub -x 'title:div[@class='book']//h1' \
+      -x 'toc://table' \
+      -x 'toc_item://tr' \
+      http://www.gutenberg.org/dirs/etext99/advsh12h.htm
+This tells Repub to look for title in the first found H1 in the DIV of class "book"; that table of contents is
+located in the first TABLE and TOC item can be found inside TR.
+The above will produce readable ePub which can be further enhanced by removing some "noise" content:
+    repub -x 'title:div[@class='book']//h1' \
+      -x 'toc://table' \
+      -x 'toc_item://tr' \
+      -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
+      http://www.gutenberg.org/dirs/etext99/advsh12h.htm
+In addition to parsing, the above command also removes from the final version of document all PREs, HRs and
+first H1 and H2 elements from the body.
+A bit more complicated example:
+* Git User's Manual
+    repub -x 'title://h1' \
+      -x 'toc://div[@class="toc"]/dl' \
+      -x 'toc_item:dt' \
+      -x 'toc_section:following-sibling::*[1]/dl' \
+      -w git-manual \
+      http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
+This tells Repub to look for title in the first found H1, for TOC in the DL element of the DIV with class "toc" and
+that TOC items can be found inside DT elements. Additionally, TOC item can have a child TOC section inside DL when
+DL element immediately follows DT.
+The above command also saves all XPath expressions as "git-manual" profile, which can be later reused to save keystrokes.
+For example, if you later decide to regenerate Git Manual ePub without TOC at the beginning of document, you can do
+    repub -l git-manual -X '//div[@class="toc"]' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
+A few more examples:
+* GNU Wget Manual
+    repub -m 'creator:gnu.org' \
+      -x 'title://h1' -x 'toc://div[@class="contents"]/ul' -x 'toc_item:li' -x 'toc_section:ul' \
+      -X '//div[@class="contents"]' \
+      http://www.gnu.org/software/wget/manual/wget.html
+* Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
+    repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' \
+      http://www.gutenberg.org/files/11/11-h/11-h.htm
+* The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
+    repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
+== SYNOPSIS:
+  Usage: repub [options] url
+  General options:
+    -D, --downloader NAME            Which downloader to use to get files (wget or httrack).
+                                     Default is wget.
+    -o, --output PATH                Output path for generated ePub file.
+                                     Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
+    -w, --write-profile NAME         Save given options for later reuse as profile NAME.
+    -l, --load-profile NAME          Load options from saved profile NAME.
+    -W, --write-default              Save given options for later reuse as default profile.
+    -L, --list-profiles              List saved profiles.
+    -C, --cleanup                    Clean up download cache.
+    -v, --verbose                    Turn on verbose output.
+    -q, --quiet                      Turn off any output except errors.
+    -V, --version                    Show version.
+    -h, --help                       Show this help message.
+  Parser options:
+    -x, --selector NAME:VALUE        Set parser XPath selector NAME to VALUE.
+                                     Recognized selectors are: [title toc toc_item toc_section]
+    -m, --meta NAME:VALUE            Set publication information metadata NAME to VALUE.
+                                     Valid metadata names are: [creator date description
+                                     language publisher relation rights subject title]
+    -F, --no-fixup                   Do not attempt to make document meet XHTML 1.0 Strict.
+                                     Default is to try and fix things that are broken.
+    -e, --encoding NAME              Set source document encoding. Default is to auto detect.
+  Post-processing options:
+    -s, --stylesheet PATH            Use custom stylesheet at PATH to add or override existing
+                                     CSS references in the source document.
+    -X, --remove SELECTOR            Remove source element using XPath selector.
+                                     Use -X- to ignore stored profile.
+    -R, --rx /PATTERN/REPLACEMENT/   Edit source HTML using regular expressions.
+                                     Use -R- to ignore stored profile.
+    -B, --browse                     After processing, open resulting HTML in default browser.
+== DEPENDENCIES:
+* {Builder}[http://rubyforge.org/projects/builder/]
+* {Nokogiri}[http://nokogiri.rubyforge.org/nokogiri/]
+* {chardet}[http://rubyforge.org/projects/chardet/]
+* {launchy}[http://copiousfreetime.rubyforge.org/launchy/]
+Also, the following tools must be somewhere in $PATH:
+* {wget}[http://www.gnu.org/software/wget/] or {httrack}[http://www.httrack.com/]
+* {zip (Info-ZIP)}[http://www.info-zip.org/]
+== LIMITATIONS/BUGS:
+Currently, only "everything-on-one-page" HTML sources are supported. Repub will download and process all page requisites
+(stylesheets and images) but all actual content must be on one page.
+Bugs: probably. If you find any, please report them to dg at invisiblellama dot net.
+== INSTALL:
+    gem install repub
+== LICENSE:
+(The MIT License)
+Copyright (c) 2009 Dmitri Goutnik, Invisible Llama
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+==

data/Rakefile CHANGED Viewed

@@ -17,9 +17,10 @@ task :default => 'test:run'
 PROJ.name = 'repub'
 PROJ.authors = 'Dmitri Goutnik'
 PROJ.email = 'dg@invisiblellama.net'
-PROJ.url = 'http://github.com/invisiblellama/repub/tree/master'
+PROJ.url = 'http://rubyforge.org/projects/repub/'
 PROJ.version = Repub::VERSION
 PROJ.rubyforge.name = 'repub'
+PROJ.readme_file = 'README.rdoc'
 PROJ.exclude = %w[tmp/ \.git \.DS_Store .*\.tmproj .*\.epub ^pkg/]
 PROJ.spec.opts << '--color'

data/TODO CHANGED Viewed

@@ -1,3 +1,3 @@
-√ add support for rx cleaning/modifying source doc
-√ make -q/-v actually do something
-  more parser tokens: author(s) etc
+* add support for rx cleaning/modifying source doc
+* make -q/-v actually do something
+  more parser tokens: author(s) etc ?

data/lib/repub.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 module Repub
   # :stopdoc:
-  VERSION = '0.3.1'
+  VERSION = '0.3.2'
   LIBPATH = File.expand_path(File.dirname(__FILE__)) + File::SEPARATOR
   PATH = File.dirname(LIBPATH) + File::SEPARATOR
   # :startdoc:

data/lib/repub/app/builder.rb CHANGED Viewed

@@ -76,6 +76,40 @@ module Repub
         MetaInf = 'META-INF'
+        def copy_and_process_assets
+          # Copy html
+          @parser.cache.assets[:documents].each do |asset|
+            log.debug "-- Processing document #{asset}"
+            # Copy asset from cache
+            FileUtils.cp(File.join(@parser.cache.path, asset), '.')
+            # Do post-processing
+            postprocess_file(asset)
+            postprocess_doc(asset)
+            @content.add_document(asset)
+            @asset_path = File.expand_path(asset)
+          end
+          # Copy css
+          if @options[:css].nil? || @options[:css].empty?
+            # No custom css, copy one from assets
+            @parser.cache.assets[:stylesheets].each do |css|
+              log.debug "-- Copying stylesheet #{css}"
+              FileUtils.cp(File.join(@parser.cache.path, css), '.')
+              @content.add_stylesheet(css)
+            end
+          else
+            # Copy custom css
+            log.debug "-- Using custom stylesheet #{@options[:css]}"
+            FileUtils.cp(@options[:css], '.')
+            @content.add_stylesheet(File.basename(@options[:css]))
+          end
+          # Copy images
+          @parser.cache.assets[:images].each do |image|
+            log.debug "-- Copying image #{image}"
+            FileUtils.cp(File.join(@parser.cache.path, image), '.')
+            @content.add_image(image)
+          end
+        end
         def postprocess_file(asset)
           source = IO.read(asset)
           # Do rx substitutions
@@ -104,11 +138,11 @@ module Repub
         end
         def postprocess_doc(asset)
-          doc = Nokogiri::HTML.parse(open(asset), nil, 'UTF-8')
+          doc = Nokogiri::HTML.parse(IO.read(asset), nil, 'UTF-8')
           # Substitute custom CSS
           if (@options[:css] && !@options[:css].empty?)
-            doc.xpath('//link[@rel="stylesheet"]') do |link|
-              link[:href] = File.basename(@options[:css])
+            doc.xpath('//link[@rel="stylesheet"]').each do |link|
+              link['href'] = File.basename(@options[:css])
               log.debug "-- Replacing CSS refs with #{link[:href]}"
             end
           end
@@ -116,8 +150,6 @@ module Repub
           if @options[:remove] && !@options[:remove].empty?
             @options[:remove].each do |selector|
               log.info "Removing elements matching selector \"#{selector}\""
-              #p doc.search(selector).size
-              #p doc.search(selector)
               doc.search(selector).remove
             end
           end
@@ -134,40 +166,6 @@ module Repub
           end
         end
-        def copy_and_process_assets
-          # Copy html
-          @parser.cache.assets[:documents].each do |asset|
-            log.debug "-- Processing document #{asset}"
-            # Copy asset from cache
-            FileUtils.cp(File.join(@parser.cache.path, asset), '.')
-            # Do post-processing
-            postprocess_file(asset)
-            postprocess_doc(asset)
-            @content.add_document(asset)
-            @asset_path = File.expand_path(asset)
-          end
-          # Copy css
-          if @options[:css].nil? || @options[:css].empty?
-            # No custom css, copy one from assets
-            @parser.cache.assets[:stylesheets].each do |css|
-              log.debug "-- Copying stylesheet #{css}"
-              FileUtils.cp(File.join(@parser.cache.path, css), '.')
-              @content.add_stylesheet(css)
-            end
-          else
-            # Copy custom css
-            log.debug "-- Using custom stylesheet #{@options[:css]}"
-            FileUtils.cp(@options[:css], '.')
-            @content.add_stylesheet(File.basename(@options[:css]))
-          end
-          # Copy images
-          @parser.cache.assets[:images].each do |image|
-            log.debug "-- Copying image #{image}"
-            FileUtils.cp(File.join(@parser.cache.path, image), '.')
-            @content.add_image(image)
-          end
-        end
         def write_meta_inf
           FileUtils.mkdir_p(MetaInf)
           FileUtils.chdir(MetaInf) do

data/lib/repub/app/fetcher.rb CHANGED Viewed

@@ -54,17 +54,41 @@ module Repub
           rescue
             raise FetcherException, "invalid URL: #{url}"
           end
-          cmd = "#{@downloader_path} #{@downloader_options} #{url}"
           Cache.for_url(url) do |cache|
             log.debug "-- Downloading into #{cache.path}"
+            cmd = "#{@downloader_path} #{@downloader_options} #{url}"
             unless system(cmd) && !cache.empty?
               raise FetcherException, "Fetch failed."
             end
+            unless cache.cached?
+              fix_filenames(cache)
+              fix_encoding(cache, @options[:encoding])
+            end
           end
         end
         private
+        def fix_filenames(cache)
+          # TODO: fix non-alphanum characters in doc filenames
+        end
+        def fix_encoding(cache, encoding = nil)
+          cache.assets[:documents].each do |doc|
+            unless encoding
+              log.info "Detecting encoding for #{doc}"
+              s = IO.read(doc)
+              raise FetcherException, "empty document" unless s
+              encoding = UniversalDetector.chardet(s)['encoding']
+            end
+            if encoding.downcase != 'utf-8'
+              log.info "Source encoding is #{encoding}, converting to UTF-8"
+              s = Iconv.conv('utf-8', encoding, IO.read(doc))
+              File.open(doc, 'w') { |f| f.write(s) }
+            end
+          end
+        end
         def which(cmd)
           if !RUBY_PLATFORM.match('mswin')
             cmd = `/usr/bin/which #{cmd}`.strip
@@ -81,10 +105,6 @@ module Repub
           return File.join(App.data_path, 'cache')
         end
-        def self.inventorize
-          # TODO
-        end
         def self.cleanup
           Dir.chdir(self.root) { FileUtils.rm_r(Dir.glob('*')) }
         rescue
@@ -94,16 +114,15 @@ module Repub
         attr_reader :url
         attr_reader :name
         attr_reader :path
-        attr_reader :assets
         def self.for_url(url, &block)
           self.new(url).for_url(&block)
         end
         def for_url(&block)
           # Download stuff if not yet cached
-          cached = File.exist?(@path)
-          unless cached
+          @cached = File.exist?(@path)
+          unless @cached
             FileUtils.mkdir_p(@path)
             begin
               Dir.chdir(@path) { yield self }
@@ -115,40 +134,33 @@ module Repub
             log.info "Using cached assets"
             log.debug "-- Cache is #{@path}"
           end
-          # Do post-download tasks
-          Dir.chdir(@path) do
+          self
+        end
+        def assets
+          unless @assets
             # Enumerate assets
-            @assets = {}
-            AssetTypes.each_pair do |asset_type, file_types|
-              @assets[asset_type] ||= []
-              file_types.each do |file_type|
-                @assets[asset_type] << Dir.glob("*.#{file_type}")
-              end
-              @assets[asset_type].flatten!
-            end
-            # For freshly downloaded docs, detect encoding and convert to utf-8
-            unless cached
-              @assets[:documents].each do |doc|
-                log.info "Detecting encoding for #{doc}"
-                s = IO.read(doc)
-                raise FetcherException, "empty document" unless s
-                encoding = UniversalDetector.chardet(s)['encoding']
-                if encoding.downcase != 'utf-8'
-                  log.info "Looks like #{encoding}, converting to UTF-8"
-                  s = Iconv.conv('utf-8', encoding, IO.read(doc))
-                  File.open(doc, 'w') { |f| f.write(s) }
-                else
-                  log.info "Looks like UTF-8, no conversion needed"
+            Dir.chdir(@path) do
+              @assets = {}
+              AssetTypes.each_pair do |asset_type, file_types|
+                @assets[asset_type] ||= []
+                file_types.each do |file_type|
+                  @assets[asset_type] << Dir.glob("*.#{file_type}")
                 end
+                @assets[asset_type].flatten!
               end
             end
           end
-          self
+          @assets
         end
         def empty?
           Dir.glob(File.join(@path, '*')).empty?
         end
+        def cached?
+          @cached == true
+        end
         private
@@ -156,6 +168,7 @@ module Repub
           @url = url
           @name = Digest::SHA1.hexdigest(@url)
           @path = File.join(Cache.root, @name)
+          @assets = nil
         end
       end

data/lib/repub/app/parser.rb CHANGED Viewed

@@ -43,7 +43,7 @@ module Repub
           @cache = cache
           @asset = @cache.assets[:documents][0]
           log.debug "-- Parsing #{@asset}"
-          @doc = Nokogiri::HTML.parse(open(File.join(@cache.path, @asset)), nil, 'UTF-8')
+          @doc = Nokogiri::HTML.parse(IO.read(File.join(@cache.path, @asset)), nil, 'UTF-8')
           @uid = @cache.name
           parse_title

data/repub.gemspec CHANGED Viewed

@@ -2,23 +2,27 @@
 Gem::Specification.new do |s|
   s.name = %q{repub}
-  s.version = "0.3.1"
+  s.version = "0.3.2"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
   s.authors = ["Dmitri Goutnik"]
-  s.date = %q{2009-06-28}
+  s.date = %q{2009-06-30}
   s.default_executable = %q{repub}
-  s.description = %q{Simple HTML to ePub converter.}
+  s.description = %q{Repub is a simple HTML to ePub converter.
+It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
+for title and table of contents. In return, it provides you with greater control over generated
+ePub documents.}
   s.email = %q{dg@invisiblellama.net}
   s.executables = ["repub"]
-  s.extra_rdoc_files = ["History.txt", "README.txt", "SAMPLES.txt", "bin/repub"]
-  s.files = ["History.txt", "README.txt", "Rakefile", "SAMPLES.txt", "TODO", "bin/repub", "lib/repub.rb", "lib/repub/app.rb", "lib/repub/app/builder.rb", "lib/repub/app/fetcher.rb", "lib/repub/app/logger.rb", "lib/repub/app/options.rb", "lib/repub/app/parser.rb", "lib/repub/app/profile.rb", "lib/repub/app/utility.rb", "lib/repub/epub.rb", "lib/repub/epub/container.rb", "lib/repub/epub/content.rb", "lib/repub/epub/toc.rb", "repub.gemspec", "test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
-  s.homepage = %q{http://github.com/invisiblellama/repub/tree/master}
-  s.rdoc_options = ["--main", "README.txt"]
+  s.extra_rdoc_files = ["History.txt", "README.rdoc", "bin/repub"]
+  s.files = ["History.txt", "README.rdoc", "Rakefile", "TODO", "bin/repub", "lib/repub.rb", "lib/repub/app.rb", "lib/repub/app/builder.rb", "lib/repub/app/fetcher.rb", "lib/repub/app/logger.rb", "lib/repub/app/options.rb", "lib/repub/app/parser.rb", "lib/repub/app/profile.rb", "lib/repub/app/utility.rb", "lib/repub/epub.rb", "lib/repub/epub/container.rb", "lib/repub/epub/content.rb", "lib/repub/epub/toc.rb", "repub.gemspec", "test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
+  s.homepage = %q{http://rubyforge.org/projects/repub/}
+  s.rdoc_options = ["--main", "README.rdoc"]
   s.require_paths = ["lib"]
   s.rubyforge_project = %q{repub}
   s.rubygems_version = %q{1.3.4}
-  s.summary = %q{Simple HTML to ePub converter}
+  s.summary = %q{Repub is a simple HTML to ePub converter}
   s.test_files = ["test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
   if s.respond_to? :specification_version then

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: invisiblellama-repub
 version: !ruby/object:Gem::Version
-  version: 0.3.1
+  version: 0.3.2
 platform: ruby
 authors:
 - Dmitri Goutnik
@@ -9,7 +9,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2009-06-28 00:00:00 -07:00
+date: 2009-06-30 00:00:00 -07:00
 default_executable: repub
 dependencies:
 - !ruby/object:Gem::Dependency
@@ -62,7 +62,7 @@ dependencies:
       - !ruby/object:Gem::Version
         version: 2.5.1
     version:
-description: Simple HTML to ePub converter.
+description: Repub is a simple HTML to ePub converter.  It lacks imagination and won't try to guess the source document structure, you will have to describe where to look for title and table of contents. In return, it provides you with greater control over generated ePub documents.
 email: dg@invisiblellama.net
 executables:
 - repub
@@ -70,14 +70,12 @@ extensions: []
 extra_rdoc_files:
 - History.txt
-- README.txt
-- SAMPLES.txt
+- README.rdoc
 - bin/repub
 files:
 - History.txt
-- README.txt
+- README.rdoc
 - Rakefile
-- SAMPLES.txt
 - TODO
 - bin/repub
 - lib/repub.rb
@@ -102,11 +100,11 @@ files:
 - test/test_logger.rb
 - test/test_parser.rb
 has_rdoc: false
-homepage: http://github.com/invisiblellama/repub/tree/master
+homepage: http://rubyforge.org/projects/repub/
 post_install_message:
 rdoc_options:
 - --main
-- README.txt
+- README.rdoc
 require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
@@ -127,7 +125,7 @@ rubyforge_project: repub
 rubygems_version: 1.2.0
 signing_key:
 specification_version: 3
-summary: Simple HTML to ePub converter
+summary: Repub is a simple HTML to ePub converter
 test_files:
 - test/epub/test_container.rb
 - test/epub/test_content.rb

data/README.txt DELETED Viewed

@@ -1,106 +0,0 @@
-== DESCRIPTION:
-Simple HTML to ePub converter.
-== FEATURES/PROBLEMS:
-Few samples to get started:
-* Git User's Manual
-    repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' \
-        http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
-* Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES
-    repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' \
-        -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
-	    http://www.gutenberg.org/dirs/etext99/advsh12h.htm
-* Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
-    repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' \
-	    -X '//pre' -X '//hr' -X '//body/h4' \
-	    http://www.gutenberg.org/files/11/11-h/11-h.htm
-* The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
-    repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
-== SYNOPSIS:
-Usage: repub [options] url
-General options:
-  -D, --downloader NAME            Which downloader to use to get files (wget or httrack).
-                                   Default is wget.
-  -o, --output PATH                Output path for generated ePub file.
-                                   Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
-  -w, --write-profile NAME         Save given options for later reuse as profile NAME.
-  -l, --load-profile NAME          Load options from saved profile NAME.
-  -W, --write-default              Save given options for later reuse as default profile.
-  -L, --list-profiles              List saved profiles.
-  -C, --cleanup                    Clean up download cache.
-  -v, --verbose                    Turn on verbose output.
-  -q, --quiet                      Turn off any output except errors.
-  -V, --version                    Show version.
-  -h, --help                       Show this help message.
-Parser options:
-  -x, --selector NAME:VALUE        Set parser XPath selector NAME to VALUE.
-                                   Recognized selectors are: [title toc toc_item toc_section]
-  -m, --meta NAME:VALUE            Set publication information metadata NAME to VALUE.
-                                   Valid metadata names are: [creator date description
-                                   language publisher relation rights subject title]
-  -F, --no-fixup                   Do not attempt to make document meet XHTML 1.0 Strict.
-                                   Default is to try and fix things that are broken.
-  -e, --encoding NAME              Set source document encoding. Default is to autodetect.
-Post-processing options:
-  -s, --stylesheet PATH            Use custom stylesheet at PATH to add or override existing
-                                   CSS references in the source document.
-  -X, --remove SELECTOR            Remove source element using XPath selector.
-                                   Use -X- to ignore stored profile.
-  -R, --rx /PATTERN/REPLACEMENT/   Edit source HTML using regular expressions.
-                                   Use -R- to ignore stored profile.
-  -B, --browse                     After processing, open resulting HTML in default browser.
-== DEPENDENCIES:
-* Builder (https://rubyforge.org/projects/builder/)
-* Nokogiri (http://nokogiri.rubyforge.org/nokogiri/)
-* rchardet (https://rubyforge.org/projects/rchardet/)
-* launchy (http://copiousfreetime.rubyforge.org/launchy/)
-* wget or httrack
-* zip (Info-ZIP)
-== INSTALL:
-    gem install repub
-== LICENSE:
-(The MIT License)
-Copyright (c) 2009 Invisible Llama <dg@invisiblellama.net>
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
-==

data/SAMPLES.txt DELETED Viewed

@@ -1,23 +0,0 @@
-* THE ADVENTURES OF SHERLOCK HOLMES
-repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' http://www.gutenberg.org/dirs/etext99/advsh12h.htm
-* ALICE'S ADVENTURES IN WONDERLAND
-repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' http://www.gutenberg.org/files/11/11-h/11-h.htm
-* The Gelug-Kagyu Tradition of Mahamudra
-repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
-* Брюс Стерлинг. Схизматрица
-repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/STERLINGB/shizmatrica.txt_with-big-pictures.html
-* Айзек Азимов. Космические течения
-repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/FOUNDATION/currspac.txt_with-big-pictures.html
-* Git User's Manual
-repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html