repub 0.3.1 → 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -11,3 +11,8 @@
11
11
  == 0.3.1 / 2009-06-28
12
12
 
13
13
  * Fixed App.data_path bug
14
+
15
+ == 0.3.2 / 2009-06-30
16
+
17
+ * Improved Win32 support
18
+ * Updated documentation
@@ -0,0 +1,173 @@
1
+ == Repub
2
+
3
+ by Invisible Llama (dg at invisiblellama dot net)
4
+
5
+ {RubyForge Project}[http://rubyforge.org/projects/repub/] | {Github}[http://github.com/invisiblellama/repub/tree/master]
6
+
7
+ == DESCRIPTION:
8
+
9
+ Repub is a simple HTML to ePub converter.
10
+
11
+ It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
12
+ for title and table of contents. In return, it provides you with greater control over generated
13
+ ePub documents.
14
+
15
+ == FEATURES:
16
+
17
+ Repub accepts the following parameters:
18
+
19
+ * Source document URL
20
+ * List of XPath expressions for locating source document title, table of contents, TOC items and TOC sub-sections
21
+ * List of XPath expressions for describing elements that will be removed from the converted document
22
+ * List of regular expressions for editing the source document
23
+ * Publication information metadata tags
24
+
25
+ All parameters except document URL are optional; the resulting ePub will (probably, if original HTML isn't
26
+ broken too bad) be readable but will be lacking any metadata or TOC.
27
+
28
+ Few examples:
29
+
30
+ * Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES (with proper table of contents)
31
+
32
+ repub -x 'title:div[@class='book']//h1' \
33
+ -x 'toc://table' \
34
+ -x 'toc_item://tr' \
35
+ http://www.gutenberg.org/dirs/etext99/advsh12h.htm
36
+
37
+ This tells Repub to look for title in the first found H1 in the DIV of class "book"; that table of contents is
38
+ located in the first TABLE and TOC item can be found inside TR.
39
+ The above will produce readable ePub which can be further enhanced by removing some "noise" content:
40
+
41
+ repub -x 'title:div[@class='book']//h1' \
42
+ -x 'toc://table' \
43
+ -x 'toc_item://tr' \
44
+ -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
45
+ http://www.gutenberg.org/dirs/etext99/advsh12h.htm
46
+
47
+ In addition to parsing, the above command also removes from the final version of document all PREs, HRs and
48
+ first H1 and H2 elements from the body.
49
+
50
+ A bit more complicated example:
51
+
52
+ * Git User's Manual
53
+
54
+ repub -x 'title://h1' \
55
+ -x 'toc://div[@class="toc"]/dl' \
56
+ -x 'toc_item:dt' \
57
+ -x 'toc_section:following-sibling::*[1]/dl' \
58
+ -w git-manual \
59
+ http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
60
+
61
+ This tells Repub to look for title in the first found H1, for TOC in the DL element of the DIV with class "toc" and
62
+ that TOC items can be found inside DT elements. Additionally, TOC item can have a child TOC section inside DL when
63
+ DL element immediately follows DT.
64
+
65
+ The above command also saves all XPath expressions as "git-manual" profile, which can be later reused to save keystrokes.
66
+ For example, if you later decide to regenerate Git Manual ePub without TOC at the beginning of document, you can do
67
+
68
+ repub -l git-manual -X '//div[@class="toc"]' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
69
+
70
+ A few more examples:
71
+
72
+ * GNU Wget Manual
73
+
74
+ repub -m 'creator:gnu.org' \
75
+ -x 'title://h1' -x 'toc://div[@class="contents"]/ul' -x 'toc_item:li' -x 'toc_section:ul' \
76
+ -X '//div[@class="contents"]' \
77
+ http://www.gnu.org/software/wget/manual/wget.html
78
+
79
+ * Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
80
+
81
+ repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' \
82
+ http://www.gutenberg.org/files/11/11-h/11-h.htm
83
+
84
+ * The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
85
+
86
+ repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
87
+
88
+ == SYNOPSIS:
89
+
90
+ Usage: repub [options] url
91
+
92
+ General options:
93
+ -D, --downloader NAME Which downloader to use to get files (wget or httrack).
94
+ Default is wget.
95
+ -o, --output PATH Output path for generated ePub file.
96
+ Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
97
+ -w, --write-profile NAME Save given options for later reuse as profile NAME.
98
+ -l, --load-profile NAME Load options from saved profile NAME.
99
+ -W, --write-default Save given options for later reuse as default profile.
100
+ -L, --list-profiles List saved profiles.
101
+ -C, --cleanup Clean up download cache.
102
+ -v, --verbose Turn on verbose output.
103
+ -q, --quiet Turn off any output except errors.
104
+ -V, --version Show version.
105
+ -h, --help Show this help message.
106
+
107
+ Parser options:
108
+ -x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE.
109
+ Recognized selectors are: [title toc toc_item toc_section]
110
+ -m, --meta NAME:VALUE Set publication information metadata NAME to VALUE.
111
+ Valid metadata names are: [creator date description
112
+ language publisher relation rights subject title]
113
+ -F, --no-fixup Do not attempt to make document meet XHTML 1.0 Strict.
114
+ Default is to try and fix things that are broken.
115
+ -e, --encoding NAME Set source document encoding. Default is to auto detect.
116
+
117
+ Post-processing options:
118
+ -s, --stylesheet PATH Use custom stylesheet at PATH to add or override existing
119
+ CSS references in the source document.
120
+ -X, --remove SELECTOR Remove source element using XPath selector.
121
+ Use -X- to ignore stored profile.
122
+ -R, --rx /PATTERN/REPLACEMENT/ Edit source HTML using regular expressions.
123
+ Use -R- to ignore stored profile.
124
+ -B, --browse After processing, open resulting HTML in default browser.
125
+
126
+ == DEPENDENCIES:
127
+
128
+ * {Builder}[http://rubyforge.org/projects/builder/]
129
+ * {Nokogiri}[http://nokogiri.rubyforge.org/nokogiri/]
130
+ * {chardet}[http://rubyforge.org/projects/chardet/]
131
+ * {launchy}[http://copiousfreetime.rubyforge.org/launchy/]
132
+
133
+ Also, the following tools must be somewhere in $PATH:
134
+
135
+ * {wget}[http://www.gnu.org/software/wget/] or {httrack}[http://www.httrack.com/]
136
+ * {zip (Info-ZIP)}[http://www.info-zip.org/]
137
+
138
+ == LIMITATIONS/BUGS:
139
+
140
+ Currently, only "everything-on-one-page" HTML sources are supported. Repub will download and process all page requisites
141
+ (stylesheets and images) but all actual content must be on one page.
142
+
143
+ Bugs: probably. If you find any, please report them to dg at invisiblellama dot net.
144
+
145
+ == INSTALL:
146
+
147
+ gem install repub
148
+
149
+ == LICENSE:
150
+
151
+ (The MIT License)
152
+
153
+ Copyright (c) 2009 Dmitri Goutnik, Invisible Llama
154
+
155
+ Permission is hereby granted, free of charge, to any person obtaining a copy
156
+ of this software and associated documentation files (the "Software"), to deal
157
+ in the Software without restriction, including without limitation the rights
158
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
159
+ copies of the Software, and to permit persons to whom the Software is
160
+ furnished to do so, subject to the following conditions:
161
+
162
+ The above copyright notice and this permission notice shall be included in
163
+ all copies or substantial portions of the Software.
164
+
165
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
166
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
167
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
168
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
169
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
170
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
171
+ THE SOFTWARE.
172
+
173
+ ==
data/Rakefile CHANGED
@@ -17,9 +17,10 @@ task :default => 'test:run'
17
17
  PROJ.name = 'repub'
18
18
  PROJ.authors = 'Dmitri Goutnik'
19
19
  PROJ.email = 'dg@invisiblellama.net'
20
- PROJ.url = 'http://github.com/invisiblellama/repub/tree/master'
20
+ PROJ.url = 'http://rubyforge.org/projects/repub/'
21
21
  PROJ.version = Repub::VERSION
22
22
  PROJ.rubyforge.name = 'repub'
23
+ PROJ.readme_file = 'README.rdoc'
23
24
  PROJ.exclude = %w[tmp/ \.git \.DS_Store .*\.tmproj .*\.epub ^pkg/]
24
25
 
25
26
  PROJ.spec.opts << '--color'
data/TODO CHANGED
@@ -1,3 +1,3 @@
1
- add support for rx cleaning/modifying source doc
2
- make -q/-v actually do something
3
- more parser tokens: author(s) etc
1
+ * add support for rx cleaning/modifying source doc
2
+ * make -q/-v actually do something
3
+ more parser tokens: author(s) etc ?
@@ -1,7 +1,7 @@
1
1
  module Repub
2
2
 
3
3
  # :stopdoc:
4
- VERSION = '0.3.1'
4
+ VERSION = '0.3.2'
5
5
  LIBPATH = File.expand_path(File.dirname(__FILE__)) + File::SEPARATOR
6
6
  PATH = File.dirname(LIBPATH) + File::SEPARATOR
7
7
  # :startdoc:
@@ -76,6 +76,40 @@ module Repub
76
76
 
77
77
  MetaInf = 'META-INF'
78
78
 
79
+ def copy_and_process_assets
80
+ # Copy html
81
+ @parser.cache.assets[:documents].each do |asset|
82
+ log.debug "-- Processing document #{asset}"
83
+ # Copy asset from cache
84
+ FileUtils.cp(File.join(@parser.cache.path, asset), '.')
85
+ # Do post-processing
86
+ postprocess_file(asset)
87
+ postprocess_doc(asset)
88
+ @content.add_document(asset)
89
+ @asset_path = File.expand_path(asset)
90
+ end
91
+ # Copy css
92
+ if @options[:css].nil? || @options[:css].empty?
93
+ # No custom css, copy one from assets
94
+ @parser.cache.assets[:stylesheets].each do |css|
95
+ log.debug "-- Copying stylesheet #{css}"
96
+ FileUtils.cp(File.join(@parser.cache.path, css), '.')
97
+ @content.add_stylesheet(css)
98
+ end
99
+ else
100
+ # Copy custom css
101
+ log.debug "-- Using custom stylesheet #{@options[:css]}"
102
+ FileUtils.cp(@options[:css], '.')
103
+ @content.add_stylesheet(File.basename(@options[:css]))
104
+ end
105
+ # Copy images
106
+ @parser.cache.assets[:images].each do |image|
107
+ log.debug "-- Copying image #{image}"
108
+ FileUtils.cp(File.join(@parser.cache.path, image), '.')
109
+ @content.add_image(image)
110
+ end
111
+ end
112
+
79
113
  def postprocess_file(asset)
80
114
  source = IO.read(asset)
81
115
  # Do rx substitutions
@@ -104,11 +138,11 @@ module Repub
104
138
  end
105
139
 
106
140
  def postprocess_doc(asset)
107
- doc = Nokogiri::HTML.parse(open(asset), nil, 'UTF-8')
141
+ doc = Nokogiri::HTML.parse(IO.read(asset), nil, 'UTF-8')
108
142
  # Substitute custom CSS
109
143
  if (@options[:css] && !@options[:css].empty?)
110
- doc.xpath('//link[@rel="stylesheet"]') do |link|
111
- link[:href] = File.basename(@options[:css])
144
+ doc.xpath('//link[@rel="stylesheet"]').each do |link|
145
+ link['href'] = File.basename(@options[:css])
112
146
  log.debug "-- Replacing CSS refs with #{link[:href]}"
113
147
  end
114
148
  end
@@ -116,8 +150,6 @@ module Repub
116
150
  if @options[:remove] && !@options[:remove].empty?
117
151
  @options[:remove].each do |selector|
118
152
  log.info "Removing elements matching selector \"#{selector}\""
119
- #p doc.search(selector).size
120
- #p doc.search(selector)
121
153
  doc.search(selector).remove
122
154
  end
123
155
  end
@@ -134,40 +166,6 @@ module Repub
134
166
  end
135
167
  end
136
168
 
137
- def copy_and_process_assets
138
- # Copy html
139
- @parser.cache.assets[:documents].each do |asset|
140
- log.debug "-- Processing document #{asset}"
141
- # Copy asset from cache
142
- FileUtils.cp(File.join(@parser.cache.path, asset), '.')
143
- # Do post-processing
144
- postprocess_file(asset)
145
- postprocess_doc(asset)
146
- @content.add_document(asset)
147
- @asset_path = File.expand_path(asset)
148
- end
149
- # Copy css
150
- if @options[:css].nil? || @options[:css].empty?
151
- # No custom css, copy one from assets
152
- @parser.cache.assets[:stylesheets].each do |css|
153
- log.debug "-- Copying stylesheet #{css}"
154
- FileUtils.cp(File.join(@parser.cache.path, css), '.')
155
- @content.add_stylesheet(css)
156
- end
157
- else
158
- # Copy custom css
159
- log.debug "-- Using custom stylesheet #{@options[:css]}"
160
- FileUtils.cp(@options[:css], '.')
161
- @content.add_stylesheet(File.basename(@options[:css]))
162
- end
163
- # Copy images
164
- @parser.cache.assets[:images].each do |image|
165
- log.debug "-- Copying image #{image}"
166
- FileUtils.cp(File.join(@parser.cache.path, image), '.')
167
- @content.add_image(image)
168
- end
169
- end
170
-
171
169
  def write_meta_inf
172
170
  FileUtils.mkdir_p(MetaInf)
173
171
  FileUtils.chdir(MetaInf) do
@@ -54,17 +54,41 @@ module Repub
54
54
  rescue
55
55
  raise FetcherException, "invalid URL: #{url}"
56
56
  end
57
- cmd = "#{@downloader_path} #{@downloader_options} #{url}"
58
57
  Cache.for_url(url) do |cache|
59
58
  log.debug "-- Downloading into #{cache.path}"
59
+ cmd = "#{@downloader_path} #{@downloader_options} #{url}"
60
60
  unless system(cmd) && !cache.empty?
61
61
  raise FetcherException, "Fetch failed."
62
62
  end
63
+ unless cache.cached?
64
+ fix_filenames(cache)
65
+ fix_encoding(cache, @options[:encoding])
66
+ end
63
67
  end
64
68
  end
65
69
 
66
70
  private
67
71
 
72
+ def fix_filenames(cache)
73
+ # TODO: fix non-alphanum characters in doc filenames
74
+ end
75
+
76
+ def fix_encoding(cache, encoding = nil)
77
+ cache.assets[:documents].each do |doc|
78
+ unless encoding
79
+ log.info "Detecting encoding for #{doc}"
80
+ s = IO.read(doc)
81
+ raise FetcherException, "empty document" unless s
82
+ encoding = UniversalDetector.chardet(s)['encoding']
83
+ end
84
+ if encoding.downcase != 'utf-8'
85
+ log.info "Source encoding is #{encoding}, converting to UTF-8"
86
+ s = Iconv.conv('utf-8', encoding, IO.read(doc))
87
+ File.open(doc, 'w') { |f| f.write(s) }
88
+ end
89
+ end
90
+ end
91
+
68
92
  def which(cmd)
69
93
  if !RUBY_PLATFORM.match('mswin')
70
94
  cmd = `/usr/bin/which #{cmd}`.strip
@@ -81,10 +105,6 @@ module Repub
81
105
  return File.join(App.data_path, 'cache')
82
106
  end
83
107
 
84
- def self.inventorize
85
- # TODO
86
- end
87
-
88
108
  def self.cleanup
89
109
  Dir.chdir(self.root) { FileUtils.rm_r(Dir.glob('*')) }
90
110
  rescue
@@ -94,16 +114,15 @@ module Repub
94
114
  attr_reader :url
95
115
  attr_reader :name
96
116
  attr_reader :path
97
- attr_reader :assets
98
-
117
+
99
118
  def self.for_url(url, &block)
100
119
  self.new(url).for_url(&block)
101
120
  end
102
121
 
103
122
  def for_url(&block)
104
123
  # Download stuff if not yet cached
105
- cached = File.exist?(@path)
106
- unless cached
124
+ @cached = File.exist?(@path)
125
+ unless @cached
107
126
  FileUtils.mkdir_p(@path)
108
127
  begin
109
128
  Dir.chdir(@path) { yield self }
@@ -115,40 +134,33 @@ module Repub
115
134
  log.info "Using cached assets"
116
135
  log.debug "-- Cache is #{@path}"
117
136
  end
118
- # Do post-download tasks
119
- Dir.chdir(@path) do
137
+ self
138
+ end
139
+
140
+ def assets
141
+ unless @assets
120
142
  # Enumerate assets
121
- @assets = {}
122
- AssetTypes.each_pair do |asset_type, file_types|
123
- @assets[asset_type] ||= []
124
- file_types.each do |file_type|
125
- @assets[asset_type] << Dir.glob("*.#{file_type}")
126
- end
127
- @assets[asset_type].flatten!
128
- end
129
- # For freshly downloaded docs, detect encoding and convert to utf-8
130
- unless cached
131
- @assets[:documents].each do |doc|
132
- log.info "Detecting encoding for #{doc}"
133
- s = IO.read(doc)
134
- raise FetcherException, "empty document" unless s
135
- encoding = UniversalDetector.chardet(s)['encoding']
136
- if encoding.downcase != 'utf-8'
137
- log.info "Looks like #{encoding}, converting to UTF-8"
138
- s = Iconv.conv('utf-8', encoding, IO.read(doc))
139
- File.open(doc, 'w') { |f| f.write(s) }
140
- else
141
- log.info "Looks like UTF-8, no conversion needed"
143
+ Dir.chdir(@path) do
144
+ @assets = {}
145
+ AssetTypes.each_pair do |asset_type, file_types|
146
+ @assets[asset_type] ||= []
147
+ file_types.each do |file_type|
148
+ @assets[asset_type] << Dir.glob("*.#{file_type}")
142
149
  end
150
+ @assets[asset_type].flatten!
143
151
  end
144
152
  end
145
153
  end
146
- self
154
+ @assets
147
155
  end
148
-
156
+
149
157
  def empty?
150
158
  Dir.glob(File.join(@path, '*')).empty?
151
159
  end
160
+
161
+ def cached?
162
+ @cached == true
163
+ end
152
164
 
153
165
  private
154
166
 
@@ -156,6 +168,7 @@ module Repub
156
168
  @url = url
157
169
  @name = Digest::SHA1.hexdigest(@url)
158
170
  @path = File.join(Cache.root, @name)
171
+ @assets = nil
159
172
  end
160
173
  end
161
174
 
@@ -43,7 +43,7 @@ module Repub
43
43
  @cache = cache
44
44
  @asset = @cache.assets[:documents][0]
45
45
  log.debug "-- Parsing #{@asset}"
46
- @doc = Nokogiri::HTML.parse(open(File.join(@cache.path, @asset)), nil, 'UTF-8')
46
+ @doc = Nokogiri::HTML.parse(IO.read(File.join(@cache.path, @asset)), nil, 'UTF-8')
47
47
 
48
48
  @uid = @cache.name
49
49
  parse_title
@@ -2,23 +2,27 @@
2
2
 
3
3
  Gem::Specification.new do |s|
4
4
  s.name = %q{repub}
5
- s.version = "0.3.1"
5
+ s.version = "0.3.2"
6
6
 
7
7
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
8
8
  s.authors = ["Dmitri Goutnik"]
9
- s.date = %q{2009-06-28}
9
+ s.date = %q{2009-06-30}
10
10
  s.default_executable = %q{repub}
11
- s.description = %q{Simple HTML to ePub converter.}
11
+ s.description = %q{Repub is a simple HTML to ePub converter.
12
+
13
+ It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
14
+ for title and table of contents. In return, it provides you with greater control over generated
15
+ ePub documents.}
12
16
  s.email = %q{dg@invisiblellama.net}
13
17
  s.executables = ["repub"]
14
- s.extra_rdoc_files = ["History.txt", "README.txt", "SAMPLES.txt", "bin/repub"]
15
- s.files = ["History.txt", "README.txt", "Rakefile", "SAMPLES.txt", "TODO", "bin/repub", "lib/repub.rb", "lib/repub/app.rb", "lib/repub/app/builder.rb", "lib/repub/app/fetcher.rb", "lib/repub/app/logger.rb", "lib/repub/app/options.rb", "lib/repub/app/parser.rb", "lib/repub/app/profile.rb", "lib/repub/app/utility.rb", "lib/repub/epub.rb", "lib/repub/epub/container.rb", "lib/repub/epub/content.rb", "lib/repub/epub/toc.rb", "repub.gemspec", "test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
16
- s.homepage = %q{http://github.com/invisiblellama/repub/tree/master}
17
- s.rdoc_options = ["--main", "README.txt"]
18
+ s.extra_rdoc_files = ["History.txt", "README.rdoc", "bin/repub"]
19
+ s.files = ["History.txt", "README.rdoc", "Rakefile", "TODO", "bin/repub", "lib/repub.rb", "lib/repub/app.rb", "lib/repub/app/builder.rb", "lib/repub/app/fetcher.rb", "lib/repub/app/logger.rb", "lib/repub/app/options.rb", "lib/repub/app/parser.rb", "lib/repub/app/profile.rb", "lib/repub/app/utility.rb", "lib/repub/epub.rb", "lib/repub/epub/container.rb", "lib/repub/epub/content.rb", "lib/repub/epub/toc.rb", "repub.gemspec", "test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
20
+ s.homepage = %q{http://rubyforge.org/projects/repub/}
21
+ s.rdoc_options = ["--main", "README.rdoc"]
18
22
  s.require_paths = ["lib"]
19
23
  s.rubyforge_project = %q{repub}
20
24
  s.rubygems_version = %q{1.3.4}
21
- s.summary = %q{Simple HTML to ePub converter}
25
+ s.summary = %q{Repub is a simple HTML to ePub converter}
22
26
  s.test_files = ["test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
23
27
 
24
28
  if s.respond_to? :specification_version then
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: repub
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitri Goutnik
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-06-28 00:00:00 +04:00
12
+ date: 2009-06-30 00:00:00 +04:00
13
13
  default_executable:
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -62,7 +62,12 @@ dependencies:
62
62
  - !ruby/object:Gem::Version
63
63
  version: 2.5.1
64
64
  version:
65
- description: Simple HTML to ePub converter.
65
+ description: |-
66
+ Repub is a simple HTML to ePub converter.
67
+
68
+ It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
69
+ for title and table of contents. In return, it provides you with greater control over generated
70
+ ePub documents.
66
71
  email: dg@invisiblellama.net
67
72
  executables:
68
73
  - repub
@@ -70,14 +75,12 @@ extensions: []
70
75
 
71
76
  extra_rdoc_files:
72
77
  - History.txt
73
- - README.txt
74
- - SAMPLES.txt
78
+ - README.rdoc
75
79
  - bin/repub
76
80
  files:
77
81
  - History.txt
78
- - README.txt
82
+ - README.rdoc
79
83
  - Rakefile
80
- - SAMPLES.txt
81
84
  - TODO
82
85
  - bin/repub
83
86
  - lib/repub.rb
@@ -115,13 +118,13 @@ files:
115
118
  - test/test_logger.rb
116
119
  - test/test_parser.rb
117
120
  has_rdoc: true
118
- homepage: http://github.com/invisiblellama/repub/tree/master
121
+ homepage: http://rubyforge.org/projects/repub/
119
122
  licenses: []
120
123
 
121
124
  post_install_message:
122
125
  rdoc_options:
123
126
  - --main
124
- - README.txt
127
+ - README.rdoc
125
128
  require_paths:
126
129
  - lib
127
130
  required_ruby_version: !ruby/object:Gem::Requirement
@@ -142,7 +145,7 @@ rubyforge_project: repub
142
145
  rubygems_version: 1.3.4
143
146
  signing_key:
144
147
  specification_version: 3
145
- summary: Simple HTML to ePub converter
148
+ summary: Repub is a simple HTML to ePub converter
146
149
  test_files:
147
150
  - test/epub/test_container.rb
148
151
  - test/epub/test_content.rb
data/README.txt DELETED
@@ -1,106 +0,0 @@
1
- == DESCRIPTION:
2
-
3
- Simple HTML to ePub converter.
4
-
5
- == FEATURES/PROBLEMS:
6
-
7
- Few samples to get started:
8
-
9
- * Git User's Manual
10
-
11
- repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' \
12
- http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
13
-
14
- * Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES
15
-
16
- repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' \
17
- -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
18
- http://www.gutenberg.org/dirs/etext99/advsh12h.htm
19
-
20
- * Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
21
-
22
- repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' \
23
- -X '//pre' -X '//hr' -X '//body/h4' \
24
- http://www.gutenberg.org/files/11/11-h/11-h.htm
25
-
26
- * The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
27
-
28
- repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
29
-
30
- == SYNOPSIS:
31
-
32
- Usage: repub [options] url
33
-
34
- General options:
35
- -D, --downloader NAME Which downloader to use to get files (wget or httrack).
36
- Default is wget.
37
- -o, --output PATH Output path for generated ePub file.
38
- Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
39
- -w, --write-profile NAME Save given options for later reuse as profile NAME.
40
- -l, --load-profile NAME Load options from saved profile NAME.
41
- -W, --write-default Save given options for later reuse as default profile.
42
- -L, --list-profiles List saved profiles.
43
- -C, --cleanup Clean up download cache.
44
- -v, --verbose Turn on verbose output.
45
- -q, --quiet Turn off any output except errors.
46
- -V, --version Show version.
47
- -h, --help Show this help message.
48
-
49
- Parser options:
50
- -x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE.
51
- Recognized selectors are: [title toc toc_item toc_section]
52
- -m, --meta NAME:VALUE Set publication information metadata NAME to VALUE.
53
- Valid metadata names are: [creator date description
54
- language publisher relation rights subject title]
55
- -F, --no-fixup Do not attempt to make document meet XHTML 1.0 Strict.
56
- Default is to try and fix things that are broken.
57
- -e, --encoding NAME Set source document encoding. Default is to autodetect.
58
-
59
- Post-processing options:
60
- -s, --stylesheet PATH Use custom stylesheet at PATH to add or override existing
61
- CSS references in the source document.
62
- -X, --remove SELECTOR Remove source element using XPath selector.
63
- Use -X- to ignore stored profile.
64
- -R, --rx /PATTERN/REPLACEMENT/ Edit source HTML using regular expressions.
65
- Use -R- to ignore stored profile.
66
- -B, --browse After processing, open resulting HTML in default browser.
67
-
68
- == DEPENDENCIES:
69
-
70
- * Builder (https://rubyforge.org/projects/builder/)
71
- * Nokogiri (http://nokogiri.rubyforge.org/nokogiri/)
72
- * rchardet (https://rubyforge.org/projects/rchardet/)
73
- * launchy (http://copiousfreetime.rubyforge.org/launchy/)
74
-
75
- * wget or httrack
76
- * zip (Info-ZIP)
77
-
78
- == INSTALL:
79
-
80
- gem install repub
81
-
82
- == LICENSE:
83
-
84
- (The MIT License)
85
-
86
- Copyright (c) 2009 Invisible Llama <dg@invisiblellama.net>
87
-
88
- Permission is hereby granted, free of charge, to any person obtaining a copy
89
- of this software and associated documentation files (the "Software"), to deal
90
- in the Software without restriction, including without limitation the rights
91
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
92
- copies of the Software, and to permit persons to whom the Software is
93
- furnished to do so, subject to the following conditions:
94
-
95
- The above copyright notice and this permission notice shall be included in
96
- all copies or substantial portions of the Software.
97
-
98
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
99
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
100
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
101
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
102
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
103
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
104
- THE SOFTWARE.
105
-
106
- ==
@@ -1,23 +0,0 @@
1
- * THE ADVENTURES OF SHERLOCK HOLMES
2
-
3
- repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' http://www.gutenberg.org/dirs/etext99/advsh12h.htm
4
-
5
- * ALICE'S ADVENTURES IN WONDERLAND
6
-
7
- repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' http://www.gutenberg.org/files/11/11-h/11-h.htm
8
-
9
- * The Gelug-Kagyu Tradition of Mahamudra
10
-
11
- repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
12
-
13
- * Брюс Стерлинг. Схизматрица
14
-
15
- repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/STERLINGB/shizmatrica.txt_with-big-pictures.html
16
-
17
- * Айзек Азимов. Космические течения
18
-
19
- repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/FOUNDATION/currspac.txt_with-big-pictures.html
20
-
21
- * Git User's Manual
22
-
23
- repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html