repub 0.3.1 → 0.3.2
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +5 -0
- data/README.rdoc +173 -0
- data/Rakefile +2 -1
- data/TODO +3 -3
- data/lib/repub.rb +1 -1
- data/lib/repub/app/builder.rb +37 -39
- data/lib/repub/app/fetcher.rb +47 -34
- data/lib/repub/app/parser.rb +1 -1
- data/repub.gemspec +12 -8
- metadata +13 -10
- data/README.txt +0 -106
- data/SAMPLES.txt +0 -23
data/History.txt
CHANGED
data/README.rdoc
ADDED
@@ -0,0 +1,173 @@
|
|
1
|
+
== Repub
|
2
|
+
|
3
|
+
by Invisible Llama (dg at invisiblellama dot net)
|
4
|
+
|
5
|
+
{RubyForge Project}[http://rubyforge.org/projects/repub/] | {Github}[http://github.com/invisiblellama/repub/tree/master]
|
6
|
+
|
7
|
+
== DESCRIPTION:
|
8
|
+
|
9
|
+
Repub is a simple HTML to ePub converter.
|
10
|
+
|
11
|
+
It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
|
12
|
+
for title and table of contents. In return, it provides you with greater control over generated
|
13
|
+
ePub documents.
|
14
|
+
|
15
|
+
== FEATURES:
|
16
|
+
|
17
|
+
Repub accepts the following parameters:
|
18
|
+
|
19
|
+
* Source document URL
|
20
|
+
* List of XPath expressions for locating source document title, table of contents, TOC items and TOC sub-sections
|
21
|
+
* List of XPath expressions for describing elements that will be removed from the converted document
|
22
|
+
* List of regular expressions for editing the source document
|
23
|
+
* Publication information metadata tags
|
24
|
+
|
25
|
+
All parameters except document URL are optional; the resulting ePub will (probably, if original HTML isn't
|
26
|
+
broken too bad) be readable but will be lacking any metadata or TOC.
|
27
|
+
|
28
|
+
Few examples:
|
29
|
+
|
30
|
+
* Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES (with proper table of contents)
|
31
|
+
|
32
|
+
repub -x 'title:div[@class='book']//h1' \
|
33
|
+
-x 'toc://table' \
|
34
|
+
-x 'toc_item://tr' \
|
35
|
+
http://www.gutenberg.org/dirs/etext99/advsh12h.htm
|
36
|
+
|
37
|
+
This tells Repub to look for title in the first found H1 in the DIV of class "book"; that table of contents is
|
38
|
+
located in the first TABLE and TOC item can be found inside TR.
|
39
|
+
The above will produce readable ePub which can be further enhanced by removing some "noise" content:
|
40
|
+
|
41
|
+
repub -x 'title:div[@class='book']//h1' \
|
42
|
+
-x 'toc://table' \
|
43
|
+
-x 'toc_item://tr' \
|
44
|
+
-X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
|
45
|
+
http://www.gutenberg.org/dirs/etext99/advsh12h.htm
|
46
|
+
|
47
|
+
In addition to parsing, the above command also removes from the final version of document all PREs, HRs and
|
48
|
+
first H1 and H2 elements from the body.
|
49
|
+
|
50
|
+
A bit more complicated example:
|
51
|
+
|
52
|
+
* Git User's Manual
|
53
|
+
|
54
|
+
repub -x 'title://h1' \
|
55
|
+
-x 'toc://div[@class="toc"]/dl' \
|
56
|
+
-x 'toc_item:dt' \
|
57
|
+
-x 'toc_section:following-sibling::*[1]/dl' \
|
58
|
+
-w git-manual \
|
59
|
+
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
|
60
|
+
|
61
|
+
This tells Repub to look for title in the first found H1, for TOC in the DL element of the DIV with class "toc" and
|
62
|
+
that TOC items can be found inside DT elements. Additionally, TOC item can have a child TOC section inside DL when
|
63
|
+
DL element immediately follows DT.
|
64
|
+
|
65
|
+
The above command also saves all XPath expressions as "git-manual" profile, which can be later reused to save keystrokes.
|
66
|
+
For example, if you later decide to regenerate Git Manual ePub without TOC at the beginning of document, you can do
|
67
|
+
|
68
|
+
repub -l git-manual -X '//div[@class="toc"]' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
|
69
|
+
|
70
|
+
A few more examples:
|
71
|
+
|
72
|
+
* GNU Wget Manual
|
73
|
+
|
74
|
+
repub -m 'creator:gnu.org' \
|
75
|
+
-x 'title://h1' -x 'toc://div[@class="contents"]/ul' -x 'toc_item:li' -x 'toc_section:ul' \
|
76
|
+
-X '//div[@class="contents"]' \
|
77
|
+
http://www.gnu.org/software/wget/manual/wget.html
|
78
|
+
|
79
|
+
* Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
|
80
|
+
|
81
|
+
repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' \
|
82
|
+
http://www.gutenberg.org/files/11/11-h/11-h.htm
|
83
|
+
|
84
|
+
* The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
|
85
|
+
|
86
|
+
repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
|
87
|
+
|
88
|
+
== SYNOPSIS:
|
89
|
+
|
90
|
+
Usage: repub [options] url
|
91
|
+
|
92
|
+
General options:
|
93
|
+
-D, --downloader NAME Which downloader to use to get files (wget or httrack).
|
94
|
+
Default is wget.
|
95
|
+
-o, --output PATH Output path for generated ePub file.
|
96
|
+
Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
|
97
|
+
-w, --write-profile NAME Save given options for later reuse as profile NAME.
|
98
|
+
-l, --load-profile NAME Load options from saved profile NAME.
|
99
|
+
-W, --write-default Save given options for later reuse as default profile.
|
100
|
+
-L, --list-profiles List saved profiles.
|
101
|
+
-C, --cleanup Clean up download cache.
|
102
|
+
-v, --verbose Turn on verbose output.
|
103
|
+
-q, --quiet Turn off any output except errors.
|
104
|
+
-V, --version Show version.
|
105
|
+
-h, --help Show this help message.
|
106
|
+
|
107
|
+
Parser options:
|
108
|
+
-x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE.
|
109
|
+
Recognized selectors are: [title toc toc_item toc_section]
|
110
|
+
-m, --meta NAME:VALUE Set publication information metadata NAME to VALUE.
|
111
|
+
Valid metadata names are: [creator date description
|
112
|
+
language publisher relation rights subject title]
|
113
|
+
-F, --no-fixup Do not attempt to make document meet XHTML 1.0 Strict.
|
114
|
+
Default is to try and fix things that are broken.
|
115
|
+
-e, --encoding NAME Set source document encoding. Default is to auto detect.
|
116
|
+
|
117
|
+
Post-processing options:
|
118
|
+
-s, --stylesheet PATH Use custom stylesheet at PATH to add or override existing
|
119
|
+
CSS references in the source document.
|
120
|
+
-X, --remove SELECTOR Remove source element using XPath selector.
|
121
|
+
Use -X- to ignore stored profile.
|
122
|
+
-R, --rx /PATTERN/REPLACEMENT/ Edit source HTML using regular expressions.
|
123
|
+
Use -R- to ignore stored profile.
|
124
|
+
-B, --browse After processing, open resulting HTML in default browser.
|
125
|
+
|
126
|
+
== DEPENDENCIES:
|
127
|
+
|
128
|
+
* {Builder}[http://rubyforge.org/projects/builder/]
|
129
|
+
* {Nokogiri}[http://nokogiri.rubyforge.org/nokogiri/]
|
130
|
+
* {chardet}[http://rubyforge.org/projects/chardet/]
|
131
|
+
* {launchy}[http://copiousfreetime.rubyforge.org/launchy/]
|
132
|
+
|
133
|
+
Also, the following tools must be somewhere in $PATH:
|
134
|
+
|
135
|
+
* {wget}[http://www.gnu.org/software/wget/] or {httrack}[http://www.httrack.com/]
|
136
|
+
* {zip (Info-ZIP)}[http://www.info-zip.org/]
|
137
|
+
|
138
|
+
== LIMITATIONS/BUGS:
|
139
|
+
|
140
|
+
Currently, only "everything-on-one-page" HTML sources are supported. Repub will download and process all page requisites
|
141
|
+
(stylesheets and images) but all actual content must be on one page.
|
142
|
+
|
143
|
+
Bugs: probably. If you find any, please report them to dg at invisiblellama dot net.
|
144
|
+
|
145
|
+
== INSTALL:
|
146
|
+
|
147
|
+
gem install repub
|
148
|
+
|
149
|
+
== LICENSE:
|
150
|
+
|
151
|
+
(The MIT License)
|
152
|
+
|
153
|
+
Copyright (c) 2009 Dmitri Goutnik, Invisible Llama
|
154
|
+
|
155
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
156
|
+
of this software and associated documentation files (the "Software"), to deal
|
157
|
+
in the Software without restriction, including without limitation the rights
|
158
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
159
|
+
copies of the Software, and to permit persons to whom the Software is
|
160
|
+
furnished to do so, subject to the following conditions:
|
161
|
+
|
162
|
+
The above copyright notice and this permission notice shall be included in
|
163
|
+
all copies or substantial portions of the Software.
|
164
|
+
|
165
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
166
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
167
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
168
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
169
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
170
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
171
|
+
THE SOFTWARE.
|
172
|
+
|
173
|
+
==
|
data/Rakefile
CHANGED
@@ -17,9 +17,10 @@ task :default => 'test:run'
|
|
17
17
|
PROJ.name = 'repub'
|
18
18
|
PROJ.authors = 'Dmitri Goutnik'
|
19
19
|
PROJ.email = 'dg@invisiblellama.net'
|
20
|
-
PROJ.url = 'http://
|
20
|
+
PROJ.url = 'http://rubyforge.org/projects/repub/'
|
21
21
|
PROJ.version = Repub::VERSION
|
22
22
|
PROJ.rubyforge.name = 'repub'
|
23
|
+
PROJ.readme_file = 'README.rdoc'
|
23
24
|
PROJ.exclude = %w[tmp/ \.git \.DS_Store .*\.tmproj .*\.epub ^pkg/]
|
24
25
|
|
25
26
|
PROJ.spec.opts << '--color'
|
data/TODO
CHANGED
@@ -1,3 +1,3 @@
|
|
1
|
-
|
2
|
-
|
3
|
-
more parser tokens: author(s) etc
|
1
|
+
* add support for rx cleaning/modifying source doc
|
2
|
+
* make -q/-v actually do something
|
3
|
+
more parser tokens: author(s) etc ?
|
data/lib/repub.rb
CHANGED
data/lib/repub/app/builder.rb
CHANGED
@@ -76,6 +76,40 @@ module Repub
|
|
76
76
|
|
77
77
|
MetaInf = 'META-INF'
|
78
78
|
|
79
|
+
def copy_and_process_assets
|
80
|
+
# Copy html
|
81
|
+
@parser.cache.assets[:documents].each do |asset|
|
82
|
+
log.debug "-- Processing document #{asset}"
|
83
|
+
# Copy asset from cache
|
84
|
+
FileUtils.cp(File.join(@parser.cache.path, asset), '.')
|
85
|
+
# Do post-processing
|
86
|
+
postprocess_file(asset)
|
87
|
+
postprocess_doc(asset)
|
88
|
+
@content.add_document(asset)
|
89
|
+
@asset_path = File.expand_path(asset)
|
90
|
+
end
|
91
|
+
# Copy css
|
92
|
+
if @options[:css].nil? || @options[:css].empty?
|
93
|
+
# No custom css, copy one from assets
|
94
|
+
@parser.cache.assets[:stylesheets].each do |css|
|
95
|
+
log.debug "-- Copying stylesheet #{css}"
|
96
|
+
FileUtils.cp(File.join(@parser.cache.path, css), '.')
|
97
|
+
@content.add_stylesheet(css)
|
98
|
+
end
|
99
|
+
else
|
100
|
+
# Copy custom css
|
101
|
+
log.debug "-- Using custom stylesheet #{@options[:css]}"
|
102
|
+
FileUtils.cp(@options[:css], '.')
|
103
|
+
@content.add_stylesheet(File.basename(@options[:css]))
|
104
|
+
end
|
105
|
+
# Copy images
|
106
|
+
@parser.cache.assets[:images].each do |image|
|
107
|
+
log.debug "-- Copying image #{image}"
|
108
|
+
FileUtils.cp(File.join(@parser.cache.path, image), '.')
|
109
|
+
@content.add_image(image)
|
110
|
+
end
|
111
|
+
end
|
112
|
+
|
79
113
|
def postprocess_file(asset)
|
80
114
|
source = IO.read(asset)
|
81
115
|
# Do rx substitutions
|
@@ -104,11 +138,11 @@ module Repub
|
|
104
138
|
end
|
105
139
|
|
106
140
|
def postprocess_doc(asset)
|
107
|
-
doc = Nokogiri::HTML.parse(
|
141
|
+
doc = Nokogiri::HTML.parse(IO.read(asset), nil, 'UTF-8')
|
108
142
|
# Substitute custom CSS
|
109
143
|
if (@options[:css] && !@options[:css].empty?)
|
110
|
-
doc.xpath('//link[@rel="stylesheet"]') do |link|
|
111
|
-
link[
|
144
|
+
doc.xpath('//link[@rel="stylesheet"]').each do |link|
|
145
|
+
link['href'] = File.basename(@options[:css])
|
112
146
|
log.debug "-- Replacing CSS refs with #{link[:href]}"
|
113
147
|
end
|
114
148
|
end
|
@@ -116,8 +150,6 @@ module Repub
|
|
116
150
|
if @options[:remove] && !@options[:remove].empty?
|
117
151
|
@options[:remove].each do |selector|
|
118
152
|
log.info "Removing elements matching selector \"#{selector}\""
|
119
|
-
#p doc.search(selector).size
|
120
|
-
#p doc.search(selector)
|
121
153
|
doc.search(selector).remove
|
122
154
|
end
|
123
155
|
end
|
@@ -134,40 +166,6 @@ module Repub
|
|
134
166
|
end
|
135
167
|
end
|
136
168
|
|
137
|
-
def copy_and_process_assets
|
138
|
-
# Copy html
|
139
|
-
@parser.cache.assets[:documents].each do |asset|
|
140
|
-
log.debug "-- Processing document #{asset}"
|
141
|
-
# Copy asset from cache
|
142
|
-
FileUtils.cp(File.join(@parser.cache.path, asset), '.')
|
143
|
-
# Do post-processing
|
144
|
-
postprocess_file(asset)
|
145
|
-
postprocess_doc(asset)
|
146
|
-
@content.add_document(asset)
|
147
|
-
@asset_path = File.expand_path(asset)
|
148
|
-
end
|
149
|
-
# Copy css
|
150
|
-
if @options[:css].nil? || @options[:css].empty?
|
151
|
-
# No custom css, copy one from assets
|
152
|
-
@parser.cache.assets[:stylesheets].each do |css|
|
153
|
-
log.debug "-- Copying stylesheet #{css}"
|
154
|
-
FileUtils.cp(File.join(@parser.cache.path, css), '.')
|
155
|
-
@content.add_stylesheet(css)
|
156
|
-
end
|
157
|
-
else
|
158
|
-
# Copy custom css
|
159
|
-
log.debug "-- Using custom stylesheet #{@options[:css]}"
|
160
|
-
FileUtils.cp(@options[:css], '.')
|
161
|
-
@content.add_stylesheet(File.basename(@options[:css]))
|
162
|
-
end
|
163
|
-
# Copy images
|
164
|
-
@parser.cache.assets[:images].each do |image|
|
165
|
-
log.debug "-- Copying image #{image}"
|
166
|
-
FileUtils.cp(File.join(@parser.cache.path, image), '.')
|
167
|
-
@content.add_image(image)
|
168
|
-
end
|
169
|
-
end
|
170
|
-
|
171
169
|
def write_meta_inf
|
172
170
|
FileUtils.mkdir_p(MetaInf)
|
173
171
|
FileUtils.chdir(MetaInf) do
|
data/lib/repub/app/fetcher.rb
CHANGED
@@ -54,17 +54,41 @@ module Repub
|
|
54
54
|
rescue
|
55
55
|
raise FetcherException, "invalid URL: #{url}"
|
56
56
|
end
|
57
|
-
cmd = "#{@downloader_path} #{@downloader_options} #{url}"
|
58
57
|
Cache.for_url(url) do |cache|
|
59
58
|
log.debug "-- Downloading into #{cache.path}"
|
59
|
+
cmd = "#{@downloader_path} #{@downloader_options} #{url}"
|
60
60
|
unless system(cmd) && !cache.empty?
|
61
61
|
raise FetcherException, "Fetch failed."
|
62
62
|
end
|
63
|
+
unless cache.cached?
|
64
|
+
fix_filenames(cache)
|
65
|
+
fix_encoding(cache, @options[:encoding])
|
66
|
+
end
|
63
67
|
end
|
64
68
|
end
|
65
69
|
|
66
70
|
private
|
67
71
|
|
72
|
+
def fix_filenames(cache)
|
73
|
+
# TODO: fix non-alphanum characters in doc filenames
|
74
|
+
end
|
75
|
+
|
76
|
+
def fix_encoding(cache, encoding = nil)
|
77
|
+
cache.assets[:documents].each do |doc|
|
78
|
+
unless encoding
|
79
|
+
log.info "Detecting encoding for #{doc}"
|
80
|
+
s = IO.read(doc)
|
81
|
+
raise FetcherException, "empty document" unless s
|
82
|
+
encoding = UniversalDetector.chardet(s)['encoding']
|
83
|
+
end
|
84
|
+
if encoding.downcase != 'utf-8'
|
85
|
+
log.info "Source encoding is #{encoding}, converting to UTF-8"
|
86
|
+
s = Iconv.conv('utf-8', encoding, IO.read(doc))
|
87
|
+
File.open(doc, 'w') { |f| f.write(s) }
|
88
|
+
end
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
68
92
|
def which(cmd)
|
69
93
|
if !RUBY_PLATFORM.match('mswin')
|
70
94
|
cmd = `/usr/bin/which #{cmd}`.strip
|
@@ -81,10 +105,6 @@ module Repub
|
|
81
105
|
return File.join(App.data_path, 'cache')
|
82
106
|
end
|
83
107
|
|
84
|
-
def self.inventorize
|
85
|
-
# TODO
|
86
|
-
end
|
87
|
-
|
88
108
|
def self.cleanup
|
89
109
|
Dir.chdir(self.root) { FileUtils.rm_r(Dir.glob('*')) }
|
90
110
|
rescue
|
@@ -94,16 +114,15 @@ module Repub
|
|
94
114
|
attr_reader :url
|
95
115
|
attr_reader :name
|
96
116
|
attr_reader :path
|
97
|
-
|
98
|
-
|
117
|
+
|
99
118
|
def self.for_url(url, &block)
|
100
119
|
self.new(url).for_url(&block)
|
101
120
|
end
|
102
121
|
|
103
122
|
def for_url(&block)
|
104
123
|
# Download stuff if not yet cached
|
105
|
-
cached = File.exist?(@path)
|
106
|
-
unless cached
|
124
|
+
@cached = File.exist?(@path)
|
125
|
+
unless @cached
|
107
126
|
FileUtils.mkdir_p(@path)
|
108
127
|
begin
|
109
128
|
Dir.chdir(@path) { yield self }
|
@@ -115,40 +134,33 @@ module Repub
|
|
115
134
|
log.info "Using cached assets"
|
116
135
|
log.debug "-- Cache is #{@path}"
|
117
136
|
end
|
118
|
-
|
119
|
-
|
137
|
+
self
|
138
|
+
end
|
139
|
+
|
140
|
+
def assets
|
141
|
+
unless @assets
|
120
142
|
# Enumerate assets
|
121
|
-
@
|
122
|
-
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
@assets[asset_type].flatten!
|
128
|
-
end
|
129
|
-
# For freshly downloaded docs, detect encoding and convert to utf-8
|
130
|
-
unless cached
|
131
|
-
@assets[:documents].each do |doc|
|
132
|
-
log.info "Detecting encoding for #{doc}"
|
133
|
-
s = IO.read(doc)
|
134
|
-
raise FetcherException, "empty document" unless s
|
135
|
-
encoding = UniversalDetector.chardet(s)['encoding']
|
136
|
-
if encoding.downcase != 'utf-8'
|
137
|
-
log.info "Looks like #{encoding}, converting to UTF-8"
|
138
|
-
s = Iconv.conv('utf-8', encoding, IO.read(doc))
|
139
|
-
File.open(doc, 'w') { |f| f.write(s) }
|
140
|
-
else
|
141
|
-
log.info "Looks like UTF-8, no conversion needed"
|
143
|
+
Dir.chdir(@path) do
|
144
|
+
@assets = {}
|
145
|
+
AssetTypes.each_pair do |asset_type, file_types|
|
146
|
+
@assets[asset_type] ||= []
|
147
|
+
file_types.each do |file_type|
|
148
|
+
@assets[asset_type] << Dir.glob("*.#{file_type}")
|
142
149
|
end
|
150
|
+
@assets[asset_type].flatten!
|
143
151
|
end
|
144
152
|
end
|
145
153
|
end
|
146
|
-
|
154
|
+
@assets
|
147
155
|
end
|
148
|
-
|
156
|
+
|
149
157
|
def empty?
|
150
158
|
Dir.glob(File.join(@path, '*')).empty?
|
151
159
|
end
|
160
|
+
|
161
|
+
def cached?
|
162
|
+
@cached == true
|
163
|
+
end
|
152
164
|
|
153
165
|
private
|
154
166
|
|
@@ -156,6 +168,7 @@ module Repub
|
|
156
168
|
@url = url
|
157
169
|
@name = Digest::SHA1.hexdigest(@url)
|
158
170
|
@path = File.join(Cache.root, @name)
|
171
|
+
@assets = nil
|
159
172
|
end
|
160
173
|
end
|
161
174
|
|
data/lib/repub/app/parser.rb
CHANGED
@@ -43,7 +43,7 @@ module Repub
|
|
43
43
|
@cache = cache
|
44
44
|
@asset = @cache.assets[:documents][0]
|
45
45
|
log.debug "-- Parsing #{@asset}"
|
46
|
-
@doc = Nokogiri::HTML.parse(
|
46
|
+
@doc = Nokogiri::HTML.parse(IO.read(File.join(@cache.path, @asset)), nil, 'UTF-8')
|
47
47
|
|
48
48
|
@uid = @cache.name
|
49
49
|
parse_title
|
data/repub.gemspec
CHANGED
@@ -2,23 +2,27 @@
|
|
2
2
|
|
3
3
|
Gem::Specification.new do |s|
|
4
4
|
s.name = %q{repub}
|
5
|
-
s.version = "0.3.
|
5
|
+
s.version = "0.3.2"
|
6
6
|
|
7
7
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
8
8
|
s.authors = ["Dmitri Goutnik"]
|
9
|
-
s.date = %q{2009-06-
|
9
|
+
s.date = %q{2009-06-30}
|
10
10
|
s.default_executable = %q{repub}
|
11
|
-
s.description = %q{
|
11
|
+
s.description = %q{Repub is a simple HTML to ePub converter.
|
12
|
+
|
13
|
+
It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
|
14
|
+
for title and table of contents. In return, it provides you with greater control over generated
|
15
|
+
ePub documents.}
|
12
16
|
s.email = %q{dg@invisiblellama.net}
|
13
17
|
s.executables = ["repub"]
|
14
|
-
s.extra_rdoc_files = ["History.txt", "README.
|
15
|
-
s.files = ["History.txt", "README.
|
16
|
-
s.homepage = %q{http://
|
17
|
-
s.rdoc_options = ["--main", "README.
|
18
|
+
s.extra_rdoc_files = ["History.txt", "README.rdoc", "bin/repub"]
|
19
|
+
s.files = ["History.txt", "README.rdoc", "Rakefile", "TODO", "bin/repub", "lib/repub.rb", "lib/repub/app.rb", "lib/repub/app/builder.rb", "lib/repub/app/fetcher.rb", "lib/repub/app/logger.rb", "lib/repub/app/options.rb", "lib/repub/app/parser.rb", "lib/repub/app/profile.rb", "lib/repub/app/utility.rb", "lib/repub/epub.rb", "lib/repub/epub/container.rb", "lib/repub/epub/content.rb", "lib/repub/epub/toc.rb", "repub.gemspec", "test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
|
20
|
+
s.homepage = %q{http://rubyforge.org/projects/repub/}
|
21
|
+
s.rdoc_options = ["--main", "README.rdoc"]
|
18
22
|
s.require_paths = ["lib"]
|
19
23
|
s.rubyforge_project = %q{repub}
|
20
24
|
s.rubygems_version = %q{1.3.4}
|
21
|
-
s.summary = %q{
|
25
|
+
s.summary = %q{Repub is a simple HTML to ePub converter}
|
22
26
|
s.test_files = ["test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
|
23
27
|
|
24
28
|
if s.respond_to? :specification_version then
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: repub
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Dmitri Goutnik
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-06-
|
12
|
+
date: 2009-06-30 00:00:00 +04:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
@@ -62,7 +62,12 @@ dependencies:
|
|
62
62
|
- !ruby/object:Gem::Version
|
63
63
|
version: 2.5.1
|
64
64
|
version:
|
65
|
-
description:
|
65
|
+
description: |-
|
66
|
+
Repub is a simple HTML to ePub converter.
|
67
|
+
|
68
|
+
It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
|
69
|
+
for title and table of contents. In return, it provides you with greater control over generated
|
70
|
+
ePub documents.
|
66
71
|
email: dg@invisiblellama.net
|
67
72
|
executables:
|
68
73
|
- repub
|
@@ -70,14 +75,12 @@ extensions: []
|
|
70
75
|
|
71
76
|
extra_rdoc_files:
|
72
77
|
- History.txt
|
73
|
-
- README.
|
74
|
-
- SAMPLES.txt
|
78
|
+
- README.rdoc
|
75
79
|
- bin/repub
|
76
80
|
files:
|
77
81
|
- History.txt
|
78
|
-
- README.
|
82
|
+
- README.rdoc
|
79
83
|
- Rakefile
|
80
|
-
- SAMPLES.txt
|
81
84
|
- TODO
|
82
85
|
- bin/repub
|
83
86
|
- lib/repub.rb
|
@@ -115,13 +118,13 @@ files:
|
|
115
118
|
- test/test_logger.rb
|
116
119
|
- test/test_parser.rb
|
117
120
|
has_rdoc: true
|
118
|
-
homepage: http://
|
121
|
+
homepage: http://rubyforge.org/projects/repub/
|
119
122
|
licenses: []
|
120
123
|
|
121
124
|
post_install_message:
|
122
125
|
rdoc_options:
|
123
126
|
- --main
|
124
|
-
- README.
|
127
|
+
- README.rdoc
|
125
128
|
require_paths:
|
126
129
|
- lib
|
127
130
|
required_ruby_version: !ruby/object:Gem::Requirement
|
@@ -142,7 +145,7 @@ rubyforge_project: repub
|
|
142
145
|
rubygems_version: 1.3.4
|
143
146
|
signing_key:
|
144
147
|
specification_version: 3
|
145
|
-
summary:
|
148
|
+
summary: Repub is a simple HTML to ePub converter
|
146
149
|
test_files:
|
147
150
|
- test/epub/test_container.rb
|
148
151
|
- test/epub/test_content.rb
|
data/README.txt
DELETED
@@ -1,106 +0,0 @@
|
|
1
|
-
== DESCRIPTION:
|
2
|
-
|
3
|
-
Simple HTML to ePub converter.
|
4
|
-
|
5
|
-
== FEATURES/PROBLEMS:
|
6
|
-
|
7
|
-
Few samples to get started:
|
8
|
-
|
9
|
-
* Git User's Manual
|
10
|
-
|
11
|
-
repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' \
|
12
|
-
http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
|
13
|
-
|
14
|
-
* Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES
|
15
|
-
|
16
|
-
repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' \
|
17
|
-
-X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
|
18
|
-
http://www.gutenberg.org/dirs/etext99/advsh12h.htm
|
19
|
-
|
20
|
-
* Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
|
21
|
-
|
22
|
-
repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' \
|
23
|
-
-X '//pre' -X '//hr' -X '//body/h4' \
|
24
|
-
http://www.gutenberg.org/files/11/11-h/11-h.htm
|
25
|
-
|
26
|
-
* The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
|
27
|
-
|
28
|
-
repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
|
29
|
-
|
30
|
-
== SYNOPSIS:
|
31
|
-
|
32
|
-
Usage: repub [options] url
|
33
|
-
|
34
|
-
General options:
|
35
|
-
-D, --downloader NAME Which downloader to use to get files (wget or httrack).
|
36
|
-
Default is wget.
|
37
|
-
-o, --output PATH Output path for generated ePub file.
|
38
|
-
Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
|
39
|
-
-w, --write-profile NAME Save given options for later reuse as profile NAME.
|
40
|
-
-l, --load-profile NAME Load options from saved profile NAME.
|
41
|
-
-W, --write-default Save given options for later reuse as default profile.
|
42
|
-
-L, --list-profiles List saved profiles.
|
43
|
-
-C, --cleanup Clean up download cache.
|
44
|
-
-v, --verbose Turn on verbose output.
|
45
|
-
-q, --quiet Turn off any output except errors.
|
46
|
-
-V, --version Show version.
|
47
|
-
-h, --help Show this help message.
|
48
|
-
|
49
|
-
Parser options:
|
50
|
-
-x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE.
|
51
|
-
Recognized selectors are: [title toc toc_item toc_section]
|
52
|
-
-m, --meta NAME:VALUE Set publication information metadata NAME to VALUE.
|
53
|
-
Valid metadata names are: [creator date description
|
54
|
-
language publisher relation rights subject title]
|
55
|
-
-F, --no-fixup Do not attempt to make document meet XHTML 1.0 Strict.
|
56
|
-
Default is to try and fix things that are broken.
|
57
|
-
-e, --encoding NAME Set source document encoding. Default is to autodetect.
|
58
|
-
|
59
|
-
Post-processing options:
|
60
|
-
-s, --stylesheet PATH Use custom stylesheet at PATH to add or override existing
|
61
|
-
CSS references in the source document.
|
62
|
-
-X, --remove SELECTOR Remove source element using XPath selector.
|
63
|
-
Use -X- to ignore stored profile.
|
64
|
-
-R, --rx /PATTERN/REPLACEMENT/ Edit source HTML using regular expressions.
|
65
|
-
Use -R- to ignore stored profile.
|
66
|
-
-B, --browse After processing, open resulting HTML in default browser.
|
67
|
-
|
68
|
-
== DEPENDENCIES:
|
69
|
-
|
70
|
-
* Builder (https://rubyforge.org/projects/builder/)
|
71
|
-
* Nokogiri (http://nokogiri.rubyforge.org/nokogiri/)
|
72
|
-
* rchardet (https://rubyforge.org/projects/rchardet/)
|
73
|
-
* launchy (http://copiousfreetime.rubyforge.org/launchy/)
|
74
|
-
|
75
|
-
* wget or httrack
|
76
|
-
* zip (Info-ZIP)
|
77
|
-
|
78
|
-
== INSTALL:
|
79
|
-
|
80
|
-
gem install repub
|
81
|
-
|
82
|
-
== LICENSE:
|
83
|
-
|
84
|
-
(The MIT License)
|
85
|
-
|
86
|
-
Copyright (c) 2009 Invisible Llama <dg@invisiblellama.net>
|
87
|
-
|
88
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
89
|
-
of this software and associated documentation files (the "Software"), to deal
|
90
|
-
in the Software without restriction, including without limitation the rights
|
91
|
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
92
|
-
copies of the Software, and to permit persons to whom the Software is
|
93
|
-
furnished to do so, subject to the following conditions:
|
94
|
-
|
95
|
-
The above copyright notice and this permission notice shall be included in
|
96
|
-
all copies or substantial portions of the Software.
|
97
|
-
|
98
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
99
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
100
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
101
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
102
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
103
|
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
104
|
-
THE SOFTWARE.
|
105
|
-
|
106
|
-
==
|
data/SAMPLES.txt
DELETED
@@ -1,23 +0,0 @@
|
|
1
|
-
* THE ADVENTURES OF SHERLOCK HOLMES
|
2
|
-
|
3
|
-
repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' http://www.gutenberg.org/dirs/etext99/advsh12h.htm
|
4
|
-
|
5
|
-
* ALICE'S ADVENTURES IN WONDERLAND
|
6
|
-
|
7
|
-
repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' http://www.gutenberg.org/files/11/11-h/11-h.htm
|
8
|
-
|
9
|
-
* The Gelug-Kagyu Tradition of Mahamudra
|
10
|
-
|
11
|
-
repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
|
12
|
-
|
13
|
-
* Брюс Стерлинг. Схизматрица
|
14
|
-
|
15
|
-
repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/STERLINGB/shizmatrica.txt_with-big-pictures.html
|
16
|
-
|
17
|
-
* Айзек Азимов. Космические течения
|
18
|
-
|
19
|
-
repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/FOUNDATION/currspac.txt_with-big-pictures.html
|
20
|
-
|
21
|
-
* Git User's Manual
|
22
|
-
|
23
|
-
repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
|