invisiblellama-repub 0.3.1 → 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
data/History.txt CHANGED
@@ -11,3 +11,8 @@
11
11
  == 0.3.1 / 2009-06-28
12
12
 
13
13
  * Fixed App.data_path bug
14
+
15
+ == 0.3.2 / 2009-06-30
16
+
17
+ * Improved Win32 support
18
+ * Updated documentation
data/README.rdoc ADDED
@@ -0,0 +1,173 @@
1
+ == Repub
2
+
3
+ by Invisible Llama (dg at invisiblellama dot net)
4
+
5
+ {RubyForge Project}[http://rubyforge.org/projects/repub/] | {Github}[http://github.com/invisiblellama/repub/tree/master]
6
+
7
+ == DESCRIPTION:
8
+
9
+ Repub is a simple HTML to ePub converter.
10
+
11
+ It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
12
+ for title and table of contents. In return, it provides you with greater control over generated
13
+ ePub documents.
14
+
15
+ == FEATURES:
16
+
17
+ Repub accepts the following parameters:
18
+
19
+ * Source document URL
20
+ * List of XPath expressions for locating source document title, table of contents, TOC items and TOC sub-sections
21
+ * List of XPath expressions for describing elements that will be removed from the converted document
22
+ * List of regular expressions for editing the source document
23
+ * Publication information metadata tags
24
+
25
+ All parameters except document URL are optional; the resulting ePub will (probably, if original HTML isn't
26
+ broken too bad) be readable but will be lacking any metadata or TOC.
27
+
28
+ Few examples:
29
+
30
+ * Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES (with proper table of contents)
31
+
32
+ repub -x 'title:div[@class='book']//h1' \
33
+ -x 'toc://table' \
34
+ -x 'toc_item://tr' \
35
+ http://www.gutenberg.org/dirs/etext99/advsh12h.htm
36
+
37
+ This tells Repub to look for title in the first found H1 in the DIV of class "book"; that table of contents is
38
+ located in the first TABLE and TOC item can be found inside TR.
39
+ The above will produce readable ePub which can be further enhanced by removing some "noise" content:
40
+
41
+ repub -x 'title:div[@class='book']//h1' \
42
+ -x 'toc://table' \
43
+ -x 'toc_item://tr' \
44
+ -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
45
+ http://www.gutenberg.org/dirs/etext99/advsh12h.htm
46
+
47
+ In addition to parsing, the above command also removes from the final version of document all PREs, HRs and
48
+ first H1 and H2 elements from the body.
49
+
50
+ A bit more complicated example:
51
+
52
+ * Git User's Manual
53
+
54
+ repub -x 'title://h1' \
55
+ -x 'toc://div[@class="toc"]/dl' \
56
+ -x 'toc_item:dt' \
57
+ -x 'toc_section:following-sibling::*[1]/dl' \
58
+ -w git-manual \
59
+ http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
60
+
61
+ This tells Repub to look for title in the first found H1, for TOC in the DL element of the DIV with class "toc" and
62
+ that TOC items can be found inside DT elements. Additionally, TOC item can have a child TOC section inside DL when
63
+ DL element immediately follows DT.
64
+
65
+ The above command also saves all XPath expressions as "git-manual" profile, which can be later reused to save keystrokes.
66
+ For example, if you later decide to regenerate Git Manual ePub without TOC at the beginning of document, you can do
67
+
68
+ repub -l git-manual -X '//div[@class="toc"]' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
69
+
70
+ A few more examples:
71
+
72
+ * GNU Wget Manual
73
+
74
+ repub -m 'creator:gnu.org' \
75
+ -x 'title://h1' -x 'toc://div[@class="contents"]/ul' -x 'toc_item:li' -x 'toc_section:ul' \
76
+ -X '//div[@class="contents"]' \
77
+ http://www.gnu.org/software/wget/manual/wget.html
78
+
79
+ * Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
80
+
81
+ repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' \
82
+ http://www.gutenberg.org/files/11/11-h/11-h.htm
83
+
84
+ * The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
85
+
86
+ repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
87
+
88
+ == SYNOPSIS:
89
+
90
+ Usage: repub [options] url
91
+
92
+ General options:
93
+ -D, --downloader NAME Which downloader to use to get files (wget or httrack).
94
+ Default is wget.
95
+ -o, --output PATH Output path for generated ePub file.
96
+ Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
97
+ -w, --write-profile NAME Save given options for later reuse as profile NAME.
98
+ -l, --load-profile NAME Load options from saved profile NAME.
99
+ -W, --write-default Save given options for later reuse as default profile.
100
+ -L, --list-profiles List saved profiles.
101
+ -C, --cleanup Clean up download cache.
102
+ -v, --verbose Turn on verbose output.
103
+ -q, --quiet Turn off any output except errors.
104
+ -V, --version Show version.
105
+ -h, --help Show this help message.
106
+
107
+ Parser options:
108
+ -x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE.
109
+ Recognized selectors are: [title toc toc_item toc_section]
110
+ -m, --meta NAME:VALUE Set publication information metadata NAME to VALUE.
111
+ Valid metadata names are: [creator date description
112
+ language publisher relation rights subject title]
113
+ -F, --no-fixup Do not attempt to make document meet XHTML 1.0 Strict.
114
+ Default is to try and fix things that are broken.
115
+ -e, --encoding NAME Set source document encoding. Default is to auto detect.
116
+
117
+ Post-processing options:
118
+ -s, --stylesheet PATH Use custom stylesheet at PATH to add or override existing
119
+ CSS references in the source document.
120
+ -X, --remove SELECTOR Remove source element using XPath selector.
121
+ Use -X- to ignore stored profile.
122
+ -R, --rx /PATTERN/REPLACEMENT/ Edit source HTML using regular expressions.
123
+ Use -R- to ignore stored profile.
124
+ -B, --browse After processing, open resulting HTML in default browser.
125
+
126
+ == DEPENDENCIES:
127
+
128
+ * {Builder}[http://rubyforge.org/projects/builder/]
129
+ * {Nokogiri}[http://nokogiri.rubyforge.org/nokogiri/]
130
+ * {chardet}[http://rubyforge.org/projects/chardet/]
131
+ * {launchy}[http://copiousfreetime.rubyforge.org/launchy/]
132
+
133
+ Also, the following tools must be somewhere in $PATH:
134
+
135
+ * {wget}[http://www.gnu.org/software/wget/] or {httrack}[http://www.httrack.com/]
136
+ * {zip (Info-ZIP)}[http://www.info-zip.org/]
137
+
138
+ == LIMITATIONS/BUGS:
139
+
140
+ Currently, only "everything-on-one-page" HTML sources are supported. Repub will download and process all page requisites
141
+ (stylesheets and images) but all actual content must be on one page.
142
+
143
+ Bugs: probably. If you find any, please report them to dg at invisiblellama dot net.
144
+
145
+ == INSTALL:
146
+
147
+ gem install repub
148
+
149
+ == LICENSE:
150
+
151
+ (The MIT License)
152
+
153
+ Copyright (c) 2009 Dmitri Goutnik, Invisible Llama
154
+
155
+ Permission is hereby granted, free of charge, to any person obtaining a copy
156
+ of this software and associated documentation files (the "Software"), to deal
157
+ in the Software without restriction, including without limitation the rights
158
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
159
+ copies of the Software, and to permit persons to whom the Software is
160
+ furnished to do so, subject to the following conditions:
161
+
162
+ The above copyright notice and this permission notice shall be included in
163
+ all copies or substantial portions of the Software.
164
+
165
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
166
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
167
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
168
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
169
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
170
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
171
+ THE SOFTWARE.
172
+
173
+ ==
data/Rakefile CHANGED
@@ -17,9 +17,10 @@ task :default => 'test:run'
17
17
  PROJ.name = 'repub'
18
18
  PROJ.authors = 'Dmitri Goutnik'
19
19
  PROJ.email = 'dg@invisiblellama.net'
20
- PROJ.url = 'http://github.com/invisiblellama/repub/tree/master'
20
+ PROJ.url = 'http://rubyforge.org/projects/repub/'
21
21
  PROJ.version = Repub::VERSION
22
22
  PROJ.rubyforge.name = 'repub'
23
+ PROJ.readme_file = 'README.rdoc'
23
24
  PROJ.exclude = %w[tmp/ \.git \.DS_Store .*\.tmproj .*\.epub ^pkg/]
24
25
 
25
26
  PROJ.spec.opts << '--color'
data/TODO CHANGED
@@ -1,3 +1,3 @@
1
- add support for rx cleaning/modifying source doc
2
- make -q/-v actually do something
3
- more parser tokens: author(s) etc
1
+ * add support for rx cleaning/modifying source doc
2
+ * make -q/-v actually do something
3
+ more parser tokens: author(s) etc ?
data/lib/repub.rb CHANGED
@@ -1,7 +1,7 @@
1
1
  module Repub
2
2
 
3
3
  # :stopdoc:
4
- VERSION = '0.3.1'
4
+ VERSION = '0.3.2'
5
5
  LIBPATH = File.expand_path(File.dirname(__FILE__)) + File::SEPARATOR
6
6
  PATH = File.dirname(LIBPATH) + File::SEPARATOR
7
7
  # :startdoc:
@@ -76,6 +76,40 @@ module Repub
76
76
 
77
77
  MetaInf = 'META-INF'
78
78
 
79
+ def copy_and_process_assets
80
+ # Copy html
81
+ @parser.cache.assets[:documents].each do |asset|
82
+ log.debug "-- Processing document #{asset}"
83
+ # Copy asset from cache
84
+ FileUtils.cp(File.join(@parser.cache.path, asset), '.')
85
+ # Do post-processing
86
+ postprocess_file(asset)
87
+ postprocess_doc(asset)
88
+ @content.add_document(asset)
89
+ @asset_path = File.expand_path(asset)
90
+ end
91
+ # Copy css
92
+ if @options[:css].nil? || @options[:css].empty?
93
+ # No custom css, copy one from assets
94
+ @parser.cache.assets[:stylesheets].each do |css|
95
+ log.debug "-- Copying stylesheet #{css}"
96
+ FileUtils.cp(File.join(@parser.cache.path, css), '.')
97
+ @content.add_stylesheet(css)
98
+ end
99
+ else
100
+ # Copy custom css
101
+ log.debug "-- Using custom stylesheet #{@options[:css]}"
102
+ FileUtils.cp(@options[:css], '.')
103
+ @content.add_stylesheet(File.basename(@options[:css]))
104
+ end
105
+ # Copy images
106
+ @parser.cache.assets[:images].each do |image|
107
+ log.debug "-- Copying image #{image}"
108
+ FileUtils.cp(File.join(@parser.cache.path, image), '.')
109
+ @content.add_image(image)
110
+ end
111
+ end
112
+
79
113
  def postprocess_file(asset)
80
114
  source = IO.read(asset)
81
115
  # Do rx substitutions
@@ -104,11 +138,11 @@ module Repub
104
138
  end
105
139
 
106
140
  def postprocess_doc(asset)
107
- doc = Nokogiri::HTML.parse(open(asset), nil, 'UTF-8')
141
+ doc = Nokogiri::HTML.parse(IO.read(asset), nil, 'UTF-8')
108
142
  # Substitute custom CSS
109
143
  if (@options[:css] && !@options[:css].empty?)
110
- doc.xpath('//link[@rel="stylesheet"]') do |link|
111
- link[:href] = File.basename(@options[:css])
144
+ doc.xpath('//link[@rel="stylesheet"]').each do |link|
145
+ link['href'] = File.basename(@options[:css])
112
146
  log.debug "-- Replacing CSS refs with #{link[:href]}"
113
147
  end
114
148
  end
@@ -116,8 +150,6 @@ module Repub
116
150
  if @options[:remove] && !@options[:remove].empty?
117
151
  @options[:remove].each do |selector|
118
152
  log.info "Removing elements matching selector \"#{selector}\""
119
- #p doc.search(selector).size
120
- #p doc.search(selector)
121
153
  doc.search(selector).remove
122
154
  end
123
155
  end
@@ -134,40 +166,6 @@ module Repub
134
166
  end
135
167
  end
136
168
 
137
- def copy_and_process_assets
138
- # Copy html
139
- @parser.cache.assets[:documents].each do |asset|
140
- log.debug "-- Processing document #{asset}"
141
- # Copy asset from cache
142
- FileUtils.cp(File.join(@parser.cache.path, asset), '.')
143
- # Do post-processing
144
- postprocess_file(asset)
145
- postprocess_doc(asset)
146
- @content.add_document(asset)
147
- @asset_path = File.expand_path(asset)
148
- end
149
- # Copy css
150
- if @options[:css].nil? || @options[:css].empty?
151
- # No custom css, copy one from assets
152
- @parser.cache.assets[:stylesheets].each do |css|
153
- log.debug "-- Copying stylesheet #{css}"
154
- FileUtils.cp(File.join(@parser.cache.path, css), '.')
155
- @content.add_stylesheet(css)
156
- end
157
- else
158
- # Copy custom css
159
- log.debug "-- Using custom stylesheet #{@options[:css]}"
160
- FileUtils.cp(@options[:css], '.')
161
- @content.add_stylesheet(File.basename(@options[:css]))
162
- end
163
- # Copy images
164
- @parser.cache.assets[:images].each do |image|
165
- log.debug "-- Copying image #{image}"
166
- FileUtils.cp(File.join(@parser.cache.path, image), '.')
167
- @content.add_image(image)
168
- end
169
- end
170
-
171
169
  def write_meta_inf
172
170
  FileUtils.mkdir_p(MetaInf)
173
171
  FileUtils.chdir(MetaInf) do
@@ -54,17 +54,41 @@ module Repub
54
54
  rescue
55
55
  raise FetcherException, "invalid URL: #{url}"
56
56
  end
57
- cmd = "#{@downloader_path} #{@downloader_options} #{url}"
58
57
  Cache.for_url(url) do |cache|
59
58
  log.debug "-- Downloading into #{cache.path}"
59
+ cmd = "#{@downloader_path} #{@downloader_options} #{url}"
60
60
  unless system(cmd) && !cache.empty?
61
61
  raise FetcherException, "Fetch failed."
62
62
  end
63
+ unless cache.cached?
64
+ fix_filenames(cache)
65
+ fix_encoding(cache, @options[:encoding])
66
+ end
63
67
  end
64
68
  end
65
69
 
66
70
  private
67
71
 
72
+ def fix_filenames(cache)
73
+ # TODO: fix non-alphanum characters in doc filenames
74
+ end
75
+
76
+ def fix_encoding(cache, encoding = nil)
77
+ cache.assets[:documents].each do |doc|
78
+ unless encoding
79
+ log.info "Detecting encoding for #{doc}"
80
+ s = IO.read(doc)
81
+ raise FetcherException, "empty document" unless s
82
+ encoding = UniversalDetector.chardet(s)['encoding']
83
+ end
84
+ if encoding.downcase != 'utf-8'
85
+ log.info "Source encoding is #{encoding}, converting to UTF-8"
86
+ s = Iconv.conv('utf-8', encoding, IO.read(doc))
87
+ File.open(doc, 'w') { |f| f.write(s) }
88
+ end
89
+ end
90
+ end
91
+
68
92
  def which(cmd)
69
93
  if !RUBY_PLATFORM.match('mswin')
70
94
  cmd = `/usr/bin/which #{cmd}`.strip
@@ -81,10 +105,6 @@ module Repub
81
105
  return File.join(App.data_path, 'cache')
82
106
  end
83
107
 
84
- def self.inventorize
85
- # TODO
86
- end
87
-
88
108
  def self.cleanup
89
109
  Dir.chdir(self.root) { FileUtils.rm_r(Dir.glob('*')) }
90
110
  rescue
@@ -94,16 +114,15 @@ module Repub
94
114
  attr_reader :url
95
115
  attr_reader :name
96
116
  attr_reader :path
97
- attr_reader :assets
98
-
117
+
99
118
  def self.for_url(url, &block)
100
119
  self.new(url).for_url(&block)
101
120
  end
102
121
 
103
122
  def for_url(&block)
104
123
  # Download stuff if not yet cached
105
- cached = File.exist?(@path)
106
- unless cached
124
+ @cached = File.exist?(@path)
125
+ unless @cached
107
126
  FileUtils.mkdir_p(@path)
108
127
  begin
109
128
  Dir.chdir(@path) { yield self }
@@ -115,40 +134,33 @@ module Repub
115
134
  log.info "Using cached assets"
116
135
  log.debug "-- Cache is #{@path}"
117
136
  end
118
- # Do post-download tasks
119
- Dir.chdir(@path) do
137
+ self
138
+ end
139
+
140
+ def assets
141
+ unless @assets
120
142
  # Enumerate assets
121
- @assets = {}
122
- AssetTypes.each_pair do |asset_type, file_types|
123
- @assets[asset_type] ||= []
124
- file_types.each do |file_type|
125
- @assets[asset_type] << Dir.glob("*.#{file_type}")
126
- end
127
- @assets[asset_type].flatten!
128
- end
129
- # For freshly downloaded docs, detect encoding and convert to utf-8
130
- unless cached
131
- @assets[:documents].each do |doc|
132
- log.info "Detecting encoding for #{doc}"
133
- s = IO.read(doc)
134
- raise FetcherException, "empty document" unless s
135
- encoding = UniversalDetector.chardet(s)['encoding']
136
- if encoding.downcase != 'utf-8'
137
- log.info "Looks like #{encoding}, converting to UTF-8"
138
- s = Iconv.conv('utf-8', encoding, IO.read(doc))
139
- File.open(doc, 'w') { |f| f.write(s) }
140
- else
141
- log.info "Looks like UTF-8, no conversion needed"
143
+ Dir.chdir(@path) do
144
+ @assets = {}
145
+ AssetTypes.each_pair do |asset_type, file_types|
146
+ @assets[asset_type] ||= []
147
+ file_types.each do |file_type|
148
+ @assets[asset_type] << Dir.glob("*.#{file_type}")
142
149
  end
150
+ @assets[asset_type].flatten!
143
151
  end
144
152
  end
145
153
  end
146
- self
154
+ @assets
147
155
  end
148
-
156
+
149
157
  def empty?
150
158
  Dir.glob(File.join(@path, '*')).empty?
151
159
  end
160
+
161
+ def cached?
162
+ @cached == true
163
+ end
152
164
 
153
165
  private
154
166
 
@@ -156,6 +168,7 @@ module Repub
156
168
  @url = url
157
169
  @name = Digest::SHA1.hexdigest(@url)
158
170
  @path = File.join(Cache.root, @name)
171
+ @assets = nil
159
172
  end
160
173
  end
161
174
 
@@ -43,7 +43,7 @@ module Repub
43
43
  @cache = cache
44
44
  @asset = @cache.assets[:documents][0]
45
45
  log.debug "-- Parsing #{@asset}"
46
- @doc = Nokogiri::HTML.parse(open(File.join(@cache.path, @asset)), nil, 'UTF-8')
46
+ @doc = Nokogiri::HTML.parse(IO.read(File.join(@cache.path, @asset)), nil, 'UTF-8')
47
47
 
48
48
  @uid = @cache.name
49
49
  parse_title
data/repub.gemspec CHANGED
@@ -2,23 +2,27 @@
2
2
 
3
3
  Gem::Specification.new do |s|
4
4
  s.name = %q{repub}
5
- s.version = "0.3.1"
5
+ s.version = "0.3.2"
6
6
 
7
7
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
8
8
  s.authors = ["Dmitri Goutnik"]
9
- s.date = %q{2009-06-28}
9
+ s.date = %q{2009-06-30}
10
10
  s.default_executable = %q{repub}
11
- s.description = %q{Simple HTML to ePub converter.}
11
+ s.description = %q{Repub is a simple HTML to ePub converter.
12
+
13
+ It lacks imagination and won't try to guess the source document structure, you will have to describe where to look
14
+ for title and table of contents. In return, it provides you with greater control over generated
15
+ ePub documents.}
12
16
  s.email = %q{dg@invisiblellama.net}
13
17
  s.executables = ["repub"]
14
- s.extra_rdoc_files = ["History.txt", "README.txt", "SAMPLES.txt", "bin/repub"]
15
- s.files = ["History.txt", "README.txt", "Rakefile", "SAMPLES.txt", "TODO", "bin/repub", "lib/repub.rb", "lib/repub/app.rb", "lib/repub/app/builder.rb", "lib/repub/app/fetcher.rb", "lib/repub/app/logger.rb", "lib/repub/app/options.rb", "lib/repub/app/parser.rb", "lib/repub/app/profile.rb", "lib/repub/app/utility.rb", "lib/repub/epub.rb", "lib/repub/epub/container.rb", "lib/repub/epub/content.rb", "lib/repub/epub/toc.rb", "repub.gemspec", "test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
16
- s.homepage = %q{http://github.com/invisiblellama/repub/tree/master}
17
- s.rdoc_options = ["--main", "README.txt"]
18
+ s.extra_rdoc_files = ["History.txt", "README.rdoc", "bin/repub"]
19
+ s.files = ["History.txt", "README.rdoc", "Rakefile", "TODO", "bin/repub", "lib/repub.rb", "lib/repub/app.rb", "lib/repub/app/builder.rb", "lib/repub/app/fetcher.rb", "lib/repub/app/logger.rb", "lib/repub/app/options.rb", "lib/repub/app/parser.rb", "lib/repub/app/profile.rb", "lib/repub/app/utility.rb", "lib/repub/epub.rb", "lib/repub/epub/container.rb", "lib/repub/epub/content.rb", "lib/repub/epub/toc.rb", "repub.gemspec", "test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
20
+ s.homepage = %q{http://rubyforge.org/projects/repub/}
21
+ s.rdoc_options = ["--main", "README.rdoc"]
18
22
  s.require_paths = ["lib"]
19
23
  s.rubyforge_project = %q{repub}
20
24
  s.rubygems_version = %q{1.3.4}
21
- s.summary = %q{Simple HTML to ePub converter}
25
+ s.summary = %q{Repub is a simple HTML to ePub converter}
22
26
  s.test_files = ["test/epub/test_container.rb", "test/epub/test_content.rb", "test/epub/test_toc.rb", "test/test_builder.rb", "test/test_fetcher.rb", "test/test_logger.rb", "test/test_parser.rb"]
23
27
 
24
28
  if s.respond_to? :specification_version then
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: invisiblellama-repub
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Dmitri Goutnik
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-06-28 00:00:00 -07:00
12
+ date: 2009-06-30 00:00:00 -07:00
13
13
  default_executable: repub
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency
@@ -62,7 +62,7 @@ dependencies:
62
62
  - !ruby/object:Gem::Version
63
63
  version: 2.5.1
64
64
  version:
65
- description: Simple HTML to ePub converter.
65
+ description: Repub is a simple HTML to ePub converter. It lacks imagination and won't try to guess the source document structure, you will have to describe where to look for title and table of contents. In return, it provides you with greater control over generated ePub documents.
66
66
  email: dg@invisiblellama.net
67
67
  executables:
68
68
  - repub
@@ -70,14 +70,12 @@ extensions: []
70
70
 
71
71
  extra_rdoc_files:
72
72
  - History.txt
73
- - README.txt
74
- - SAMPLES.txt
73
+ - README.rdoc
75
74
  - bin/repub
76
75
  files:
77
76
  - History.txt
78
- - README.txt
77
+ - README.rdoc
79
78
  - Rakefile
80
- - SAMPLES.txt
81
79
  - TODO
82
80
  - bin/repub
83
81
  - lib/repub.rb
@@ -102,11 +100,11 @@ files:
102
100
  - test/test_logger.rb
103
101
  - test/test_parser.rb
104
102
  has_rdoc: false
105
- homepage: http://github.com/invisiblellama/repub/tree/master
103
+ homepage: http://rubyforge.org/projects/repub/
106
104
  post_install_message:
107
105
  rdoc_options:
108
106
  - --main
109
- - README.txt
107
+ - README.rdoc
110
108
  require_paths:
111
109
  - lib
112
110
  required_ruby_version: !ruby/object:Gem::Requirement
@@ -127,7 +125,7 @@ rubyforge_project: repub
127
125
  rubygems_version: 1.2.0
128
126
  signing_key:
129
127
  specification_version: 3
130
- summary: Simple HTML to ePub converter
128
+ summary: Repub is a simple HTML to ePub converter
131
129
  test_files:
132
130
  - test/epub/test_container.rb
133
131
  - test/epub/test_content.rb
data/README.txt DELETED
@@ -1,106 +0,0 @@
1
- == DESCRIPTION:
2
-
3
- Simple HTML to ePub converter.
4
-
5
- == FEATURES/PROBLEMS:
6
-
7
- Few samples to get started:
8
-
9
- * Git User's Manual
10
-
11
- repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' \
12
- http://www.kernel.org/pub/software/scm/git/docs/user-manual.html
13
-
14
- * Project Gutenberg's THE ADVENTURES OF SHERLOCK HOLMES
15
-
16
- repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' \
17
- -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' \
18
- http://www.gutenberg.org/dirs/etext99/advsh12h.htm
19
-
20
- * Project Gutenberg's ALICE'S ADVENTURES IN WONDERLAND
21
-
22
- repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' \
23
- -X '//pre' -X '//hr' -X '//body/h4' \
24
- http://www.gutenberg.org/files/11/11-h/11-h.htm
25
-
26
- * The Gelug-Kagyu Tradition of Mahamudra from Berzin Archives
27
-
28
- repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
29
-
30
- == SYNOPSIS:
31
-
32
- Usage: repub [options] url
33
-
34
- General options:
35
- -D, --downloader NAME Which downloader to use to get files (wget or httrack).
36
- Default is wget.
37
- -o, --output PATH Output path for generated ePub file.
38
- Default is /Users/dg/Projects/repub/<Parsed_Title>.epub
39
- -w, --write-profile NAME Save given options for later reuse as profile NAME.
40
- -l, --load-profile NAME Load options from saved profile NAME.
41
- -W, --write-default Save given options for later reuse as default profile.
42
- -L, --list-profiles List saved profiles.
43
- -C, --cleanup Clean up download cache.
44
- -v, --verbose Turn on verbose output.
45
- -q, --quiet Turn off any output except errors.
46
- -V, --version Show version.
47
- -h, --help Show this help message.
48
-
49
- Parser options:
50
- -x, --selector NAME:VALUE Set parser XPath selector NAME to VALUE.
51
- Recognized selectors are: [title toc toc_item toc_section]
52
- -m, --meta NAME:VALUE Set publication information metadata NAME to VALUE.
53
- Valid metadata names are: [creator date description
54
- language publisher relation rights subject title]
55
- -F, --no-fixup Do not attempt to make document meet XHTML 1.0 Strict.
56
- Default is to try and fix things that are broken.
57
- -e, --encoding NAME Set source document encoding. Default is to autodetect.
58
-
59
- Post-processing options:
60
- -s, --stylesheet PATH Use custom stylesheet at PATH to add or override existing
61
- CSS references in the source document.
62
- -X, --remove SELECTOR Remove source element using XPath selector.
63
- Use -X- to ignore stored profile.
64
- -R, --rx /PATTERN/REPLACEMENT/ Edit source HTML using regular expressions.
65
- Use -R- to ignore stored profile.
66
- -B, --browse After processing, open resulting HTML in default browser.
67
-
68
- == DEPENDENCIES:
69
-
70
- * Builder (https://rubyforge.org/projects/builder/)
71
- * Nokogiri (http://nokogiri.rubyforge.org/nokogiri/)
72
- * rchardet (https://rubyforge.org/projects/rchardet/)
73
- * launchy (http://copiousfreetime.rubyforge.org/launchy/)
74
-
75
- * wget or httrack
76
- * zip (Info-ZIP)
77
-
78
- == INSTALL:
79
-
80
- gem install repub
81
-
82
- == LICENSE:
83
-
84
- (The MIT License)
85
-
86
- Copyright (c) 2009 Invisible Llama <dg@invisiblellama.net>
87
-
88
- Permission is hereby granted, free of charge, to any person obtaining a copy
89
- of this software and associated documentation files (the "Software"), to deal
90
- in the Software without restriction, including without limitation the rights
91
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
92
- copies of the Software, and to permit persons to whom the Software is
93
- furnished to do so, subject to the following conditions:
94
-
95
- The above copyright notice and this permission notice shall be included in
96
- all copies or substantial portions of the Software.
97
-
98
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
99
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
100
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
101
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
102
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
103
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
104
- THE SOFTWARE.
105
-
106
- ==
data/SAMPLES.txt DELETED
@@ -1,23 +0,0 @@
1
- * THE ADVENTURES OF SHERLOCK HOLMES
2
-
3
- repub -x 'title:div[@class='book']//h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h1' -X '//body/h2' http://www.gutenberg.org/dirs/etext99/advsh12h.htm
4
-
5
- * ALICE'S ADVENTURES IN WONDERLAND
6
-
7
- repub -x 'title:body/h1' -x 'toc://table' -x 'toc_item://tr' -X '//pre' -X '//hr' -X '//body/h4' http://www.gutenberg.org/files/11/11-h/11-h.htm
8
-
9
- * The Gelug-Kagyu Tradition of Mahamudra
10
-
11
- repub http://www.berzinarchives.com/web/x/prn/p.html_680632258.html
12
-
13
- * Брюс Стерлинг. Схизматрица
14
-
15
- repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/STERLINGB/shizmatrica.txt_with-big-pictures.html
16
-
17
- * Айзек Азимов. Космические течения
18
-
19
- repub -x 'title://h2' -x 'toc://table' -x 'toc_item://a' -X 'div' -X 'table' -X '//hr' http://lib.ru/FOUNDATION/currspac.txt_with-big-pictures.html
20
-
21
- * Git User's Manual
22
-
23
- repub -x 'title://h1' -x 'toc://div[@class="toc"]/dl' -x 'toc_item:dt' -x 'toc_section:following-sibling::*[1]/dl' http://www.kernel.org/pub/software/scm/git/docs/user-manual.html