html2odt 0.2.1 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/Gemfile.lock +1 -1
- data/README.md +63 -4
- data/bin/html2odt.rb +100 -0
- data/html2odt.gemspec +4 -3
- data/lib/html2odt/document.rb +45 -10
- data/lib/html2odt/version.rb +1 -1
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d2e147b6eecf0af3652facf38559813053546c6a
|
4
|
+
data.tar.gz: 303a83ca4cd9e86ecfe5091d911847358a79f77c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9d3086df185e177ea423c7a886503463918933a032741d25c96a0fe847ac089043e32275d0b533f3295124021c696d9f0af9fd83d0b6ab4f10065e417b9c24b8
|
7
|
+
data.tar.gz: 5114655368d6cbcd5df9b0f404f9bdc5b7d8203e3308d33eb9c86f60641cbaca2c153ccc034f890d3394559aa55b698125ce7c42ded667d9e63bc45268a36c6b
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
# v0.3.0 - 2016-05-25
|
2
|
+
|
3
|
+
Adding support for `base_uri` configuration to expand links and download images
|
4
|
+
without fully qualified URI.
|
5
|
+
|
6
|
+
Adding html2odt.rb binary.
|
7
|
+
|
1
8
|
# v0.2.1 - 2016-05-25
|
2
9
|
|
3
10
|
Adding workarounds for HTML structures not supported by xhtml2odt.
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -3,6 +3,20 @@
|
|
3
3
|
This gem provides a Ruby wrapper around the set of XLST stylesheets published as
|
4
4
|
[xhtml2odt](https://github.com/abompard/xhtml2odt).
|
5
5
|
|
6
|
+
|
7
|
+
## html2odt vs. xhtml2odt
|
8
|
+
|
9
|
+
So, why is this project called `html2odt` while the original library and command
|
10
|
+
line tools by Aurélien Bompard are called **`x`**`html2odt`?
|
11
|
+
|
12
|
+
This project uses [nokogiri](http://www.nokogiri.org) to parse the HTML and
|
13
|
+
apply the XSLT transformations. Nokogiri implements a forgiving HTML parser and
|
14
|
+
tries be as forgiving as possible. Furthermore, the basic API expects HTML
|
15
|
+
fragments, not full documents. We are not expecting the users of this library to
|
16
|
+
pass in a complete, valid XHTML document. A reasonably good piece of HTML should
|
17
|
+
be good enough. Therefore we skipped the `X` in the name as well.
|
18
|
+
|
19
|
+
|
6
20
|
## Installation
|
7
21
|
|
8
22
|
Add this line to your application's Gemfile:
|
@@ -19,9 +33,27 @@ Or install it yourself as:
|
|
19
33
|
|
20
34
|
$ gem install html2odt
|
21
35
|
|
36
|
+
|
22
37
|
## Usage
|
23
38
|
|
24
|
-
###
|
39
|
+
### Command line Usage
|
40
|
+
|
41
|
+
|
42
|
+
```
|
43
|
+
Usage: html2odt.rb [options] -i input.html -o output.odt
|
44
|
+
-i, --input input.html
|
45
|
+
-o, --output output.odt
|
46
|
+
-t, --template <template.odt> The file that should be filled with the input's content.
|
47
|
+
Defaults to basic template file which is part of this gem.
|
48
|
+
-r, --replace <KEYWORD> A keyword in the template document to replace with the converted text.
|
49
|
+
Defaults to `{{content}}`.
|
50
|
+
-u, --url <URL> The remote URL you downloaded the page from.
|
51
|
+
This is required to include remote images and to resolve links properly.
|
52
|
+
-h, --help Show this message
|
53
|
+
```
|
54
|
+
|
55
|
+
|
56
|
+
### Ruby API usage
|
25
57
|
|
26
58
|
```ruby
|
27
59
|
# Create an Html2Odt::Document instance
|
@@ -77,20 +109,47 @@ doc = Html2Odt::Document.new(html: <<HTML)
|
|
77
109
|
HTML
|
78
110
|
```
|
79
111
|
|
112
|
+
|
113
|
+
|
114
|
+
|
115
|
+
Furthermore, you may specify a `base_uri`, which will most likely be the place,
|
116
|
+
the original HTML fragment belongs to. The `base_uri` will be used to convert
|
117
|
+
links to fully qualified URLs, so that they still work when placed in the ODT
|
118
|
+
document. Furthermore the setting will be used to identify the sources of
|
119
|
+
image's found within the HTML fragments (see below for some detail).
|
120
|
+
|
121
|
+
```ruby
|
122
|
+
# Provide base_uri
|
123
|
+
doc = Html2Odt::Document.new
|
124
|
+
doc.base_uri = "https://www.example.com"
|
125
|
+
```
|
126
|
+
|
127
|
+
You may also pass a `URI` instance directly.
|
128
|
+
|
129
|
+
```ruby
|
130
|
+
# Provide base_uri
|
131
|
+
doc = Html2Odt::Document.new
|
132
|
+
doc.base_uri = URI::parse("https://www.example.com")
|
133
|
+
```
|
134
|
+
|
135
|
+
It is expected, that the URI refers to a `http(s)` location.
|
136
|
+
|
137
|
+
|
80
138
|
### Image handling
|
81
139
|
|
82
140
|
`html2odt` provides basic image inlining, i.e. images referenced in the HTML
|
83
141
|
code will be embeded into the ODT file by default. This is true for images
|
84
142
|
referenced with a full `file://`, `http://`, or `https://` URL. Absolute URLs
|
85
|
-
(i.e. starting `/`) and relative URLs are
|
86
|
-
idea, which server or document they
|
143
|
+
(i.e. starting `/`) and relative URLs are only supported if the `base_uri`
|
144
|
+
option is set. Otherwise `html2odt` has no idea, which server or document they
|
145
|
+
are relating to.
|
87
146
|
|
88
147
|
Images referencing an unsupported resource will be replaced with a link
|
89
148
|
containing the alt text of the image.
|
90
149
|
|
91
150
|
If you are using `html2odt` in a web application context, you will probably want
|
92
151
|
to provide some special handling for resources residing on your own server. This
|
93
|
-
should be done for security reasons
|
152
|
+
should be done for security reasons and to save roundtrips.
|
94
153
|
|
95
154
|
`html2odt` provides the following API to map image `src` attributes to local
|
96
155
|
file locations.
|
data/bin/html2odt.rb
ADDED
@@ -0,0 +1,100 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'optparse'
|
4
|
+
require 'html2odt'
|
5
|
+
|
6
|
+
options = {}
|
7
|
+
parser = OptionParser.new do |opts|
|
8
|
+
opts.banner = "Usage: html2odt.rb [options] -i input.html -o output.odt"
|
9
|
+
|
10
|
+
opts.on("-i", "--input input.html") do |input|
|
11
|
+
options[:input] = input
|
12
|
+
end
|
13
|
+
|
14
|
+
opts.on("-o", "--output output.odt") do |output|
|
15
|
+
options[:output] = output
|
16
|
+
end
|
17
|
+
|
18
|
+
opts.on("-t", "--template <template.odt>",
|
19
|
+
"The file that should be filled with the input's content.", "Defaults to basic template file which is part of this gem.") do |template|
|
20
|
+
options[:template] = template
|
21
|
+
end
|
22
|
+
|
23
|
+
opts.on("-r", "--replace <KEYWORD>",
|
24
|
+
"A keyword in the template document to replace with the converted text.", "Defaults to `{{content}}`.") do |replace|
|
25
|
+
options[:replace] = replace
|
26
|
+
end
|
27
|
+
|
28
|
+
opts.on("-u", "--url <URL>",
|
29
|
+
"The remote URL you downloaded the page from.", "This is required to include remote images and to resolve links properly.") do |url|
|
30
|
+
options[:url] = url
|
31
|
+
end
|
32
|
+
|
33
|
+
opts.on("-h", "--help",
|
34
|
+
"Show this message") do
|
35
|
+
puts opts
|
36
|
+
exit
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
parser.parse!
|
41
|
+
|
42
|
+
if options.empty?
|
43
|
+
puts parser
|
44
|
+
exit
|
45
|
+
end
|
46
|
+
|
47
|
+
if options[:replace]
|
48
|
+
warn "-r option is not yet implemented, please use the default `{{content}}` place holder for now."
|
49
|
+
exit 1
|
50
|
+
end
|
51
|
+
|
52
|
+
|
53
|
+
if options[:input].nil?
|
54
|
+
warn "Missing -i option"
|
55
|
+
puts parser
|
56
|
+
exit 1
|
57
|
+
end
|
58
|
+
|
59
|
+
|
60
|
+
if options[:output].nil?
|
61
|
+
warn "Missing -o option"
|
62
|
+
puts parser
|
63
|
+
exit 1
|
64
|
+
end
|
65
|
+
|
66
|
+
|
67
|
+
|
68
|
+
doc = if options[:template].nil?
|
69
|
+
Html2Odt::Document.new
|
70
|
+
else
|
71
|
+
begin
|
72
|
+
Html2Odt::Document.new(template: options[:template])
|
73
|
+
rescue ArgumentError
|
74
|
+
warn "Template does not match expectations - #{$!.message}"
|
75
|
+
exit 2
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
|
80
|
+
if File.readable? options[:input]
|
81
|
+
doc.html = File.read(options[:input])
|
82
|
+
else
|
83
|
+
warn "Input does not match expectations - Cannot read input file #{options[:input].inspect}"
|
84
|
+
exit 3
|
85
|
+
end
|
86
|
+
|
87
|
+
|
88
|
+
if options[:url]
|
89
|
+
begin
|
90
|
+
doc.base_uri = options[:url]
|
91
|
+
rescue ArgumentError
|
92
|
+
warn "URL does not match expectations - #{$!.message}"
|
93
|
+
exit 4
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
|
98
|
+
doc.write_to options[:output]
|
99
|
+
|
100
|
+
puts "Wrote document to: #{options[:output]}"
|
data/html2odt.gemspec
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
# coding: utf-8
|
2
|
-
lib = File.expand_path(
|
2
|
+
lib = File.expand_path("../lib", __FILE__)
|
3
3
|
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
-
require
|
4
|
+
require "html2odt/version"
|
5
5
|
|
6
6
|
Gem::Specification.new do |spec|
|
7
7
|
spec.name = "html2odt"
|
@@ -13,10 +13,11 @@ Gem::Specification.new do |spec|
|
|
13
13
|
spec.description = %q{html2odt generates ODT documents based on HTML fragments using xhtml2odt}
|
14
14
|
spec.homepage = "https://github.com/planio-gmbh/html2odt"
|
15
15
|
|
16
|
-
spec.license =
|
16
|
+
spec.license = "MIT"
|
17
17
|
|
18
18
|
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
19
19
|
spec.require_paths = ["lib"]
|
20
|
+
spec.executables << "html2odt.rb"
|
20
21
|
|
21
22
|
spec.add_dependency "dimensions", "~> 1.3.0"
|
22
23
|
spec.add_dependency "nokogiri", "~> 1.6.7.2"
|
data/lib/html2odt/document.rb
CHANGED
@@ -30,6 +30,25 @@ class Html2Odt::Document
|
|
30
30
|
@html
|
31
31
|
end
|
32
32
|
|
33
|
+
def base_uri=(uri)
|
34
|
+
if uri.is_a? URI
|
35
|
+
@base_uri = uri
|
36
|
+
else
|
37
|
+
@base_uri = URI::parse(uri)
|
38
|
+
end
|
39
|
+
|
40
|
+
unless @base_uri.is_a? URI::HTTP
|
41
|
+
raise ArgumentError, "Invalid URI - Expecting http(s) scheme."
|
42
|
+
end
|
43
|
+
|
44
|
+
rescue URI::InvalidURIError
|
45
|
+
raise ArgumentError, "Invalid URI - #{$!.message}"
|
46
|
+
end
|
47
|
+
|
48
|
+
def base_uri
|
49
|
+
@base_uri
|
50
|
+
end
|
51
|
+
|
33
52
|
def content_xml
|
34
53
|
@content_xml ||= begin
|
35
54
|
|
@@ -177,7 +196,7 @@ class Html2Odt::Document
|
|
177
196
|
end
|
178
197
|
|
179
198
|
rescue Zip::Error
|
180
|
-
raise ArgumentError, "Template file does not look like
|
199
|
+
raise ArgumentError, "Template file does not look like an ODT file - #{$!.message}"
|
181
200
|
rescue Errno::ENOENT
|
182
201
|
raise ArgumentError, "Template file does not contain expected file - #{$!.message}"
|
183
202
|
end
|
@@ -187,6 +206,7 @@ class Html2Odt::Document
|
|
187
206
|
html = self.html
|
188
207
|
html = fix_images_in_html(html)
|
189
208
|
html = fix_document_structure(html)
|
209
|
+
html = fix_links(html) if base_uri
|
190
210
|
html = create_document(html)
|
191
211
|
html
|
192
212
|
end
|
@@ -243,6 +263,16 @@ class Html2Odt::Document
|
|
243
263
|
doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
|
244
264
|
end
|
245
265
|
|
266
|
+
def fix_links(html)
|
267
|
+
doc = Nokogiri::HTML::DocumentFragment.parse(html)
|
268
|
+
|
269
|
+
doc.css("a").each do |a|
|
270
|
+
a["href"] = (base_uri + a["href"]).to_s
|
271
|
+
end
|
272
|
+
|
273
|
+
doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
|
274
|
+
end
|
275
|
+
|
246
276
|
def create_document(html)
|
247
277
|
%Q{<html xmlns="http://www.w3.org/1999/xhtml">#{html}</html>}
|
248
278
|
end
|
@@ -252,22 +282,27 @@ class Html2Odt::Document
|
|
252
282
|
return image_location_mapping.call(src)
|
253
283
|
end
|
254
284
|
|
255
|
-
|
256
|
-
when /\Afile:\/\//
|
285
|
+
if src =~ /\Afile:\/\//
|
257
286
|
# local file URL
|
258
287
|
#
|
259
288
|
# TODO: Verify, that this does not pose a security threat, maybe make
|
260
289
|
# this optional. In any case, it's useful for testing.
|
261
290
|
|
262
|
-
src[7..-1]
|
291
|
+
return src[7..-1]
|
292
|
+
end
|
263
293
|
|
264
|
-
|
294
|
+
if src =~ /\Ahttps?:\/\// or !base_uri.nil?
|
265
295
|
# remote image URL
|
266
296
|
#
|
267
297
|
# TODO: Verify, that this does not pose a security threat, maybe make
|
268
298
|
# this optional.
|
269
299
|
|
270
|
-
|
300
|
+
if base_uri
|
301
|
+
uri = base_uri + src
|
302
|
+
else
|
303
|
+
uri = URI.parse(src)
|
304
|
+
end
|
305
|
+
|
271
306
|
file = Tempfile.new("html2odt")
|
272
307
|
file.binmode
|
273
308
|
|
@@ -279,11 +314,11 @@ class Html2Odt::Document
|
|
279
314
|
file
|
280
315
|
end
|
281
316
|
|
282
|
-
file.path
|
283
|
-
else
|
284
|
-
# cannot handle image properly, return nil
|
285
|
-
nil
|
317
|
+
return file.path
|
286
318
|
end
|
319
|
+
|
320
|
+
# cannot handle image properly, return nil
|
321
|
+
nil
|
287
322
|
end
|
288
323
|
|
289
324
|
def update_img_tag(img, image)
|
data/lib/html2odt/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html2odt
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Gregor Schmidt (Planio)
|
@@ -112,7 +112,8 @@ description: html2odt generates ODT documents based on HTML fragments using xhtm
|
|
112
112
|
email:
|
113
113
|
- gregor@plan.io
|
114
114
|
- support@plan.io
|
115
|
-
executables:
|
115
|
+
executables:
|
116
|
+
- html2odt.rb
|
116
117
|
extensions: []
|
117
118
|
extra_rdoc_files: []
|
118
119
|
files:
|
@@ -125,6 +126,7 @@ files:
|
|
125
126
|
- README.md
|
126
127
|
- Rakefile
|
127
128
|
- bin/console
|
129
|
+
- bin/html2odt.rb
|
128
130
|
- html2odt.gemspec
|
129
131
|
- lib/html2odt.rb
|
130
132
|
- lib/html2odt/document.rb
|