html2odt 0.2.1 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +7 -0
- data/Gemfile.lock +1 -1
- data/README.md +63 -4
- data/bin/html2odt.rb +100 -0
- data/html2odt.gemspec +4 -3
- data/lib/html2odt/document.rb +45 -10
- data/lib/html2odt/version.rb +1 -1
- metadata +4 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: d2e147b6eecf0af3652facf38559813053546c6a
|
4
|
+
data.tar.gz: 303a83ca4cd9e86ecfe5091d911847358a79f77c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 9d3086df185e177ea423c7a886503463918933a032741d25c96a0fe847ac089043e32275d0b533f3295124021c696d9f0af9fd83d0b6ab4f10065e417b9c24b8
|
7
|
+
data.tar.gz: 5114655368d6cbcd5df9b0f404f9bdc5b7d8203e3308d33eb9c86f60641cbaca2c153ccc034f890d3394559aa55b698125ce7c42ded667d9e63bc45268a36c6b
|
data/CHANGELOG.md
CHANGED
@@ -1,3 +1,10 @@
|
|
1
|
+
# v0.3.0 - 2016-05-25
|
2
|
+
|
3
|
+
Adding support for `base_uri` configuration to expand links and download images
|
4
|
+
without fully qualified URI.
|
5
|
+
|
6
|
+
Adding html2odt.rb binary.
|
7
|
+
|
1
8
|
# v0.2.1 - 2016-05-25
|
2
9
|
|
3
10
|
Adding workarounds for HTML structures not supported by xhtml2odt.
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -3,6 +3,20 @@
|
|
3
3
|
This gem provides a Ruby wrapper around the set of XLST stylesheets published as
|
4
4
|
[xhtml2odt](https://github.com/abompard/xhtml2odt).
|
5
5
|
|
6
|
+
|
7
|
+
## html2odt vs. xhtml2odt
|
8
|
+
|
9
|
+
So, why is this project called `html2odt` while the original library and command
|
10
|
+
line tools by Aurélien Bompard are called **`x`**`html2odt`?
|
11
|
+
|
12
|
+
This project uses [nokogiri](http://www.nokogiri.org) to parse the HTML and
|
13
|
+
apply the XSLT transformations. Nokogiri implements a forgiving HTML parser and
|
14
|
+
tries be as forgiving as possible. Furthermore, the basic API expects HTML
|
15
|
+
fragments, not full documents. We are not expecting the users of this library to
|
16
|
+
pass in a complete, valid XHTML document. A reasonably good piece of HTML should
|
17
|
+
be good enough. Therefore we skipped the `X` in the name as well.
|
18
|
+
|
19
|
+
|
6
20
|
## Installation
|
7
21
|
|
8
22
|
Add this line to your application's Gemfile:
|
@@ -19,9 +33,27 @@ Or install it yourself as:
|
|
19
33
|
|
20
34
|
$ gem install html2odt
|
21
35
|
|
36
|
+
|
22
37
|
## Usage
|
23
38
|
|
24
|
-
###
|
39
|
+
### Command line Usage
|
40
|
+
|
41
|
+
|
42
|
+
```
|
43
|
+
Usage: html2odt.rb [options] -i input.html -o output.odt
|
44
|
+
-i, --input input.html
|
45
|
+
-o, --output output.odt
|
46
|
+
-t, --template <template.odt> The file that should be filled with the input's content.
|
47
|
+
Defaults to basic template file which is part of this gem.
|
48
|
+
-r, --replace <KEYWORD> A keyword in the template document to replace with the converted text.
|
49
|
+
Defaults to `{{content}}`.
|
50
|
+
-u, --url <URL> The remote URL you downloaded the page from.
|
51
|
+
This is required to include remote images and to resolve links properly.
|
52
|
+
-h, --help Show this message
|
53
|
+
```
|
54
|
+
|
55
|
+
|
56
|
+
### Ruby API usage
|
25
57
|
|
26
58
|
```ruby
|
27
59
|
# Create an Html2Odt::Document instance
|
@@ -77,20 +109,47 @@ doc = Html2Odt::Document.new(html: <<HTML)
|
|
77
109
|
HTML
|
78
110
|
```
|
79
111
|
|
112
|
+
|
113
|
+
|
114
|
+
|
115
|
+
Furthermore, you may specify a `base_uri`, which will most likely be the place,
|
116
|
+
the original HTML fragment belongs to. The `base_uri` will be used to convert
|
117
|
+
links to fully qualified URLs, so that they still work when placed in the ODT
|
118
|
+
document. Furthermore the setting will be used to identify the sources of
|
119
|
+
image's found within the HTML fragments (see below for some detail).
|
120
|
+
|
121
|
+
```ruby
|
122
|
+
# Provide base_uri
|
123
|
+
doc = Html2Odt::Document.new
|
124
|
+
doc.base_uri = "https://www.example.com"
|
125
|
+
```
|
126
|
+
|
127
|
+
You may also pass a `URI` instance directly.
|
128
|
+
|
129
|
+
```ruby
|
130
|
+
# Provide base_uri
|
131
|
+
doc = Html2Odt::Document.new
|
132
|
+
doc.base_uri = URI::parse("https://www.example.com")
|
133
|
+
```
|
134
|
+
|
135
|
+
It is expected, that the URI refers to a `http(s)` location.
|
136
|
+
|
137
|
+
|
80
138
|
### Image handling
|
81
139
|
|
82
140
|
`html2odt` provides basic image inlining, i.e. images referenced in the HTML
|
83
141
|
code will be embeded into the ODT file by default. This is true for images
|
84
142
|
referenced with a full `file://`, `http://`, or `https://` URL. Absolute URLs
|
85
|
-
(i.e. starting `/`) and relative URLs are
|
86
|
-
idea, which server or document they
|
143
|
+
(i.e. starting `/`) and relative URLs are only supported if the `base_uri`
|
144
|
+
option is set. Otherwise `html2odt` has no idea, which server or document they
|
145
|
+
are relating to.
|
87
146
|
|
88
147
|
Images referencing an unsupported resource will be replaced with a link
|
89
148
|
containing the alt text of the image.
|
90
149
|
|
91
150
|
If you are using `html2odt` in a web application context, you will probably want
|
92
151
|
to provide some special handling for resources residing on your own server. This
|
93
|
-
should be done for security reasons
|
152
|
+
should be done for security reasons and to save roundtrips.
|
94
153
|
|
95
154
|
`html2odt` provides the following API to map image `src` attributes to local
|
96
155
|
file locations.
|
data/bin/html2odt.rb
ADDED
@@ -0,0 +1,100 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'optparse'
|
4
|
+
require 'html2odt'
|
5
|
+
|
6
|
+
options = {}
|
7
|
+
parser = OptionParser.new do |opts|
|
8
|
+
opts.banner = "Usage: html2odt.rb [options] -i input.html -o output.odt"
|
9
|
+
|
10
|
+
opts.on("-i", "--input input.html") do |input|
|
11
|
+
options[:input] = input
|
12
|
+
end
|
13
|
+
|
14
|
+
opts.on("-o", "--output output.odt") do |output|
|
15
|
+
options[:output] = output
|
16
|
+
end
|
17
|
+
|
18
|
+
opts.on("-t", "--template <template.odt>",
|
19
|
+
"The file that should be filled with the input's content.", "Defaults to basic template file which is part of this gem.") do |template|
|
20
|
+
options[:template] = template
|
21
|
+
end
|
22
|
+
|
23
|
+
opts.on("-r", "--replace <KEYWORD>",
|
24
|
+
"A keyword in the template document to replace with the converted text.", "Defaults to `{{content}}`.") do |replace|
|
25
|
+
options[:replace] = replace
|
26
|
+
end
|
27
|
+
|
28
|
+
opts.on("-u", "--url <URL>",
|
29
|
+
"The remote URL you downloaded the page from.", "This is required to include remote images and to resolve links properly.") do |url|
|
30
|
+
options[:url] = url
|
31
|
+
end
|
32
|
+
|
33
|
+
opts.on("-h", "--help",
|
34
|
+
"Show this message") do
|
35
|
+
puts opts
|
36
|
+
exit
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
parser.parse!
|
41
|
+
|
42
|
+
if options.empty?
|
43
|
+
puts parser
|
44
|
+
exit
|
45
|
+
end
|
46
|
+
|
47
|
+
if options[:replace]
|
48
|
+
warn "-r option is not yet implemented, please use the default `{{content}}` place holder for now."
|
49
|
+
exit 1
|
50
|
+
end
|
51
|
+
|
52
|
+
|
53
|
+
if options[:input].nil?
|
54
|
+
warn "Missing -i option"
|
55
|
+
puts parser
|
56
|
+
exit 1
|
57
|
+
end
|
58
|
+
|
59
|
+
|
60
|
+
if options[:output].nil?
|
61
|
+
warn "Missing -o option"
|
62
|
+
puts parser
|
63
|
+
exit 1
|
64
|
+
end
|
65
|
+
|
66
|
+
|
67
|
+
|
68
|
+
doc = if options[:template].nil?
|
69
|
+
Html2Odt::Document.new
|
70
|
+
else
|
71
|
+
begin
|
72
|
+
Html2Odt::Document.new(template: options[:template])
|
73
|
+
rescue ArgumentError
|
74
|
+
warn "Template does not match expectations - #{$!.message}"
|
75
|
+
exit 2
|
76
|
+
end
|
77
|
+
end
|
78
|
+
|
79
|
+
|
80
|
+
if File.readable? options[:input]
|
81
|
+
doc.html = File.read(options[:input])
|
82
|
+
else
|
83
|
+
warn "Input does not match expectations - Cannot read input file #{options[:input].inspect}"
|
84
|
+
exit 3
|
85
|
+
end
|
86
|
+
|
87
|
+
|
88
|
+
if options[:url]
|
89
|
+
begin
|
90
|
+
doc.base_uri = options[:url]
|
91
|
+
rescue ArgumentError
|
92
|
+
warn "URL does not match expectations - #{$!.message}"
|
93
|
+
exit 4
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
|
98
|
+
doc.write_to options[:output]
|
99
|
+
|
100
|
+
puts "Wrote document to: #{options[:output]}"
|
data/html2odt.gemspec
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
# coding: utf-8
|
2
|
-
lib = File.expand_path(
|
2
|
+
lib = File.expand_path("../lib", __FILE__)
|
3
3
|
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
-
require
|
4
|
+
require "html2odt/version"
|
5
5
|
|
6
6
|
Gem::Specification.new do |spec|
|
7
7
|
spec.name = "html2odt"
|
@@ -13,10 +13,11 @@ Gem::Specification.new do |spec|
|
|
13
13
|
spec.description = %q{html2odt generates ODT documents based on HTML fragments using xhtml2odt}
|
14
14
|
spec.homepage = "https://github.com/planio-gmbh/html2odt"
|
15
15
|
|
16
|
-
spec.license =
|
16
|
+
spec.license = "MIT"
|
17
17
|
|
18
18
|
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
19
19
|
spec.require_paths = ["lib"]
|
20
|
+
spec.executables << "html2odt.rb"
|
20
21
|
|
21
22
|
spec.add_dependency "dimensions", "~> 1.3.0"
|
22
23
|
spec.add_dependency "nokogiri", "~> 1.6.7.2"
|
data/lib/html2odt/document.rb
CHANGED
@@ -30,6 +30,25 @@ class Html2Odt::Document
|
|
30
30
|
@html
|
31
31
|
end
|
32
32
|
|
33
|
+
def base_uri=(uri)
|
34
|
+
if uri.is_a? URI
|
35
|
+
@base_uri = uri
|
36
|
+
else
|
37
|
+
@base_uri = URI::parse(uri)
|
38
|
+
end
|
39
|
+
|
40
|
+
unless @base_uri.is_a? URI::HTTP
|
41
|
+
raise ArgumentError, "Invalid URI - Expecting http(s) scheme."
|
42
|
+
end
|
43
|
+
|
44
|
+
rescue URI::InvalidURIError
|
45
|
+
raise ArgumentError, "Invalid URI - #{$!.message}"
|
46
|
+
end
|
47
|
+
|
48
|
+
def base_uri
|
49
|
+
@base_uri
|
50
|
+
end
|
51
|
+
|
33
52
|
def content_xml
|
34
53
|
@content_xml ||= begin
|
35
54
|
|
@@ -177,7 +196,7 @@ class Html2Odt::Document
|
|
177
196
|
end
|
178
197
|
|
179
198
|
rescue Zip::Error
|
180
|
-
raise ArgumentError, "Template file does not look like
|
199
|
+
raise ArgumentError, "Template file does not look like an ODT file - #{$!.message}"
|
181
200
|
rescue Errno::ENOENT
|
182
201
|
raise ArgumentError, "Template file does not contain expected file - #{$!.message}"
|
183
202
|
end
|
@@ -187,6 +206,7 @@ class Html2Odt::Document
|
|
187
206
|
html = self.html
|
188
207
|
html = fix_images_in_html(html)
|
189
208
|
html = fix_document_structure(html)
|
209
|
+
html = fix_links(html) if base_uri
|
190
210
|
html = create_document(html)
|
191
211
|
html
|
192
212
|
end
|
@@ -243,6 +263,16 @@ class Html2Odt::Document
|
|
243
263
|
doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
|
244
264
|
end
|
245
265
|
|
266
|
+
def fix_links(html)
|
267
|
+
doc = Nokogiri::HTML::DocumentFragment.parse(html)
|
268
|
+
|
269
|
+
doc.css("a").each do |a|
|
270
|
+
a["href"] = (base_uri + a["href"]).to_s
|
271
|
+
end
|
272
|
+
|
273
|
+
doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
|
274
|
+
end
|
275
|
+
|
246
276
|
def create_document(html)
|
247
277
|
%Q{<html xmlns="http://www.w3.org/1999/xhtml">#{html}</html>}
|
248
278
|
end
|
@@ -252,22 +282,27 @@ class Html2Odt::Document
|
|
252
282
|
return image_location_mapping.call(src)
|
253
283
|
end
|
254
284
|
|
255
|
-
|
256
|
-
when /\Afile:\/\//
|
285
|
+
if src =~ /\Afile:\/\//
|
257
286
|
# local file URL
|
258
287
|
#
|
259
288
|
# TODO: Verify, that this does not pose a security threat, maybe make
|
260
289
|
# this optional. In any case, it's useful for testing.
|
261
290
|
|
262
|
-
src[7..-1]
|
291
|
+
return src[7..-1]
|
292
|
+
end
|
263
293
|
|
264
|
-
|
294
|
+
if src =~ /\Ahttps?:\/\// or !base_uri.nil?
|
265
295
|
# remote image URL
|
266
296
|
#
|
267
297
|
# TODO: Verify, that this does not pose a security threat, maybe make
|
268
298
|
# this optional.
|
269
299
|
|
270
|
-
|
300
|
+
if base_uri
|
301
|
+
uri = base_uri + src
|
302
|
+
else
|
303
|
+
uri = URI.parse(src)
|
304
|
+
end
|
305
|
+
|
271
306
|
file = Tempfile.new("html2odt")
|
272
307
|
file.binmode
|
273
308
|
|
@@ -279,11 +314,11 @@ class Html2Odt::Document
|
|
279
314
|
file
|
280
315
|
end
|
281
316
|
|
282
|
-
file.path
|
283
|
-
else
|
284
|
-
# cannot handle image properly, return nil
|
285
|
-
nil
|
317
|
+
return file.path
|
286
318
|
end
|
319
|
+
|
320
|
+
# cannot handle image properly, return nil
|
321
|
+
nil
|
287
322
|
end
|
288
323
|
|
289
324
|
def update_img_tag(img, image)
|
data/lib/html2odt/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html2odt
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Gregor Schmidt (Planio)
|
@@ -112,7 +112,8 @@ description: html2odt generates ODT documents based on HTML fragments using xhtm
|
|
112
112
|
email:
|
113
113
|
- gregor@plan.io
|
114
114
|
- support@plan.io
|
115
|
-
executables:
|
115
|
+
executables:
|
116
|
+
- html2odt.rb
|
116
117
|
extensions: []
|
117
118
|
extra_rdoc_files: []
|
118
119
|
files:
|
@@ -125,6 +126,7 @@ files:
|
|
125
126
|
- README.md
|
126
127
|
- Rakefile
|
127
128
|
- bin/console
|
129
|
+
- bin/html2odt.rb
|
128
130
|
- html2odt.gemspec
|
129
131
|
- lib/html2odt.rb
|
130
132
|
- lib/html2odt/document.rb
|