html2odt 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f17ae59439de60f02d2bb78494d27c6e4107638b
4
- data.tar.gz: 598800a2f2928bc4bfcfda35383dbb825ffd4f7f
3
+ metadata.gz: d2e147b6eecf0af3652facf38559813053546c6a
4
+ data.tar.gz: 303a83ca4cd9e86ecfe5091d911847358a79f77c
5
5
  SHA512:
6
- metadata.gz: eb6e4267022ff572028c42632c6edd03f3660bd91fa013933880052ecc71de6c4009cf878c440d16bb264243bf145b9478ab73e837be30d46bc3d9c8008d7293
7
- data.tar.gz: dbe23ef34a73cdbfa7661863178fc02161be4fd405ba4817a7ec4cd432f5c4c52e52b8fa41afda2b2a72c6f0fc173885a7da56423654713c81b20fcb9d3024e7
6
+ metadata.gz: 9d3086df185e177ea423c7a886503463918933a032741d25c96a0fe847ac089043e32275d0b533f3295124021c696d9f0af9fd83d0b6ab4f10065e417b9c24b8
7
+ data.tar.gz: 5114655368d6cbcd5df9b0f404f9bdc5b7d8203e3308d33eb9c86f60641cbaca2c153ccc034f890d3394559aa55b698125ce7c42ded667d9e63bc45268a36c6b
data/CHANGELOG.md CHANGED
@@ -1,3 +1,10 @@
1
+ # v0.3.0 - 2016-05-25
2
+
3
+ Adding support for `base_uri` configuration to expand links and download images
4
+ without fully qualified URI.
5
+
6
+ Adding html2odt.rb binary.
7
+
1
8
  # v0.2.1 - 2016-05-25
2
9
 
3
10
  Adding workarounds for HTML structures not supported by xhtml2odt.
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- html2odt (0.2.1)
4
+ html2odt (0.3.0)
5
5
  dimensions (~> 1.3.0)
6
6
  nokogiri (~> 1.6.7.2)
7
7
  rubyzip (~> 1.0)
data/README.md CHANGED
@@ -3,6 +3,20 @@
3
3
  This gem provides a Ruby wrapper around the set of XLST stylesheets published as
4
4
  [xhtml2odt](https://github.com/abompard/xhtml2odt).
5
5
 
6
+
7
+ ## html2odt vs. xhtml2odt
8
+
9
+ So, why is this project called `html2odt` while the original library and command
10
+ line tools by Aurélien Bompard are called **`x`**`html2odt`?
11
+
12
+ This project uses [nokogiri](http://www.nokogiri.org) to parse the HTML and
13
+ apply the XSLT transformations. Nokogiri implements a forgiving HTML parser and
14
+ tries be as forgiving as possible. Furthermore, the basic API expects HTML
15
+ fragments, not full documents. We are not expecting the users of this library to
16
+ pass in a complete, valid XHTML document. A reasonably good piece of HTML should
17
+ be good enough. Therefore we skipped the `X` in the name as well.
18
+
19
+
6
20
  ## Installation
7
21
 
8
22
  Add this line to your application's Gemfile:
@@ -19,9 +33,27 @@ Or install it yourself as:
19
33
 
20
34
  $ gem install html2odt
21
35
 
36
+
22
37
  ## Usage
23
38
 
24
- ### Basic usage
39
+ ### Command line Usage
40
+
41
+
42
+ ```
43
+ Usage: html2odt.rb [options] -i input.html -o output.odt
44
+ -i, --input input.html
45
+ -o, --output output.odt
46
+ -t, --template <template.odt> The file that should be filled with the input's content.
47
+ Defaults to basic template file which is part of this gem.
48
+ -r, --replace <KEYWORD> A keyword in the template document to replace with the converted text.
49
+ Defaults to `{{content}}`.
50
+ -u, --url <URL> The remote URL you downloaded the page from.
51
+ This is required to include remote images and to resolve links properly.
52
+ -h, --help Show this message
53
+ ```
54
+
55
+
56
+ ### Ruby API usage
25
57
 
26
58
  ```ruby
27
59
  # Create an Html2Odt::Document instance
@@ -77,20 +109,47 @@ doc = Html2Odt::Document.new(html: <<HTML)
77
109
  HTML
78
110
  ```
79
111
 
112
+
113
+
114
+
115
+ Furthermore, you may specify a `base_uri`, which will most likely be the place,
116
+ the original HTML fragment belongs to. The `base_uri` will be used to convert
117
+ links to fully qualified URLs, so that they still work when placed in the ODT
118
+ document. Furthermore the setting will be used to identify the sources of
119
+ image's found within the HTML fragments (see below for some detail).
120
+
121
+ ```ruby
122
+ # Provide base_uri
123
+ doc = Html2Odt::Document.new
124
+ doc.base_uri = "https://www.example.com"
125
+ ```
126
+
127
+ You may also pass a `URI` instance directly.
128
+
129
+ ```ruby
130
+ # Provide base_uri
131
+ doc = Html2Odt::Document.new
132
+ doc.base_uri = URI::parse("https://www.example.com")
133
+ ```
134
+
135
+ It is expected, that the URI refers to a `http(s)` location.
136
+
137
+
80
138
  ### Image handling
81
139
 
82
140
  `html2odt` provides basic image inlining, i.e. images referenced in the HTML
83
141
  code will be embeded into the ODT file by default. This is true for images
84
142
  referenced with a full `file://`, `http://`, or `https://` URL. Absolute URLs
85
- (i.e. starting `/`) and relative URLs are not supported, since `html2odt` has no
86
- idea, which server or document they are relating to.
143
+ (i.e. starting `/`) and relative URLs are only supported if the `base_uri`
144
+ option is set. Otherwise `html2odt` has no idea, which server or document they
145
+ are relating to.
87
146
 
88
147
  Images referencing an unsupported resource will be replaced with a link
89
148
  containing the alt text of the image.
90
149
 
91
150
  If you are using `html2odt` in a web application context, you will probably want
92
151
  to provide some special handling for resources residing on your own server. This
93
- should be done for security reasons or to save roundtrips.
152
+ should be done for security reasons and to save roundtrips.
94
153
 
95
154
  `html2odt` provides the following API to map image `src` attributes to local
96
155
  file locations.
data/bin/html2odt.rb ADDED
@@ -0,0 +1,100 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'optparse'
4
+ require 'html2odt'
5
+
6
+ options = {}
7
+ parser = OptionParser.new do |opts|
8
+ opts.banner = "Usage: html2odt.rb [options] -i input.html -o output.odt"
9
+
10
+ opts.on("-i", "--input input.html") do |input|
11
+ options[:input] = input
12
+ end
13
+
14
+ opts.on("-o", "--output output.odt") do |output|
15
+ options[:output] = output
16
+ end
17
+
18
+ opts.on("-t", "--template <template.odt>",
19
+ "The file that should be filled with the input's content.", "Defaults to basic template file which is part of this gem.") do |template|
20
+ options[:template] = template
21
+ end
22
+
23
+ opts.on("-r", "--replace <KEYWORD>",
24
+ "A keyword in the template document to replace with the converted text.", "Defaults to `{{content}}`.") do |replace|
25
+ options[:replace] = replace
26
+ end
27
+
28
+ opts.on("-u", "--url <URL>",
29
+ "The remote URL you downloaded the page from.", "This is required to include remote images and to resolve links properly.") do |url|
30
+ options[:url] = url
31
+ end
32
+
33
+ opts.on("-h", "--help",
34
+ "Show this message") do
35
+ puts opts
36
+ exit
37
+ end
38
+ end
39
+
40
+ parser.parse!
41
+
42
+ if options.empty?
43
+ puts parser
44
+ exit
45
+ end
46
+
47
+ if options[:replace]
48
+ warn "-r option is not yet implemented, please use the default `{{content}}` place holder for now."
49
+ exit 1
50
+ end
51
+
52
+
53
+ if options[:input].nil?
54
+ warn "Missing -i option"
55
+ puts parser
56
+ exit 1
57
+ end
58
+
59
+
60
+ if options[:output].nil?
61
+ warn "Missing -o option"
62
+ puts parser
63
+ exit 1
64
+ end
65
+
66
+
67
+
68
+ doc = if options[:template].nil?
69
+ Html2Odt::Document.new
70
+ else
71
+ begin
72
+ Html2Odt::Document.new(template: options[:template])
73
+ rescue ArgumentError
74
+ warn "Template does not match expectations - #{$!.message}"
75
+ exit 2
76
+ end
77
+ end
78
+
79
+
80
+ if File.readable? options[:input]
81
+ doc.html = File.read(options[:input])
82
+ else
83
+ warn "Input does not match expectations - Cannot read input file #{options[:input].inspect}"
84
+ exit 3
85
+ end
86
+
87
+
88
+ if options[:url]
89
+ begin
90
+ doc.base_uri = options[:url]
91
+ rescue ArgumentError
92
+ warn "URL does not match expectations - #{$!.message}"
93
+ exit 4
94
+ end
95
+ end
96
+
97
+
98
+ doc.write_to options[:output]
99
+
100
+ puts "Wrote document to: #{options[:output]}"
data/html2odt.gemspec CHANGED
@@ -1,7 +1,7 @@
1
1
  # coding: utf-8
2
- lib = File.expand_path('../lib', __FILE__)
2
+ lib = File.expand_path("../lib", __FILE__)
3
3
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
- require 'html2odt/version'
4
+ require "html2odt/version"
5
5
 
6
6
  Gem::Specification.new do |spec|
7
7
  spec.name = "html2odt"
@@ -13,10 +13,11 @@ Gem::Specification.new do |spec|
13
13
  spec.description = %q{html2odt generates ODT documents based on HTML fragments using xhtml2odt}
14
14
  spec.homepage = "https://github.com/planio-gmbh/html2odt"
15
15
 
16
- spec.license = 'MIT'
16
+ spec.license = "MIT"
17
17
 
18
18
  spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
19
19
  spec.require_paths = ["lib"]
20
+ spec.executables << "html2odt.rb"
20
21
 
21
22
  spec.add_dependency "dimensions", "~> 1.3.0"
22
23
  spec.add_dependency "nokogiri", "~> 1.6.7.2"
@@ -30,6 +30,25 @@ class Html2Odt::Document
30
30
  @html
31
31
  end
32
32
 
33
+ def base_uri=(uri)
34
+ if uri.is_a? URI
35
+ @base_uri = uri
36
+ else
37
+ @base_uri = URI::parse(uri)
38
+ end
39
+
40
+ unless @base_uri.is_a? URI::HTTP
41
+ raise ArgumentError, "Invalid URI - Expecting http(s) scheme."
42
+ end
43
+
44
+ rescue URI::InvalidURIError
45
+ raise ArgumentError, "Invalid URI - #{$!.message}"
46
+ end
47
+
48
+ def base_uri
49
+ @base_uri
50
+ end
51
+
33
52
  def content_xml
34
53
  @content_xml ||= begin
35
54
 
@@ -177,7 +196,7 @@ class Html2Odt::Document
177
196
  end
178
197
 
179
198
  rescue Zip::Error
180
- raise ArgumentError, "Template file does not look like a ODT file - #{$!.message}"
199
+ raise ArgumentError, "Template file does not look like an ODT file - #{$!.message}"
181
200
  rescue Errno::ENOENT
182
201
  raise ArgumentError, "Template file does not contain expected file - #{$!.message}"
183
202
  end
@@ -187,6 +206,7 @@ class Html2Odt::Document
187
206
  html = self.html
188
207
  html = fix_images_in_html(html)
189
208
  html = fix_document_structure(html)
209
+ html = fix_links(html) if base_uri
190
210
  html = create_document(html)
191
211
  html
192
212
  end
@@ -243,6 +263,16 @@ class Html2Odt::Document
243
263
  doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
244
264
  end
245
265
 
266
+ def fix_links(html)
267
+ doc = Nokogiri::HTML::DocumentFragment.parse(html)
268
+
269
+ doc.css("a").each do |a|
270
+ a["href"] = (base_uri + a["href"]).to_s
271
+ end
272
+
273
+ doc.to_xml(:save_with => Nokogiri::XML::Node::SaveOptions::AS_XML)
274
+ end
275
+
246
276
  def create_document(html)
247
277
  %Q{<html xmlns="http://www.w3.org/1999/xhtml">#{html}</html>}
248
278
  end
@@ -252,22 +282,27 @@ class Html2Odt::Document
252
282
  return image_location_mapping.call(src)
253
283
  end
254
284
 
255
- case src
256
- when /\Afile:\/\//
285
+ if src =~ /\Afile:\/\//
257
286
  # local file URL
258
287
  #
259
288
  # TODO: Verify, that this does not pose a security threat, maybe make
260
289
  # this optional. In any case, it's useful for testing.
261
290
 
262
- src[7..-1]
291
+ return src[7..-1]
292
+ end
263
293
 
264
- when /\Ahttps?:\/\//
294
+ if src =~ /\Ahttps?:\/\// or !base_uri.nil?
265
295
  # remote image URL
266
296
  #
267
297
  # TODO: Verify, that this does not pose a security threat, maybe make
268
298
  # this optional.
269
299
 
270
- uri = URI.parse(src)
300
+ if base_uri
301
+ uri = base_uri + src
302
+ else
303
+ uri = URI.parse(src)
304
+ end
305
+
271
306
  file = Tempfile.new("html2odt")
272
307
  file.binmode
273
308
 
@@ -279,11 +314,11 @@ class Html2Odt::Document
279
314
  file
280
315
  end
281
316
 
282
- file.path
283
- else
284
- # cannot handle image properly, return nil
285
- nil
317
+ return file.path
286
318
  end
319
+
320
+ # cannot handle image properly, return nil
321
+ nil
287
322
  end
288
323
 
289
324
  def update_img_tag(img, image)
@@ -1,3 +1,3 @@
1
1
  module Html2Odt
2
- VERSION = "0.2.1"
2
+ VERSION = "0.3.0"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html2odt
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Gregor Schmidt (Planio)
@@ -112,7 +112,8 @@ description: html2odt generates ODT documents based on HTML fragments using xhtm
112
112
  email:
113
113
  - gregor@plan.io
114
114
  - support@plan.io
115
- executables: []
115
+ executables:
116
+ - html2odt.rb
116
117
  extensions: []
117
118
  extra_rdoc_files: []
118
119
  files:
@@ -125,6 +126,7 @@ files:
125
126
  - README.md
126
127
  - Rakefile
127
128
  - bin/console
129
+ - bin/html2odt.rb
128
130
  - html2odt.gemspec
129
131
  - lib/html2odt.rb
130
132
  - lib/html2odt/document.rb