html2doc 0.6.2 → 0.6.5
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.adoc +22 -11
- data/lib/html2doc.rb +2 -0
- data/lib/html2doc/base.rb +20 -79
- data/lib/html2doc/lists.rb +39 -0
- data/lib/html2doc/math.rb +45 -0
- data/lib/html2doc/mime.rb +41 -1
- data/lib/html2doc/version.rb +1 -1
- data/spec/19160-8.jpg +0 -0
- data/spec/html2doc_spec.rb +213 -65
- metadata +5 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a81b1c785ccf5bf053bcfc9bc447c56c950aa5d0
|
4
|
+
data.tar.gz: f6eecc7cb0d39c318d75d8b8ce15bd472fcbe730
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7b851e4484c4c76cd2401b1421599f4d9ceb6feece0298e9c2ddf75975fcc373174c4b1073e7efdeb94f60990dc8f73fe1a641c188f98550c0f0c789a0dbb879
|
7
|
+
data.tar.gz: 03bfaccc89670d711ee63dfafb8a7acae8815a18efb827ae512f0028d2ab9d58f878744ff4fe8135c385e3626e373cb0ada8fc34ac5ec3b8c73d26af94cfc19a
|
data/README.adoc
CHANGED
@@ -5,23 +5,24 @@ image:https://img.shields.io/gem/v/html2doc.svg["Gem Version", link="https://rub
|
|
5
5
|
image:https://img.shields.io/travis/riboseinc/html2doc/master.svg["Build Status", link="https://travis-ci.org/riboseinc/html2doc"]
|
6
6
|
image:https://codeclimate.com/github/riboseinc/html2doc/badges/gpa.svg["Code Climate", link="https://codeclimate.com/github/riboseinc/html2doc"]
|
7
7
|
|
8
|
-
Gem to convert an HTML document into a Word document (.doc) format. This is intended for automated generation of Microsoft Word documents, given HTML documents, which are
|
8
|
+
Gem to convert an HTML document into a Word document (.doc) format. This is intended for automated generation of Microsoft Word documents, given HTML documents, which are much more readily crafted.
|
9
9
|
|
10
|
-
This gem originated out of https://github.com/riboseinc/asciidoctor-iso, which creates a Word document from a
|
10
|
+
This gem originated out of https://github.com/riboseinc/asciidoctor-iso, which creates a Word document from a automatically generated HTML document (created in turn by processing Asciidoc).
|
11
11
|
|
12
12
|
This work is driven by the Word document generation procedure documented in http://sebsauvage.net/wiki/doku.php?id=word_document_generation
|
13
13
|
|
14
14
|
The gem currently does the following:
|
15
15
|
|
16
|
-
* Convert any AsciiMath and MathML to Word's native mathematical formatting language.
|
17
|
-
|
16
|
+
* Convert any AsciiMath and MathML to Word's native mathematical formatting language, OOXML. Word supports copy-pasting MathML into Word and converting it into OOXML; however the conversion is not infallible (we have found problems with `\sum`: Word claims parameters were missing, and inserting dotted squares to indicate as much), and you may need to post-edit the OOXML.
|
17
|
+
** The gem does attempt to repair the MathML input, to bring it in line with Word's OOXML's expectations. If you find any issues with AsciiMath or MathML input, please raise an issue.
|
18
|
+
* Identify any footnotes in the document (defined as hyperlinks with attributes `class = "Footnote"` or `epub:type = "footnote"`), and render them as Microsoft Word footnotes.
|
18
19
|
* Resize any images in the HTML file to fit within the maximum page size. (Word will otherwise crash on reading the document.)
|
19
20
|
* Generate a filelist.xml listing of all files to be bundled into the Word document.
|
20
21
|
* Assign the class `MsoNormal` to any paragraphs that do not have a class, so that they can be treated as Normal Style when editing the Word document.
|
21
|
-
* Inject Microsoft Word-specific CSS into the HTML document.
|
22
|
+
* Inject Microsoft Word-specific CSS into the HTML document. If a CSS file is not supplied, the CSS file used is at `lib/html2doc/wordstyle.css` is used by default. Microsoft Word HTML has particular requirements from its CSS, and you should review the sample CSS before replacing it with your own. (This generic CSS can be overridden by CSS already in the HTML document, since the generic CSS is injected at the top of the document.)
|
22
23
|
* Bundle up the images, the HTML file of the document proper, and the `header.html` file representing header/footer information, into a MIME file, and save that file to disk (so that Microsoft Word can deal with it as a Word file.)
|
23
24
|
|
24
|
-
|
25
|
+
For a representative generator of HTML that uses this gem in postprocessing, see https://github.com/riboseinc/asciidoctor-iso
|
25
26
|
|
26
27
|
Work to be done:
|
27
28
|
|
@@ -31,7 +32,10 @@ Work to be done:
|
|
31
32
|
|
32
33
|
This generates .doc documents. Future versions will upgrade the output to docx.
|
33
34
|
|
34
|
-
There there are two other Microsoft Word vendors in the Ruby ecosystem.
|
35
|
+
There there are two other Microsoft Word vendors in the Ruby ecosystem.
|
36
|
+
|
37
|
+
* https://github.com/jetruby/puredocx generate Word documents from a ruby struct as a DSL, rather than converting a preexisting html document. That constrains it's coverage to what is explicitly catered for in the DSL.
|
38
|
+
* https://github.com/MuhammetDilmac/Html2Docx is a much simpler wrapper around html: it does not do any of the added functionality described above (image resizing, converting footnotes, AsciiMath and MathML), though it does already generate docx.
|
35
39
|
|
36
40
|
== Usage
|
37
41
|
|
@@ -39,18 +43,25 @@ There there are two other Microsoft Word vendors in the Ruby ecosystem. https://
|
|
39
43
|
--
|
40
44
|
require "html2doc"
|
41
45
|
|
42
|
-
Html2Doc.process(result, filename, stylesheet, header_filename, dir, asciimathdelims
|
46
|
+
Html2Doc.process(result, filename: filename, stylesheet: stylesheet, header_filename: header_filename, dir: dir, asciimathdelims: asciimathdelims, liststyles: liststyles)
|
43
47
|
--
|
44
48
|
|
45
49
|
result:: is the Html document to be converted into Word, as a string.
|
46
50
|
filename:: is the name the document is to be saved as, without a file suffix
|
47
|
-
stylesheet:: is the full path filename of the CSS stylesheet for Microsoft Word-specific styles. If this is not provided
|
51
|
+
stylesheet:: is the full path filename of the CSS stylesheet for Microsoft Word-specific styles. If this is not provided, the program will used the default stylesheet included in the gem, `lib/html2doc/wordstyle.css`. The stylsheet provided must match this stylesheet; you can obtain one by saving a Word document with your desired styles to HTML, and extracting the style definitions from the HTML document header.
|
48
52
|
header_filename:: is the filename of the HTML document containing header and footer for the document, as well as footnote/endnote separators; if there is none, use nil. To generate your own such document, save a Word document with headers/footers and/or footnote/endnote separators as an HTML document; the `header.html` will be in the `{filename}.fld` folder generated along with the HTML. A sample file is available at https://github.com/riboseinc/asciidoctor-iso/blob/master/lib/asciidoctor/iso/word/header.html
|
49
|
-
dir:: is the folder that any ancillary files (images, headers, filelist) are to be saved to. If not provided
|
50
|
-
asciimathdelims:: are the AsciiMath delimiters used in the text. If none are provided, no AsciiMath conversion is attempted.
|
53
|
+
dir:: is the folder that any ancillary files (images, headers, filelist) are to be saved to. If not provided, it will be created as `{filename}_files`. Anything in the directory will be attached to the Word document; so this folder should only contain the images that accompany the document. (If the images are elsewhere on the local drive, the gem will move them into the folder.)
|
54
|
+
asciimathdelims:: are the AsciiMath delimiters used in the text (an array of an opening and a closing delimiter). If none are provided, no AsciiMath conversion is attempted.
|
55
|
+
liststyles:: a hash of list style labels in Word CSS, which are used to define the behaviour of list item labels (e.g. _i)_ vs _i._). The gem recognises the hash keys `ul`, `ol`. So if the appearance of an ordered list's item labels in the supplied stylesheet is governed by style `@list l1` (e.g. `@list l1:level1 {mso-level-text:"%1\)";}` appears in the stylesheet), call the method with `liststyles:{ol: "l1"}`.
|
51
56
|
|
52
57
|
Note that the local CSS stylesheet file contains a variable `FILENAME` for the location of footnote/endnote separators and headers/footers, which are provided in the header HTML file. The gem replaces `FILENAME` with the file nane that the document will be saved as. If you supply your own stylesheet and also wish to use separators or headers/footers, you will likewise need to replace the document name mentioned in your stylesheet with a `FILENAME` string.
|
53
58
|
|
59
|
+
== Caveat
|
60
|
+
|
61
|
+
The good news with generating a Word document via HTML is that Word understands CSS, and you can determine much of what the Word document looks like by manipulating that CSS. That extends to features that are not part of HTML CSS: if you want to work out how to get Word to do something in CSS, save a Word document that already does what you want as HTML, and inspect the HTML and CSS you get.
|
62
|
+
|
63
|
+
The bad news is that Word's implementation of CSS is poorly documented (even if Office HTML is documented in a 1300 page document (online at https://stigmortenmyre.no/mso/, https://www.rodriguezcommaj.com/assets/resources/microsoft-office-html-and-xml-reference.pdf), and the CSS selectors are only partially and selectively implemented. For list styles, for example, `mso-level-text` governs how the list label is displayed; but it is only recognised in a `@list` style: it is ignored in a CSS rule like `ol li`, or in a `style` attribute on a node. Working out the right CSS for what you want will take some trial and error, and you are better placed to try to do things Word's way than the right way.
|
64
|
+
|
54
65
|
== Example
|
55
66
|
|
56
67
|
The `spec/examples` directory includes `rice.doc` and its source files: this Word document has been generated from `rice.html` through a call to html2doc from https://github.com/riboseinc/asciidoctor-iso. (The source document `rice.html` was itself generated from Asciidoc, rather than being hand-crafted.)
|
data/lib/html2doc.rb
CHANGED
data/lib/html2doc/base.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
require "uuidtools"
|
2
2
|
require "asciimath"
|
3
|
-
require "
|
3
|
+
require "htmlentities"
|
4
4
|
require "nokogiri"
|
5
5
|
require "xml/xslt"
|
6
6
|
require "pp"
|
@@ -9,16 +9,15 @@ module Html2Doc
|
|
9
9
|
@xslt = XML::XSLT.new
|
10
10
|
@xslt.xsl = File.read(File.join(File.dirname(__FILE__), "mathml2omml.xsl"))
|
11
11
|
|
12
|
-
def self.process(result,
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
rm_temp_files(filename, dir, dir1)
|
12
|
+
def self.process(result, hash)
|
13
|
+
hash[:dir1] = create_dir(hash[:filename], hash[:dir])
|
14
|
+
result = process_html(result, hash)
|
15
|
+
hash[:header_file].nil? ||
|
16
|
+
system("cp #{hash[:header_file]} #{hash[:dir1]}/header.html")
|
17
|
+
generate_filelist(hash[:filename], hash[:dir1])
|
18
|
+
File.open("#{hash[:filename]}.htm", "w") { |f| f.write(result) }
|
19
|
+
mime_package result, hash[:filename], hash[:dir1]
|
20
|
+
rm_temp_files(hash[:filename], hash[:dir], hash[:dir1])
|
22
21
|
end
|
23
22
|
|
24
23
|
def self.create_dir(filename, dir)
|
@@ -28,11 +27,9 @@ module Html2Doc
|
|
28
27
|
dir
|
29
28
|
end
|
30
29
|
|
31
|
-
def self.process_html(result,
|
32
|
-
|
33
|
-
|
34
|
-
docxml = to_xhtml(asciimath_to_mathml(result, asciimathdelims))
|
35
|
-
define_head(cleanup(docxml, dir), dir, filename, stylesheet, header_file)
|
30
|
+
def self.process_html(result, hash)
|
31
|
+
docxml = to_xhtml(asciimath_to_mathml(result, hash[:asciimathdelims]))
|
32
|
+
define_head(cleanup(docxml, hash), hash)
|
36
33
|
msword_fix(from_xhtml(docxml))
|
37
34
|
end
|
38
35
|
|
@@ -41,33 +38,15 @@ module Html2Doc
|
|
41
38
|
system "rm -r #{dir1}" unless dir
|
42
39
|
end
|
43
40
|
|
44
|
-
def self.cleanup(docxml,
|
45
|
-
image_cleanup(docxml,
|
41
|
+
def self.cleanup(docxml, hash)
|
42
|
+
image_cleanup(docxml, hash[:dir1])
|
46
43
|
mathml_to_ooml(docxml)
|
44
|
+
lists(docxml, hash[:liststyles])
|
47
45
|
footnotes(docxml)
|
48
46
|
msonormal(docxml)
|
49
47
|
docxml
|
50
48
|
end
|
51
49
|
|
52
|
-
def self.asciimath_to_mathml(doc, delims)
|
53
|
-
return doc if delims.nil? || delims.size < 2
|
54
|
-
doc.split(/(#{delims[0]}|#{delims[1]})/).each_slice(4).map do |a|
|
55
|
-
a[2].nil? || a[2] = AsciiMath.parse(a[2]).to_mathml.
|
56
|
-
gsub(/<math>/, "<math xmlns='http://www.w3.org/1998/Math/MathML'>")
|
57
|
-
a.size > 1 ? a[0] + a[2] : a[0]
|
58
|
-
end.join
|
59
|
-
end
|
60
|
-
|
61
|
-
def self.mathml_to_ooml(docxml)
|
62
|
-
docxml.xpath("//*[local-name() = 'math']").each do |m|
|
63
|
-
@xslt.xml = m.to_s.
|
64
|
-
gsub(/<math>/, "<math xmlns='http://www.w3.org/1998/Math/MathML'>")
|
65
|
-
ooml = @xslt.serve.gsub(/<\?[^>]+>\s*/, "").
|
66
|
-
gsub(/ xmlns:[^=]+="[^"]+"/, "")
|
67
|
-
m.swap(ooml)
|
68
|
-
end
|
69
|
-
end
|
70
|
-
|
71
50
|
NOKOHEAD = <<~HERE.freeze
|
72
51
|
<!DOCTYPE html SYSTEM
|
73
52
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
@@ -109,32 +88,6 @@ module Html2Doc
|
|
109
88
|
r
|
110
89
|
end
|
111
90
|
|
112
|
-
def self.image_resize(i, maxheight, maxwidth)
|
113
|
-
size = [i["width"].to_i, i["height"].to_i]
|
114
|
-
size = ImageSize.path(i["src"]).size if size[0].zero? && size[1].zero?
|
115
|
-
# max height for Word document is 400, max width is 680
|
116
|
-
if size[0] > maxheight
|
117
|
-
size = [maxheight, (size[1] * maxheight / size[0]).ceil]
|
118
|
-
end
|
119
|
-
if size[1] > maxwidth
|
120
|
-
size = [(size[0] * maxwidth / size[1]).ceil, maxwidth]
|
121
|
-
end
|
122
|
-
size
|
123
|
-
end
|
124
|
-
|
125
|
-
def self.image_cleanup(docxml, dir)
|
126
|
-
docxml.xpath("//*[local-name() = 'img']").each do |i|
|
127
|
-
matched = /\.(?<suffix>\S+)$/.match i["src"]
|
128
|
-
uuid = UUIDTools::UUID.random_create.to_s
|
129
|
-
new_full_filename = File.join(dir, "#{uuid}.#{matched[:suffix]}")
|
130
|
-
# presupposes that the image source is local
|
131
|
-
system "cp #{i['src']} #{new_full_filename}"
|
132
|
-
i["width"], i["height"] = image_resize(i, 400, 680)
|
133
|
-
i["src"] = new_full_filename
|
134
|
-
end
|
135
|
-
docxml
|
136
|
-
end
|
137
|
-
|
138
91
|
PRINT_VIEW = <<~XML.freeze
|
139
92
|
<!--[if gte mso 9]>
|
140
93
|
<xml>
|
@@ -151,7 +104,7 @@ module Html2Doc
|
|
151
104
|
def self.define_head1(docxml, dir)
|
152
105
|
docxml.xpath("//*[local-name() = 'head']").each do |h|
|
153
106
|
h.children.first.add_previous_sibling <<~XML
|
154
|
-
|
107
|
+
#{PRINT_VIEW}
|
155
108
|
<link rel="File-List" href="#{dir}/filelist.xml"/>
|
156
109
|
XML
|
157
110
|
end
|
@@ -176,12 +129,12 @@ module Html2Doc
|
|
176
129
|
xml.root.to_s
|
177
130
|
end
|
178
131
|
|
179
|
-
def self.define_head(docxml,
|
132
|
+
def self.define_head(docxml, hash)
|
180
133
|
title = docxml.at("//*[local-name() = 'head']/*[local-name() = 'title']")
|
181
134
|
head = docxml.at("//*[local-name() = 'head']")
|
182
|
-
css = stylesheet(filename, header_file,
|
135
|
+
css = stylesheet(hash[:filename], hash[:header_file], hash[:stylesheet])
|
183
136
|
add_stylesheet(head, title, css)
|
184
|
-
define_head1(docxml,
|
137
|
+
define_head1(docxml, hash[:dir1])
|
185
138
|
namespace(docxml.root)
|
186
139
|
end
|
187
140
|
|
@@ -204,18 +157,6 @@ module Html2Doc
|
|
204
157
|
root.add_namespace(nil, "http://www.w3.org/TR/REC-html40")
|
205
158
|
end
|
206
159
|
|
207
|
-
def self.generate_filelist(filename, dir)
|
208
|
-
File.open(File.join(dir, "filelist.xml"), "w") do |f|
|
209
|
-
f.write %{<xml xmlns:o="urn:schemas-microsoft-com:office:office">
|
210
|
-
<o:MainFile HRef="../#{filename}.htm"/>}
|
211
|
-
Dir.foreach(dir) do |item|
|
212
|
-
next if item == "." || item == ".." || /^\./.match(item)
|
213
|
-
f.write %{ <o:File HRef="#{item}"/>\n}
|
214
|
-
end
|
215
|
-
f.write("</xml>\n")
|
216
|
-
end
|
217
|
-
end
|
218
|
-
|
219
160
|
def self.msonormal(docxml)
|
220
161
|
docxml.xpath("//*[local-name() = 'p'][not(self::*[@class])]").each do |p|
|
221
162
|
p["class"] = "MsoNormal"
|
@@ -0,0 +1,39 @@
|
|
1
|
+
require "uuidtools"
|
2
|
+
require "asciimath"
|
3
|
+
require "htmlentities"
|
4
|
+
require "nokogiri"
|
5
|
+
require "xml/xslt"
|
6
|
+
require "pp"
|
7
|
+
|
8
|
+
module Html2Doc
|
9
|
+
def self.style_list(li, level, listno)
|
10
|
+
return unless listno
|
11
|
+
if li["style"]
|
12
|
+
li["style"] += ";"
|
13
|
+
else
|
14
|
+
li["style"] = ""
|
15
|
+
end
|
16
|
+
# I don't know what the lfo-n attribute is. I doubt Micro$oft now does either.
|
17
|
+
li["style"] += "mso-list:#{listno} level#{level} lfo1;"
|
18
|
+
end
|
19
|
+
|
20
|
+
def self.list_add(xpath, liststyles, listtype, level)
|
21
|
+
xpath.each do |list|
|
22
|
+
(list.xpath(".//li") - list.xpath(".//ol//li | .//ul//li")).each do |li|
|
23
|
+
style_list(li, level, liststyles[listtype])
|
24
|
+
list_add(li.xpath(".//ul") - li.xpath(".//ul//ul | .//ol//ul"), liststyles, :ul, level + 1)
|
25
|
+
list_add(li.xpath(".//ol") - li.xpath(".//ul//ol | .//ol//ol"), liststyles, :ol, level + 1)
|
26
|
+
end
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
30
|
+
def self.lists(docxml, liststyles)
|
31
|
+
return if liststyles.nil?
|
32
|
+
if liststyles.has_key?(:ul)
|
33
|
+
list_add(docxml.xpath("//ul[not(ancestor::ul) and not(ancestor::ol)]"), liststyles, :ul, 1)
|
34
|
+
end
|
35
|
+
if liststyles.has_key?(:ol)
|
36
|
+
list_add(docxml.xpath("//ol[not(ancestor::ul) and not(ancestor::ol)]"), liststyles, :ol, 1)
|
37
|
+
end
|
38
|
+
end
|
39
|
+
end
|
@@ -0,0 +1,45 @@
|
|
1
|
+
require "uuidtools"
|
2
|
+
require "asciimath"
|
3
|
+
require "htmlentities"
|
4
|
+
require "nokogiri"
|
5
|
+
require "xml/xslt"
|
6
|
+
require "pp"
|
7
|
+
|
8
|
+
module Html2Doc
|
9
|
+
@xslt = XML::XSLT.new
|
10
|
+
@xslt.xsl = File.read(File.join(File.dirname(__FILE__), "mathml2omml.xsl"))
|
11
|
+
|
12
|
+
def self.asciimath_to_mathml1(x)
|
13
|
+
AsciiMath.parse(HTMLEntities.new.decode(x)).to_mathml.
|
14
|
+
gsub(/<math>/, "<math xmlns='http://www.w3.org/1998/Math/MathML'>")
|
15
|
+
end
|
16
|
+
|
17
|
+
def self.asciimath_to_mathml(doc, delims)
|
18
|
+
return doc if delims.nil? || delims.size < 2
|
19
|
+
doc.split(/(#{Regexp.escape(delims[0])}|#{Regexp.escape(delims[1])})/).
|
20
|
+
each_slice(4).map do |a|
|
21
|
+
a[2].nil? || a[2] = asciimath_to_mathml1(a[2])
|
22
|
+
a.size > 1 ? a[0] + a[2] : a[0]
|
23
|
+
end.join
|
24
|
+
end
|
25
|
+
|
26
|
+
# random fixes that OOXML needs to render properly
|
27
|
+
def self.ooxml_cleanup(m)
|
28
|
+
m.xpath(".//xmlns:msup[name(preceding-sibling::*[1])='munderover']",
|
29
|
+
m.document.collect_namespaces).each do |x|
|
30
|
+
x1 = x.replace("<mrow></mrow>").first
|
31
|
+
x1.children = x
|
32
|
+
end
|
33
|
+
m.add_namespace(nil, "http://www.w3.org/1998/Math/MathML")
|
34
|
+
m.to_s
|
35
|
+
end
|
36
|
+
|
37
|
+
def self.mathml_to_ooml(docxml)
|
38
|
+
docxml.xpath("//*[local-name() = 'math']").each do |m|
|
39
|
+
@xslt.xml = ooxml_cleanup(m)
|
40
|
+
ooxml = @xslt.serve.gsub(/<\?[^>]+>\s*/, "").
|
41
|
+
gsub(/ xmlns:[^=]+="[^"]+"/, "")
|
42
|
+
m.swap(ooxml)
|
43
|
+
end
|
44
|
+
end
|
45
|
+
end
|
data/lib/html2doc/mime.rb
CHANGED
@@ -1,6 +1,7 @@
|
|
1
1
|
require "uuidtools"
|
2
2
|
require "base64"
|
3
3
|
require "mime/types"
|
4
|
+
require "image_size"
|
4
5
|
|
5
6
|
module Html2Doc
|
6
7
|
def self.mime_preamble(boundary, filename, result)
|
@@ -49,10 +50,49 @@ module Html2Doc
|
|
49
50
|
mhtml = mime_preamble(boundary, filename, result)
|
50
51
|
mhtml += mime_attachment(boundary, filename, "filelist.xml", dir)
|
51
52
|
Dir.foreach(dir) do |item|
|
52
|
-
next if item == "." || item == ".." || /^\./.match(item) ||
|
53
|
+
next if item == "." || item == ".." || /^\./.match(item) ||
|
54
|
+
item == "filelist.xml"
|
53
55
|
mhtml += mime_attachment(boundary, filename, item, dir)
|
54
56
|
end
|
55
57
|
mhtml += "--#{boundary}--"
|
56
58
|
File.open("#{filename}.doc", "w") { |f| f.write mhtml }
|
57
59
|
end
|
60
|
+
|
61
|
+
def self.image_resize(i, maxheight, maxwidth)
|
62
|
+
size = [i["width"].to_i, i["height"].to_i]
|
63
|
+
size = ImageSize.path(i["src"]).size if size[0].zero? && size[1].zero?
|
64
|
+
# max height for Word document is 400, max width is 680
|
65
|
+
if size[0] > maxheight
|
66
|
+
size = [maxheight, (size[1] * maxheight / size[0]).ceil]
|
67
|
+
end
|
68
|
+
if size[1] > maxwidth
|
69
|
+
size = [(size[0] * maxwidth / size[1]).ceil, maxwidth]
|
70
|
+
end
|
71
|
+
size
|
72
|
+
end
|
73
|
+
|
74
|
+
def self.image_cleanup(docxml, dir)
|
75
|
+
docxml.xpath("//*[local-name() = 'img']").each do |i|
|
76
|
+
matched = /\.(?<suffix>\S+)$/.match i["src"]
|
77
|
+
uuid = UUIDTools::UUID.random_create.to_s
|
78
|
+
new_full_filename = File.join(dir, "#{uuid}.#{matched[:suffix]}")
|
79
|
+
# presupposes that the image source is local
|
80
|
+
system "cp #{i['src']} #{new_full_filename}"
|
81
|
+
i["width"], i["height"] = image_resize(i, 400, 680)
|
82
|
+
i["src"] = new_full_filename
|
83
|
+
end
|
84
|
+
docxml
|
85
|
+
end
|
86
|
+
|
87
|
+
def self.generate_filelist(filename, dir)
|
88
|
+
File.open(File.join(dir, "filelist.xml"), "w") do |f|
|
89
|
+
f.write %{<xml xmlns:o="urn:schemas-microsoft-com:office:office">
|
90
|
+
<o:MainFile HRef="../#{filename}.htm"/>}
|
91
|
+
Dir.entries(dir).sort.each do |item|
|
92
|
+
next if item == "." || item == ".." || /^\./.match(item)
|
93
|
+
f.write %{ <o:File HRef="#{item}"/>\n}
|
94
|
+
end
|
95
|
+
f.write("</xml>\n")
|
96
|
+
end
|
97
|
+
end
|
58
98
|
end
|
data/lib/html2doc/version.rb
CHANGED
data/spec/19160-8.jpg
ADDED
Binary file
|
data/spec/html2doc_spec.rb
CHANGED
@@ -97,59 +97,178 @@ Content-Type: text/html charset="utf-8"
|
|
97
97
|
PGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwiDQp4bWxuczpvPSJ1
|
98
98
|
cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTpvZmZpY2UiDQp4bWxuczp3PSJ1cm46c2No
|
99
99
|
ZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTp3b3JkIg0KeG1sbnM6bT0iaHR0cDovL3NjaGVtYXMu
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
|
108
|
-
|
109
|
-
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
115
|
-
|
116
|
-
|
117
|
-
|
118
|
-
|
119
|
-
|
120
|
-
bC1jaGFyYWN0ZXI6Zm9vdG5vdGUtc2VwYXJhdG9yJz48IVtpZiAhc3VwcG9ydEZvb3Rub3Rlc10+
|
121
|
-
DQoNCjxociBhbGlnbj1sZWZ0IHNpemU9MSB3aWR0aD0iMzMlIj4NCg0KPCFbZW5kaWZdPjwvc3Bh
|
122
|
-
bj48L3NwYW4+PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6ZW5kbm90
|
123
|
-
ZS1jb250aW51YXRpb24tc2VwYXJhdG9yJyBpZD1lY3M+DQoNCjxwIGNsYXNzPU1zb05vcm1hbD48
|
124
|
-
c3BhbiBsYW5nPUVOLVVTPjxzcGFuIHN0eWxlPSdtc28tc3BlY2lhbC1jaGFyYWN0ZXI6Zm9vdG5v
|
100
|
+
bWljcm9zb2Z0LmNvbS9vZmZpY2UvMjAwNC8xMi9vbW1sIg0KeG1sbnM6bXY9Imh0dHA6Ly9tYWNW
|
101
|
+
bWxTY2hlbWFVcmkiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy9UUi9SRUMtaHRtbDQwIj4NCg0K
|
102
|
+
PGhlYWQ+DQo8bWV0YSBuYW1lPVRpdGxlIGNvbnRlbnQ9IiI+DQo8bWV0YSBuYW1lPUtleXdvcmRz
|
103
|
+
IGNvbnRlbnQ9IiI+DQo8bWV0YSBodHRwLWVxdWl2PUNvbnRlbnQtVHlwZSBjb250ZW50PSJ0ZXh0
|
104
|
+
L2h0bWw7IGNoYXJzZXQ9dXRmLTgiPg0KPG1ldGEgbmFtZT1Qcm9nSWQgY29udGVudD1Xb3JkLkRv
|
105
|
+
Y3VtZW50Pg0KPG1ldGEgbmFtZT1HZW5lcmF0b3IgY29udGVudD0iTWljcm9zb2Z0IFdvcmQgMTUi
|
106
|
+
Pg0KPG1ldGEgbmFtZT1PcmlnaW5hdG9yIGNvbnRlbnQ9Ik1pY3Jvc29mdCBXb3JkIDE1Ij4NCjxs
|
107
|
+
aW5rIGlkPU1haW4tRmlsZSByZWw9TWFpbi1GaWxlIGhyZWY9IkZJTEVOQU1FLmh0bWwiPg0KPCEt
|
108
|
+
LVtpZiBndGUgbXNvIDldPjx4bWw+DQogPG86c2hhcGVkZWZhdWx0cyB2OmV4dD0iZWRpdCIgc3Bp
|
109
|
+
ZG1heD0iMjA0OSIvPg0KPC94bWw+PCFbZW5kaWZdLS0+DQo8L2hlYWQ+DQoNCjxib2R5IGxhbmc9
|
110
|
+
RU4gbGluaz1ibHVlIHZsaW5rPSIjOTU0RjcyIj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6
|
111
|
+
Zm9vdG5vdGUtc2VwYXJhdG9yJyBpZD1mcz4NCg0KPHAgY2xhc3M9TXNvTm9ybWFsIHN0eWxlPSdt
|
112
|
+
YXJnaW4tYm90dG9tOjBjbTttYXJnaW4tYm90dG9tOi4wMDAxcHQ7bGluZS1oZWlnaHQ6DQpub3Jt
|
113
|
+
YWwnPjxzcGFuIGxhbmc9RU4tR0I+PHNwYW4gc3R5bGU9J21zby1zcGVjaWFsLWNoYXJhY3Rlcjpm
|
114
|
+
b290bm90ZS1zZXBhcmF0b3InPjwhW2lmICFzdXBwb3J0Rm9vdG5vdGVzXT4NCg0KPGhyIGFsaWdu
|
115
|
+
PWxlZnQgc2l6ZT0xIHdpZHRoPSIzMyUiPg0KDQo8IVtlbmRpZl0+PC9zcGFuPjwvc3Bhbj48L3A+
|
116
|
+
DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxlPSdtc28tZWxlbWVudDpmb290bm90ZS1jb250aW51YXRp
|
117
|
+
b24tc2VwYXJhdG9yJyBpZD1mY3M+DQoNCjxwIGNsYXNzPU1zb05vcm1hbCBzdHlsZT0nbWFyZ2lu
|
118
|
+
LWJvdHRvbTowY207bWFyZ2luLWJvdHRvbTouMDAwMXB0O2xpbmUtaGVpZ2h0Og0Kbm9ybWFsJz48
|
119
|
+
c3BhbiBsYW5nPUVOLUdCPjxzcGFuIHN0eWxlPSdtc28tc3BlY2lhbC1jaGFyYWN0ZXI6Zm9vdG5v
|
125
120
|
dGUtY29udGludWF0aW9uLXNlcGFyYXRvcic+PCFbaWYgIXN1cHBvcnRGb290bm90ZXNdPg0KDQo8
|
126
121
|
aHIgYWxpZ249bGVmdCBzaXplPTE+DQoNCjwhW2VuZGlmXT48L3NwYW4+PC9zcGFuPjwvcD4NCg0K
|
127
|
-
PC9kaXY+
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
140
|
-
|
141
|
-
|
122
|
+
PC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50OmVuZG5vdGUtc2VwYXJhdG9yJyBpZD1l
|
123
|
+
cz4NCg0KPHAgY2xhc3M9TXNvTm9ybWFsIHN0eWxlPSdtYXJnaW4tYm90dG9tOjBjbTttYXJnaW4t
|
124
|
+
Ym90dG9tOi4wMDAxcHQ7bGluZS1oZWlnaHQ6DQpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0I+PHNw
|
125
|
+
YW4gc3R5bGU9J21zby1zcGVjaWFsLWNoYXJhY3Rlcjpmb290bm90ZS1zZXBhcmF0b3InPjwhW2lm
|
126
|
+
ICFzdXBwb3J0Rm9vdG5vdGVzXT4NCg0KPGhyIGFsaWduPWxlZnQgc2l6ZT0xIHdpZHRoPSIzMyUi
|
127
|
+
Pg0KDQo8IVtlbmRpZl0+PC9zcGFuPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxl
|
128
|
+
PSdtc28tZWxlbWVudDplbmRub3RlLWNvbnRpbnVhdGlvbi1zZXBhcmF0b3InIGlkPWVjcz4NCg0K
|
129
|
+
PHAgY2xhc3M9TXNvTm9ybWFsIHN0eWxlPSdtYXJnaW4tYm90dG9tOjBjbTttYXJnaW4tYm90dG9t
|
130
|
+
Oi4wMDAxcHQ7bGluZS1oZWlnaHQ6DQpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0I+PHNwYW4gc3R5
|
131
|
+
bGU9J21zby1zcGVjaWFsLWNoYXJhY3Rlcjpmb290bm90ZS1jb250aW51YXRpb24tc2VwYXJhdG9y
|
132
|
+
Jz48IVtpZiAhc3VwcG9ydEZvb3Rub3Rlc10+DQoNCjxociBhbGlnbj1sZWZ0IHNpemU9MT4NCg0K
|
133
|
+
PCFbZW5kaWZdPjwvc3Bhbj48L3NwYW4+PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNv
|
134
|
+
LWVsZW1lbnQ6aGVhZGVyJyBpZD1laDE+DQoNCjxwIGNsYXNzPU1zb0hlYWRlciBhbGlnbj1sZWZ0
|
135
|
+
IHN0eWxlPSd0ZXh0LWFsaWduOmxlZnQ7bGluZS1oZWlnaHQ6MTIuMHB0Ow0KbXNvLWxpbmUtaGVp
|
136
|
+
Z2h0LXJ1bGU6ZXhhY3RseSc+PHNwYW4gbGFuZz1FTi1HQj5JU08vSUVDJm5ic3A7Q0QgMTczMDEt
|
137
|
+
MToyMDE2KEUpPC9zcGFuPjwvcD4NCg0KPC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50
|
138
|
+
OmhlYWRlcicgaWQ9aDE+DQoNCjxwIGNsYXNzPU1zb0hlYWRlciBzdHlsZT0nbWFyZ2luLWJvdHRv
|
139
|
+
bToxOC4wcHQnPjxzcGFuIGxhbmc9RU4tR0INCnN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1i
|
140
|
+
aWRpLWZvbnQtc2l6ZToxMS4wcHQ7Zm9udC13ZWlnaHQ6bm9ybWFsJz7CqQ0KSVNPL0lFQyZuYnNw
|
141
|
+
OzIwMTYmbmJzcDvigJMgQWxsIHJpZ2h0cyByZXNlcnZlZDwvc3Bhbj48c3BhbiBsYW5nPUVOLUdC
|
142
|
+
DQpzdHlsZT0nZm9udC13ZWlnaHQ6bm9ybWFsJz48bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwv
|
143
|
+
ZGl2Pg0KDQo8ZGl2IHN0eWxlPSdtc28tZWxlbWVudDpmb290ZXInIGlkPWVmMT4NCg0KPHAgY2xh
|
144
|
+
c3M9TXNvRm9vdGVyIHN0eWxlPSdtYXJnaW4tdG9wOjEyLjBwdDtsaW5lLWhlaWdodDoxMi4wcHQ7
|
145
|
+
bXNvLWxpbmUtaGVpZ2h0LXJ1bGU6DQpleGFjdGx5Jz48IS0tW2lmIHN1cHBvcnRGaWVsZHNdPjxi
|
146
|
+
IHN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuDQpsYW5nPUVOLUdCIHN0
|
147
|
+
eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuDQpz
|
148
|
+
dHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtYmVnaW4nPjwvc3Bhbj48c3Bhbg0Kc3R5bGU9J21zby1z
|
149
|
+
cGFjZXJ1bjp5ZXMnPsKgPC9zcGFuPlBBR0U8c3BhbiBzdHlsZT0nbXNvLXNwYWNlcnVuOnllcyc+
|
150
|
+
wqDCoA0KPC9zcGFuPlwqIE1FUkdFRk9STUFUIDxzcGFuIHN0eWxlPSdtc28tZWxlbWVudDpmaWVs
|
151
|
+
ZC1zZXBhcmF0b3InPjwvc3Bhbj48L3NwYW4+PC9iPjwhW2VuZGlmXS0tPjxiDQpzdHlsZT0nbXNv
|
152
|
+
LWJpZGktZm9udC13ZWlnaHQ6bm9ybWFsJz48c3BhbiBsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNp
|
153
|
+
emU6MTAuMHB0Ow0KbXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4gc3R5bGU9J21zby1u
|
154
|
+
by1wcm9vZjp5ZXMnPjI8L3NwYW4+PC9zcGFuPjwvYj48IS0tW2lmIHN1cHBvcnRGaWVsZHNdPjxi
|
155
|
+
DQpzdHlsZT0nbXNvLWJpZGktZm9udC13ZWlnaHQ6bm9ybWFsJz48c3BhbiBsYW5nPUVOLUdCIHN0
|
156
|
+
eWxlPSdmb250LXNpemU6MTAuMHB0Ow0KbXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4g
|
157
|
+
c3R5bGU9J21zby1lbGVtZW50OmZpZWxkLWVuZCc+PC9zcGFuPjwvc3Bhbj48L2I+PCFbZW5kaWZd
|
158
|
+
LS0+PHNwYW4NCmxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9u
|
159
|
+
dC1zaXplOjExLjBwdCc+PHNwYW4NCnN0eWxlPSdtc28tdGFiLWNvdW50OjEnPsKgwqDCoMKgwqDC
|
160
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
161
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
162
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
163
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
164
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
165
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgPC9zcGFuPsKpDQpJ
|
166
|
+
U08vSUVDJm5ic3A7MjAxNiZuYnNwO+KAkyBBbGwgcmlnaHRzIHJlc2VydmVkPG86cD48L286cD48
|
167
|
+
L3NwYW4+PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6aGVhZGVyJyBp
|
168
|
+
ZD1laDI+DQoNCjxwIGNsYXNzPU1zb0hlYWRlciBhbGlnbj1sZWZ0IHN0eWxlPSd0ZXh0LWFsaWdu
|
169
|
+
OmxlZnQ7bGluZS1oZWlnaHQ6MTIuMHB0Ow0KbXNvLWxpbmUtaGVpZ2h0LXJ1bGU6ZXhhY3RseSc+
|
170
|
+
PHNwYW4gbGFuZz1FTi1HQj5JU08vSUVDJm5ic3A7Q0QgMTczMDEtMToyMDE2KEUpPC9zcGFuPjwv
|
171
|
+
cD4NCg0KPC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50OmhlYWRlcicgaWQ9aDI+DQoN
|
172
|
+
CjxwIGNsYXNzPU1zb0hlYWRlciBhbGlnbj1yaWdodCBzdHlsZT0ndGV4dC1hbGlnbjpyaWdodDts
|
173
|
+
aW5lLWhlaWdodDoxMi4wcHQ7DQptc28tbGluZS1oZWlnaHQtcnVsZTpleGFjdGx5Jz48c3BhbiBs
|
174
|
+
YW5nPUVOLUdCPklTTy9JRUMmbmJzcDtDRCAxNzMwMS0xOjIwMTYoRSk8L3NwYW4+PC9wPg0KDQo8
|
175
|
+
L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6Zm9vdGVyJyBpZD1lZjI+DQoNCjxwIGNs
|
176
|
+
YXNzPU1zb0Zvb3RlciBzdHlsZT0nbGluZS1oZWlnaHQ6MTIuMHB0O21zby1saW5lLWhlaWdodC1y
|
177
|
+
dWxlOmV4YWN0bHknPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PHNwYW4NCmxhbmc9RU4tR0Igc3R5
|
178
|
+
bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4NCnN0
|
179
|
+
eWxlPSdtc28tZWxlbWVudDpmaWVsZC1iZWdpbic+PC9zcGFuPjxzcGFuDQpzdHlsZT0nbXNvLXNw
|
180
|
+
YWNlcnVuOnllcyc+wqA8L3NwYW4+UEFHRTxzcGFuIHN0eWxlPSdtc28tc3BhY2VydW46eWVzJz7C
|
181
|
+
oMKgDQo8L3NwYW4+XCogTUVSR0VGT1JNQVQgPHNwYW4gc3R5bGU9J21zby1lbGVtZW50OmZpZWxk
|
182
|
+
LXNlcGFyYXRvcic+PC9zcGFuPjwvc3Bhbj48IVtlbmRpZl0tLT48c3Bhbg0KbGFuZz1FTi1HQiBz
|
183
|
+
dHlsZT0nZm9udC1zaXplOjEwLjBwdDttc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3Bhbg0K
|
184
|
+
c3R5bGU9J21zby1uby1wcm9vZjp5ZXMnPmlpPC9zcGFuPjwvc3Bhbj48IS0tW2lmIHN1cHBvcnRG
|
185
|
+
aWVsZHNdPjxzcGFuDQpsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRp
|
186
|
+
LWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuDQpzdHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtZW5kJz48
|
187
|
+
L3NwYW4+PC9zcGFuPjwhW2VuZGlmXS0tPjxzcGFuIGxhbmc9RU4tR0INCnN0eWxlPSdmb250LXNp
|
188
|
+
emU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuIHN0eWxlPSdtc28tdGFi
|
189
|
+
LWNvdW50Og0KMSc+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
190
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
191
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
192
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
193
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
194
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
195
|
+
wqDCoMKgwqDCoCA8L3NwYW4+wqkNCklTTy9JRUMmbmJzcDsyMDE2Jm5ic3A74oCTIEFsbCByaWdo
|
196
|
+
dHMgcmVzZXJ2ZWQ8bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxl
|
197
|
+
PSdtc28tZWxlbWVudDpmb290ZXInIGlkPWYyPg0KDQo8cCBjbGFzcz1Nc29Gb290ZXIgc3R5bGU9
|
198
|
+
J2xpbmUtaGVpZ2h0OjEyLjBwdCc+PHNwYW4gbGFuZz1FTi1HQg0Kc3R5bGU9J2ZvbnQtc2l6ZTox
|
199
|
+
MC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+wqkgSVNPL0lFQyZuYnNwOzIwMTYmbmJz
|
200
|
+
cDvigJMgQWxsDQpyaWdodHMgcmVzZXJ2ZWQ8c3BhbiBzdHlsZT0nbXNvLXRhYi1jb3VudDoxJz7C
|
201
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
202
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
203
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
204
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
205
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
206
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCA8L3Nw
|
207
|
+
YW4+PC9zcGFuPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PHNwYW4NCmxhbmc9RU4tR0Igc3R5bGU9
|
208
|
+
J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4NCnN0eWxl
|
209
|
+
PSdtc28tZWxlbWVudDpmaWVsZC1iZWdpbic+PC9zcGFuPiBQQUdFPHNwYW4gc3R5bGU9J21zby1z
|
210
|
+
cGFjZXJ1bjp5ZXMnPsKgwqANCjwvc3Bhbj5cKiBNRVJHRUZPUk1BVCA8c3BhbiBzdHlsZT0nbXNv
|
211
|
+
LWVsZW1lbnQ6ZmllbGQtc2VwYXJhdG9yJz48L3NwYW4+PC9zcGFuPjwhW2VuZGlmXS0tPjxzcGFu
|
212
|
+
DQpsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZTox
|
213
|
+
MS4wcHQnPjxzcGFuDQpzdHlsZT0nbXNvLW5vLXByb29mOnllcyc+aWlpPC9zcGFuPjwvc3Bhbj48
|
214
|
+
IS0tW2lmIHN1cHBvcnRGaWVsZHNdPjxzcGFuDQpsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNpemU6
|
215
|
+
MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuDQpzdHlsZT0nbXNvLWVsZW1l
|
216
|
+
bnQ6ZmllbGQtZW5kJz48L3NwYW4+PC9zcGFuPjwhW2VuZGlmXS0tPjxzcGFuIGxhbmc9RU4tR0IN
|
217
|
+
CnN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxvOnA+
|
218
|
+
PC9vOnA+PC9zcGFuPjwvcD4NCg0KPC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50OmZv
|
219
|
+
b3RlcicgaWQ9ZWYzPg0KDQo8cCBjbGFzcz1Nc29Gb290ZXIgc3R5bGU9J21hcmdpbi10b3A6MTIu
|
220
|
+
MHB0O2xpbmUtaGVpZ2h0OjEyLjBwdDttc28tbGluZS1oZWlnaHQtcnVsZToNCmV4YWN0bHknPjwh
|
221
|
+
LS1baWYgc3VwcG9ydEZpZWxkc10+PGIgc3R5bGU9J21zby1iaWRpLWZvbnQtd2VpZ2h0Om5vcm1h
|
222
|
+
bCc+PHNwYW4NCmxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9u
|
223
|
+
dC1zaXplOjExLjBwdCc+PHNwYW4NCnN0eWxlPSdtc28tZWxlbWVudDpmaWVsZC1iZWdpbic+PC9z
|
224
|
+
cGFuPjxzcGFuDQpzdHlsZT0nbXNvLXNwYWNlcnVuOnllcyc+wqA8L3NwYW4+UEFHRTxzcGFuIHN0
|
225
|
+
eWxlPSdtc28tc3BhY2VydW46eWVzJz7CoMKgDQo8L3NwYW4+XCogTUVSR0VGT1JNQVQgPHNwYW4g
|
226
|
+
c3R5bGU9J21zby1lbGVtZW50OmZpZWxkLXNlcGFyYXRvcic+PC9zcGFuPjwvc3Bhbj48L2I+PCFb
|
227
|
+
ZW5kaWZdLS0+PGINCnN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuIGxh
|
228
|
+
bmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7DQptc28tYmlkaS1mb250LXNpemU6MTEu
|
229
|
+
MHB0Jz48c3BhbiBzdHlsZT0nbXNvLW5vLXByb29mOnllcyc+Mjwvc3Bhbj48L3NwYW4+PC9iPjwh
|
230
|
+
LS1baWYgc3VwcG9ydEZpZWxkc10+PGINCnN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3Jt
|
231
|
+
YWwnPjxzcGFuIGxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7DQptc28tYmlkaS1m
|
232
|
+
b250LXNpemU6MTEuMHB0Jz48c3BhbiBzdHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtZW5kJz48L3Nw
|
233
|
+
YW4+PC9zcGFuPjwvYj48IVtlbmRpZl0tLT48c3Bhbg0KbGFuZz1FTi1HQiBzdHlsZT0nZm9udC1z
|
234
|
+
aXplOjEwLjBwdDttc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3Bhbg0Kc3R5bGU9J21zby10
|
235
|
+
YWItY291bnQ6MSc+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
236
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
237
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
238
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
239
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
240
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
241
|
+
wqDCoMKgwqDCoCA8L3NwYW4+wqkNCklTTy9JRUMmbmJzcDsyMDE2Jm5ic3A74oCTIEFsbCByaWdo
|
242
|
+
dHMgcmVzZXJ2ZWQ8bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxl
|
243
|
+
PSdtc28tZWxlbWVudDpmb290ZXInIGlkPWYzPg0KDQo8cCBjbGFzcz1Nc29Gb290ZXIgc3R5bGU9
|
244
|
+
J2xpbmUtaGVpZ2h0OjEyLjBwdCc+PHNwYW4gbGFuZz1FTi1HQg0Kc3R5bGU9J2ZvbnQtc2l6ZTox
|
245
|
+
MC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+wqkgSVNPL0lFQyZuYnNwOzIwMTYmbmJz
|
246
|
+
cDvigJMgQWxsDQpyaWdodHMgcmVzZXJ2ZWQ8c3BhbiBzdHlsZT0nbXNvLXRhYi1jb3VudDoxJz7C
|
247
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
248
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
249
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
250
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
|
251
|
+
oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
|
252
|
+
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIDwv
|
253
|
+
c3Bhbj48L3NwYW4+PCEtLVtpZiBzdXBwb3J0RmllbGRzXT48Yg0Kc3R5bGU9J21zby1iaWRpLWZv
|
254
|
+
bnQtd2VpZ2h0Om5vcm1hbCc+PHNwYW4gbGFuZz1FTi1HQiBzdHlsZT0nZm9udC1zaXplOjEwLjBw
|
255
|
+
dDsNCm1zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuIHN0eWxlPSdtc28tZWxlbWVudDpm
|
256
|
+
aWVsZC1iZWdpbic+PC9zcGFuPg0KUEFHRTxzcGFuIHN0eWxlPSdtc28tc3BhY2VydW46eWVzJz7C
|
257
|
+
oMKgIDwvc3Bhbj5cKiBNRVJHRUZPUk1BVCA8c3Bhbg0Kc3R5bGU9J21zby1lbGVtZW50OmZpZWxk
|
258
|
+
LXNlcGFyYXRvcic+PC9zcGFuPjwvc3Bhbj48L2I+PCFbZW5kaWZdLS0+PGINCnN0eWxlPSdtc28t
|
259
|
+
YmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6
|
260
|
+
ZToxMC4wcHQ7DQptc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3BhbiBzdHlsZT0nbXNvLW5v
|
261
|
+
LXByb29mOnllcyc+Mzwvc3Bhbj48L3NwYW4+PC9iPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PGIN
|
262
|
+
CnN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0Igc3R5
|
263
|
+
bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7DQptc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3BhbiBz
|
264
|
+
dHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtZW5kJz48L3NwYW4+PC9zcGFuPjwvYj48IVtlbmRpZl0t
|
265
|
+
LT48c3Bhbg0KbGFuZz1FTi1HQiBzdHlsZT0nZm9udC1zaXplOjEwLjBwdDttc28tYmlkaS1mb250
|
266
|
+
LXNpemU6MTEuMHB0Jz48bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8L2JvZHk+
|
267
|
+
DQoNCjwvaHRtbD4NCg==
|
142
268
|
|
143
269
|
------=_NextPart_--
|
144
270
|
FTR
|
145
271
|
|
146
|
-
WORD_FTR3 = <<~FTR
|
147
|
-
------=_NextPart_
|
148
|
-
Content-Location: file:///C:/Doc/test_files/609e8807-c2d0-450c-b60b-d995a0f8dcaf.png
|
149
|
-
Content-Transfer-Encoding: base64
|
150
|
-
Content-Type: image/png
|
151
|
-
FTR
|
152
|
-
|
153
272
|
WORD_FTR3 = <<~FTR
|
154
273
|
------=_NextPart_
|
155
274
|
Content-Location: file:///C:/Doc/test_files/filelist.xml
|
@@ -190,7 +309,7 @@ RSpec.describe Html2Doc do
|
|
190
309
|
end
|
191
310
|
|
192
311
|
it "processes a blank document" do
|
193
|
-
Html2Doc.process(html_input(""), "test"
|
312
|
+
Html2Doc.process(html_input(""), filename: "test")
|
194
313
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
195
314
|
to match_fuzzy(<<~OUTPUT)
|
196
315
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -200,14 +319,14 @@ RSpec.describe Html2Doc do
|
|
200
319
|
|
201
320
|
it "removes any temp files" do
|
202
321
|
File.delete("test.doc")
|
203
|
-
Html2Doc.process(html_input(""), "test"
|
322
|
+
Html2Doc.process(html_input(""), filename: "test")
|
204
323
|
expect(File.exist?("test.doc")).to be true
|
205
324
|
expect(File.exist?("test.htm")).to be false
|
206
325
|
expect(File.exist?("test_files")).to be false
|
207
326
|
end
|
208
327
|
|
209
328
|
it "processes a stylesheet in an HTML document with a title" do
|
210
|
-
Html2Doc.process(html_input(""), "test", "lib/html2doc/wordstyle.css"
|
329
|
+
Html2Doc.process(html_input(""), filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
|
211
330
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
212
331
|
to match_fuzzy(<<~OUTPUT)
|
213
332
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -216,7 +335,7 @@ RSpec.describe Html2Doc do
|
|
216
335
|
end
|
217
336
|
|
218
337
|
it "processes a stylesheet in an HTML document without a title" do
|
219
|
-
Html2Doc.process(html_input_no_title(""), "test", "lib/html2doc/wordstyle.css"
|
338
|
+
Html2Doc.process(html_input_no_title(""), filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
|
220
339
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
221
340
|
to match_fuzzy(<<~OUTPUT)
|
222
341
|
#{WORD_HDR.sub("<title>blank</title>", "")}
|
@@ -226,7 +345,7 @@ RSpec.describe Html2Doc do
|
|
226
345
|
end
|
227
346
|
|
228
347
|
it "processes a stylesheet in an HTML document with an empty head" do
|
229
|
-
Html2Doc.process(html_input_empty_head(""), "test", "lib/html2doc/wordstyle.css"
|
348
|
+
Html2Doc.process(html_input_empty_head(""), filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
|
230
349
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
231
350
|
to match_fuzzy(<<~OUTPUT)
|
232
351
|
#{WORD_HDR.sub("<title>blank</title>", "")}
|
@@ -237,7 +356,7 @@ RSpec.describe Html2Doc do
|
|
237
356
|
end
|
238
357
|
|
239
358
|
it "processes a header" do
|
240
|
-
Html2Doc.process(html_input(""), "test",
|
359
|
+
Html2Doc.process(html_input(""), filename: "test", header_file: "spec/header.html")
|
241
360
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
242
361
|
to match_fuzzy(<<~OUTPUT)
|
243
362
|
#{WORD_HDR} #{DEFAULT_STYLESHEET.gsub(/FILENAME/, "test")}
|
@@ -248,7 +367,7 @@ RSpec.describe Html2Doc do
|
|
248
367
|
it "processes a populated document" do
|
249
368
|
simple_body = "<h1>Hello word!</h1>
|
250
369
|
<div>This is a very simple document</div>"
|
251
|
-
Html2Doc.process(html_input(simple_body), "test"
|
370
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
252
371
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
253
372
|
to match_fuzzy(<<~OUTPUT)
|
254
373
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -258,11 +377,23 @@ RSpec.describe Html2Doc do
|
|
258
377
|
end
|
259
378
|
|
260
379
|
it "processes AsciiMath" do
|
261
|
-
Html2Doc.process(html_input("<div>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"), "test",
|
380
|
+
Html2Doc.process(html_input("<div>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"), filename: "test", asciimathdelims: ["{{", "}}"])
|
381
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
382
|
+
to match_fuzzy(<<~OUTPUT)
|
383
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
384
|
+
#{word_body('<div><m:oMath><m:nary><m:naryPr><m:chr m:val="∑"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="1"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub><m:r><m:t>i=1</m:t></m:r></m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e><m:sSup><m:e><m:r><m:t>i</m:t></m:r></m:e><m:sup><m:r><m:t>3</m:t></m:r></m:sup></m:sSup></m:e></m:nary><m:r><m:t>=</m:t></m:r><m:sSup><m:e><m:r><m:t>(</m:t></m:r><m:f><m:fPr><m:type m:val="bar"></m:type></m:fPr><m:num><m:r><m:t>n</m:t></m:r><m:r><m:t>(n+1)</m:t></m:r></m:num><m:den><m:r><m:t>2</m:t></m:r></m:den></m:f><m:r><m:t>)</m:t></m:r></m:e><m:sup><m:r><m:t>2</m:t></m:r></m:sup></m:sSup></m:oMath>
|
385
|
+
</div>', '<div style="mso-element:footnote-list"/>')}
|
386
|
+
#{WORD_FTR1}
|
387
|
+
OUTPUT
|
388
|
+
end
|
389
|
+
|
390
|
+
it "wraps msup after munderover in MathML" do
|
391
|
+
Html2Doc.process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
|
392
|
+
<munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi></mrow></munderover><msup><mn>2</mn><mrow><mi>i</mi></mrow></msup></math></div>"), filename: "test", asciimathdelims: ["{{", "}}"])
|
262
393
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
263
394
|
to match_fuzzy(<<~OUTPUT)
|
264
395
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
265
|
-
#{word_body('<div><m:oMath><m:nary><m:naryPr><m:chr m:val="∑"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="1"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub><m:r><m:t>i=
|
396
|
+
#{word_body('<div><m:oMath><m:nary><m:naryPr><m:chr m:val="∑"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="1"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub><m:r><m:t>i=0</m:t></m:r></m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e><m:sSup><m:e><m:r><m:t>2</m:t></m:r></m:e><m:sup><m:r><m:t>i</m:t></m:r></m:sup></m:sSup></m:e></m:nary></m:oMath>
|
266
397
|
</div>', '<div style="mso-element:footnote-list"/>')}
|
267
398
|
#{WORD_FTR1}
|
268
399
|
OUTPUT
|
@@ -271,7 +402,7 @@ RSpec.describe Html2Doc do
|
|
271
402
|
it "processes tabs" do
|
272
403
|
simple_body = "<h1>Hello word!</h1>
|
273
404
|
<div>This is a very &tab; simple document</div>"
|
274
|
-
Html2Doc.process(html_input(simple_body), "test"
|
405
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
275
406
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
276
407
|
to match_fuzzy(<<~OUTPUT)
|
277
408
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -284,7 +415,7 @@ RSpec.describe Html2Doc do
|
|
284
415
|
simple_body = '<h1>Hello word!</h1>
|
285
416
|
<p>This is a very simple document</p>
|
286
417
|
<p class="x">This style stays</p>'
|
287
|
-
Html2Doc.process(html_input(simple_body), "test"
|
418
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
288
419
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
289
420
|
to match_fuzzy(<<~OUTPUT)
|
290
421
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -299,7 +430,7 @@ RSpec.describe Html2Doc do
|
|
299
430
|
<li>This is a very simple document</li>
|
300
431
|
<li class="x">This style stays</li>
|
301
432
|
</ul>'
|
302
|
-
Html2Doc.process(html_input(simple_body), "test"
|
433
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
303
434
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
304
435
|
to match_fuzzy(<<~OUTPUT)
|
305
436
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -310,7 +441,7 @@ RSpec.describe Html2Doc do
|
|
310
441
|
|
311
442
|
it "resizes images for height" do
|
312
443
|
simple_body = '<img src="spec/19160-6.png">'
|
313
|
-
Html2Doc.process(html_input(simple_body), "test"
|
444
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
314
445
|
testdoc = File.read("test.doc", encoding: "utf-8")
|
315
446
|
expect(testdoc).to match(%r{Content-Type: image/png})
|
316
447
|
expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
|
@@ -322,7 +453,7 @@ RSpec.describe Html2Doc do
|
|
322
453
|
|
323
454
|
it "resizes images for width" do
|
324
455
|
simple_body = '<img src="spec/19160-7.gif">'
|
325
|
-
Html2Doc.process(html_input(simple_body), "test"
|
456
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
326
457
|
testdoc = File.read("test.doc", encoding: "utf-8")
|
327
458
|
expect(testdoc).to match(%r{Content-Type: image/gif})
|
328
459
|
expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
|
@@ -334,7 +465,7 @@ RSpec.describe Html2Doc do
|
|
334
465
|
|
335
466
|
it "resizes images for height" do
|
336
467
|
simple_body = '<img src="spec/19160-8.jpg">'
|
337
|
-
Html2Doc.process(html_input(simple_body), "test"
|
468
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
338
469
|
testdoc = File.read("test.doc", encoding: "utf-8")
|
339
470
|
expect(testdoc).to match(%r{Content-Type: image/jpeg})
|
340
471
|
expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
|
@@ -349,7 +480,7 @@ RSpec.describe Html2Doc do
|
|
349
480
|
document<a epub:type="footnote" href="#a1">1</a> allegedly<a epub:type="footnote" href="#a2">2</a></div>
|
350
481
|
<aside id="a1">Footnote</aside>
|
351
482
|
<aside id="a2">Other Footnote</aside>'
|
352
|
-
Html2Doc.process(html_input(simple_body), "test"
|
483
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
353
484
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
354
485
|
to match_fuzzy(<<~OUTPUT)
|
355
486
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -369,7 +500,7 @@ RSpec.describe Html2Doc do
|
|
369
500
|
document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
|
370
501
|
<aside id="a1">Footnote</aside>
|
371
502
|
<aside id="a2">Other Footnote</aside>'
|
372
|
-
Html2Doc.process(html_input(simple_body), "test"
|
503
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
373
504
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
374
505
|
to match_fuzzy(<<~OUTPUT)
|
375
506
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -389,7 +520,7 @@ RSpec.describe Html2Doc do
|
|
389
520
|
document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
|
390
521
|
<aside id="a1"><p>Footnote</p></aside>
|
391
522
|
<div id="a2"><p>Other Footnote</p></div>'
|
392
|
-
Html2Doc.process(html_input(simple_body), "test"
|
523
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
393
524
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
394
525
|
to match_fuzzy(<<~OUTPUT)
|
395
526
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
@@ -403,4 +534,21 @@ RSpec.describe Html2Doc do
|
|
403
534
|
#{WORD_FTR1}
|
404
535
|
OUTPUT
|
405
536
|
end
|
537
|
+
|
538
|
+
it "labels lists with list styles" do
|
539
|
+
simple_body = <<~BODY
|
540
|
+
<div><ul>
|
541
|
+
<li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>
|
542
|
+
BODY
|
543
|
+
Html2Doc.process(html_input(simple_body), filename: "test", liststyles: {ul: "l1", ol: "l2"})
|
544
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
545
|
+
to match_fuzzy(<<~OUTPUT)
|
546
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
547
|
+
#{word_body('<div><ul>
|
548
|
+
<li style="mso-list:l1 level1 lfo1;" class="MsoNormal"><div><p class="MsoNormal"><ol><li style="mso-list:l2 level2 lfo1;" class="MsoNormal"><ul><li style="mso-list:l1 level3 lfo1;" class="MsoNormal"><p class="MsoNormal"><ol><li style="mso-list:l2 level4 lfo1;" class="MsoNormal"><ol><li style="mso-list:l2 level5 lfo1;" class="MsoNormal">A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>',
|
549
|
+
'<div style="mso-element:footnote-list"/>')}
|
550
|
+
#{WORD_FTR1}
|
551
|
+
OUTPUT
|
552
|
+
end
|
553
|
+
|
406
554
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html2doc
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.6.
|
4
|
+
version: 0.6.5
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ribose Inc.
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-03-08 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: htmlentities
|
@@ -305,6 +305,8 @@ files:
|
|
305
305
|
- html2doc.gemspec
|
306
306
|
- lib/html2doc.rb
|
307
307
|
- lib/html2doc/base.rb
|
308
|
+
- lib/html2doc/lists.rb
|
309
|
+
- lib/html2doc/math.rb
|
308
310
|
- lib/html2doc/mathml2omml.xsl
|
309
311
|
- lib/html2doc/mime.rb
|
310
312
|
- lib/html2doc/notes.rb
|
@@ -312,6 +314,7 @@ files:
|
|
312
314
|
- lib/html2doc/wordstyle.css
|
313
315
|
- spec/19160-6.png
|
314
316
|
- spec/19160-7.gif
|
317
|
+
- spec/19160-8.jpg
|
315
318
|
- spec/examples/header.html
|
316
319
|
- spec/examples/rice.doc
|
317
320
|
- spec/examples/rice.html
|