html2doc 0.6.2 → 0.6.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 74ea5fa5a0e4221f38ded5491536c3b7fbeeb51b
4
- data.tar.gz: 6cfa24874e5afe45854c7c86061b1d3a9c1cb80a
3
+ metadata.gz: a81b1c785ccf5bf053bcfc9bc447c56c950aa5d0
4
+ data.tar.gz: f6eecc7cb0d39c318d75d8b8ce15bd472fcbe730
5
5
  SHA512:
6
- metadata.gz: d2aa3ea91ba1fe76ee540c85b0cb944ef3cc9eca4dd12f7884028fad0e41cdbc17b8e1ca17c6d110a73cb65e0dc214b8815f6296464c8a20399b00e28016717f
7
- data.tar.gz: efa78fc0e9d6533ed21bd9e564cdad6175c128510417ccf43d02a429a3c397bdec8f50987548ce9c2db78204ddb641fdc2828b9f1c29e02cc90868c8f5a1e7fa
6
+ metadata.gz: 7b851e4484c4c76cd2401b1421599f4d9ceb6feece0298e9c2ddf75975fcc373174c4b1073e7efdeb94f60990dc8f73fe1a641c188f98550c0f0c789a0dbb879
7
+ data.tar.gz: 03bfaccc89670d711ee63dfafb8a7acae8815a18efb827ae512f0028d2ab9d58f878744ff4fe8135c385e3626e373cb0ada8fc34ac5ec3b8c73d26af94cfc19a
data/README.adoc CHANGED
@@ -5,23 +5,24 @@ image:https://img.shields.io/gem/v/html2doc.svg["Gem Version", link="https://rub
5
5
  image:https://img.shields.io/travis/riboseinc/html2doc/master.svg["Build Status", link="https://travis-ci.org/riboseinc/html2doc"]
6
6
  image:https://codeclimate.com/github/riboseinc/html2doc/badges/gpa.svg["Code Climate", link="https://codeclimate.com/github/riboseinc/html2doc"]
7
7
 
8
- Gem to convert an HTML document into a Word document (.doc) format. This is intended for automated generation of Microsoft Word documents, given HTML documents, which are mmuch more readily crafted.
8
+ Gem to convert an HTML document into a Word document (.doc) format. This is intended for automated generation of Microsoft Word documents, given HTML documents, which are much more readily crafted.
9
9
 
10
- This gem originated out of https://github.com/riboseinc/asciidoctor-iso, which creates a Word document from a Microsoft HTML document (created in turn by processing Asciidoc). The Microsoft HTML document is already quite close to Microsoft Word requirements, but future iterations of this gem will become more generic.
10
+ This gem originated out of https://github.com/riboseinc/asciidoctor-iso, which creates a Word document from a automatically generated HTML document (created in turn by processing Asciidoc).
11
11
 
12
12
  This work is driven by the Word document generation procedure documented in http://sebsauvage.net/wiki/doku.php?id=word_document_generation
13
13
 
14
14
  The gem currently does the following:
15
15
 
16
- * Convert any AsciiMath and MathML to Word's native mathematical formatting language.
17
- * Identify any footnotes in the document (through hyperlinks with `class = "Footnote"` or `epub:type = "footnote"`), and render them as Microsoft Word footnotes.
16
+ * Convert any AsciiMath and MathML to Word's native mathematical formatting language, OOXML. Word supports copy-pasting MathML into Word and converting it into OOXML; however the conversion is not infallible (we have found problems with `\sum`: Word claims parameters were missing, and inserting dotted squares to indicate as much), and you may need to post-edit the OOXML.
17
+ ** The gem does attempt to repair the MathML input, to bring it in line with Word's OOXML's expectations. If you find any issues with AsciiMath or MathML input, please raise an issue.
18
+ * Identify any footnotes in the document (defined as hyperlinks with attributes `class = "Footnote"` or `epub:type = "footnote"`), and render them as Microsoft Word footnotes.
18
19
  * Resize any images in the HTML file to fit within the maximum page size. (Word will otherwise crash on reading the document.)
19
20
  * Generate a filelist.xml listing of all files to be bundled into the Word document.
20
21
  * Assign the class `MsoNormal` to any paragraphs that do not have a class, so that they can be treated as Normal Style when editing the Word document.
21
- * Inject Microsoft Word-specific CSS into the HTML document. The CSS file used is at `lib/html2doc/wordstyle.css`, and can be customised. (This generic CSS can be overridden by CSS already in the HTML document, since the generic CSS is injected at the top of the document.)
22
+ * Inject Microsoft Word-specific CSS into the HTML document. If a CSS file is not supplied, the CSS file used is at `lib/html2doc/wordstyle.css` is used by default. Microsoft Word HTML has particular requirements from its CSS, and you should review the sample CSS before replacing it with your own. (This generic CSS can be overridden by CSS already in the HTML document, since the generic CSS is injected at the top of the document.)
22
23
  * Bundle up the images, the HTML file of the document proper, and the `header.html` file representing header/footer information, into a MIME file, and save that file to disk (so that Microsoft Word can deal with it as a Word file.)
23
24
 
24
- Future iterations will convert generic HTML to Microsoft-specific HTML. For a representative generator of Microsoft HTML, see https://github.com/riboseinc/asciidoctor-iso
25
+ For a representative generator of HTML that uses this gem in postprocessing, see https://github.com/riboseinc/asciidoctor-iso
25
26
 
26
27
  Work to be done:
27
28
 
@@ -31,7 +32,10 @@ Work to be done:
31
32
 
32
33
  This generates .doc documents. Future versions will upgrade the output to docx.
33
34
 
34
- There there are two other Microsoft Word vendors in the Ruby ecosystem. https://github.com/jetruby/puredocx generate Word documents from a ruby struct as a DSL, rather than converting a preexisting html document. That constrains it's coverage to what is explicitly catered for in the DSL. https://github.com/MuhammetDilmac/Html2Docx is a much simpler wrapper around html: it does not do any of the added functionality described above (image resizing, converting footnotes, AsciiMath and MathML), though it does already generate docx.
35
+ There there are two other Microsoft Word vendors in the Ruby ecosystem.
36
+
37
+ * https://github.com/jetruby/puredocx generate Word documents from a ruby struct as a DSL, rather than converting a preexisting html document. That constrains it's coverage to what is explicitly catered for in the DSL.
38
+ * https://github.com/MuhammetDilmac/Html2Docx is a much simpler wrapper around html: it does not do any of the added functionality described above (image resizing, converting footnotes, AsciiMath and MathML), though it does already generate docx.
35
39
 
36
40
  == Usage
37
41
 
@@ -39,18 +43,25 @@ There there are two other Microsoft Word vendors in the Ruby ecosystem. https://
39
43
  --
40
44
  require "html2doc"
41
45
 
42
- Html2Doc.process(result, filename, stylesheet, header_filename, dir, asciimathdelims = nil)
46
+ Html2Doc.process(result, filename: filename, stylesheet: stylesheet, header_filename: header_filename, dir: dir, asciimathdelims: asciimathdelims, liststyles: liststyles)
43
47
  --
44
48
 
45
49
  result:: is the Html document to be converted into Word, as a string.
46
50
  filename:: is the name the document is to be saved as, without a file suffix
47
- stylesheet:: is the full path filename of the CSS stylesheet for Microsoft Word-specific styles. If this is not provided (`nil`), the program will used the default stylesheet included in the gem, `lib/html2doc/wordstyle.css`. The stylsheet provided must match this stylesheet; you can obtain one by saving a Word document with your desired styles to HTML, and extracting the style definitions from the HTML document header.
51
+ stylesheet:: is the full path filename of the CSS stylesheet for Microsoft Word-specific styles. If this is not provided, the program will used the default stylesheet included in the gem, `lib/html2doc/wordstyle.css`. The stylsheet provided must match this stylesheet; you can obtain one by saving a Word document with your desired styles to HTML, and extracting the style definitions from the HTML document header.
48
52
  header_filename:: is the filename of the HTML document containing header and footer for the document, as well as footnote/endnote separators; if there is none, use nil. To generate your own such document, save a Word document with headers/footers and/or footnote/endnote separators as an HTML document; the `header.html` will be in the `{filename}.fld` folder generated along with the HTML. A sample file is available at https://github.com/riboseinc/asciidoctor-iso/blob/master/lib/asciidoctor/iso/word/header.html
49
- dir:: is the folder that any ancillary files (images, headers, filelist) are to be saved to. If not provided (`nil`), it will be created as `{filename}_files`. Anything in the directory will be attached to the Word document; so this folder should only contain the images that accompany the document. (If the images are elsewhere on the local drive, the gem will move them into the folder.)
50
- asciimathdelims:: are the AsciiMath delimiters used in the text. If none are provided, no AsciiMath conversion is attempted.
53
+ dir:: is the folder that any ancillary files (images, headers, filelist) are to be saved to. If not provided, it will be created as `{filename}_files`. Anything in the directory will be attached to the Word document; so this folder should only contain the images that accompany the document. (If the images are elsewhere on the local drive, the gem will move them into the folder.)
54
+ asciimathdelims:: are the AsciiMath delimiters used in the text (an array of an opening and a closing delimiter). If none are provided, no AsciiMath conversion is attempted.
55
+ liststyles:: a hash of list style labels in Word CSS, which are used to define the behaviour of list item labels (e.g. _i)_ vs _i._). The gem recognises the hash keys `ul`, `ol`. So if the appearance of an ordered list's item labels in the supplied stylesheet is governed by style `@list l1` (e.g. `@list l1:level1 {mso-level-text:"%1\)";}` appears in the stylesheet), call the method with `liststyles:{ol: "l1"}`.
51
56
 
52
57
  Note that the local CSS stylesheet file contains a variable `FILENAME` for the location of footnote/endnote separators and headers/footers, which are provided in the header HTML file. The gem replaces `FILENAME` with the file nane that the document will be saved as. If you supply your own stylesheet and also wish to use separators or headers/footers, you will likewise need to replace the document name mentioned in your stylesheet with a `FILENAME` string.
53
58
 
59
+ == Caveat
60
+
61
+ The good news with generating a Word document via HTML is that Word understands CSS, and you can determine much of what the Word document looks like by manipulating that CSS. That extends to features that are not part of HTML CSS: if you want to work out how to get Word to do something in CSS, save a Word document that already does what you want as HTML, and inspect the HTML and CSS you get.
62
+
63
+ The bad news is that Word's implementation of CSS is poorly documented (even if Office HTML is documented in a 1300 page document (online at https://stigmortenmyre.no/mso/, https://www.rodriguezcommaj.com/assets/resources/microsoft-office-html-and-xml-reference.pdf), and the CSS selectors are only partially and selectively implemented. For list styles, for example, `mso-level-text` governs how the list label is displayed; but it is only recognised in a `@list` style: it is ignored in a CSS rule like `ol li`, or in a `style` attribute on a node. Working out the right CSS for what you want will take some trial and error, and you are better placed to try to do things Word's way than the right way.
64
+
54
65
  == Example
55
66
 
56
67
  The `spec/examples` directory includes `rice.doc` and its source files: this Word document has been generated from `rice.html` through a call to html2doc from https://github.com/riboseinc/asciidoctor-iso. (The source document `rice.html` was itself generated from Asciidoc, rather than being hand-crafted.)
data/lib/html2doc.rb CHANGED
@@ -2,3 +2,5 @@ require_relative "html2doc/version"
2
2
  require_relative "html2doc/base"
3
3
  require_relative "html2doc/mime"
4
4
  require_relative "html2doc/notes"
5
+ require_relative "html2doc/math"
6
+ require_relative "html2doc/lists"
data/lib/html2doc/base.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  require "uuidtools"
2
2
  require "asciimath"
3
- require "image_size"
3
+ require "htmlentities"
4
4
  require "nokogiri"
5
5
  require "xml/xslt"
6
6
  require "pp"
@@ -9,16 +9,15 @@ module Html2Doc
9
9
  @xslt = XML::XSLT.new
10
10
  @xslt.xsl = File.read(File.join(File.dirname(__FILE__), "mathml2omml.xsl"))
11
11
 
12
- def self.process(result, filename, stylesheet, header_file, dir = nil,
13
- asciimathdelims = nil)
14
- dir1 = create_dir(filename, dir)
15
- result = process_html(result, filename, stylesheet, header_file,
16
- dir1, asciimathdelims)
17
- system "cp #{header_file} #{dir1}/header.html" unless header_file.nil?
18
- generate_filelist(filename, dir1)
19
- File.open("#{filename}.htm", "w") { |f| f.write(result) }
20
- mime_package result, filename, dir1
21
- rm_temp_files(filename, dir, dir1)
12
+ def self.process(result, hash)
13
+ hash[:dir1] = create_dir(hash[:filename], hash[:dir])
14
+ result = process_html(result, hash)
15
+ hash[:header_file].nil? ||
16
+ system("cp #{hash[:header_file]} #{hash[:dir1]}/header.html")
17
+ generate_filelist(hash[:filename], hash[:dir1])
18
+ File.open("#{hash[:filename]}.htm", "w") { |f| f.write(result) }
19
+ mime_package result, hash[:filename], hash[:dir1]
20
+ rm_temp_files(hash[:filename], hash[:dir], hash[:dir1])
22
21
  end
23
22
 
24
23
  def self.create_dir(filename, dir)
@@ -28,11 +27,9 @@ module Html2Doc
28
27
  dir
29
28
  end
30
29
 
31
- def self.process_html(result, filename, stylesheet, header_file, dir,
32
- asciimathdelims)
33
- # docxml = Nokogiri::XML(asciimath_to_mathml(result, asciimathdelims))
34
- docxml = to_xhtml(asciimath_to_mathml(result, asciimathdelims))
35
- define_head(cleanup(docxml, dir), dir, filename, stylesheet, header_file)
30
+ def self.process_html(result, hash)
31
+ docxml = to_xhtml(asciimath_to_mathml(result, hash[:asciimathdelims]))
32
+ define_head(cleanup(docxml, hash), hash)
36
33
  msword_fix(from_xhtml(docxml))
37
34
  end
38
35
 
@@ -41,33 +38,15 @@ module Html2Doc
41
38
  system "rm -r #{dir1}" unless dir
42
39
  end
43
40
 
44
- def self.cleanup(docxml, dir)
45
- image_cleanup(docxml, dir)
41
+ def self.cleanup(docxml, hash)
42
+ image_cleanup(docxml, hash[:dir1])
46
43
  mathml_to_ooml(docxml)
44
+ lists(docxml, hash[:liststyles])
47
45
  footnotes(docxml)
48
46
  msonormal(docxml)
49
47
  docxml
50
48
  end
51
49
 
52
- def self.asciimath_to_mathml(doc, delims)
53
- return doc if delims.nil? || delims.size < 2
54
- doc.split(/(#{delims[0]}|#{delims[1]})/).each_slice(4).map do |a|
55
- a[2].nil? || a[2] = AsciiMath.parse(a[2]).to_mathml.
56
- gsub(/<math>/, "<math xmlns='http://www.w3.org/1998/Math/MathML'>")
57
- a.size > 1 ? a[0] + a[2] : a[0]
58
- end.join
59
- end
60
-
61
- def self.mathml_to_ooml(docxml)
62
- docxml.xpath("//*[local-name() = 'math']").each do |m|
63
- @xslt.xml = m.to_s.
64
- gsub(/<math>/, "<math xmlns='http://www.w3.org/1998/Math/MathML'>")
65
- ooml = @xslt.serve.gsub(/<\?[^>]+>\s*/, "").
66
- gsub(/ xmlns:[^=]+="[^"]+"/, "")
67
- m.swap(ooml)
68
- end
69
- end
70
-
71
50
  NOKOHEAD = <<~HERE.freeze
72
51
  <!DOCTYPE html SYSTEM
73
52
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
@@ -109,32 +88,6 @@ module Html2Doc
109
88
  r
110
89
  end
111
90
 
112
- def self.image_resize(i, maxheight, maxwidth)
113
- size = [i["width"].to_i, i["height"].to_i]
114
- size = ImageSize.path(i["src"]).size if size[0].zero? && size[1].zero?
115
- # max height for Word document is 400, max width is 680
116
- if size[0] > maxheight
117
- size = [maxheight, (size[1] * maxheight / size[0]).ceil]
118
- end
119
- if size[1] > maxwidth
120
- size = [(size[0] * maxwidth / size[1]).ceil, maxwidth]
121
- end
122
- size
123
- end
124
-
125
- def self.image_cleanup(docxml, dir)
126
- docxml.xpath("//*[local-name() = 'img']").each do |i|
127
- matched = /\.(?<suffix>\S+)$/.match i["src"]
128
- uuid = UUIDTools::UUID.random_create.to_s
129
- new_full_filename = File.join(dir, "#{uuid}.#{matched[:suffix]}")
130
- # presupposes that the image source is local
131
- system "cp #{i['src']} #{new_full_filename}"
132
- i["width"], i["height"] = image_resize(i, 400, 680)
133
- i["src"] = new_full_filename
134
- end
135
- docxml
136
- end
137
-
138
91
  PRINT_VIEW = <<~XML.freeze
139
92
  <!--[if gte mso 9]>
140
93
  <xml>
@@ -151,7 +104,7 @@ module Html2Doc
151
104
  def self.define_head1(docxml, dir)
152
105
  docxml.xpath("//*[local-name() = 'head']").each do |h|
153
106
  h.children.first.add_previous_sibling <<~XML
154
- #{PRINT_VIEW}
107
+ #{PRINT_VIEW}
155
108
  <link rel="File-List" href="#{dir}/filelist.xml"/>
156
109
  XML
157
110
  end
@@ -176,12 +129,12 @@ module Html2Doc
176
129
  xml.root.to_s
177
130
  end
178
131
 
179
- def self.define_head(docxml, dir, filename, cssname, header_file)
132
+ def self.define_head(docxml, hash)
180
133
  title = docxml.at("//*[local-name() = 'head']/*[local-name() = 'title']")
181
134
  head = docxml.at("//*[local-name() = 'head']")
182
- css = stylesheet(filename, header_file, cssname)
135
+ css = stylesheet(hash[:filename], hash[:header_file], hash[:stylesheet])
183
136
  add_stylesheet(head, title, css)
184
- define_head1(docxml, dir)
137
+ define_head1(docxml, hash[:dir1])
185
138
  namespace(docxml.root)
186
139
  end
187
140
 
@@ -204,18 +157,6 @@ module Html2Doc
204
157
  root.add_namespace(nil, "http://www.w3.org/TR/REC-html40")
205
158
  end
206
159
 
207
- def self.generate_filelist(filename, dir)
208
- File.open(File.join(dir, "filelist.xml"), "w") do |f|
209
- f.write %{<xml xmlns:o="urn:schemas-microsoft-com:office:office">
210
- <o:MainFile HRef="../#{filename}.htm"/>}
211
- Dir.foreach(dir) do |item|
212
- next if item == "." || item == ".." || /^\./.match(item)
213
- f.write %{ <o:File HRef="#{item}"/>\n}
214
- end
215
- f.write("</xml>\n")
216
- end
217
- end
218
-
219
160
  def self.msonormal(docxml)
220
161
  docxml.xpath("//*[local-name() = 'p'][not(self::*[@class])]").each do |p|
221
162
  p["class"] = "MsoNormal"
@@ -0,0 +1,39 @@
1
+ require "uuidtools"
2
+ require "asciimath"
3
+ require "htmlentities"
4
+ require "nokogiri"
5
+ require "xml/xslt"
6
+ require "pp"
7
+
8
+ module Html2Doc
9
+ def self.style_list(li, level, listno)
10
+ return unless listno
11
+ if li["style"]
12
+ li["style"] += ";"
13
+ else
14
+ li["style"] = ""
15
+ end
16
+ # I don't know what the lfo-n attribute is. I doubt Micro$oft now does either.
17
+ li["style"] += "mso-list:#{listno} level#{level} lfo1;"
18
+ end
19
+
20
+ def self.list_add(xpath, liststyles, listtype, level)
21
+ xpath.each do |list|
22
+ (list.xpath(".//li") - list.xpath(".//ol//li | .//ul//li")).each do |li|
23
+ style_list(li, level, liststyles[listtype])
24
+ list_add(li.xpath(".//ul") - li.xpath(".//ul//ul | .//ol//ul"), liststyles, :ul, level + 1)
25
+ list_add(li.xpath(".//ol") - li.xpath(".//ul//ol | .//ol//ol"), liststyles, :ol, level + 1)
26
+ end
27
+ end
28
+ end
29
+
30
+ def self.lists(docxml, liststyles)
31
+ return if liststyles.nil?
32
+ if liststyles.has_key?(:ul)
33
+ list_add(docxml.xpath("//ul[not(ancestor::ul) and not(ancestor::ol)]"), liststyles, :ul, 1)
34
+ end
35
+ if liststyles.has_key?(:ol)
36
+ list_add(docxml.xpath("//ol[not(ancestor::ul) and not(ancestor::ol)]"), liststyles, :ol, 1)
37
+ end
38
+ end
39
+ end
@@ -0,0 +1,45 @@
1
+ require "uuidtools"
2
+ require "asciimath"
3
+ require "htmlentities"
4
+ require "nokogiri"
5
+ require "xml/xslt"
6
+ require "pp"
7
+
8
+ module Html2Doc
9
+ @xslt = XML::XSLT.new
10
+ @xslt.xsl = File.read(File.join(File.dirname(__FILE__), "mathml2omml.xsl"))
11
+
12
+ def self.asciimath_to_mathml1(x)
13
+ AsciiMath.parse(HTMLEntities.new.decode(x)).to_mathml.
14
+ gsub(/<math>/, "<math xmlns='http://www.w3.org/1998/Math/MathML'>")
15
+ end
16
+
17
+ def self.asciimath_to_mathml(doc, delims)
18
+ return doc if delims.nil? || delims.size < 2
19
+ doc.split(/(#{Regexp.escape(delims[0])}|#{Regexp.escape(delims[1])})/).
20
+ each_slice(4).map do |a|
21
+ a[2].nil? || a[2] = asciimath_to_mathml1(a[2])
22
+ a.size > 1 ? a[0] + a[2] : a[0]
23
+ end.join
24
+ end
25
+
26
+ # random fixes that OOXML needs to render properly
27
+ def self.ooxml_cleanup(m)
28
+ m.xpath(".//xmlns:msup[name(preceding-sibling::*[1])='munderover']",
29
+ m.document.collect_namespaces).each do |x|
30
+ x1 = x.replace("<mrow></mrow>").first
31
+ x1.children = x
32
+ end
33
+ m.add_namespace(nil, "http://www.w3.org/1998/Math/MathML")
34
+ m.to_s
35
+ end
36
+
37
+ def self.mathml_to_ooml(docxml)
38
+ docxml.xpath("//*[local-name() = 'math']").each do |m|
39
+ @xslt.xml = ooxml_cleanup(m)
40
+ ooxml = @xslt.serve.gsub(/<\?[^>]+>\s*/, "").
41
+ gsub(/ xmlns:[^=]+="[^"]+"/, "")
42
+ m.swap(ooxml)
43
+ end
44
+ end
45
+ end
data/lib/html2doc/mime.rb CHANGED
@@ -1,6 +1,7 @@
1
1
  require "uuidtools"
2
2
  require "base64"
3
3
  require "mime/types"
4
+ require "image_size"
4
5
 
5
6
  module Html2Doc
6
7
  def self.mime_preamble(boundary, filename, result)
@@ -49,10 +50,49 @@ module Html2Doc
49
50
  mhtml = mime_preamble(boundary, filename, result)
50
51
  mhtml += mime_attachment(boundary, filename, "filelist.xml", dir)
51
52
  Dir.foreach(dir) do |item|
52
- next if item == "." || item == ".." || /^\./.match(item) || item == "filelist.xml"
53
+ next if item == "." || item == ".." || /^\./.match(item) ||
54
+ item == "filelist.xml"
53
55
  mhtml += mime_attachment(boundary, filename, item, dir)
54
56
  end
55
57
  mhtml += "--#{boundary}--"
56
58
  File.open("#{filename}.doc", "w") { |f| f.write mhtml }
57
59
  end
60
+
61
+ def self.image_resize(i, maxheight, maxwidth)
62
+ size = [i["width"].to_i, i["height"].to_i]
63
+ size = ImageSize.path(i["src"]).size if size[0].zero? && size[1].zero?
64
+ # max height for Word document is 400, max width is 680
65
+ if size[0] > maxheight
66
+ size = [maxheight, (size[1] * maxheight / size[0]).ceil]
67
+ end
68
+ if size[1] > maxwidth
69
+ size = [(size[0] * maxwidth / size[1]).ceil, maxwidth]
70
+ end
71
+ size
72
+ end
73
+
74
+ def self.image_cleanup(docxml, dir)
75
+ docxml.xpath("//*[local-name() = 'img']").each do |i|
76
+ matched = /\.(?<suffix>\S+)$/.match i["src"]
77
+ uuid = UUIDTools::UUID.random_create.to_s
78
+ new_full_filename = File.join(dir, "#{uuid}.#{matched[:suffix]}")
79
+ # presupposes that the image source is local
80
+ system "cp #{i['src']} #{new_full_filename}"
81
+ i["width"], i["height"] = image_resize(i, 400, 680)
82
+ i["src"] = new_full_filename
83
+ end
84
+ docxml
85
+ end
86
+
87
+ def self.generate_filelist(filename, dir)
88
+ File.open(File.join(dir, "filelist.xml"), "w") do |f|
89
+ f.write %{<xml xmlns:o="urn:schemas-microsoft-com:office:office">
90
+ <o:MainFile HRef="../#{filename}.htm"/>}
91
+ Dir.entries(dir).sort.each do |item|
92
+ next if item == "." || item == ".." || /^\./.match(item)
93
+ f.write %{ <o:File HRef="#{item}"/>\n}
94
+ end
95
+ f.write("</xml>\n")
96
+ end
97
+ end
58
98
  end
@@ -1,3 +1,3 @@
1
1
  module Html2Doc
2
- VERSION = "0.6.2".freeze
2
+ VERSION = "0.6.5".freeze
3
3
  end
data/spec/19160-8.jpg ADDED
Binary file
@@ -97,59 +97,178 @@ Content-Type: text/html charset="utf-8"
97
97
  PGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwiDQp4bWxuczpvPSJ1
98
98
  cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTpvZmZpY2UiDQp4bWxuczp3PSJ1cm46c2No
99
99
  ZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTp3b3JkIg0KeG1sbnM6bT0iaHR0cDovL3NjaGVtYXMu
100
- bWljcm9zb2Z0LmNvbS9vZmZpY2UvMjAwNC8xMi9vbW1sIg0KeG1sbnM9Imh0dHA6Ly93d3cudzMu
101
- b3JnL1RSL1JFQy1odG1sNDAiPg0KDQo8aGVhZD4NCjxtZXRhIGh0dHAtZXF1aXY9Q29udGVudC1U
102
- eXBlIGNvbnRlbnQ9InRleHQvaHRtbDsgY2hhcnNldD11dGYtOCI+DQo8bWV0YSBuYW1lPVByb2dJ
103
- ZCBjb250ZW50PVdvcmQuRG9jdW1lbnQ+DQo8bWV0YSBuYW1lPUdlbmVyYXRvciBjb250ZW50PSJN
104
- aWNyb3NvZnQgV29yZCAxNSI+DQo8bWV0YSBuYW1lPU9yaWdpbmF0b3IgY29udGVudD0iTWljcm9z
105
- b2Z0IFdvcmQgMTUiPg0KPGxpbmsgaWQ9TWFpbi1GaWxlIHJlbD1NYWluLUZpbGUgaHJlZj0iLi4v
106
- cmljZS5nYi5odG1sIj4NCjwhLS1baWYgZ3RlIG1zbyA5XT48eG1sPg0KIDxvOnNoYXBlZGVmYXVs
107
- dHMgdjpleHQ9ImVkaXQiIHNwaWRtYXg9IjIwNDkiLz4NCjwveG1sPjwhW2VuZGlmXS0tPg0KPC9o
108
- ZWFkPg0KDQo8Ym9keSBsYW5nPVpIIGxpbms9Ymx1ZSB2bGluaz1wdXJwbGU+DQoNCjxkaXYgc3R5
109
- bGU9J21zby1lbGVtZW50OmZvb3Rub3RlLXNlcGFyYXRvcicgaWQ9ZnM+DQoNCjxwIGNsYXNzPU1z
110
- b05vcm1hbD48c3BhbiBsYW5nPUVOLVVTPjxzcGFuIHN0eWxlPSdtc28tc3BlY2lhbC1jaGFyYWN0
111
- ZXI6Zm9vdG5vdGUtc2VwYXJhdG9yJz48IVtpZiAhc3VwcG9ydEZvb3Rub3Rlc10+DQoNCjxociBh
112
- bGlnbj1sZWZ0IHNpemU9MSB3aWR0aD0iMzMlIj4NCg0KPCFbZW5kaWZdPjwvc3Bhbj48L3NwYW4+
113
- PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6Zm9vdG5vdGUtY29udGlu
114
- dWF0aW9uLXNlcGFyYXRvcicgaWQ9ZmNzPg0KDQo8cCBjbGFzcz1Nc29Ob3JtYWw+PHNwYW4gbGFu
115
- Zz1FTi1VUz48c3BhbiBzdHlsZT0nbXNvLXNwZWNpYWwtY2hhcmFjdGVyOmZvb3Rub3RlLWNvbnRp
116
- bnVhdGlvbi1zZXBhcmF0b3InPjwhW2lmICFzdXBwb3J0Rm9vdG5vdGVzXT4NCg0KPGhyIGFsaWdu
117
- PWxlZnQgc2l6ZT0xPg0KDQo8IVtlbmRpZl0+PC9zcGFuPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0K
118
- DQo8ZGl2IHN0eWxlPSdtc28tZWxlbWVudDplbmRub3RlLXNlcGFyYXRvcicgaWQ9ZXM+DQoNCjxw
119
- IGNsYXNzPU1zb05vcm1hbD48c3BhbiBsYW5nPUVOLVVTPjxzcGFuIHN0eWxlPSdtc28tc3BlY2lh
120
- bC1jaGFyYWN0ZXI6Zm9vdG5vdGUtc2VwYXJhdG9yJz48IVtpZiAhc3VwcG9ydEZvb3Rub3Rlc10+
121
- DQoNCjxociBhbGlnbj1sZWZ0IHNpemU9MSB3aWR0aD0iMzMlIj4NCg0KPCFbZW5kaWZdPjwvc3Bh
122
- bj48L3NwYW4+PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6ZW5kbm90
123
- ZS1jb250aW51YXRpb24tc2VwYXJhdG9yJyBpZD1lY3M+DQoNCjxwIGNsYXNzPU1zb05vcm1hbD48
124
- c3BhbiBsYW5nPUVOLVVTPjxzcGFuIHN0eWxlPSdtc28tc3BlY2lhbC1jaGFyYWN0ZXI6Zm9vdG5v
100
+ bWljcm9zb2Z0LmNvbS9vZmZpY2UvMjAwNC8xMi9vbW1sIg0KeG1sbnM6bXY9Imh0dHA6Ly9tYWNW
101
+ bWxTY2hlbWFVcmkiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy9UUi9SRUMtaHRtbDQwIj4NCg0K
102
+ PGhlYWQ+DQo8bWV0YSBuYW1lPVRpdGxlIGNvbnRlbnQ9IiI+DQo8bWV0YSBuYW1lPUtleXdvcmRz
103
+ IGNvbnRlbnQ9IiI+DQo8bWV0YSBodHRwLWVxdWl2PUNvbnRlbnQtVHlwZSBjb250ZW50PSJ0ZXh0
104
+ L2h0bWw7IGNoYXJzZXQ9dXRmLTgiPg0KPG1ldGEgbmFtZT1Qcm9nSWQgY29udGVudD1Xb3JkLkRv
105
+ Y3VtZW50Pg0KPG1ldGEgbmFtZT1HZW5lcmF0b3IgY29udGVudD0iTWljcm9zb2Z0IFdvcmQgMTUi
106
+ Pg0KPG1ldGEgbmFtZT1PcmlnaW5hdG9yIGNvbnRlbnQ9Ik1pY3Jvc29mdCBXb3JkIDE1Ij4NCjxs
107
+ aW5rIGlkPU1haW4tRmlsZSByZWw9TWFpbi1GaWxlIGhyZWY9IkZJTEVOQU1FLmh0bWwiPg0KPCEt
108
+ LVtpZiBndGUgbXNvIDldPjx4bWw+DQogPG86c2hhcGVkZWZhdWx0cyB2OmV4dD0iZWRpdCIgc3Bp
109
+ ZG1heD0iMjA0OSIvPg0KPC94bWw+PCFbZW5kaWZdLS0+DQo8L2hlYWQ+DQoNCjxib2R5IGxhbmc9
110
+ RU4gbGluaz1ibHVlIHZsaW5rPSIjOTU0RjcyIj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6
111
+ Zm9vdG5vdGUtc2VwYXJhdG9yJyBpZD1mcz4NCg0KPHAgY2xhc3M9TXNvTm9ybWFsIHN0eWxlPSdt
112
+ YXJnaW4tYm90dG9tOjBjbTttYXJnaW4tYm90dG9tOi4wMDAxcHQ7bGluZS1oZWlnaHQ6DQpub3Jt
113
+ YWwnPjxzcGFuIGxhbmc9RU4tR0I+PHNwYW4gc3R5bGU9J21zby1zcGVjaWFsLWNoYXJhY3Rlcjpm
114
+ b290bm90ZS1zZXBhcmF0b3InPjwhW2lmICFzdXBwb3J0Rm9vdG5vdGVzXT4NCg0KPGhyIGFsaWdu
115
+ PWxlZnQgc2l6ZT0xIHdpZHRoPSIzMyUiPg0KDQo8IVtlbmRpZl0+PC9zcGFuPjwvc3Bhbj48L3A+
116
+ DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxlPSdtc28tZWxlbWVudDpmb290bm90ZS1jb250aW51YXRp
117
+ b24tc2VwYXJhdG9yJyBpZD1mY3M+DQoNCjxwIGNsYXNzPU1zb05vcm1hbCBzdHlsZT0nbWFyZ2lu
118
+ LWJvdHRvbTowY207bWFyZ2luLWJvdHRvbTouMDAwMXB0O2xpbmUtaGVpZ2h0Og0Kbm9ybWFsJz48
119
+ c3BhbiBsYW5nPUVOLUdCPjxzcGFuIHN0eWxlPSdtc28tc3BlY2lhbC1jaGFyYWN0ZXI6Zm9vdG5v
125
120
  dGUtY29udGludWF0aW9uLXNlcGFyYXRvcic+PCFbaWYgIXN1cHBvcnRGb290bm90ZXNdPg0KDQo8
126
121
  aHIgYWxpZ249bGVmdCBzaXplPTE+DQoNCjwhW2VuZGlmXT48L3NwYW4+PC9zcGFuPjwvcD4NCg0K
127
- PC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50OmhlYWRlcicgaWQ9aDI+DQoNCjxwIGNs
128
- YXNzPU1zb0hlYWRlcj48c3BhbiBsYW5nPUVOLVVTPkRCMTEvQ0QgMTczMDEtMTwvc3Bhbj48c3Bh
129
- biBsYW5nPUVOLVVTDQpzdHlsZT0nZm9udC1mYW1pbHk6IlRpbWVzIE5ldyBSb21hbiIsc2VyaWY7
130
- bXNvLWFzY2lpLWZvbnQtZmFtaWx5OlNpbUhlaSc+4oCUPC9zcGFuPjxzcGFuDQpsYW5nPUVOLVVT
131
- PjIwMTY8L3NwYW4+PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6Zm9v
132
- dGVyJyBpZD1mMj4NCg0KPHAgY2xhc3M9TXNvRm9vdGVyPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+
133
- PHNwYW4gbGFuZz1FTi1VUz48c3BhbiBzdHlsZT0nbXNvLWVsZW1lbnQ6DQpmaWVsZC1iZWdpbic+
134
- PC9zcGFuPjxzcGFuIHN0eWxlPSdtc28tc3BhY2VydW46eWVzJz7CoDwvc3Bhbj5QQUdFPHNwYW4N
135
- CnN0eWxlPSdtc28tc3BhY2VydW46eWVzJz7CoCA8L3NwYW4+XCogTUVSR0VGT1JNQVQgPHNwYW4g
136
- c3R5bGU9J21zby1lbGVtZW50OmZpZWxkLXNlcGFyYXRvcic+PC9zcGFuPjwvc3Bhbj48IVtlbmRp
137
- Zl0tLT48c3Bhbg0KbGFuZz1lbCBzdHlsZT0nbXNvLWFuc2ktbGFuZ3VhZ2U6IzA0MDA7bXNvLWZh
138
- cmVhc3QtbGFuZ3VhZ2U6IzA0MDA7bXNvLW5vLXByb29mOg0KeWVzJz40Mjwvc3Bhbj48IS0tW2lm
139
- IHN1cHBvcnRGaWVsZHNdPjxzcGFuIGxhbmc9RU4tVVM+PHNwYW4gc3R5bGU9J21zby1lbGVtZW50
140
- Og0KZmllbGQtZW5kJz48L3NwYW4+PC9zcGFuPjwhW2VuZGlmXS0tPjwvcD4NCg0KPC9kaXY+DQoN
141
- CjwvYm9keT4NCg0KPC9odG1sPg0K
122
+ PC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50OmVuZG5vdGUtc2VwYXJhdG9yJyBpZD1l
123
+ cz4NCg0KPHAgY2xhc3M9TXNvTm9ybWFsIHN0eWxlPSdtYXJnaW4tYm90dG9tOjBjbTttYXJnaW4t
124
+ Ym90dG9tOi4wMDAxcHQ7bGluZS1oZWlnaHQ6DQpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0I+PHNw
125
+ YW4gc3R5bGU9J21zby1zcGVjaWFsLWNoYXJhY3Rlcjpmb290bm90ZS1zZXBhcmF0b3InPjwhW2lm
126
+ ICFzdXBwb3J0Rm9vdG5vdGVzXT4NCg0KPGhyIGFsaWduPWxlZnQgc2l6ZT0xIHdpZHRoPSIzMyUi
127
+ Pg0KDQo8IVtlbmRpZl0+PC9zcGFuPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxl
128
+ PSdtc28tZWxlbWVudDplbmRub3RlLWNvbnRpbnVhdGlvbi1zZXBhcmF0b3InIGlkPWVjcz4NCg0K
129
+ PHAgY2xhc3M9TXNvTm9ybWFsIHN0eWxlPSdtYXJnaW4tYm90dG9tOjBjbTttYXJnaW4tYm90dG9t
130
+ Oi4wMDAxcHQ7bGluZS1oZWlnaHQ6DQpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0I+PHNwYW4gc3R5
131
+ bGU9J21zby1zcGVjaWFsLWNoYXJhY3Rlcjpmb290bm90ZS1jb250aW51YXRpb24tc2VwYXJhdG9y
132
+ Jz48IVtpZiAhc3VwcG9ydEZvb3Rub3Rlc10+DQoNCjxociBhbGlnbj1sZWZ0IHNpemU9MT4NCg0K
133
+ PCFbZW5kaWZdPjwvc3Bhbj48L3NwYW4+PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNv
134
+ LWVsZW1lbnQ6aGVhZGVyJyBpZD1laDE+DQoNCjxwIGNsYXNzPU1zb0hlYWRlciBhbGlnbj1sZWZ0
135
+ IHN0eWxlPSd0ZXh0LWFsaWduOmxlZnQ7bGluZS1oZWlnaHQ6MTIuMHB0Ow0KbXNvLWxpbmUtaGVp
136
+ Z2h0LXJ1bGU6ZXhhY3RseSc+PHNwYW4gbGFuZz1FTi1HQj5JU08vSUVDJm5ic3A7Q0QgMTczMDEt
137
+ MToyMDE2KEUpPC9zcGFuPjwvcD4NCg0KPC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50
138
+ OmhlYWRlcicgaWQ9aDE+DQoNCjxwIGNsYXNzPU1zb0hlYWRlciBzdHlsZT0nbWFyZ2luLWJvdHRv
139
+ bToxOC4wcHQnPjxzcGFuIGxhbmc9RU4tR0INCnN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1i
140
+ aWRpLWZvbnQtc2l6ZToxMS4wcHQ7Zm9udC13ZWlnaHQ6bm9ybWFsJz7CqQ0KSVNPL0lFQyZuYnNw
141
+ OzIwMTYmbmJzcDvigJMgQWxsIHJpZ2h0cyByZXNlcnZlZDwvc3Bhbj48c3BhbiBsYW5nPUVOLUdC
142
+ DQpzdHlsZT0nZm9udC13ZWlnaHQ6bm9ybWFsJz48bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwv
143
+ ZGl2Pg0KDQo8ZGl2IHN0eWxlPSdtc28tZWxlbWVudDpmb290ZXInIGlkPWVmMT4NCg0KPHAgY2xh
144
+ c3M9TXNvRm9vdGVyIHN0eWxlPSdtYXJnaW4tdG9wOjEyLjBwdDtsaW5lLWhlaWdodDoxMi4wcHQ7
145
+ bXNvLWxpbmUtaGVpZ2h0LXJ1bGU6DQpleGFjdGx5Jz48IS0tW2lmIHN1cHBvcnRGaWVsZHNdPjxi
146
+ IHN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuDQpsYW5nPUVOLUdCIHN0
147
+ eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuDQpz
148
+ dHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtYmVnaW4nPjwvc3Bhbj48c3Bhbg0Kc3R5bGU9J21zby1z
149
+ cGFjZXJ1bjp5ZXMnPsKgPC9zcGFuPlBBR0U8c3BhbiBzdHlsZT0nbXNvLXNwYWNlcnVuOnllcyc+
150
+ wqDCoA0KPC9zcGFuPlwqIE1FUkdFRk9STUFUIDxzcGFuIHN0eWxlPSdtc28tZWxlbWVudDpmaWVs
151
+ ZC1zZXBhcmF0b3InPjwvc3Bhbj48L3NwYW4+PC9iPjwhW2VuZGlmXS0tPjxiDQpzdHlsZT0nbXNv
152
+ LWJpZGktZm9udC13ZWlnaHQ6bm9ybWFsJz48c3BhbiBsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNp
153
+ emU6MTAuMHB0Ow0KbXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4gc3R5bGU9J21zby1u
154
+ by1wcm9vZjp5ZXMnPjI8L3NwYW4+PC9zcGFuPjwvYj48IS0tW2lmIHN1cHBvcnRGaWVsZHNdPjxi
155
+ DQpzdHlsZT0nbXNvLWJpZGktZm9udC13ZWlnaHQ6bm9ybWFsJz48c3BhbiBsYW5nPUVOLUdCIHN0
156
+ eWxlPSdmb250LXNpemU6MTAuMHB0Ow0KbXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4g
157
+ c3R5bGU9J21zby1lbGVtZW50OmZpZWxkLWVuZCc+PC9zcGFuPjwvc3Bhbj48L2I+PCFbZW5kaWZd
158
+ LS0+PHNwYW4NCmxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9u
159
+ dC1zaXplOjExLjBwdCc+PHNwYW4NCnN0eWxlPSdtc28tdGFiLWNvdW50OjEnPsKgwqDCoMKgwqDC
160
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
161
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
162
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
163
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
164
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
165
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAgPC9zcGFuPsKpDQpJ
166
+ U08vSUVDJm5ic3A7MjAxNiZuYnNwO+KAkyBBbGwgcmlnaHRzIHJlc2VydmVkPG86cD48L286cD48
167
+ L3NwYW4+PC9wPg0KDQo8L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6aGVhZGVyJyBp
168
+ ZD1laDI+DQoNCjxwIGNsYXNzPU1zb0hlYWRlciBhbGlnbj1sZWZ0IHN0eWxlPSd0ZXh0LWFsaWdu
169
+ OmxlZnQ7bGluZS1oZWlnaHQ6MTIuMHB0Ow0KbXNvLWxpbmUtaGVpZ2h0LXJ1bGU6ZXhhY3RseSc+
170
+ PHNwYW4gbGFuZz1FTi1HQj5JU08vSUVDJm5ic3A7Q0QgMTczMDEtMToyMDE2KEUpPC9zcGFuPjwv
171
+ cD4NCg0KPC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50OmhlYWRlcicgaWQ9aDI+DQoN
172
+ CjxwIGNsYXNzPU1zb0hlYWRlciBhbGlnbj1yaWdodCBzdHlsZT0ndGV4dC1hbGlnbjpyaWdodDts
173
+ aW5lLWhlaWdodDoxMi4wcHQ7DQptc28tbGluZS1oZWlnaHQtcnVsZTpleGFjdGx5Jz48c3BhbiBs
174
+ YW5nPUVOLUdCPklTTy9JRUMmbmJzcDtDRCAxNzMwMS0xOjIwMTYoRSk8L3NwYW4+PC9wPg0KDQo8
175
+ L2Rpdj4NCg0KPGRpdiBzdHlsZT0nbXNvLWVsZW1lbnQ6Zm9vdGVyJyBpZD1lZjI+DQoNCjxwIGNs
176
+ YXNzPU1zb0Zvb3RlciBzdHlsZT0nbGluZS1oZWlnaHQ6MTIuMHB0O21zby1saW5lLWhlaWdodC1y
177
+ dWxlOmV4YWN0bHknPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PHNwYW4NCmxhbmc9RU4tR0Igc3R5
178
+ bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4NCnN0
179
+ eWxlPSdtc28tZWxlbWVudDpmaWVsZC1iZWdpbic+PC9zcGFuPjxzcGFuDQpzdHlsZT0nbXNvLXNw
180
+ YWNlcnVuOnllcyc+wqA8L3NwYW4+UEFHRTxzcGFuIHN0eWxlPSdtc28tc3BhY2VydW46eWVzJz7C
181
+ oMKgDQo8L3NwYW4+XCogTUVSR0VGT1JNQVQgPHNwYW4gc3R5bGU9J21zby1lbGVtZW50OmZpZWxk
182
+ LXNlcGFyYXRvcic+PC9zcGFuPjwvc3Bhbj48IVtlbmRpZl0tLT48c3Bhbg0KbGFuZz1FTi1HQiBz
183
+ dHlsZT0nZm9udC1zaXplOjEwLjBwdDttc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3Bhbg0K
184
+ c3R5bGU9J21zby1uby1wcm9vZjp5ZXMnPmlpPC9zcGFuPjwvc3Bhbj48IS0tW2lmIHN1cHBvcnRG
185
+ aWVsZHNdPjxzcGFuDQpsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRp
186
+ LWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuDQpzdHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtZW5kJz48
187
+ L3NwYW4+PC9zcGFuPjwhW2VuZGlmXS0tPjxzcGFuIGxhbmc9RU4tR0INCnN0eWxlPSdmb250LXNp
188
+ emU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuIHN0eWxlPSdtc28tdGFi
189
+ LWNvdW50Og0KMSc+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
190
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
191
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
192
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
193
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
194
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
195
+ wqDCoMKgwqDCoCA8L3NwYW4+wqkNCklTTy9JRUMmbmJzcDsyMDE2Jm5ic3A74oCTIEFsbCByaWdo
196
+ dHMgcmVzZXJ2ZWQ8bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxl
197
+ PSdtc28tZWxlbWVudDpmb290ZXInIGlkPWYyPg0KDQo8cCBjbGFzcz1Nc29Gb290ZXIgc3R5bGU9
198
+ J2xpbmUtaGVpZ2h0OjEyLjBwdCc+PHNwYW4gbGFuZz1FTi1HQg0Kc3R5bGU9J2ZvbnQtc2l6ZTox
199
+ MC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+wqkgSVNPL0lFQyZuYnNwOzIwMTYmbmJz
200
+ cDvigJMgQWxsDQpyaWdodHMgcmVzZXJ2ZWQ8c3BhbiBzdHlsZT0nbXNvLXRhYi1jb3VudDoxJz7C
201
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
202
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
203
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
204
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
205
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
206
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCA8L3Nw
207
+ YW4+PC9zcGFuPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PHNwYW4NCmxhbmc9RU4tR0Igc3R5bGU9
208
+ J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+PHNwYW4NCnN0eWxl
209
+ PSdtc28tZWxlbWVudDpmaWVsZC1iZWdpbic+PC9zcGFuPiBQQUdFPHNwYW4gc3R5bGU9J21zby1z
210
+ cGFjZXJ1bjp5ZXMnPsKgwqANCjwvc3Bhbj5cKiBNRVJHRUZPUk1BVCA8c3BhbiBzdHlsZT0nbXNv
211
+ LWVsZW1lbnQ6ZmllbGQtc2VwYXJhdG9yJz48L3NwYW4+PC9zcGFuPjwhW2VuZGlmXS0tPjxzcGFu
212
+ DQpsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZTox
213
+ MS4wcHQnPjxzcGFuDQpzdHlsZT0nbXNvLW5vLXByb29mOnllcyc+aWlpPC9zcGFuPjwvc3Bhbj48
214
+ IS0tW2lmIHN1cHBvcnRGaWVsZHNdPjxzcGFuDQpsYW5nPUVOLUdCIHN0eWxlPSdmb250LXNpemU6
215
+ MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuDQpzdHlsZT0nbXNvLWVsZW1l
216
+ bnQ6ZmllbGQtZW5kJz48L3NwYW4+PC9zcGFuPjwhW2VuZGlmXS0tPjxzcGFuIGxhbmc9RU4tR0IN
217
+ CnN0eWxlPSdmb250LXNpemU6MTAuMHB0O21zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxvOnA+
218
+ PC9vOnA+PC9zcGFuPjwvcD4NCg0KPC9kaXY+DQoNCjxkaXYgc3R5bGU9J21zby1lbGVtZW50OmZv
219
+ b3RlcicgaWQ9ZWYzPg0KDQo8cCBjbGFzcz1Nc29Gb290ZXIgc3R5bGU9J21hcmdpbi10b3A6MTIu
220
+ MHB0O2xpbmUtaGVpZ2h0OjEyLjBwdDttc28tbGluZS1oZWlnaHQtcnVsZToNCmV4YWN0bHknPjwh
221
+ LS1baWYgc3VwcG9ydEZpZWxkc10+PGIgc3R5bGU9J21zby1iaWRpLWZvbnQtd2VpZ2h0Om5vcm1h
222
+ bCc+PHNwYW4NCmxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7bXNvLWJpZGktZm9u
223
+ dC1zaXplOjExLjBwdCc+PHNwYW4NCnN0eWxlPSdtc28tZWxlbWVudDpmaWVsZC1iZWdpbic+PC9z
224
+ cGFuPjxzcGFuDQpzdHlsZT0nbXNvLXNwYWNlcnVuOnllcyc+wqA8L3NwYW4+UEFHRTxzcGFuIHN0
225
+ eWxlPSdtc28tc3BhY2VydW46eWVzJz7CoMKgDQo8L3NwYW4+XCogTUVSR0VGT1JNQVQgPHNwYW4g
226
+ c3R5bGU9J21zby1lbGVtZW50OmZpZWxkLXNlcGFyYXRvcic+PC9zcGFuPjwvc3Bhbj48L2I+PCFb
227
+ ZW5kaWZdLS0+PGINCnN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuIGxh
228
+ bmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7DQptc28tYmlkaS1mb250LXNpemU6MTEu
229
+ MHB0Jz48c3BhbiBzdHlsZT0nbXNvLW5vLXByb29mOnllcyc+Mjwvc3Bhbj48L3NwYW4+PC9iPjwh
230
+ LS1baWYgc3VwcG9ydEZpZWxkc10+PGINCnN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3Jt
231
+ YWwnPjxzcGFuIGxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7DQptc28tYmlkaS1m
232
+ b250LXNpemU6MTEuMHB0Jz48c3BhbiBzdHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtZW5kJz48L3Nw
233
+ YW4+PC9zcGFuPjwvYj48IVtlbmRpZl0tLT48c3Bhbg0KbGFuZz1FTi1HQiBzdHlsZT0nZm9udC1z
234
+ aXplOjEwLjBwdDttc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3Bhbg0Kc3R5bGU9J21zby10
235
+ YWItY291bnQ6MSc+wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
236
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
237
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
238
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
239
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
240
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
241
+ wqDCoMKgwqDCoCA8L3NwYW4+wqkNCklTTy9JRUMmbmJzcDsyMDE2Jm5ic3A74oCTIEFsbCByaWdo
242
+ dHMgcmVzZXJ2ZWQ8bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8ZGl2IHN0eWxl
243
+ PSdtc28tZWxlbWVudDpmb290ZXInIGlkPWYzPg0KDQo8cCBjbGFzcz1Nc29Gb290ZXIgc3R5bGU9
244
+ J2xpbmUtaGVpZ2h0OjEyLjBwdCc+PHNwYW4gbGFuZz1FTi1HQg0Kc3R5bGU9J2ZvbnQtc2l6ZTox
245
+ MC4wcHQ7bXNvLWJpZGktZm9udC1zaXplOjExLjBwdCc+wqkgSVNPL0lFQyZuYnNwOzIwMTYmbmJz
246
+ cDvigJMgQWxsDQpyaWdodHMgcmVzZXJ2ZWQ8c3BhbiBzdHlsZT0nbXNvLXRhYi1jb3VudDoxJz7C
247
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
248
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
249
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
250
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
251
+ oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
252
+ wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgIDwv
253
+ c3Bhbj48L3NwYW4+PCEtLVtpZiBzdXBwb3J0RmllbGRzXT48Yg0Kc3R5bGU9J21zby1iaWRpLWZv
254
+ bnQtd2VpZ2h0Om5vcm1hbCc+PHNwYW4gbGFuZz1FTi1HQiBzdHlsZT0nZm9udC1zaXplOjEwLjBw
255
+ dDsNCm1zby1iaWRpLWZvbnQtc2l6ZToxMS4wcHQnPjxzcGFuIHN0eWxlPSdtc28tZWxlbWVudDpm
256
+ aWVsZC1iZWdpbic+PC9zcGFuPg0KUEFHRTxzcGFuIHN0eWxlPSdtc28tc3BhY2VydW46eWVzJz7C
257
+ oMKgIDwvc3Bhbj5cKiBNRVJHRUZPUk1BVCA8c3Bhbg0Kc3R5bGU9J21zby1lbGVtZW50OmZpZWxk
258
+ LXNlcGFyYXRvcic+PC9zcGFuPjwvc3Bhbj48L2I+PCFbZW5kaWZdLS0+PGINCnN0eWxlPSdtc28t
259
+ YmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0Igc3R5bGU9J2ZvbnQtc2l6
260
+ ZToxMC4wcHQ7DQptc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3BhbiBzdHlsZT0nbXNvLW5v
261
+ LXByb29mOnllcyc+Mzwvc3Bhbj48L3NwYW4+PC9iPjwhLS1baWYgc3VwcG9ydEZpZWxkc10+PGIN
262
+ CnN0eWxlPSdtc28tYmlkaS1mb250LXdlaWdodDpub3JtYWwnPjxzcGFuIGxhbmc9RU4tR0Igc3R5
263
+ bGU9J2ZvbnQtc2l6ZToxMC4wcHQ7DQptc28tYmlkaS1mb250LXNpemU6MTEuMHB0Jz48c3BhbiBz
264
+ dHlsZT0nbXNvLWVsZW1lbnQ6ZmllbGQtZW5kJz48L3NwYW4+PC9zcGFuPjwvYj48IVtlbmRpZl0t
265
+ LT48c3Bhbg0KbGFuZz1FTi1HQiBzdHlsZT0nZm9udC1zaXplOjEwLjBwdDttc28tYmlkaS1mb250
266
+ LXNpemU6MTEuMHB0Jz48bzpwPjwvbzpwPjwvc3Bhbj48L3A+DQoNCjwvZGl2Pg0KDQo8L2JvZHk+
267
+ DQoNCjwvaHRtbD4NCg==
142
268
 
143
269
  ------=_NextPart_--
144
270
  FTR
145
271
 
146
- WORD_FTR3 = <<~FTR
147
- ------=_NextPart_
148
- Content-Location: file:///C:/Doc/test_files/609e8807-c2d0-450c-b60b-d995a0f8dcaf.png
149
- Content-Transfer-Encoding: base64
150
- Content-Type: image/png
151
- FTR
152
-
153
272
  WORD_FTR3 = <<~FTR
154
273
  ------=_NextPart_
155
274
  Content-Location: file:///C:/Doc/test_files/filelist.xml
@@ -190,7 +309,7 @@ RSpec.describe Html2Doc do
190
309
  end
191
310
 
192
311
  it "processes a blank document" do
193
- Html2Doc.process(html_input(""), "test", nil, nil, nil, nil)
312
+ Html2Doc.process(html_input(""), filename: "test")
194
313
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
195
314
  to match_fuzzy(<<~OUTPUT)
196
315
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -200,14 +319,14 @@ RSpec.describe Html2Doc do
200
319
 
201
320
  it "removes any temp files" do
202
321
  File.delete("test.doc")
203
- Html2Doc.process(html_input(""), "test", nil, nil, nil, nil)
322
+ Html2Doc.process(html_input(""), filename: "test")
204
323
  expect(File.exist?("test.doc")).to be true
205
324
  expect(File.exist?("test.htm")).to be false
206
325
  expect(File.exist?("test_files")).to be false
207
326
  end
208
327
 
209
328
  it "processes a stylesheet in an HTML document with a title" do
210
- Html2Doc.process(html_input(""), "test", "lib/html2doc/wordstyle.css", nil, nil, nil)
329
+ Html2Doc.process(html_input(""), filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
211
330
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
212
331
  to match_fuzzy(<<~OUTPUT)
213
332
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -216,7 +335,7 @@ RSpec.describe Html2Doc do
216
335
  end
217
336
 
218
337
  it "processes a stylesheet in an HTML document without a title" do
219
- Html2Doc.process(html_input_no_title(""), "test", "lib/html2doc/wordstyle.css", nil, nil, nil)
338
+ Html2Doc.process(html_input_no_title(""), filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
220
339
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
221
340
  to match_fuzzy(<<~OUTPUT)
222
341
  #{WORD_HDR.sub("<title>blank</title>", "")}
@@ -226,7 +345,7 @@ RSpec.describe Html2Doc do
226
345
  end
227
346
 
228
347
  it "processes a stylesheet in an HTML document with an empty head" do
229
- Html2Doc.process(html_input_empty_head(""), "test", "lib/html2doc/wordstyle.css", nil, nil, nil)
348
+ Html2Doc.process(html_input_empty_head(""), filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
230
349
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
231
350
  to match_fuzzy(<<~OUTPUT)
232
351
  #{WORD_HDR.sub("<title>blank</title>", "")}
@@ -237,7 +356,7 @@ RSpec.describe Html2Doc do
237
356
  end
238
357
 
239
358
  it "processes a header" do
240
- Html2Doc.process(html_input(""), "test", nil, "header.html", nil, nil)
359
+ Html2Doc.process(html_input(""), filename: "test", header_file: "spec/header.html")
241
360
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
242
361
  to match_fuzzy(<<~OUTPUT)
243
362
  #{WORD_HDR} #{DEFAULT_STYLESHEET.gsub(/FILENAME/, "test")}
@@ -248,7 +367,7 @@ RSpec.describe Html2Doc do
248
367
  it "processes a populated document" do
249
368
  simple_body = "<h1>Hello word!</h1>
250
369
  <div>This is a very simple document</div>"
251
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
370
+ Html2Doc.process(html_input(simple_body), filename: "test")
252
371
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
253
372
  to match_fuzzy(<<~OUTPUT)
254
373
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -258,11 +377,23 @@ RSpec.describe Html2Doc do
258
377
  end
259
378
 
260
379
  it "processes AsciiMath" do
261
- Html2Doc.process(html_input("<div>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"), "test", nil, nil, nil, ["{{", "}}"])
380
+ Html2Doc.process(html_input("<div>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"), filename: "test", asciimathdelims: ["{{", "}}"])
381
+ expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
382
+ to match_fuzzy(<<~OUTPUT)
383
+ #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
384
+ #{word_body('<div><m:oMath><m:nary><m:naryPr><m:chr m:val="&#x2211;"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="1"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub><m:r><m:t>i=1</m:t></m:r></m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e><m:sSup><m:e><m:r><m:t>i</m:t></m:r></m:e><m:sup><m:r><m:t>3</m:t></m:r></m:sup></m:sSup></m:e></m:nary><m:r><m:t>=</m:t></m:r><m:sSup><m:e><m:r><m:t>(</m:t></m:r><m:f><m:fPr><m:type m:val="bar"></m:type></m:fPr><m:num><m:r><m:t>n</m:t></m:r><m:r><m:t>(n+1)</m:t></m:r></m:num><m:den><m:r><m:t>2</m:t></m:r></m:den></m:f><m:r><m:t>)</m:t></m:r></m:e><m:sup><m:r><m:t>2</m:t></m:r></m:sup></m:sSup></m:oMath>
385
+ </div>', '<div style="mso-element:footnote-list"/>')}
386
+ #{WORD_FTR1}
387
+ OUTPUT
388
+ end
389
+
390
+ it "wraps msup after munderover in MathML" do
391
+ Html2Doc.process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
392
+ <munderover><mo>&#x2211;</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi></mrow></munderover><msup><mn>2</mn><mrow><mi>i</mi></mrow></msup></math></div>"), filename: "test", asciimathdelims: ["{{", "}}"])
262
393
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
263
394
  to match_fuzzy(<<~OUTPUT)
264
395
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
265
- #{word_body('<div><m:oMath><m:nary><m:naryPr><m:chr m:val="&#x2211;"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="1"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub><m:r><m:t>i=1</m:t></m:r></m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e></m:e></m:nary><m:sSup><m:e><m:r><m:t>i</m:t></m:r></m:e><m:sup><m:r><m:t>3</m:t></m:r></m:sup></m:sSup><m:r><m:t>=</m:t></m:r><m:sSup><m:e><m:r><m:t>(</m:t></m:r><m:f><m:fPr><m:type m:val="bar"></m:type></m:fPr><m:num><m:r><m:t>n</m:t></m:r><m:r><m:t>(n+1)</m:t></m:r></m:num><m:den><m:r><m:t>2</m:t></m:r></m:den></m:f><m:r><m:t>)</m:t></m:r></m:e><m:sup><m:r><m:t>2</m:t></m:r></m:sup></m:sSup></m:oMath>
396
+ #{word_body('<div><m:oMath><m:nary><m:naryPr><m:chr m:val="&#x2211;"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="1"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub><m:r><m:t>i=0</m:t></m:r></m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e><m:sSup><m:e><m:r><m:t>2</m:t></m:r></m:e><m:sup><m:r><m:t>i</m:t></m:r></m:sup></m:sSup></m:e></m:nary></m:oMath>
266
397
  </div>', '<div style="mso-element:footnote-list"/>')}
267
398
  #{WORD_FTR1}
268
399
  OUTPUT
@@ -271,7 +402,7 @@ RSpec.describe Html2Doc do
271
402
  it "processes tabs" do
272
403
  simple_body = "<h1>Hello word!</h1>
273
404
  <div>This is a very &tab; simple document</div>"
274
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
405
+ Html2Doc.process(html_input(simple_body), filename: "test")
275
406
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
276
407
  to match_fuzzy(<<~OUTPUT)
277
408
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -284,7 +415,7 @@ RSpec.describe Html2Doc do
284
415
  simple_body = '<h1>Hello word!</h1>
285
416
  <p>This is a very simple document</p>
286
417
  <p class="x">This style stays</p>'
287
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
418
+ Html2Doc.process(html_input(simple_body), filename: "test")
288
419
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
289
420
  to match_fuzzy(<<~OUTPUT)
290
421
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -299,7 +430,7 @@ RSpec.describe Html2Doc do
299
430
  <li>This is a very simple document</li>
300
431
  <li class="x">This style stays</li>
301
432
  </ul>'
302
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
433
+ Html2Doc.process(html_input(simple_body), filename: "test")
303
434
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
304
435
  to match_fuzzy(<<~OUTPUT)
305
436
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -310,7 +441,7 @@ RSpec.describe Html2Doc do
310
441
 
311
442
  it "resizes images for height" do
312
443
  simple_body = '<img src="spec/19160-6.png">'
313
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
444
+ Html2Doc.process(html_input(simple_body), filename: "test")
314
445
  testdoc = File.read("test.doc", encoding: "utf-8")
315
446
  expect(testdoc).to match(%r{Content-Type: image/png})
316
447
  expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
@@ -322,7 +453,7 @@ RSpec.describe Html2Doc do
322
453
 
323
454
  it "resizes images for width" do
324
455
  simple_body = '<img src="spec/19160-7.gif">'
325
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
456
+ Html2Doc.process(html_input(simple_body), filename: "test")
326
457
  testdoc = File.read("test.doc", encoding: "utf-8")
327
458
  expect(testdoc).to match(%r{Content-Type: image/gif})
328
459
  expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
@@ -334,7 +465,7 @@ RSpec.describe Html2Doc do
334
465
 
335
466
  it "resizes images for height" do
336
467
  simple_body = '<img src="spec/19160-8.jpg">'
337
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
468
+ Html2Doc.process(html_input(simple_body), filename: "test")
338
469
  testdoc = File.read("test.doc", encoding: "utf-8")
339
470
  expect(testdoc).to match(%r{Content-Type: image/jpeg})
340
471
  expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
@@ -349,7 +480,7 @@ RSpec.describe Html2Doc do
349
480
  document<a epub:type="footnote" href="#a1">1</a> allegedly<a epub:type="footnote" href="#a2">2</a></div>
350
481
  <aside id="a1">Footnote</aside>
351
482
  <aside id="a2">Other Footnote</aside>'
352
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
483
+ Html2Doc.process(html_input(simple_body), filename: "test")
353
484
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
354
485
  to match_fuzzy(<<~OUTPUT)
355
486
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -369,7 +500,7 @@ RSpec.describe Html2Doc do
369
500
  document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
370
501
  <aside id="a1">Footnote</aside>
371
502
  <aside id="a2">Other Footnote</aside>'
372
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
503
+ Html2Doc.process(html_input(simple_body), filename: "test")
373
504
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
374
505
  to match_fuzzy(<<~OUTPUT)
375
506
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -389,7 +520,7 @@ RSpec.describe Html2Doc do
389
520
  document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
390
521
  <aside id="a1"><p>Footnote</p></aside>
391
522
  <div id="a2"><p>Other Footnote</p></div>'
392
- Html2Doc.process(html_input(simple_body), "test", nil, nil, nil, nil)
523
+ Html2Doc.process(html_input(simple_body), filename: "test")
393
524
  expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
394
525
  to match_fuzzy(<<~OUTPUT)
395
526
  #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -403,4 +534,21 @@ RSpec.describe Html2Doc do
403
534
  #{WORD_FTR1}
404
535
  OUTPUT
405
536
  end
537
+
538
+ it "labels lists with list styles" do
539
+ simple_body = <<~BODY
540
+ <div><ul>
541
+ <li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>
542
+ BODY
543
+ Html2Doc.process(html_input(simple_body), filename: "test", liststyles: {ul: "l1", ol: "l2"})
544
+ expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
545
+ to match_fuzzy(<<~OUTPUT)
546
+ #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
547
+ #{word_body('<div><ul>
548
+ <li style="mso-list:l1 level1 lfo1;" class="MsoNormal"><div><p class="MsoNormal"><ol><li style="mso-list:l2 level2 lfo1;" class="MsoNormal"><ul><li style="mso-list:l1 level3 lfo1;" class="MsoNormal"><p class="MsoNormal"><ol><li style="mso-list:l2 level4 lfo1;" class="MsoNormal"><ol><li style="mso-list:l2 level5 lfo1;" class="MsoNormal">A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>',
549
+ '<div style="mso-element:footnote-list"/>')}
550
+ #{WORD_FTR1}
551
+ OUTPUT
552
+ end
553
+
406
554
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html2doc
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.2
4
+ version: 0.6.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-02-23 00:00:00.000000000 Z
11
+ date: 2018-03-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: htmlentities
@@ -305,6 +305,8 @@ files:
305
305
  - html2doc.gemspec
306
306
  - lib/html2doc.rb
307
307
  - lib/html2doc/base.rb
308
+ - lib/html2doc/lists.rb
309
+ - lib/html2doc/math.rb
308
310
  - lib/html2doc/mathml2omml.xsl
309
311
  - lib/html2doc/mime.rb
310
312
  - lib/html2doc/notes.rb
@@ -312,6 +314,7 @@ files:
312
314
  - lib/html2doc/wordstyle.css
313
315
  - spec/19160-6.png
314
316
  - spec/19160-7.gif
317
+ - spec/19160-8.jpg
315
318
  - spec/examples/header.html
316
319
  - spec/examples/rice.doc
317
320
  - spec/examples/rice.html