RubyGems - html2doc - Versions diffs - 1.3.0.1 → 1.4.0.1 - Mend

html2doc 1.3.0.1 → 1.4.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6f471eed9c61de156ee7aa0dd279d21335fd193dda721727197c2dc508a7bf56
-  data.tar.gz: 79365ba28433486f8d6442aa939729af2456c570baccbb88b9ddb835d1f94172
+  metadata.gz: 39f218409dbbaa66345a38c3ac768bde9def9ffffa4fb40b388366994c28cba3
+  data.tar.gz: c3eb9ec0b62796ca8cd165c787f03a752e3ba92ff01f4fcedac7c47099e3b6f9
 SHA512:
-  metadata.gz: ccdfd5dc149461e651cf9a0ca22504052e76f3980c75498b05f1720451694acc41c8a91c16f92a2611e553a9d93b9caa482c93b47cdbd95288767588895a22ac
-  data.tar.gz: 14b4833c7d5fd9f179b9101b240acca1f2f2e8e40ef6521f0bd244d6a5f526255a2a34928960403b28575685e1d6e29d84156577a54fa13bedea1048eb9eeafb
+  metadata.gz: 5de663e28833714b38e902ecb78f567f1e6734ab7998c12f37aec432a44694b8f5e8f868b861a443716b1a88d05c60e92fc298b2f9d608e6370c74d0ad0170f6
+  data.tar.gz: 5e508c5589940b6b50cb16de23d8de07717d82807308e4980ae8206a219778a0ef3ae835afd87d021f835ffd218675e557cea0fb396fc85707e6df9502cab118

data/README.adoc CHANGED Viewed

@@ -58,14 +58,14 @@ There there are two other Microsoft Word vendors in the Ruby ecosystem.
 --
 require "html2doc"
-Html2Doc.process(result, filename: filename, imagedir: imagedir, stylesheet: stylesheet, header_filename: header_filename, dir: dir, asciimathdelims: asciimathdelims, liststyles: liststyles)
+Html2Doc.new(filename: filename, imagedir: imagedir, stylesheet: stylesheet, header_file: header_filename, dir: dir, asciimathdelims: asciimathdelims, liststyles: liststyles).process(result)
 --
 result:: is the Html document to be converted into Word, as a string.
 filename:: is the name the document is to be saved as, without a file suffix
 imagedir:: base directory for local image file names in source XML
 stylesheet:: is the full path filename of the CSS stylesheet for Microsoft Word-specific styles. If this is not provided, the program will used the default stylesheet included in the gem, `lib/html2doc/wordstyle.css`. The stylsheet provided must match this stylesheet; you can obtain one by saving a Word document with your desired styles to HTML, and extracting the style definitions from the HTML document header.
-header_filename:: is the filename of the HTML document containing header and footer for the document, as well as footnote/endnote separators; if there is none, use nil. To generate your own such document, save a Word document with headers/footers and/or footnote/endnote separators as an HTML document; the `header.html` will be in the `{filename}.fld` folder generated along with the HTML. A sample file is available at https://github.com/metanorma/metanorma-iso/blob/master/lib/asciidoctor/iso/word/header.html
+header_file:: is the filename of the HTML document containing header and footer for the document, as well as footnote/endnote separators; if there is none, use nil. To generate your own such document, save a Word document with headers/footers and/or footnote/endnote separators as an HTML document; the `header.html` will be in the `{filename}.fld` folder generated along with the HTML. A sample file is available at https://github.com/metanorma/metanorma-iso/blob/master/lib/asciidoctor/iso/word/header.html
 dir:: is the folder that any ancillary files (images, headers, filelist) are to be saved to. If not provided, it will be created as `{filename}_files`. Anything in the directory will be attached to the Word document; so this folder should only contain the images that accompany the document. (If the images are elsewhere on the local drive, the gem will move them into the folder. External URL images are left alone, and are not downloaded.)
 asciimathdelims:: are the AsciiMath delimiters used in the text (an array of an opening and a closing delimiter). If none are provided, no AsciiMath conversion is attempted.
 liststyles:: a hash of list style labels in Word CSS, which are used to define the behaviour of list item labels (e.g. _i)_ vs _i._). The gem recognises the hash keys `ul`, `ol`. So if the appearance of an ordered list's item labels in the supplied stylesheet is governed by style `@list l1` (e.g. `@list l1:level1 {mso-level-text:"%1\)";}` appears in the stylesheet), call the method with `liststyles:{ol: "l1"}`. The lists that the `ul` and `ol` list styles are applied to are assumed not to have any CSS class. If there any additional hash keys, they are assumed to be classes applied to the topmost ordered or unordered list; e.g. `liststyles:{steps: "l5"}` means that any list with class `steps` at the topmost level has the list style `l5` recursively applied to it. Any top-level lists without a class named in liststyles will be treated like lists with no CSS class.

data/bin/html2doc CHANGED Viewed

@@ -21,8 +21,7 @@ if ARGV.length < 1
 end
 Html2Doc.process(
-  File.read(ARGV[0], encoding: "utf-8"),
   filename: ARGV[0].gsub(/\.html?$/, ""),
   stylesheet: options[:stylesheet],
   header: options[:header],
-)
+).process(File.read(ARGV[0], encoding: "utf-8"))

data/lib/html2doc/base.rb CHANGED Viewed

@@ -4,27 +4,41 @@ require "htmlentities"
 require "nokogiri"
 require "fileutils"
-module Html2Doc
-  def self.process(result, hash)
-    hash[:dir1] = create_dir(hash[:filename], hash[:dir])
-    result = process_html(result, hash)
-    process_header(hash[:header_file], hash)
-    generate_filelist(hash[:filename], hash[:dir1])
-    File.open("#{hash[:filename]}.htm", "w:UTF-8") { |f| f.write(result) }
-    mime_package result, hash[:filename], hash[:dir1]
-    rm_temp_files(hash[:filename], hash[:dir], hash[:dir1]) unless hash[:debug]
-  end
-  def self.process_header(headerfile, hash)
+class Html2Doc
+  def initialize(hash)
+    @filename = hash[:filename]
+    @dir = hash[:dir]
+    @dir1 = create_dir(@filename, @dir)
+    @header_file = hash[:header_file]
+    @asciimathdelims = hash[:asciimathdelims]
+    @imagedir = hash[:imagedir]
+    @debug = hash[:debug]
+    @liststyles = hash[:liststyles]
+    @stylesheet = hash[:stylesheet]
+    @xsltemplate =
+      Nokogiri::XSLT(File.read(File.join(File.dirname(__FILE__), "mml2omml.xsl"),
+                               encoding: "utf-8"))
+  end
+  def process(result)
+    result = process_html(result)
+    process_header(@header_file)
+    generate_filelist(@filename, @dir1)
+    File.open("#{@filename}.htm", "w:UTF-8") { |f| f.write(result) }
+    mime_package result, @filename, @dir1
+    rm_temp_files(@filename, @dir, @dir1) unless @debug
+  end
+  def process_header(headerfile)
     return if headerfile.nil?
     doc = File.read(headerfile, encoding: "utf-8")
-    doc = header_image_cleanup(doc, hash[:dir1], hash[:filename],
-                               File.dirname(hash[:filename]))
-    File.open("#{hash[:dir1]}/header.html", "w:UTF-8") { |f| f.write(doc) }
+    doc = header_image_cleanup(doc, @dir1, @filename,
+                               File.dirname(@filename))
+    File.open("#{@dir1}/header.html", "w:UTF-8") { |f| f.write(doc) }
   end
-  def self.clear_dir(dir)
+  def clear_dir(dir)
     Dir.foreach(dir) do |f|
       fn = File.join(dir, f)
       File.delete(fn) if f != "." && f != ".."
@@ -32,30 +46,30 @@ module Html2Doc
     dir
   end
-  def self.create_dir(filename, dir)
+  def create_dir(filename, dir)
     dir and return clear_dir(dir)
     dir = "#{filename}_files"
     Dir.mkdir(dir) unless File.exists?(dir)
     clear_dir(dir)
   end
-  def self.process_html(result, hash)
-    docxml = to_xhtml(asciimath_to_mathml(result, hash[:asciimathdelims]))
-    define_head(cleanup(docxml, hash), hash)
+  def process_html(result)
+    docxml = to_xhtml(asciimath_to_mathml(result, @asciimathdelims))
+    define_head(cleanup(docxml))
     msword_fix(from_xhtml(docxml))
   end
-  def self.rm_temp_files(filename, dir, dir1)
+  def rm_temp_files(filename, dir, dir1)
     FileUtils.rm "#{filename}.htm"
     FileUtils.rm_f "#{dir1}/header.html"
     FileUtils.rm_r dir1 unless dir
   end
-  def self.cleanup(docxml, hash)
+  def cleanup(docxml)
     namespace(docxml.root)
-    image_cleanup(docxml, hash[:dir1], hash[:imagedir])
+    image_cleanup(docxml, @dir1, @imagedir)
     mathml_to_ooml(docxml)
-    lists(docxml, hash[:liststyles])
+    lists(docxml, @liststyles)
     footnotes(docxml)
     bookmarks(docxml)
     msonormal(docxml)
@@ -70,7 +84,7 @@ module Html2Doc
     <body> </body> </html>
   HERE
-  def self.to_xhtml(xml)
+  def to_xhtml(xml)
     xml.gsub!(/<\?xml[^>]*>/, "")
     unless /<!DOCTYPE /.match? xml
       xml = '<!DOCTYPE html SYSTEM
@@ -85,7 +99,7 @@ module Html2Doc
     <!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   DOCTYPE
-  def self.from_xhtml(xml)
+  def from_xhtml(xml)
     xml.to_xml.sub(%{ xmlns="http://www.w3.org/1999/xhtml"}, "")
       .sub(DOCTYPE, "").gsub(%{ />}, "/>")
       .gsub(/<!-- MSWORD-COMMENT (.+?) -->/, "<!--[\\1]>")
@@ -93,7 +107,7 @@ module Html2Doc
       .gsub("\n--&gt;\n", "\n-->\n")
   end
-  def self.msword_fix(doc)
+  def msword_fix(doc)
     # brain damage in MSWord parser
     doc.gsub!(%r{<w:DoNotOptimizeForBrowser></w:DoNotOptimizeForBrowser>},
               "<w:DoNotOptimizeForBrowser/>")
@@ -133,7 +147,7 @@ module Html2Doc
     <meta http-equiv='Content-Type' content="text/html; charset=utf-8"/>
   XML
-  def self.define_head1(docxml, _dir)
+  def define_head1(docxml, _dir)
     docxml.xpath("//*[local-name() = 'head']").each do |h|
       h.children.first.add_previous_sibling <<~XML
         #{PRINT_VIEW}
@@ -142,7 +156,7 @@ module Html2Doc
     end
   end
-  def self.filename_substitute(head, header_filename)
+  def filename_substitute(head, header_filename)
     return if header_filename.nil?
     head.xpath(".//*[local-name() = 'style']").each do |s|
@@ -153,30 +167,30 @@ module Html2Doc
     end
   end
-  def self.stylesheet(_filename, _header_filename, cssname)
+  def stylesheet(_filename, _header_filename, cssname)
     (cssname.nil? || cssname.empty?) and
       cssname = File.join(File.dirname(__FILE__), "wordstyle.css")
     stylesheet = File.read(cssname, encoding: "UTF-8")
     xml = Nokogiri::XML("<style/>")
-    #s = Nokogiri::XML::CDATA.new(xml, "\n#{stylesheet}\n")
-    #xml.children.first << Nokogiri::XML::Comment.new(xml, s)
+    # s = Nokogiri::XML::CDATA.new(xml, "\n#{stylesheet}\n")
+    # xml.children.first << Nokogiri::XML::Comment.new(xml, s)
     xml.children.first << Nokogiri::XML::CDATA
       .new(xml, "\n<!--\n#{stylesheet}\n-->\n")
     xml.root.to_s
   end
-  def self.define_head(docxml, hash)
+  def define_head(docxml)
     title = docxml.at("//*[local-name() = 'head']/*[local-name() = 'title']")
     head = docxml.at("//*[local-name() = 'head']")
-    css = stylesheet(hash[:filename], hash[:header_file], hash[:stylesheet])
+    css = stylesheet(@filename, @header_file, @stylesheet)
     add_stylesheet(head, title, css)
-    filename_substitute(head, hash[:header_file])
-    define_head1(docxml, hash[:dir1])
+    filename_substitute(head, @header_file)
+    define_head1(docxml, @dir1)
     rootnamespace(docxml.root)
   end
-  def self.add_stylesheet(head, title, css)
+  def add_stylesheet(head, title, css)
     if head.children.empty?
       head.add_child css
     elsif title.nil?
@@ -186,7 +200,7 @@ module Html2Doc
     end
   end
-  def self.namespace(root)
+  def namespace(root)
     {
       o: "urn:schemas-microsoft-com:office:office",
       w: "urn:schemas-microsoft-com:office:word",
@@ -195,11 +209,11 @@ module Html2Doc
     }.each { |k, v| root.add_namespace_definition(k.to_s, v) }
   end
-  def self.rootnamespace(root)
+  def rootnamespace(root)
     root.add_namespace(nil, "http://www.w3.org/TR/REC-html40")
   end
-  def self.bookmarks(docxml)
+  def bookmarks(docxml)
     docxml.xpath("//*[@id][not(@name)][not(@style = 'mso-element:footnote')]")
       .each do |x|
       next if x["id"].empty? ||
@@ -212,7 +226,7 @@ module Html2Doc
     end
   end
-  def self.msonormal(docxml)
+  def msonormal(docxml)
     docxml.xpath("//*[local-name() = 'p'][not(self::*[@class])]").each do |p|
       p["class"] = "MsoNormal"
     end

data/lib/html2doc/lists.rb CHANGED Viewed

@@ -3,8 +3,8 @@ require "asciimath"
 require "htmlentities"
 require "nokogiri"
-module Html2Doc
-  def self.style_list(elem, level, liststyle, listnumber)
+class Html2Doc
+  def style_list(elem, level, liststyle, listnumber)
     return unless liststyle
     if elem["style"]
@@ -15,7 +15,7 @@ module Html2Doc
     elem["style"] += "mso-list:#{liststyle} level#{level} lfo#{listnumber};"
   end
-  def self.list_add1(elem, liststyles, listtype, level)
+  def list_add1(elem, liststyles, listtype, level)
     if %i[ul ol].include? listtype
       list_add(elem.xpath(".//ul") - elem.xpath(".//ul//ul | .//ol//ul"),
                liststyles, :ul, level + 1)
@@ -29,7 +29,7 @@ module Html2Doc
     end
   end
-  def self.list_add(xpath, liststyles, listtype, level)
+  def list_add(xpath, liststyles, listtype, level)
     xpath.each_with_index do |l, _i|
       @listnumber += 1 if level == 1
       l["seen"] = true if level == 1
@@ -46,7 +46,7 @@ module Html2Doc
     end
   end
-  def self.list2para(list)
+  def list2para(list)
     return if list.xpath("./li").empty?
     list.xpath("./li").first["class"] ||= "MsoListParagraphCxSpFirst"
@@ -63,7 +63,7 @@ module Html2Doc
   TOPLIST = "[not(ancestor::ul) and not(ancestor::ol)]".freeze
-  def self.lists1(docxml, liststyles, style)
+  def lists1(docxml, liststyles, style)
     case style
     when :ul then list_add(docxml.xpath("//ul[not(@class)]#{TOPLIST}"),
                            liststyles, :ul, 1)
@@ -76,7 +76,7 @@ module Html2Doc
     end
   end
-  def self.lists_unstyled(docxml, liststyles)
+  def lists_unstyled(docxml, liststyles)
     liststyles.has_key?(:ul) and
       list_add(docxml.xpath("//ul#{TOPLIST}[not(@seen)]"),
                liststyles, :ul, 1)
@@ -88,7 +88,7 @@ module Html2Doc
     end
   end
-  def self.lists(docxml, liststyles)
+  def lists(docxml, liststyles)
     return if liststyles.nil?
     @listnumber = 0

data/lib/html2doc/math.rb CHANGED Viewed

@@ -4,12 +4,8 @@ require "htmlentities"
 require "nokogiri"
 require "plane1converter"
-module Html2Doc
-  @xsltemplate =
-    Nokogiri::XSLT(File.read(File.join(File.dirname(__FILE__), "mml2omml.xsl"),
-                             encoding: "utf-8"))
-  def self.asciimath_to_mathml1(expr)
+class Html2Doc
+  def asciimath_to_mathml1(expr)
     AsciiMath::MathMLBuilder.new(msword: true).append_expression(
       AsciiMath.parse(HTMLEntities.new.decode(expr)).ast,
     ).to_s
@@ -20,7 +16,7 @@ module Html2Doc
     raise e
   end
-  def self.asciimath_to_mathml(doc, delims)
+  def asciimath_to_mathml(doc, delims)
     return doc if delims.nil? || delims.size < 2
     m = doc.split(/(#{Regexp.escape(delims[0])}|#{Regexp.escape(delims[1])})/)
@@ -31,13 +27,13 @@ module Html2Doc
     end.join
   end
-  def self.progress_conv(idx, step, total, threshold, msg)
+  def progress_conv(idx, step, total, threshold, msg)
     return unless (idx % step).zero? && total > threshold && idx.positive?
     warn "#{msg} #{idx} of #{total}"
   end
-  def self.unwrap_accents(doc)
+  def unwrap_accents(doc)
     doc.xpath("//*[@accent = 'true']").each do |x|
       x.elements.length > 1 or next
       x.elements[1].name == "mrow" and
@@ -47,7 +43,7 @@ module Html2Doc
   end
   # random fixes to MathML input that OOXML needs to render properly
-  def self.ooxml_cleanup(math, docnamespaces)
+  def ooxml_cleanup(math, docnamespaces)
     math = unwrap_accents(
       mathml_preserve_space(
         mathml_insert_rows(math, docnamespaces), docnamespaces
@@ -57,7 +53,7 @@ module Html2Doc
     math
   end
-  def self.mathml_insert_rows(math, docnamespaces)
+  def mathml_insert_rows(math, docnamespaces)
     math.xpath(%w(msup msub msubsup munder mover munderover)
             .map { |m| ".//xmlns:#{m}" }.join(" | "), docnamespaces).each do |x|
       next unless x.next_element && x.next_element != "mrow"
@@ -67,7 +63,7 @@ module Html2Doc
     math
   end
-  def self.mathml_preserve_space(math, docnamespaces)
+  def mathml_preserve_space(math, docnamespaces)
     math.xpath(".//xmlns:mtext", docnamespaces).each do |x|
       x.children = x.children.to_xml.gsub(/^\s/, "&#xA0;").gsub(/\s$/, "&#xA0;")
     end
@@ -76,7 +72,7 @@ module Html2Doc
   HTML_NS = 'xmlns="http://www.w3.org/1999/xhtml"'.freeze
-  def self.unitalic(math)
+  def unitalic(math)
     math.xpath(".//xmlns:r[xmlns:rPr[not(xmlns:scr)]/xmlns:sty[@m:val = 'p']]").each do |x|
       x.wrap("<span #{HTML_NS} style='font-style:normal;'></span>")
     end
@@ -122,7 +118,7 @@ module Html2Doc
     math
   end
-  def self.to_plane1(xml, font)
+  def to_plane1(xml, font)
     xml.traverse do |n|
       next unless n.text?
@@ -131,7 +127,7 @@ module Html2Doc
     xml
   end
-  def self.mathml_to_ooml(docxml)
+  def mathml_to_ooml(docxml)
     docnamespaces = docxml.collect_namespaces
     m = docxml.xpath("//*[local-name() = 'math']")
     m.each_with_index do |x, i|
@@ -140,28 +136,45 @@ module Html2Doc
     end
   end
-  # We need span and em not to be namespaced. Word can't deal with explicit
+  # We need span and em not to be namespaced. Word can't deal with explicit
   # namespaces.
   # We will end up stripping them out again under Nokogiri 1.11, which correctly
   # insists on inheriting namespace from parent.
-  def self.ooml_clean(xml)
+  def ooml_clean(xml)
     xml.to_s
       .gsub(/<\?[^>]+>\s*/, "")
       .gsub(/ xmlns(:[^=]+)?="[^"]+"/, "")
       .gsub(%r{<(/)?(?!span)(?!em)([a-z])}, "<\\1m:\\2")
   end
-  def self.mathml_to_ooml1(xml, docnamespaces)
+  def mathml_to_ooml1(xml, docnamespaces)
     doc = Nokogiri::XML::Document::new
     doc.root = ooxml_cleanup(xml, docnamespaces)
-      ooxml = ooml_clean(unitalic(esc_space(@xsltemplate.transform(doc))))
+    ooxml = ooml_clean(unitalic(esc_space(accent_tr(@xsltemplate.transform(doc)))))
     ooxml = uncenter(xml, ooxml)
     xml.swap(ooxml)
   end
+  def accent_tr(xml)
+    xml.xpath(".//*[local-name()='accPr']/*[local-name()='chr']").each do |x|
+      x["m:val"] &&= accent_tr1(x["m:val"])
+      x["val"] &&= accent_tr1(x["val"])
+    end
+    xml
+  end
+  def accent_tr1(accent)
+    case accent
+    when "\u2192" then "\u20D7"
+    when "^" then "\u0302"
+    when "~" then "\u0303"
+    else accent
+    end
+  end
   # escape space as &#x32;; we are removing any spaces generated by
   # XML indentation
-  def self.esc_space(xml)
+  def esc_space(xml)
     xml.traverse do |n|
       next unless n.text?
@@ -172,7 +185,7 @@ module Html2Doc
   # if oomml has no siblings, by default it is centered; override this with
   # left/right if parent is so tagged
-  def self.uncenter(math, ooxml)
+  def uncenter(math, ooxml)
     alignnode = math.at(".//ancestor::*[@style][local-name() = 'p' or "\
                         "local-name() = 'div' or local-name() = 'td']/@style")
     return ooxml unless alignnode && (math.next == nil && math.previous == nil)
@@ -180,7 +193,7 @@ module Html2Doc
     %w(left right).each do |dir|
       if alignnode.text.include? ("text-align:#{dir}")
         ooxml = "<m:oMathPara><m:oMathParaPr><m:jc "\
-          "m:val='#{dir}'/></m:oMathParaPr>#{ooxml}</m:oMathPara>"
+                "m:val='#{dir}'/></m:oMathParaPr>#{ooxml}</m:oMathPara>"
       end
     end
     ooxml

data/lib/html2doc/mime.rb CHANGED Viewed

@@ -4,8 +4,8 @@ require "mime/types"
 require "image_size"
 require "fileutils"
-module Html2Doc
-  def self.mime_preamble(boundary, filename, result)
+class Html2Doc
+  def mime_preamble(boundary, filename, result)
     <<~"PREAMBLE"
       MIME-Version: 1.0
       Content-Type: multipart/related; boundary="#{boundary}"
@@ -20,7 +20,7 @@ module Html2Doc
     PREAMBLE
   end
-  def self.mime_attachment(boundary, _filename, item, dir)
+  def mime_attachment(boundary, _filename, item, dir)
     content_type = mime_type(item)
     text_mode = %w[text application].any? { |p| content_type.start_with? p }
@@ -40,19 +40,19 @@ module Html2Doc
     FILE
   end
-  def self.mime_type(item)
+  def mime_type(item)
     types = MIME::Types.type_for(item)
     type = types ? types.first.to_s : 'text/plain; charset="utf-8"'
     type = %(#{type} charset="utf-8") if /^text/.match(type) && types
     type
   end
-  def self.mime_boundary
+  def mime_boundary
     salt = UUIDTools::UUID.random_create.to_s.gsub(/-/, ".")[0..17]
     "----=_NextPart_#{salt}"
   end
-  def self.mime_package(result, filename, dir)
+  def mime_package(result, filename, dir)
     boundary = mime_boundary
     mhtml = mime_preamble(boundary, "#{filename}.htm", result)
     mhtml += mime_attachment(boundary, "#{filename}.htm", "filelist.xml", dir)
@@ -66,7 +66,7 @@ module Html2Doc
     File.open("#{filename}.doc", "w:UTF-8") { |f| f.write contentid(mhtml) }
   end
-  def self.contentid(mhtml)
+  def contentid(mhtml)
     mhtml.gsub %r{(<img[^>]*?src=")([^\"']+)(['"])}m do |m|
       repl = "#{$1}cid:#{File.basename($2)}#{$3}"
       /^data:|^https?:/.match($2) ? m : repl
@@ -77,7 +77,7 @@ module Html2Doc
   end
   # max width for Word document is 400, max height is 680
-  def self.image_resize(img, path, maxheight, maxwidth)
+  def image_resize(img, path, maxheight, maxwidth)
     realsize = ImageSize.path(path).size
     s = [img["width"].to_i, img["height"].to_i]
     s = realsize if s[0].zero? && s[1].zero?
@@ -92,27 +92,28 @@ module Html2Doc
   IMAGE_PATH = "//*[local-name() = 'img' or local-name() = 'imagedata']".freeze
-  def self.mkuuid
+  def mkuuid
     UUIDTools::UUID.random_create.to_s
   end
-  def self.warnsvg(src)
+  def warnsvg(src)
     warn "#{src}: SVG not supported" if /\.svg$/i.match?(src)
   end
-  def self.localname(src, localdir)
+  def localname(src, localdir)
     %r{^([A-Z]:)?/}.match?(src) ? src : File.join(localdir, src)
   end
   # only processes locally stored images
-  def self.image_cleanup(docxml, dir, localdir)
+  def image_cleanup(docxml, dir, localdir)
     docxml.traverse do |i|
+      src = i["src"]
       next unless i.element? && %w(img v:imagedata).include?(i.name)
-      next if /^http/.match? i["src"]
-      next if %r{^data:(image|application)/[^;]+;base64}.match? i["src"]
+      next if src.nil? || src.empty? || /^http/.match?(src)
+      next if %r{^data:(image|application)/[^;]+;base64}.match? src
-      local_filename = localname(i["src"], localdir)
-      new_filename = "#{mkuuid}#{File.extname(i['src'])}"
+      local_filename = localname(src, localdir)
+      new_filename = "#{mkuuid}#{File.extname(src)}"
       FileUtils.cp local_filename, File.join(dir, new_filename)
       i["width"], i["height"] = image_resize(i, local_filename, 680, 400)
       i["src"] = File.join(File.basename(dir), new_filename)
@@ -122,13 +123,13 @@ module Html2Doc
   # do not parse the header through Nokogiri, since it will contain
   # non-XML like <![if !supportFootnotes]>
-  def self.header_image_cleanup(doc, dir, filename, localdir)
+  def header_image_cleanup(doc, dir, filename, localdir)
     doc.split(%r{(<img [^>]*>|<v:imagedata [^>]*>)}).each_slice(2).map do |a|
       header_image_cleanup1(a, dir, filename, localdir)
     end.join
   end
-  def self.header_image_cleanup1(a, dir, _filename, localdir)
+  def header_image_cleanup1(a, dir, _filename, localdir)
     if a.size == 2 && !(/ src="https?:/.match a[1]) &&
         !(%r{ src="data:(image|application)/[^;]+;base64}.match a[1])
       m = / src=['"](?<src>[^"']+)['"]/.match a[1]
@@ -140,7 +141,7 @@ module Html2Doc
     a.join
   end
-  def self.generate_filelist(filename, dir)
+  def generate_filelist(filename, dir)
     File.open(File.join(dir, "filelist.xml"), "w") do |f|
       f.write %{<xml xmlns:o="urn:schemas-microsoft-com:office:office">
         <o:MainFile HRef="../#{filename}.htm"/>}

data/lib/html2doc/notes.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 require "uuidtools"
-module Html2Doc
-  def self.footnotes(docxml)
+class Html2Doc
+  def footnotes(docxml)
     i = 1
     fn = []
     docxml.xpath("//a").each do |a|
@@ -12,7 +12,7 @@ module Html2Doc
     process_footnote_texts(docxml, fn)
   end
-  def self.process_footnote_texts(docxml, footnotes)
+  def process_footnote_texts(docxml, footnotes)
     body = docxml.at("//body")
     list = body.add_child("<div style='mso-element:footnote-list'/>")
     footnotes.each_with_index do |f, i|
@@ -23,7 +23,7 @@ module Html2Doc
     footnote_cleanup(docxml)
   end
-  def self.footnote_div_to_p(elem)
+  def footnote_div_to_p(elem)
     if %w{div aside}.include? elem.name
       if elem.at(".//p")
         elem.replace(elem.children)
@@ -37,7 +37,7 @@ module Html2Doc
   FN = "<span class='MsoFootnoteReference'>"\
     "<span style='mso-special-character:footnote'/></span>".freeze
-  def self.footnote_container(docxml, idx)
+  def footnote_container(docxml, idx)
     ref = docxml&.at("//a[@href='#_ftn#{idx}']")&.children&.to_xml(indent: 0)
       &.gsub(/>\n</, "><") || FN
     <<~DIV
@@ -47,7 +47,7 @@ module Html2Doc
     DIV
   end
-  def self.process_footnote_link(docxml, elem, idx, footnote)
+  def process_footnote_link(docxml, elem, idx, footnote)
     return false unless footnote?(elem)
     href = elem["href"].gsub(/^#/, "")
@@ -62,7 +62,7 @@ module Html2Doc
     footnote << transform_footnote_text(note)
   end
-  def self.process_footnote_link1(elem)
+  def process_footnote_link1(elem)
     elem.children.each do |c|
       if c.name == "span" && c["class"] == "MsoFootnoteReference"
         c.replace(FN)
@@ -72,7 +72,7 @@ module Html2Doc
     end
   end
-  def self.transform_footnote_text(note)
+  def transform_footnote_text(note)
     note["id"] = ""
     note.xpath(".//div").each { |div| div.replace(div.children) }
     note.xpath(".//aside | .//p").each do |p|
@@ -82,12 +82,12 @@ module Html2Doc
     note.remove
   end
-  def self.footnote?(elem)
+  def footnote?(elem)
     elem["epub:type"]&.casecmp("footnote")&.zero? ||
       elem["class"]&.casecmp("footnote")&.zero?
   end
-  def self.set_footnote_link_attrs(elem, idx)
+  def set_footnote_link_attrs(elem, idx)
     elem["style"] = "mso-footnote-id:ftn#{idx}"
     elem["href"] = "#_ftn#{idx}"
     elem["name"] = "_ftnref#{idx}"
@@ -99,7 +99,7 @@ module Html2Doc
   # to p). We do not expect any <a name> or links back to text; if they
   # are present in the HTML, they need to have been cleaned out before
   # passing to this gem
-  def self.footnote_cleanup(docxml)
+  def footnote_cleanup(docxml)
     docxml.xpath('//div[@style="mso-element:footnote"]/a')
       .each do |x|
       n = x.next_element

data/lib/html2doc/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
-module Html2Doc
-  VERSION = "1.3.0.1".freeze
+class Html2Doc
+  VERSION = "1.4.0.1".freeze
 end

data/spec/html2doc_spec.rb CHANGED Viewed

@@ -76,7 +76,7 @@ WORD_FTR1 = <<~FTR.freeze
   Content-ID: <filelist.xml>
   Content-Disposition: inline; filename="filelist.xml"
   Content-Transfer-Encoding: base64
-  Content-Type: #{Html2Doc::mime_type('filelist.xml')}
+  Content-Type: #{Html2Doc.new({}).mime_type('filelist.xml')}
   PHhtbCB4bWxuczpvPSJ1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTpvZmZpY2UiPgog
   ICAgICAgIDxvOk1haW5GaWxlIEhSZWY9Ii4uL3Rlc3QuaHRtIi8+ICA8bzpGaWxlIEhSZWY9ImZp
@@ -90,7 +90,7 @@ WORD_FTR2 = <<~FTR.freeze
   Content-ID: <filelist.xml>
   Content-Disposition: inline; filename="filelist.xml"
   Content-Transfer-Encoding: base64
-  Content-Type: #{Html2Doc::mime_type('filelist.xml')}
+  Content-Type: #{Html2Doc.new({}).mime_type('filelist.xml')}
   PHhtbCB4bWxuczpvPSJ1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTpvZmZpY2UiPgog
   ICAgICAgIDxvOk1haW5GaWxlIEhSZWY9Ii4uL3Rlc3QuaHRtIi8+ICA8bzpGaWxlIEhSZWY9ImZp
   bGVsaXN0LnhtbCIvPgogIDxvOkZpbGUgSFJlZj0iaGVhZGVyLmh0bWwiLz4KPC94bWw+Cg==
@@ -102,7 +102,7 @@ WORD_FTR3 = <<~FTR.freeze
   Content-ID: <filelist.xml>
   Content-Disposition: inline; filename="filelist.xml"
   Content-Transfer-Encoding: base64
-  Content-Type: #{Html2Doc::mime_type('filelist.xml')}
+  Content-Type: #{Html2Doc.new({}).mime_type('filelist.xml')}
   PHhtbCB4bWxuczpvPSJ1cm46c2NoZW1hcy1taWNyb3NvZnQtY29tOm9mZmljZTpvZmZpY2UiPgog
   ICAgICAgIDxvOk1haW5GaWxlIEhSZWY9Ii4uL3Rlc3QuaHRtIi8+ICA8bzpGaWxlIEhSZWY9IjFh
@@ -278,18 +278,18 @@ RSpec.describe Html2Doc do
   end
   it "preserves Word HTML directives" do
-    Html2Doc.process(html_input(%[A<!--[if gte mso 9]>X<![endif]-->B]), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(%[A<!--[if gte mso 9]>X<![endif]-->B]))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
         #{word_body(%{A<!--[if gte mso 9]>X<![endif]-->B},
-                   '<div style="mso-element:footnote-list"/>')}
+                    '<div style="mso-element:footnote-list"/>')}
         #{WORD_FTR1}
       OUTPUT
   end
   it "processes a blank document" do
-    Html2Doc.process(html_input(""), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(""))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -299,15 +299,15 @@ RSpec.describe Html2Doc do
   it "removes any temp files" do
     File.delete("test.doc")
-    Html2Doc.process(html_input(""), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(""))
     expect(File.exist?("test.doc")).to be true
     expect(File.exist?("test.htm")).to be false
     expect(File.exist?("test_files")).to be false
   end
   it "processes a stylesheet in an HTML document with a title" do
-    Html2Doc.process(html_input(""),
-                     filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
+    Html2Doc.new(filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
+      .process(html_input(""))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -316,9 +316,11 @@ RSpec.describe Html2Doc do
   end
   it "processes a stylesheet in an HTML document without a title" do
-    Html2Doc.process(html_input_no_title(""),
-                     filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
-    expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
+    Html2Doc.new(filename: "test",
+                 stylesheet: "lib/html2doc/wordstyle.css")
+      .process(html_input_no_title(""))
+    expect(guid_clean(File.read("test.doc",
+                                encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR.sub('<title>blank</title>', '')}
         #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -327,12 +329,14 @@ RSpec.describe Html2Doc do
   end
   it "processes a stylesheet in an HTML document with an empty head" do
-    Html2Doc.process(html_input_empty_head(""),
-                     filename: "test", stylesheet: "lib/html2doc/wordstyle.css")
+    Html2Doc.new(filename: "test",
+                 stylesheet: "lib/html2doc/wordstyle.css")
+      .process(html_input_empty_head(""))
     word_hdr_end = WORD_HDR_END
       .sub(%(<meta name="Originator" content="Me"/>\n), "")
       .sub("</style>\n</head>", "</style></head>")
-    expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
+    expect(guid_clean(File.read("test.doc",
+                                encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR.sub('<title>blank</title>', '')}
         #{DEFAULT_STYLESHEET}
@@ -342,8 +346,9 @@ RSpec.describe Html2Doc do
   end
   it "processes a header" do
-    Html2Doc.process(html_input(""),
-                     filename: "test", header_file: "spec/header.html")
+    Html2Doc.new(filename: "test",
+                 header_file: "spec/header.html")
+      .process(html_input(""))
     html = guid_clean(File.read("test.doc", encoding: "utf-8"))
     hdr = Base64.decode64(
       html
@@ -365,8 +370,9 @@ RSpec.describe Html2Doc do
   end
   it "processes a header with an image" do
-    Html2Doc.process(html_input(""),
-                     filename: "test", header_file: "spec/header_img.html")
+    Html2Doc.new(filename: "test",
+                 header_file: "spec/header_img.html")
+      .process(html_input(""))
     doc = guid_clean(File.read("test.doc", encoding: "utf-8"))
     expect(doc).to match(%r{Content-Type: image/png})
     expect(doc).to match(%r{iVBORw0KGgoAAAANSUhEUgAAA5cAAAN7CAYAAADRE24cAAAgAElEQVR4XuydB5gUxdaGC65gTogB})
@@ -381,8 +387,9 @@ RSpec.describe Html2Doc do
                                            "19160-6.png"))),
       )
     end
-    Html2Doc.process(html_input(""),
-                     filename: "test", header_file: "spec/header_img1.html")
+    Html2Doc.new(filename: "test",
+                 header_file: "spec/header_img1.html")
+      .process(html_input(""))
     doc = guid_clean(File.read("test.doc", encoding: "utf-8"))
     expect(doc).to match(%r{Content-Type: image/png})
     expect(doc).to match(%r{iVBORw0KGgoAAAANSUhEUgAAA5cAAAN7CAYAAADRE24cAAAgAElEQVR4XuydB5gUxdaGC65gTogB})
@@ -391,7 +398,7 @@ RSpec.describe Html2Doc do
   it "processes a populated document" do
     simple_body = "<h1>Hello word!</h1>
     <div>This is a very simple document</div>"
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -401,9 +408,11 @@ RSpec.describe Html2Doc do
   end
   it "processes AsciiMath" do
-    Html2Doc.process(html_input(%[<div>{{sum_(i=1)^n i^3=((n(n+1))/2)^2 text("integer"))}}</div>]),
-                     filename: "test", asciimathdelims: ["{{", "}}"])
-    expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
+    Html2Doc.new(filename: "test",
+                 asciimathdelims: ["{{", "}}"])
+      .process(html_input(%[<div>{{sum_(i=1)^n i^3=((n(n+1))/2)^2 text("integer"))}}</div>]))
+    expect(guid_clean(File.read("test.doc",
+                                encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
         #{word_body(%{
@@ -416,8 +425,8 @@ RSpec.describe Html2Doc do
   end
   it "processes mstyle" do
-    Html2Doc.process(html_input(%[<div>{{bb (-log_2 (p_u)) bb "BB" bbb "BBB" cc "CC" bcc "BCC" tt "TT" fr "FR" bfr "BFR" sf "SF" bsf "BSFα" sfi "SFI" sfbi "SFBIα" bii "BII" ii "II"}}</div>]),
-                     filename: "test", asciimathdelims: ["{{", "}}"])
+    Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+      .process(html_input(%[<div>{{bb (-log_2 (p_u)) bb "BB" bbb "BBB" cc "CC" bcc "BCC" tt "TT" fr "FR" bfr "BFR" sf "SF" bsf "BSFα" sfi "SFI" sfbi "SFBIα" bii "BII" ii "II"}}</div>]))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -431,8 +440,8 @@ RSpec.describe Html2Doc do
   end
   it "processes spaces in AsciiMath" do
-    Html2Doc.process(html_input(%[<div>{{text " integer ")}}</div>]),
-                     filename: "test", asciimathdelims: ["{{", "}}"])
+    Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+      .process(html_input(%[<div>{{text " integer ")}}</div>]))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -446,10 +455,10 @@ RSpec.describe Html2Doc do
   end
   it "processes spaces in MathML mtext" do
-    Html2Doc.process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
+    Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+      .process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
                                 <mrow><mi>H</mi><mtext> original </mtext><mi>J</mi></mrow>
-                                </math></div>"),
-                     filename: "test", asciimathdelims: ["{{", "}}"])
+                                </math></div>"))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -461,15 +470,16 @@ RSpec.describe Html2Doc do
       OUTPUT
   end
-  it "unwraps accent in MathML" do
-    Html2Doc.process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
+  it "unwraps and converts accent in MathML" do
+    Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+      .process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
                                 <mover accent='true'><mrow><mi>p</mi></mrow><mrow><mo>^</mo></mrow></mover>
-</math></div>"), filename: "test", asciimathdelims: ["{{", "}}"])
+</math></div>"))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
         #{word_body('<div><m:oMath>
-                <m:acc><m:accPr><m:chr m:val="^"></m:chr></m:accPr><m:e><m:r><m:t>p</m:t></m:r></m:e></m:acc>
+                <m:acc><m:accPr><m:chr m:val="&#x302;"></m:chr></m:accPr><m:e><m:r><m:t>p</m:t></m:r></m:e></m:acc>
                 </m:oMath>
                 </div>', '<div style="mso-element:footnote-list"/>')}
         #{WORD_FTR1}
@@ -477,8 +487,8 @@ RSpec.describe Html2Doc do
   end
   it "left-aligns AsciiMath" do
-    Html2Doc.process(html_input("<div style='text-align:left;'>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"),
-                     filename: "test", asciimathdelims: ["{{", "}}"])
+    Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+      .process(html_input("<div style='text-align:left;'>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -492,9 +502,11 @@ RSpec.describe Html2Doc do
   end
   it "right-aligns AsciiMath" do
-    Html2Doc.process(html_input("<div style='text-align:right;'>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"),
-                     filename: "test", asciimathdelims: ["{{", "}}"])
-    expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
+    Html2Doc.new(filename: "test",
+                 asciimathdelims: ["{{", "}}"])
+      .process(html_input("<div style='text-align:right;'>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"))
+    expect(guid_clean(File.read("test.doc",
+                                encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
         #{word_body(%{
@@ -509,21 +521,21 @@ RSpec.describe Html2Doc do
   it "raises error in processing of broken AsciiMath" do
     begin
       expect do
-        Html2Doc.process(html_input(%[<div style='text-align:right;'>{{u_c = 6.6"unitsml(kHz)}}</div>]),
-                         filename: "test", asciimathdelims: ["{{", "}}"])
+        Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+          .process(html_input(%[<div style='text-align:right;'>{{u_c = 6.6"unitsml(kHz)}}</div>]))
       end.to output('parsing: u_c = 6.6"unitsml(kHz)').to_stderr
     rescue StandardError
     end
     expect do
-      Html2Doc.process(html_input(%[<div style='text-align:right;'>{{u_c = 6.6"unitsml(kHz)}}</div>]),
-                       filename: "test", asciimathdelims: ["{{", "}}"])
+      Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+        .process(html_input(%[<div style='text-align:right;'>{{u_c = 6.6"unitsml(kHz)}}</div>]))
     end.to raise_error(StandardError)
   end
   it "wraps msup after munderover in MathML" do
-    Html2Doc.process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
-<munderover><mo>&#x2211;</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi></mrow></munderover><msup><mn>2</mn><mrow><mi>i</mi></mrow></msup></math></div>"),
-                     filename: "test", asciimathdelims: ["{{", "}}"])
+    Html2Doc.new(filename: "test", asciimathdelims: ["{{", "}}"])
+      .process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
+<munderover><mo>&#x2211;</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi></mrow></munderover><msup><mn>2</mn><mrow><mi>i</mi></mrow></msup></math></div>"))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -537,7 +549,7 @@ RSpec.describe Html2Doc do
   it "processes tabs" do
     simple_body = "<h1>Hello word!</h1>
     <div>This is a very &tab; simple document</div>"
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -550,7 +562,7 @@ RSpec.describe Html2Doc do
     simple_body = '<h1>Hello word!</h1>
     <p>This is a very simple document</p>
     <p class="x">This style stays</p>'
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -565,7 +577,7 @@ RSpec.describe Html2Doc do
     <li>This is a very simple document</li>
     <li class="x">This style stays</li>
     </ul>'
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -576,8 +588,8 @@ RSpec.describe Html2Doc do
   it "resizes images for height, in a file in a subdirectory" do
     simple_body = '<img src="19160-6.png">'
-    Html2Doc.process(html_input(simple_body), filename: "spec/test",
-                                              imagedir: "spec")
+    Html2Doc.new(filename: "spec/test", imagedir: "spec")
+      .process(html_input(simple_body))
     testdoc = File.read("spec/test.doc", encoding: "utf-8")
     expect(testdoc).to match(%r{Content-Type: image/png})
     expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
@@ -589,7 +601,8 @@ RSpec.describe Html2Doc do
   it "resizes images for width" do
     simple_body = '<img src="spec/19160-7.gif">'
-    Html2Doc.process(html_input(simple_body), filename: "test", imagedir: ".")
+    Html2Doc.new(filename: "test", imagedir: ".")
+      .process(html_input(simple_body))
     testdoc = File.read("test.doc", encoding: "utf-8")
     expect(testdoc).to match(%r{Content-Type: image/gif})
     expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
@@ -601,7 +614,8 @@ RSpec.describe Html2Doc do
   it "resizes images for height" do
     simple_body = '<img src="spec/19160-8.jpg">'
-    Html2Doc.process(html_input(simple_body), filename: "test", imagedir: ".")
+    Html2Doc.new(filename: "test", imagedir: ".")
+      .process(html_input(simple_body))
     testdoc = File.read("test.doc", encoding: "utf-8")
     expect(testdoc).to match(%r{Content-Type: image/jpeg})
     expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
@@ -613,48 +627,49 @@ RSpec.describe Html2Doc do
   it "resizes images with missing or auto sizes" do
     image = { "src" => "spec/19160-8.jpg" }
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [30, 100]
     image["width"] = "20"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [20, 65]
     image.delete("width")
     image["height"] = "50"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [15, 50]
     image.delete("height")
     image["width"] = "500"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [30, 100]
     image.delete("width")
     image["height"] = "500"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [30, 100]
     image["width"] = "20"
     image["height"] = "auto"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [20, 65]
     image["width"] = "auto"
     image["height"] = "50"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [15, 50]
     image["width"] = "500"
     image["height"] = "auto"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [30, 100]
     image["width"] = "auto"
     image["height"] = "500"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [30, 100]
     image["width"] = "auto"
     image["height"] = "auto"
-    expect(Html2Doc.image_resize(image, "spec/19160-8.jpg", 100, 100))
+    expect(Html2Doc.new({}).image_resize(image, "spec/19160-8.jpg", 100, 100))
       .to eq [30, 100]
   end
   it "does not move images if they are external URLs" do
     simple_body = '<img src="https://example.com/19160-6.png">'
-    Html2Doc.process(html_input(simple_body), filename: "test", imagedir: ".")
+    Html2Doc.new(filename: "test", imagedir: ".")
+      .process(html_input(simple_body))
     testdoc = File.read("test.doc", encoding: "utf-8")
     expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
       #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -665,8 +680,8 @@ RSpec.describe Html2Doc do
   it "deals with absolute image locations" do
     simple_body = %{<img src="#{__dir__}/19160-6.png">}
-    Html2Doc.process(html_input(simple_body), filename: "spec/test",
-                                              imagedir: ".")
+    Html2Doc.new(filename: "spec/test", imagedir: ".")
+      .process(html_input(simple_body))
     testdoc = File.read("spec/test.doc", encoding: "utf-8")
     expect(testdoc).to match(%r{Content-Type: image/png})
     expect(image_clean(guid_clean(testdoc))).to match_fuzzy(<<~OUTPUT)
@@ -687,7 +702,7 @@ RSpec.describe Html2Doc do
      document<a epub:type="footnote" href="#a1">1</a> allegedly<a epub:type="footnote" href="#a2">2</a></div>
      <aside id="a1">Footnote</aside>
      <aside id="a2">Other Footnote</aside>'
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -707,7 +722,7 @@ RSpec.describe Html2Doc do
      document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
      <aside id="a1">Footnote</aside>
      <aside id="a2">Other Footnote</aside>'
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -727,7 +742,7 @@ RSpec.describe Html2Doc do
      document<a class="footnote" href="#a1">(<span class="MsoFootnoteReference">1</span>)</a> allegedly<a class="footnote" href="#a2">2</a></div>
      <aside id="a1">Footnote</aside>
      <aside id="a2">Other Footnote</aside>'
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -747,7 +762,7 @@ RSpec.describe Html2Doc do
      document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
      <aside id="a1"><p>Footnote</p></aside>
      <div id="a2"><p>Other Footnote</p></div>'
-    Html2Doc.process(html_input(simple_body), filename: "test")
+    Html2Doc.new(filename: "test").process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -767,8 +782,8 @@ RSpec.describe Html2Doc do
       <div><ul id="0">
       <li><div><p><ol id="1"><li><ul id="2"><li><p><ol id="3"><li><ol id="4"><li>A</li><li><p>B</p><p>B2</p></li><li>C</li></ol></li></ol></p></li></ul></li></ol></p></div></li><div><ul id="5"><li>C</li></ul></div>
     BODY
-    Html2Doc.process(html_input(simple_body),
-                     filename: "test", liststyles: { ul: "l1", ol: "l2" })
+    Html2Doc.new(filename: "test", liststyles: { ul: "l1", ol: "l2" })
+      .process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -786,8 +801,8 @@ RSpec.describe Html2Doc do
       <ol id="1"><li><div><p><ol id="2"><li><ul id="3"><li><p><ol id="4"><li><ol id="5"><li>A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ol>
       <ol id="6"><li><div><p><ol id="7"><li><ul id="8"><li><p><ol id="9"><li><ol id="10"><li>A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ol></div>
     BODY
-    Html2Doc.process(html_input(simple_body),
-                     filename: "test", liststyles: { ul: "l1", ol: "l2" })
+    Html2Doc.new(filename: "test", liststyles: { ul: "l1", ol: "l2" })
+      .process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -808,9 +823,10 @@ RSpec.describe Html2Doc do
       <div><ul class="other" id="10">
       <li><div><p><ol id="11"><li><ul id="12"><li><p><ol id="13"><li><ol id="14"><li>A</li><li><p>B</p><p>B2</p></li><li>C</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>
     BODY
-    Html2Doc.process(html_input(simple_body),
-                     filename: "test",
-                     liststyles: { ul: "l1", ol: "l2", steps: "l3" })
+    Html2Doc.new(filename: "test",
+                 liststyles: { ul: "l1", ol: "l2",
+                               steps: "l3" })
+      .process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -832,8 +848,8 @@ RSpec.describe Html2Doc do
         <p id="b"/>
       </div>
     BODY
-    Html2Doc.process(html_input(simple_body),
-                     filename: "test", liststyles: { ul: "l1", ol: "l2" })
+    Html2Doc.new(filename: "test", liststyles: { ul: "l1", ol: "l2" })
+      .process(html_input(simple_body))
     expect(guid_clean(File.read("test.doc", encoding: "utf-8")))
       .to match_fuzzy(<<~OUTPUT)
         #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
@@ -848,8 +864,8 @@ RSpec.describe Html2Doc do
   it "test image base64 image encoding" do
     simple_body = '<img src="19160-6.png">'
-    Html2Doc.process(html_input(simple_body),
-                     filename: "spec/test", debug: true, imagedir: "spec")
+    Html2Doc.new(filename: "spec/test", debug: true, imagedir: "spec")
+      .process(html_input(simple_body))
     testdoc = File.read("spec/test.doc", encoding: "utf-8")
     base64_image = testdoc[/image\/png\n\n(.*?)\n\n----/m, 1].gsub!("\n", "")
     base64_image_basename = testdoc[%r{Content-ID: <([0-9a-z\-]+)\.png}m, 1]

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: html2doc
 version: !ruby/object:Gem::Version
-  version: 1.3.0.1
+  version: 1.4.0.1
 platform: ruby
 authors:
 - Ribose Inc.
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-01-22 00:00:00.000000000 Z
+date: 2022-05-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: asciimath
@@ -334,7 +334,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.2.32
+rubygems_version: 3.3.9
 signing_key:
 specification_version: 4
 summary: Convert HTML document to Microsoft Word document