RubyGems - html2doc - Versions diffs - 1.9.2 → 1.10.0 - Mend

html2doc 1.9.2 → 1.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a78b87b6a657f39c29e047bd6eddebefb3fad81ef91a863207bc4c10ff4c9670
-  data.tar.gz: 0ebf87be9c727d47d82f80d24d3ad2464d6fb2f63d148cad8bf4ae35774c6d86
+  metadata.gz: a9ef55b2a805994f5ca26253285cdc062ff7d9a16c12ce009935956513d3eb3c
+  data.tar.gz: 8229037b442e8c9fd93790b14ec25daff760b4f98697b787da2ae8daef344699
 SHA512:
-  metadata.gz: d069eacfc9454730817db1f5e3e81b197917f2a1a3d093892e2ae2142b9b530f60a70d0b283a109c7f403da4232bdbbd9fc238a34abb8b996888f712f10b0c63
-  data.tar.gz: e4a83b5cf61a6270d849356f5c54b72381963b51671bea6f14d11348acee83536854e01d3d3a1a0841bde2d23eb6793826d1cfbd7215fa50986330fe9e9db817
+  metadata.gz: 293dcfb6a88743f5a3a5e8bc7042d3a5f7c1e7f93683b22d21153591f2e82fdae5d3bb2c446817b2a06372469db09554078e2c5a804ff54cf2ba4ee54706e0bf
+  data.tar.gz: 4bfac10267769fecca5b85fe68319bf17fce9ed021e6c1534cf7d76cee34193468919f3d4e1f9114257884d09661d8502a7a0f1de06ab05586842c360a4a7228

data/README.adoc CHANGED Viewed

@@ -31,6 +31,7 @@ The gem currently does the following:
 * Resize any local images in the HTML file to fit within the maximum page size. (Word will otherwise crash on reading the document.)
 * Optionally apply list styles with predefined bullet and numbering from a Word CSS to the unordered and ordered lists in the document, restarting numbering for each ordered list.
 * Convert all lists to native Word HTML rendering (using paragraphs with `MsoListParagraphCxSpFirst, MsoListParagraphCxSpMiddle, MsoListParagraphCxSpLast` styles)
+* Generate additional list styles in CSS for any ordered lists with a new start number.
 * Convert any internal `@id` anchors to `a@name` anchors; Word only hyperlinks to the latter.
 * Generate a filelist.xml listing of all files to be bundled into the Word document.
 * Assign the class `MsoNormal` to any paragraphs that do not have a class, so that they can be treated as Normal Style when editing the Word document.
@@ -43,7 +44,7 @@ For a representative generator of HTML that uses this gem in postprocessing, see
 This gem generates `.doc` documents. Future versions may upgrade the output to `docx`.
-Because `.doc` is the format of an older version of Microsoft Word, the output of this gem do *not* support SVG graphics. (Word itself converts SVG into PNG when it saves documents as Word HTML, which is the input to this gem.)
+Because `.doc` is the format of an older version of Microsoft Word, the output of this gem do *not* support SVG graphics. Word itself converts SVG into PNG when it saves documents as Word HTML, which is the input to this gem. External consumers of this gem in Metanorma convert SVG to EMF.
 There there are two other Microsoft Word vendors in the Ruby ecosystem.
@@ -150,7 +151,9 @@ left-aligned or right-aligned, add `style="text-align:left"` or
 === Lists
-Natively, Word does not use `<ol>`, `<ul>`, or `<dl>` lists in its HTML exports at all: it uses paragraphs styled with list styles. If you save a Word document as HTML in order to use its CSS for Word documents generated by HTML, those styles will still work (with the caveat that you will need to extract the `@list` style specific to ordered and unordered lists, and pass it as a `liststyles` parameter to the conversion). Word HTML understands `<ol>, <ul>, <li>`, but its rendering is fragile: in particular, any instance of `<p>` within a `<li>` is treated as a new list item (so Word HTML will not let you have multi-paragraph list items if you use native HTML.) This gem now exports lists as Word HTML prefers to see them, with `MsoListParagraphCxSpFirst, MsoListParagraphCxSpMiddle, MsoListParagraphCxSpLast` styles. You will need to include these in the CSS stylesheet you supply, in order to get the right indentation for lists.
+Natively, Word does not use `<ol>`, `<ul>`, or `<dl>` lists in its HTML exports at all: it uses paragraphs styled with list styles. If you save a Word document as HTML in order to use its CSS for Word documents generated by HTML, those styles will still work (with the caveat that you will need to extract the `@list` style specific to ordered and unordered lists, and pass it as a `liststyles` parameter to the conversion). The gem will duplicate the ordered list style definition to provide new styles, in order to deal with custom numbering.
+Word HTML understands `<ol>, <ul>, <li>`, but its rendering is fragile: in particular, any instance of `<p>` within a `<li>` is treated as a new list item (so Word HTML will not let you have multi-paragraph list items if you use native HTML.) This gem now exports lists as Word HTML prefers to see them, with `MsoListParagraphCxSpFirst, MsoListParagraphCxSpMiddle, MsoListParagraphCxSpLast` styles. You will need to include these in the CSS stylesheet you supply, in order to get the right indentation for lists.
 == Example

data/lib/html2doc/base.rb CHANGED Viewed

@@ -13,7 +13,7 @@ class Html2Doc
     @imagedir = hash[:imagedir]
     @debug = hash[:debug]
     @liststyles = hash[:liststyles]
-    @stylesheet = hash[:stylesheet]
+    @stylesheet = read_stylesheet(hash[:stylesheet])
     @c = HTMLEntities.new
   end
@@ -74,8 +74,7 @@ class Html2Doc
   end
   def locate_landscape(_docxml)
-    css = read_stylesheet(@stylesheet)
-    @landscape = css.scan(/div\.\S+\s+\{\s*page:\s*[^;]+L;\s*\}/m)
+    @landscape = @stylesheet.scan(/div\.\S+\s+\{\s*page:\s*[^;]+L;\s*\}/m)
       .map { |e| e.sub(/^div\.(\S+).*$/m, "\\1") }
   end
@@ -99,11 +98,9 @@ class Html2Doc
     end
   end
-  def stylesheet(_filename, _header_filename, cssname)
-    stylesheet = read_stylesheet(cssname)
+  def stylesheet(_filename, _header_filename, _cssname)
+    stylesheet = "#{@stylesheet}\n#{@newliststyledefs}"
     xml = Nokogiri::XML("<style/>")
-    # s = Nokogiri::XML::CDATA.new(xml, "\n#{stylesheet}\n")
-    # xml.children.first << Nokogiri::XML::Comment.new(xml, s)
     xml.children.first << Nokogiri::XML::CDATA
       .new(xml, "\n<!--\n#{stylesheet}\n-->\n")
     xml.root.to_s

data/lib/html2doc/lists.rb CHANGED Viewed

@@ -4,8 +4,7 @@ require "nokogiri"
 class Html2Doc
   def style_list(elem, level, liststyle, listnumber)
-    return unless liststyle
+    liststyle or return
     if elem["style"]
       elem["style"] += ";"
     else
@@ -30,16 +29,37 @@ class Html2Doc
   def list_add(xpath, liststyles, listtype, level)
     xpath.each do |l|
-      level == 1 and l["seen"] = true and @listnumber += 1
+      level == 1 && l["seen"] = true and @listnumber += 1
       l["id"] ||= UUIDTools::UUID.random_create
+      liststyle = derive_liststyle(l, liststyles[listtype], level)
       (l.xpath(".//li") - l.xpath(".//ol//li | .//ul//li")).each do |li|
-        style_list(li, level, liststyles[listtype], @listnumber)
+        style_list(li, level, liststyle, @listnumber)
         list_add1(li, liststyles, listtype, level)
       end
       list_add_tail(l, liststyles, listtype, level)
     end
   end
+  def derive_liststyle(list, liststyle, level)
+    list["start"] && list["start"] != "1" or return liststyle
+    @liststyledefsidx += 1
+    ret = "l#{@liststyledefsidx}"
+    @newliststyledefs += newliststyle(list["start"], liststyle, ret, level)
+    ret
+  end
+  def newliststyle(start, liststyle, newstylename, level)
+    s = @liststyledefs[liststyle]
+      .gsub(/@list\s+#{liststyle}/, "@list #{newstylename}")
+      .sub(/@list\s+#{newstylename}\s+\{[^}]*\}/m, <<~LISTSTYLE)
+        @list #{newstylename}\n{mso-list-id:#{rand(100_000_000..999_999_999)};
+        mso-list-template-ids:#{rand(100_000_000..999_999_999)};}
+      LISTSTYLE
+      .sub(/@list\s+#{newstylename}:level#{level}\s+\{/m,
+           "\\0mso-level-start-at:#{start};\n")
+    "#{s}\n"
+  end
   def list_add_tail(list, liststyles, listtype, level)
     list.xpath(".//ul[not(ancestor::li/ancestor::*/@id = '#{list['id']}')] | "\
                ".//ol[not(ancestor::li/ancestor::*/@id = '#{list['id']}')]")
@@ -49,16 +69,15 @@ class Html2Doc
   end
   def list2para(list)
-    return if list.xpath("./li").empty?
+    list.xpath("./li").empty? and return
     list2para_position(list)
     list.xpath("./li").each do |l|
       l.name = "p"
       l["class"] ||= "MsoListParagraphCxSpMiddle"
-      next unless l.first_element_child&.name == "p"
+      l.first_element_child&.name == "p" or next
       l["style"] ||= ""
-      l["style"] += (l.first_element_child["style"]&.sub(/mso-list[^;]+;/, "") || "")
+      l["style"] += l.first_element_child["style"]
+        &.sub(/mso-list[^;]+;/, "") || ""
       l.first_element_child.replace(l.first_element_child.children)
     end
     list.replace(list.children)
@@ -100,12 +119,82 @@ class Html2Doc
   end
   def lists(docxml, liststyles)
-    return if liststyles.nil?
-    @listnumber = 0
+    liststyles.nil? and return
+    parse_stylesheet_line_styles
     liststyles.each_key { |k| lists1(docxml, liststyles, k) }
     lists_unstyled(docxml, liststyles)
     liststyles.has_key?(:ul) and docxml.xpath("//ul").each { |u| list2para(u) }
     liststyles.has_key?(:ol) and docxml.xpath("//ol").each { |u| list2para(u) }
   end
+  def parse_stylesheet_line_styles
+    @listnumber = 0
+    result = process_stylesheet_lines(@stylesheet.split("\n"))
+    @liststyledefs = clean_result_content(result)
+    @newliststyledefs = ""
+    @liststyledefsidx = @liststyledefs.keys.map do |k|
+      k.sub(/^.*(\d+)$/, "\\1").to_i
+    end.max
+  end
+  private
+  def extract_list_name(line)
+    match = line.match(/^\s*@list\s+([^:\s]+)(?::.*)?/)
+    match ? match[1] : nil
+  end
+  def list_declaration?(line)
+    !extract_list_name(line).nil?
+  end
+  def save_current_list(result, current_base, current_content)
+    current_base.nil? || current_content.empty? and return result
+    if result[current_base]
+      result[current_base] += current_content
+    else
+      result[current_base] = current_content
+    end
+    result
+  end
+  def process_stylesheet_lines(lines)
+    result = {}
+    current_base = nil
+    current_content = ""
+    parsing_active = false
+    lines.each do |line|
+      if list_declaration?(line)
+        base_name = extract_list_name(line)
+        if current_base == base_name
+          current_content += "#{line}\n"
+        else
+          # save accumulated list style definition, new list style
+          save_current_list(result, current_base, current_content)
+          current_base = base_name
+          current_content = "#{line}\n"
+        end
+        parsing_active = true
+      elsif parsing_active && line.include?("}")
+        # End of current block - add this line and stop parsing
+        current_content += "#{line}\n"
+        parsing_active = false
+      elsif parsing_active
+        # Continue adding content while parsing is active
+        current_content += "#{line}\n"
+      end
+      # If parsing_active is false and no @list declaration, skip the line
+    end
+    # Save the last list if we were still parsing
+    save_current_list(result, current_base, current_content)
+    result
+  end
+  def clean_result_content(result)
+    result.each { |k, v| result[k] = v.rstrip }
+    result
+  end
 end

data/lib/html2doc/mime.rb CHANGED Viewed

@@ -135,8 +135,7 @@ class Html2Doc
   # Scan both @stylesheet and docxml.to_xml (where @standardstylesheet has ended up)
   # Allow 0.9 * height to fit caption
   def page_dimensions(docxml)
-    stylesheet = read_stylesheet(@stylesheet)
-    page_size = find_page_size_in_doc(stylesheet, docxml.to_xml) or
+    page_size = find_page_size_in_doc(@stylesheet, docxml.to_xml) or
       return [680, 400]
     m_size = /size:\s*(\S+)\s+(\S+)\s*;/.match(page_size) or return [680, 400]
     m_marg = /margin:\s*(\S+)\s+(\S+)\s*(\S+)\s*(\S+)\s*;/.match(page_size) or

data/lib/html2doc/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 class Html2Doc
-  VERSION = "1.9.2".freeze
+  VERSION = "1.10.0".freeze
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: html2doc
 version: !ruby/object:Gem::Version
-  version: 1.9.2
+  version: 1.10.0
 platform: ruby
 authors:
 - Ribose Inc.
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2025-06-05 00:00:00.000000000 Z
+date: 2025-07-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: base64