html2doc 0.9.0 → 0.9.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 736a1b4a315563665d82191a2a21a63f1db634b9e847a145607cd8f0aa93d794
4
- data.tar.gz: 317869cf8d2ae0a7c4d7fee710bc29a472f2c09027d4845b3a1f70c2bee42df4
3
+ metadata.gz: f1c601730137c578b807990f177d8ded7b35c644d9239b57f22207d61d0a080d
4
+ data.tar.gz: 93f31b6831c3b9a32eb26165e0d702e7d0a92335fffc8660e3fd8215fbfc1d85
5
5
  SHA512:
6
- metadata.gz: e299333ccf126d00439413fd2e9fc45dc87eeb5756798ace28ec863b3463052dfcade119a3730140067cd48f6e37d468934d80f8432d882a974753e900adec2f
7
- data.tar.gz: 9a2337da976ae160b46284bc47f6f5131451bafff06cb1cc65d1cc9732d5e9122964b765bfabb25014c3179427f3170ba5f981b93f73d4ca4b50449b0e335383
6
+ metadata.gz: fcbc14ccff1c8ac0e11796def1bc0b15a64425ed09a5a1960ec5a26b1b486f7d7cf924ef19906c654729d6f4040a903006c1028189b6bbc3c76af78e0e612f11
7
+ data.tar.gz: 07f2968ad98ee9607c5b2c3b07c978c763cd276c83415399501f09a59a649b6a30a05b2bcbbeaa99e7dd50a148a6997c0f6d2f990406aefd40a5067a2bdb7fa1
@@ -3,14 +3,11 @@
3
3
  https://github.com/metanorma/html2doc/workflows/main/badge.svg
4
4
 
5
5
  image:https://img.shields.io/gem/v/html2doc.svg["Gem Version", link="https://rubygems.org/gems/html2doc"]
6
- image:https://github.com/metanorma/html2doc/workflows/ubuntu/badge.svg["Build Status", link="https://github.com/metanorma/html2doc/actions?workflow=ubuntu"]
7
- image:https://github.com/metanorma/html2doc/workflows/macos/badge.svg["Build Status", link="https://github.com/metanorma/html2doc/actions?workflow=macos"]
8
- image:https://github.com/metanorma/html2doc/workflows/windows/badge.svg["Build Status", link="https://github.com/metanorma/html2doc/actions?workflow=windows"]
6
+ image:https://travis-ci.com/metanorma/html2doc.svg["Build Status", link="https://travis-ci.com/metanorma/html2doc"]
7
+ image:https://ci.appveyor.com/api/projects/status/aspj42o70q3dnkf1?svg=true["Appveyor Build Status", link="https://ci.appveyor.com/project/metanorma/html2doc"]
9
8
  image:https://codeclimate.com/github/metanorma/html2doc/badges/gpa.svg["Code Climate", link="https://codeclimate.com/github/metanorma/html2doc"]
10
-
11
- ////
12
- image:https://ci.appveyor.com/api/projects/status/reqae7y99cfd0yod?svg=true["Appveyor Build Status", link="https://ci.appveyor.com/project/ribose/html2doc"]
13
- ////
9
+ image:https://img.shields.io/github/issues-pr-raw/metanorma/html2doc.svg["Pull Requests", link="https://github.com/metanorma/html2doc/pulls"]
10
+ image:https://img.shields.io/github/commits-since/metanorma/html2doc/latest.svg["Commits since latest",link="https://github.com/metanorma/html2doc/releases"]
14
11
 
15
12
  == Purpose
16
13
 
@@ -68,7 +65,7 @@ stylesheet:: is the full path filename of the CSS stylesheet for Microsoft Word-
68
65
  header_filename:: is the filename of the HTML document containing header and footer for the document, as well as footnote/endnote separators; if there is none, use nil. To generate your own such document, save a Word document with headers/footers and/or footnote/endnote separators as an HTML document; the `header.html` will be in the `{filename}.fld` folder generated along with the HTML. A sample file is available at https://github.com/metanorma/metanorma-iso/blob/master/lib/asciidoctor/iso/word/header.html
69
66
  dir:: is the folder that any ancillary files (images, headers, filelist) are to be saved to. If not provided, it will be created as `{filename}_files`. Anything in the directory will be attached to the Word document; so this folder should only contain the images that accompany the document. (If the images are elsewhere on the local drive, the gem will move them into the folder. External URL images are left alone, and are not downloaded.)
70
67
  asciimathdelims:: are the AsciiMath delimiters used in the text (an array of an opening and a closing delimiter). If none are provided, no AsciiMath conversion is attempted.
71
- liststyles:: a hash of list style labels in Word CSS, which are used to define the behaviour of list item labels (e.g. _i)_ vs _i._). The gem recognises the hash keys `ul`, `ol`. So if the appearance of an ordered list's item labels in the supplied stylesheet is governed by style `@list l1` (e.g. `@list l1:level1 {mso-level-text:"%1\)";}` appears in the stylesheet), call the method with `liststyles:{ol: "l1"}`.
68
+ liststyles:: a hash of list style labels in Word CSS, which are used to define the behaviour of list item labels (e.g. _i)_ vs _i._). The gem recognises the hash keys `ul`, `ol`. So if the appearance of an ordered list's item labels in the supplied stylesheet is governed by style `@list l1` (e.g. `@list l1:level1 {mso-level-text:"%1\)";}` appears in the stylesheet), call the method with `liststyles:{ol: "l1"}`. The lists that the `ul` and `ol` list styles are applied to are assumed not to have any CSS class. If there any additional hash keys, they are assumed to be classes applied to the topmost ordered or unordered list; e.g. `liststyles:{steps: "l5"}` means that any list with class `steps` at the topmost level has the list style `l5` recursively applied to it. Any top-level lists without a class named in liststyles will be treated like lists with no CSS class.
72
69
 
73
70
  Note that the local CSS stylesheet file contains a variable `FILENAME` for the location of footnote/endnote separators and headers/footers, which are provided in the header HTML file. The gem replaces `FILENAME` with the file name that the document will be saved as. If you supply your own stylesheet and also wish to use separators or headers/footers, you will likewise need to replace the document name mentioned in your stylesheet with a `FILENAME` string.
74
71
 
@@ -15,13 +15,23 @@ module Html2Doc
15
15
  li["style"] += "mso-list:#{liststyle} level#{level} lfo#{listnumber};"
16
16
  end
17
17
 
18
- def self.list_add(xpath, liststyles, listtype, level, listnumber)
18
+ def self.list_add(xpath, liststyles, listtype, level)
19
19
  xpath.each_with_index do |list, i|
20
- listnumber = i + 1 if level == 1
20
+ @listnumber += 1 if level == 1
21
+ list["seen"] = true if level == 1
21
22
  (list.xpath(".//li") - list.xpath(".//ol//li | .//ul//li")).each do |li|
22
- style_list(li, level, liststyles[listtype], listnumber)
23
- list_add(li.xpath(".//ul") - li.xpath(".//ul//ul | .//ol//ul"), liststyles, :ul, level + 1, listnumber)
24
- list_add(li.xpath(".//ol") - li.xpath(".//ul//ol | .//ol//ol"), liststyles, :ol, level + 1, listnumber)
23
+ style_list(li, level, liststyles[listtype], @listnumber)
24
+ if [:ul, :ol].include? listtype
25
+ list_add(li.xpath(".//ul") - li.xpath(".//ul//ul | .//ol//ul"),
26
+ liststyles, :ul, level + 1)
27
+ list_add(li.xpath(".//ol") - li.xpath(".//ul//ol | .//ol//ol"),
28
+ liststyles, :ol, level + 1)
29
+ else
30
+ list_add(li.xpath(".//ul") - li.xpath(".//ul//ul | .//ol//ul"),
31
+ liststyles, listtype, level + 1)
32
+ list_add(li.xpath(".//ol") - li.xpath(".//ul//ol | .//ol//ol"),
33
+ liststyles, listtype, level + 1)
34
+ end
25
35
  end
26
36
  end
27
37
  end
@@ -34,17 +44,42 @@ module Html2Doc
34
44
  u.xpath("./li").each do |l|
35
45
  l.name = "p"
36
46
  l["class"] ||= "MsoListParagraphCxSpMiddle"
37
- l&.first_element_child&.name == "p" and l.first_element_child.replace(l.first_element_child.children)
47
+ l&.first_element_child&.name == "p" and
48
+ l.first_element_child.replace(l.first_element_child.children)
38
49
  end
39
50
  u.replace(u.children)
40
51
  end
41
52
 
53
+ TOPLIST = "[not(ancestor::ul) and not(ancestor::ol)]".freeze
54
+
55
+ def self.lists1(docxml, liststyles, k)
56
+ case k
57
+ when :ul then list_add(docxml.xpath("//ul[not(@class)]#{TOPLIST}"),
58
+ liststyles, :ul, 1)
59
+ when :ol then list_add(docxml.xpath("//ol[not(@class)]#{TOPLIST}"),
60
+ liststyles, :ol, 1)
61
+ else
62
+ list_add(docxml.xpath("//ol[@class = '#{k.to_s}']#{TOPLIST} | "\
63
+ "//ul[@class = '#{k.to_s}']#{TOPLIST}"),
64
+ liststyles, k, 1)
65
+ end
66
+ end
67
+
68
+ def self.lists_unstyled(docxml, liststyles)
69
+ list_add(docxml.xpath("//ul#{TOPLIST}[not(@seen)]"),
70
+ liststyles, :ul, 1) if liststyles.has_key?(:ul)
71
+ list_add(docxml.xpath("//ol#{TOPLIST}[not(@seen)]"),
72
+ liststyles, :ul, 1) if liststyles.has_key?(:ol)
73
+ docxml.xpath("//ul[@seen] | //ol[@seen]").each do |l|
74
+ l.delete("seen")
75
+ end
76
+ end
77
+
42
78
  def self.lists(docxml, liststyles)
43
79
  return if liststyles.nil?
44
- liststyles.has_key?(:ul) and
45
- list_add(docxml.xpath("//ul[not(ancestor::ul) and not(ancestor::ol)]"), liststyles, :ul, 1, nil)
46
- liststyles.has_key?(:ol) and
47
- list_add(docxml.xpath("//ol[not(ancestor::ul) and not(ancestor::ol)]"), liststyles, :ol, 1, nil)
80
+ @listnumber = 0
81
+ liststyles.each_key { |k| lists1(docxml, liststyles, k) }
82
+ lists_unstyled(docxml, liststyles)
48
83
  liststyles.has_key?(:ul) and docxml.xpath("//ul").each { |u| list2para(u) }
49
84
  liststyles.has_key?(:ol) and docxml.xpath("//ol").each { |u| list2para(u) }
50
85
  end
@@ -4,7 +4,9 @@ require "htmlentities"
4
4
  require "nokogiri"
5
5
 
6
6
  module Html2Doc
7
- @xsltemplate = Nokogiri::XSLT(File.read(File.join(File.dirname(__FILE__), "mml2omml.xsl"), encoding: "utf-8"))
7
+ @xsltemplate =
8
+ Nokogiri::XSLT(File.read(File.join(File.dirname(__FILE__), "mml2omml.xsl"),
9
+ encoding: "utf-8"))
8
10
 
9
11
  def self.asciimath_to_mathml1(x)
10
12
  AsciiMath.parse(HTMLEntities.new.decode(x)).to_mathml.
@@ -15,7 +17,8 @@ module Html2Doc
15
17
  return doc if delims.nil? || delims.size < 2
16
18
  m = doc.split(/(#{Regexp.escape(delims[0])}|#{Regexp.escape(delims[1])})/)
17
19
  m.each_slice(4).map.with_index do |(*a), i|
18
- warn "MathML #{i} of #{(m.size / 4).floor}" if i % 500 == 0 && m.size > 1000 && i > 0
20
+ i % 500 == 0 && m.size > 1000 && i > 0 and
21
+ warn "MathML #{i} of #{(m.size / 4).floor}"
19
22
  a[2].nil? || a[2] = asciimath_to_mathml1(a[2])
20
23
  a.size > 1 ? a[0] + a[2] : a[0]
21
24
  end.join
@@ -23,10 +26,10 @@ module Html2Doc
23
26
 
24
27
  # random fixes to MathML input that OOXML needs to render properly
25
28
  def self.ooxml_cleanup(m, docnamespaces)
26
- m.xpath(".//xmlns:msup[name(preceding-sibling::*[1])='munderover']",
27
- docnamespaces).each do |x|
28
- x1 = x.replace("<mrow></mrow>").first
29
- x1.children = x
29
+ m.xpath(%w(msup msub msubsup munder mover munderover).
30
+ map { |m| ".//xmlns:#{m}" }.join(" | "), docnamespaces).each do |x|
31
+ next unless x.next_element && x.next_element != "mrow"
32
+ x.next_element.wrap("<mrow/>")
30
33
  end
31
34
  m.add_namespace(nil, "http://www.w3.org/1998/Math/MathML")
32
35
  m
@@ -36,22 +39,22 @@ module Html2Doc
36
39
  docnamespaces = docxml.collect_namespaces
37
40
  m = docxml.xpath("//*[local-name() = 'math']")
38
41
  m.each_with_index do |x, i|
39
- warn "Math OOXML #{i} of #{m.size}" if i % 100 == 0 && m.size > 500 && i > 0
42
+ i % 100 == 0 && m.size > 500 && i > 0 and
43
+ warn "Math OOXML #{i} of #{m.size}"
40
44
  element = ooxml_cleanup(x, docnamespaces)
41
-
42
45
  doc = Nokogiri::XML::Document::new()
43
46
  doc.root = element
44
-
45
- ooxml = @xsltemplate.transform(doc).to_s.
47
+ ooxml = (esc_space(@xsltemplate.transform(doc))).to_s.
46
48
  gsub(/<\?[^>]+>\s*/, "").
47
49
  gsub(/ xmlns(:[^=]+)?="[^"]+"/, "").
48
50
  gsub(%r{<(/)?([a-z])}, "<\\1m:\\2")
49
- ooxml = uncenter(esc_space(x), ooxml)
51
+ ooxml = uncenter(x, ooxml)
50
52
  x.swap(ooxml)
51
53
  end
52
54
  end
53
55
 
54
- # escape space as &#x32;; we are removing any spaces generated by XML indentation
56
+ # escape space as &#x32;; we are removing any spaces generated by
57
+ # XML indentation
55
58
  def self.esc_space(xml)
56
59
  xml.traverse do |n|
57
60
  next unless n.text?
@@ -64,8 +67,9 @@ module Html2Doc
64
67
  # left/right if parent is so tagged
65
68
  def self.uncenter(m, ooxml)
66
69
  if m.next == nil && m.previous == nil
67
- alignnode = m.at(".//ancestor::*[@style][local-name() = 'p' or local-name() = "\
68
- "'div' or local-name() = 'td']/@style") or return ooxml
70
+ alignnode = m.at(".//ancestor::*[@style][local-name() = 'p' or "\
71
+ "local-name() = 'div' or local-name() = 'td']/@style")
72
+ return ooxml unless alignnode
69
73
  if alignnode.text.include? ("text-align:left")
70
74
  ooxml = "<m:oMathPara><m:oMathParaPr><m:jc "\
71
75
  "m:val='left'/></m:oMathParaPr>#{ooxml}</m:oMathPara>"
@@ -1,3 +1,3 @@
1
1
  module Html2Doc
2
- VERSION = "0.9.0".freeze
2
+ VERSION = "0.9.1".freeze
3
3
  end
@@ -641,6 +641,30 @@ RSpec.describe Html2Doc do
641
641
  OUTPUT
642
642
  end
643
643
 
644
+ it "labels lists with multiple list styles" do
645
+ simple_body = <<~BODY
646
+ <div><ul class="steps">
647
+ <li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li><li><p>B</p><p>B2</p></li><li>C</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>
648
+ <div><ul>
649
+ <li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li><li><p>B</p><p>B2</p></li><li>C</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>
650
+ <div><ul class="other">
651
+ <li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li><li><p>B</p><p>B2</p></li><li>C</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>
652
+ BODY
653
+ Html2Doc.process(html_input(simple_body), filename: "test", liststyles: {ul: "l1", ol: "l2", steps: "l3"})
654
+ expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
655
+ to match_fuzzy(<<~OUTPUT)
656
+ #{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
657
+ #{word_body('<div>
658
+ <p style="mso-list:l3 level1 lfo2;" class="MsoListParagraphCxSpFirst"><div><p class="MsoNormal"><p style="mso-list:l3 level2 lfo2;" class="MsoListParagraphCxSpFirst"><p style="mso-list:l3 level4 lfo2;" class="MsoListParagraphCxSpFirst"><p style="mso-list:l3 level5 lfo2;" class="MsoListParagraphCxSpFirst">A</p><p style="mso-list:l3 level5 lfo2;" class="MsoListParagraphCxSpMiddle">B<p class="MsoListParagraphCxSpMiddle">B2</p></p><p style="mso-list:l3 level5 lfo2;" class="MsoListParagraphCxSpLast">C</p></p></p></p></div></p></div>
659
+ <div>
660
+ <p style="mso-list:l1 level1 lfo1;" class="MsoListParagraphCxSpFirst"><div><p class="MsoNormal"><p style="mso-list:l2 level2 lfo1;" class="MsoListParagraphCxSpFirst"><p style="mso-list:l2 level4 lfo1;" class="MsoListParagraphCxSpFirst"><p style="mso-list:l2 level5 lfo1;" class="MsoListParagraphCxSpFirst">A</p><p style="mso-list:l2 level5 lfo1;" class="MsoListParagraphCxSpMiddle">B<p class="MsoListParagraphCxSpMiddle">B2</p></p><p style="mso-list:l2 level5 lfo1;" class="MsoListParagraphCxSpLast">C</p></p></p></p></div></p></div>
661
+ <div>
662
+ <p style="mso-list:l1 level1 lfo3;" class="MsoListParagraphCxSpFirst"><div><p class="MsoNormal"><p style="mso-list:l2 level2 lfo3;" class="MsoListParagraphCxSpFirst"><p style="mso-list:l2 level4 lfo3;" class="MsoListParagraphCxSpFirst"><p style="mso-list:l2 level5 lfo3;" class="MsoListParagraphCxSpFirst">A</p><p style="mso-list:l2 level5 lfo3;" class="MsoListParagraphCxSpMiddle">B<p class="MsoListParagraphCxSpMiddle">B2</p></p><p style="mso-list:l2 level5 lfo3;" class="MsoListParagraphCxSpLast">C</p></p></p></p></div></p></div>',
663
+ '<div style="mso-element:footnote-list"/>')}
664
+ #{WORD_FTR1}
665
+ OUTPUT
666
+ end
667
+
644
668
  it "replaces id attributes with explicit a@name bookmarks" do
645
669
  simple_body = <<~BODY
646
670
  <div>
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html2doc
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.9.0
4
+ version: 0.9.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ribose Inc.
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-10-27 00:00:00.000000000 Z
11
+ date: 2019-11-08 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: htmlentities