html2doc 0.6.9 → 0.7.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.adoc +15 -6
- data/bin/html2doc +29 -0
- data/lib/html2doc/base.rb +3 -0
- data/lib/html2doc/math.rb +18 -0
- data/lib/html2doc/mime.rb +1 -1
- data/lib/html2doc/version.rb +1 -1
- data/spec/html2doc_spec.rb +101 -52
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0f88d61ea0bbf6baa2a355ec6d5abfd35399c91e
|
4
|
+
data.tar.gz: 1f135f2907b6a7d51196dd6774fc0b81b2e51e34
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 69b5db648e40a37723b5591caa28c3f3131b40ce777245b80ff578cb80de7e183dbdfad677e13dccbdcb5aa2327520f386d52b975ea930d676f250113c2163a3
|
7
|
+
data.tar.gz: eca2bc4cb73d72345b362ec2f3eaa6389073aba39c77d20fbe2bd413c73347b10f2c6f4a9b2e216016a3c1f74cf4ed461b6f812ba525d3b44c9083ed3c49684a
|
data/README.adoc
CHANGED
@@ -26,18 +26,14 @@ The gem currently does the following:
|
|
26
26
|
|
27
27
|
For a representative generator of HTML that uses this gem in postprocessing, see https://github.com/riboseinc/asciidoctor-iso
|
28
28
|
|
29
|
-
Work to be done:
|
30
|
-
|
31
|
-
* Render (editorial) comments
|
32
|
-
|
33
29
|
== Constraints
|
34
30
|
|
35
|
-
This generates .doc documents. Future versions
|
31
|
+
This generates .doc documents. Future versions may upgrade the output to docx.
|
36
32
|
|
37
33
|
There there are two other Microsoft Word vendors in the Ruby ecosystem.
|
38
34
|
|
39
35
|
* https://github.com/jetruby/puredocx generate Word documents from a ruby struct as a DSL, rather than converting a preexisting html document. That constrains it's coverage to what is explicitly catered for in the DSL.
|
40
|
-
* https://github.com/MuhammetDilmac/Html2Docx is a much simpler wrapper around html: it does not do any of the added functionality described above (image resizing, converting footnotes, AsciiMath and MathML)
|
36
|
+
* https://github.com/MuhammetDilmac/Html2Docx is a much simpler wrapper around html: it does not do any of the added functionality described above (image resizing, converting footnotes, AsciiMath and MathML). However it does already generate docx, which involves many more auxiliary files than the .doc format. (Any attempt to generate docx through this gem will likely involve Html2Docx.)
|
41
37
|
|
42
38
|
== Usage
|
43
39
|
|
@@ -58,6 +54,13 @@ liststyles:: a hash of list style labels in Word CSS, which are used to define t
|
|
58
54
|
|
59
55
|
Note that the local CSS stylesheet file contains a variable `FILENAME` for the location of footnote/endnote separators and headers/footers, which are provided in the header HTML file. The gem replaces `FILENAME` with the file name that the document will be saved as. If you supply your own stylesheet and also wish to use separators or headers/footers, you will likewise need to replace the document name mentioned in your stylesheet with a `FILENAME` string.
|
60
56
|
|
57
|
+
We include a script in this distribution that processes files from the command line, optionally including header and stylesheet:
|
58
|
+
|
59
|
+
[source,console]
|
60
|
+
--
|
61
|
+
$ bin/html2doc --header header.html --stylesheet stylesheet.css filename.html
|
62
|
+
--
|
63
|
+
|
61
64
|
== Caveats
|
62
65
|
|
63
66
|
=== HTML
|
@@ -80,6 +83,12 @@ The good news is that the stylesheet is not identical to the stylesheet `mathml2
|
|
80
83
|
|
81
84
|
The bad news is that the stylesheet is not identical to the stylesheet `mathml2omml.xsl` that is published with Microsoft Word, so it isn't guaranteed to have identical output. If you want to make sure that your MathML import is identical to what Word currently uses, replace `mml2omml.xsl` with `mathml2omml.xsl`, and edit the gem accordingly for your local installation. On Windows, you will find the stylesheet in the same directory as the `winword.exe` executable. On Mac, right-click on the Word application, and select "Show Package Contents"; you will find the stylesheet under `Contents/Resources`.
|
82
85
|
|
86
|
+
=== Lists
|
87
|
+
Natively, Word does not use `<ol>`, `<ul>`, or `<dl>` lists in its HTML exports at all: it uses paragraphs styled with list styles. If you save a Word document as HTML in order to use its CSS for Word documents generated by HTML, those styles will still work (with the caveat that you will need to extract the `@list` style specific to ordered and unordered lists, and pass it as a `liststyles` parameter to the conversion). *However*, Word applies a default indentation to all instances of `<ol>`, `<ul>` and `<dl>`, which the CSS stylesheet of a Word HTML will not have accounted for (because the Word HTML does not use lists at all.) If you are going to reuse that CSS for generating new documents using lists, you will need to add the rule `margin-left:0pt` to `ul`, `ol`, `dl` in the CSS stylesheet you supply, so that the margins in the Word-exported CSS remain correct.
|
88
|
+
|
89
|
+
=== Math Positioning
|
90
|
+
By default, mathematical formulas that are the only content of their paragraph are rendered as centered in Word. If you want your AsciiMath or MathML to be left-aligned or right-aligned, add `style="text-align:left"` or `style="text-align:right"` to its ancestor `div`, `p` or `td` node in HTML.
|
91
|
+
|
83
92
|
== Example
|
84
93
|
|
85
94
|
The `spec/examples` directory includes `rice.doc` and its source files: this Word document has been generated from `rice.html` through a call to html2doc from https://github.com/riboseinc/asciidoctor-iso. (The source document `rice.html` was itself generated from Asciidoc, rather than being hand-crafted.)
|
data/bin/html2doc
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "html2doc"
|
4
|
+
require "optparse"
|
5
|
+
|
6
|
+
options = {}
|
7
|
+
OptionParser.new do |opts|
|
8
|
+
opts.banner = "Usage: bin/html2doc filename [options]"
|
9
|
+
|
10
|
+
opts.on("--stylesheet FILE.CSS", "Use the provided stylesheet") do |v|
|
11
|
+
options[:stylesheet] = v
|
12
|
+
end
|
13
|
+
opts.on("--header HEADER.HTML", "Use the provided stylesheet") do |v|
|
14
|
+
options[:header] = v
|
15
|
+
end
|
16
|
+
end.parse!
|
17
|
+
|
18
|
+
if ARGV.length < 1
|
19
|
+
puts "Usage: bin/html2doc filename [options]"
|
20
|
+
exit
|
21
|
+
end
|
22
|
+
|
23
|
+
Html2Doc.process(
|
24
|
+
File.read(ARGV[0], encoding: "utf-8"),
|
25
|
+
filename: ARGV[0].gsub(/\.html?$/, ""),
|
26
|
+
stylesheet: options[:stylesheet],
|
27
|
+
header: options[:header],
|
28
|
+
)
|
29
|
+
|
data/lib/html2doc/base.rb
CHANGED
@@ -32,6 +32,7 @@ module Html2Doc
|
|
32
32
|
|
33
33
|
def self.rm_temp_files(filename, dir, dir1)
|
34
34
|
system "rm #{filename}.htm"
|
35
|
+
system "rm -r #{dir1}/header.html"
|
35
36
|
system "rm -r #{dir1}" unless dir
|
36
37
|
end
|
37
38
|
|
@@ -82,6 +83,7 @@ module Html2Doc
|
|
82
83
|
r.gsub!(%r{<link rel="File-List"}, "<link rel=File-List")
|
83
84
|
r.gsub!(%r{<meta http-equiv="Content-Type"},
|
84
85
|
"<meta http-equiv=Content-Type")
|
86
|
+
r.gsub!(%r{></m:jc>}, "/>")
|
85
87
|
r.gsub!(%r{&tab;|&tab;}, '<span style="mso-tab-count:1">  </span>')
|
86
88
|
r
|
87
89
|
end
|
@@ -150,6 +152,7 @@ module Html2Doc
|
|
150
152
|
{
|
151
153
|
o: "urn:schemas-microsoft-com:office:office",
|
152
154
|
w: "urn:schemas-microsoft-com:office:word",
|
155
|
+
v: "urn:schemas-microsoft-com:vml",
|
153
156
|
m: "http://schemas.microsoft.com/office/2004/12/omml",
|
154
157
|
}.each { |k, v| root.add_namespace_definition(k.to_s, v) }
|
155
158
|
root.add_namespace(nil, "http://www.w3.org/TR/REC-html40")
|
data/lib/html2doc/math.rb
CHANGED
@@ -41,7 +41,25 @@ module Html2Doc
|
|
41
41
|
ooxml = @xslt.serve.gsub(/<\?[^>]+>\s*/, "").
|
42
42
|
gsub(/ xmlns(:[^=]+)?="[^"]+"/, "").
|
43
43
|
gsub(%r{<(/)?([a-z])}, "<\\1m:\\2")
|
44
|
+
ooxml = uncenter(m, ooxml)
|
44
45
|
m.swap(ooxml)
|
45
46
|
end
|
46
47
|
end
|
48
|
+
|
49
|
+
# if oomml has no siblings, by default it is centered; override this with
|
50
|
+
# left/right if parent is so tagged
|
51
|
+
def self.uncenter(m, ooxml)
|
52
|
+
if m.next == nil && m.previous == nil
|
53
|
+
alignnode = m.at(".//ancestor::*[@style][local-name() = 'p' or local-name() = "\
|
54
|
+
"'div' or local-name() = 'td']/@style") or return ooxml
|
55
|
+
if alignnode.text.include? ("text-align:left")
|
56
|
+
ooxml = "<m:oMathPara><m:oMathParaPr><m:jc "\
|
57
|
+
"m:val='left'/></m:oMathParaPr>#{ooxml}</m:oMathPara>"
|
58
|
+
elsif alignnode.text.include? ("text-align:right")
|
59
|
+
ooxml = "<m:oMathPara><m:oMathParaPr><m:jc "\
|
60
|
+
"m:val='right'/></m:oMathParaPr>#{ooxml}</m:oMathPara>"
|
61
|
+
end
|
62
|
+
end
|
63
|
+
ooxml
|
64
|
+
end
|
47
65
|
end
|
data/lib/html2doc/mime.rb
CHANGED
@@ -72,7 +72,7 @@ module Html2Doc
|
|
72
72
|
end
|
73
73
|
|
74
74
|
def self.image_cleanup(docxml, dir)
|
75
|
-
docxml.xpath("//*[local-name() = 'img']").each do |i|
|
75
|
+
docxml.xpath("//*[local-name() = 'img' or local-name() = 'imagedata']").each do |i|
|
76
76
|
matched = /\.(?<suffix>\S+)$/.match i["src"]
|
77
77
|
uuid = UUIDTools::UUID.random_create.to_s
|
78
78
|
new_full_filename = File.join(dir, "#{uuid}.#{matched[:suffix]}")
|
data/lib/html2doc/version.rb
CHANGED
data/spec/html2doc_spec.rb
CHANGED
@@ -38,7 +38,7 @@ Content-Location: file:///C:/Doc/test.htm
|
|
38
38
|
Content-Type: text/html; charset="utf-8"
|
39
39
|
|
40
40
|
<?xml version="1.0"?>
|
41
|
-
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><!--[if gte mso 9]>
|
41
|
+
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><!--[if gte mso 9]>
|
42
42
|
<xml>
|
43
43
|
<w:WordDocument>
|
44
44
|
<w:View>Print</w:View>
|
@@ -376,7 +376,6 @@ RSpec.describe Html2Doc do
|
|
376
376
|
OUTPUT
|
377
377
|
end
|
378
378
|
|
379
|
-
#{word_body('<div><m:oMath><m:nary><m:naryPr><m:chr m:val="∑"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="1"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub><m:r><m:t>i=1</m:t></m:r></m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e><m:sSup><m:e><m:r><m:t>i</m:t></m:r></m:e><m:sup><m:r><m:t>3</m:t></m:r></m:sup></m:sSup></m:e></m:nary><m:r><m:t>=</m:t></m:r><m:sSup><m:e><m:r><m:t>(</m:t></m:r><m:f><m:fPr><m:type m:val="bar"></m:type></m:fPr><m:num><m:r><m:t>n</m:t></m:r><m:r><m:t>(n+1)</m:t></m:r></m:num><m:den><m:r><m:t>2</m:t></m:r></m:den></m:f><m:r><m:t>)</m:t></m:r></m:e><m:sup><m:r><m:t>2</m:t></m:r></m:sup></m:sSup></m:oMath>
|
380
379
|
it "processes AsciiMath" do
|
381
380
|
Html2Doc.process(html_input("<div>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"), filename: "test", asciimathdelims: ["{{", "}}"])
|
382
381
|
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
@@ -402,6 +401,56 @@ RSpec.describe Html2Doc do
|
|
402
401
|
OUTPUT
|
403
402
|
end
|
404
403
|
|
404
|
+
it "left-aligns AsciiMath" do
|
405
|
+
Html2Doc.process(html_input("<div style='text-align:left;'>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"), filename: "test", asciimathdelims: ["{{", "}}"])
|
406
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
407
|
+
to match_fuzzy(<<~OUTPUT)
|
408
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
409
|
+
#{word_body('
|
410
|
+
<div style="text-align:left;"><m:oMathPara><m:oMathParaPr><m:jc m:val="left"></m:jc></m:oMathParaPr><m:oMath>
|
411
|
+
<m:nary><m:naryPr><m:chr m:val="∑"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="on"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub>
|
412
|
+
<m:r><m:t>i=1</m:t></m:r>
|
413
|
+
</m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e><m:sSup><m:e><m:r><m:t>i</m:t></m:r></m:e><m:sup><m:r><m:t>3</m:t></m:r></m:sup></m:sSup></m:e></m:nary>
|
414
|
+
<m:r><m:t>=</m:t></m:r>
|
415
|
+
<m:sSup><m:e>
|
416
|
+
<m:r><m:t>(</m:t></m:r>
|
417
|
+
<m:f><m:fPr><m:type m:val="bar"></m:type></m:fPr><m:num>
|
418
|
+
<m:r><m:t>n</m:t></m:r>
|
419
|
+
<m:r><m:t>(n+1)</m:t></m:r>
|
420
|
+
</m:num><m:den><m:r><m:t>2</m:t></m:r></m:den></m:f>
|
421
|
+
<m:r><m:t>)</m:t></m:r>
|
422
|
+
</m:e><m:sup><m:r><m:t>2</m:t></m:r></m:sup></m:sSup>
|
423
|
+
</m:oMath>
|
424
|
+
</m:oMathPara></div>', '<div style="mso-element:footnote-list"/>')}
|
425
|
+
#{WORD_FTR1}
|
426
|
+
OUTPUT
|
427
|
+
end
|
428
|
+
|
429
|
+
it "right-aligns AsciiMath" do
|
430
|
+
Html2Doc.process(html_input("<div style='text-align:right;'>{{sum_(i=1)^n i^3=((n(n+1))/2)^2}}</div>"), filename: "test", asciimathdelims: ["{{", "}}"])
|
431
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
432
|
+
to match_fuzzy(<<~OUTPUT)
|
433
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
434
|
+
#{word_body('
|
435
|
+
<div style="text-align:right;"><m:oMathPara><m:oMathParaPr><m:jc m:val="right"></m:jc></m:oMathParaPr><m:oMath>
|
436
|
+
<m:nary><m:naryPr><m:chr m:val="∑"></m:chr><m:limLoc m:val="undOvr"></m:limLoc><m:grow m:val="on"></m:grow><m:subHide m:val="off"></m:subHide><m:supHide m:val="off"></m:supHide></m:naryPr><m:sub>
|
437
|
+
<m:r><m:t>i=1</m:t></m:r>
|
438
|
+
</m:sub><m:sup><m:r><m:t>n</m:t></m:r></m:sup><m:e><m:sSup><m:e><m:r><m:t>i</m:t></m:r></m:e><m:sup><m:r><m:t>3</m:t></m:r></m:sup></m:sSup></m:e></m:nary>
|
439
|
+
<m:r><m:t>=</m:t></m:r>
|
440
|
+
<m:sSup><m:e>
|
441
|
+
<m:r><m:t>(</m:t></m:r>
|
442
|
+
<m:f><m:fPr><m:type m:val="bar"></m:type></m:fPr><m:num>
|
443
|
+
<m:r><m:t>n</m:t></m:r>
|
444
|
+
<m:r><m:t>(n+1)</m:t></m:r>
|
445
|
+
</m:num><m:den><m:r><m:t>2</m:t></m:r></m:den></m:f>
|
446
|
+
<m:r><m:t>)</m:t></m:r>
|
447
|
+
</m:e><m:sup><m:r><m:t>2</m:t></m:r></m:sup></m:sSup>
|
448
|
+
</m:oMath>
|
449
|
+
</m:oMathPara></div>', '<div style="mso-element:footnote-list"/>')}
|
450
|
+
#{WORD_FTR1}
|
451
|
+
OUTPUT
|
452
|
+
end
|
453
|
+
|
405
454
|
it "wraps msup after munderover in MathML" do
|
406
455
|
Html2Doc.process(html_input("<div><math xmlns='http://www.w3.org/1998/Math/MathML'>
|
407
456
|
<munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mi>n</mi></mrow></munderover><msup><mn>2</mn><mrow><mi>i</mi></mrow></msup></math></div>"), filename: "test", asciimathdelims: ["{{", "}}"])
|
@@ -502,7 +551,7 @@ RSpec.describe Html2Doc do
|
|
502
551
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
503
552
|
#{word_body('<div>This is a very simple
|
504
553
|
document<a epub:type="footnote" href="#_ftn1" style="mso-footnote-id:ftn1" name="_ftnref1" title="" id="_ftnref1"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a> allegedly<a epub:type="footnote" href="#_ftn2" style="mso-footnote-id:ftn2" name="_ftnref2" title="" id="_ftnref2"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a></div>',
|
505
|
-
'<div style="mso-element:footnote-list"><div style="mso-element:footnote" id="ftn1">
|
554
|
+
'<div style="mso-element:footnote-list"><div style="mso-element:footnote" id="ftn1">
|
506
555
|
<p id="" class="MsoFootnoteText"><a style="mso-footnote-id:ftn1" href="#_ftn1" name="_ftnref1" title="" id="_ftnref1"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a>Footnote</p></div>
|
507
556
|
<div style="mso-element:footnote" id="ftn2">
|
508
557
|
<p id="" class="MsoFootnoteText"><a style="mso-footnote-id:ftn2" href="#_ftn2" name="_ftnref2" title="" id="_ftnref2"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a>Other Footnote</p></div>
|
@@ -511,7 +560,7 @@ RSpec.describe Html2Doc do
|
|
511
560
|
OUTPUT
|
512
561
|
end
|
513
562
|
|
514
|
-
|
563
|
+
it "processes class footnotes" do
|
515
564
|
simple_body = '<div>This is a very simple
|
516
565
|
document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
|
517
566
|
<aside id="a1">Footnote</aside>
|
@@ -522,7 +571,7 @@ RSpec.describe Html2Doc do
|
|
522
571
|
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
523
572
|
#{word_body('<div>This is a very simple
|
524
573
|
document<a class="footnote" href="#_ftn1" style="mso-footnote-id:ftn1" name="_ftnref1" title="" id="_ftnref1"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a> allegedly<a class="footnote" href="#_ftn2" style="mso-footnote-id:ftn2" name="_ftnref2" title="" id="_ftnref2"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a></div>',
|
525
|
-
'<div style="mso-element:footnote-list"><div style="mso-element:footnote" id="ftn1">
|
574
|
+
'<div style="mso-element:footnote-list"><div style="mso-element:footnote" id="ftn1">
|
526
575
|
<p id="" class="MsoFootnoteText"><a style="mso-footnote-id:ftn1" href="#_ftn1" name="_ftnref1" title="" id="_ftnref1"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a>Footnote</p></div>
|
527
576
|
<div style="mso-element:footnote" id="ftn2">
|
528
577
|
<p id="" class="MsoFootnoteText"><a style="mso-footnote-id:ftn2" href="#_ftn2" name="_ftnref2" title="" id="_ftnref2"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a>Other Footnote</p></div>
|
@@ -531,79 +580,79 @@ RSpec.describe Html2Doc do
|
|
531
580
|
OUTPUT
|
532
581
|
end
|
533
582
|
|
534
|
-
|
535
|
-
|
583
|
+
it "extracts paragraphs from footnotes" do
|
584
|
+
simple_body = '<div>This is a very simple
|
536
585
|
document<a class="footnote" href="#a1">1</a> allegedly<a class="footnote" href="#a2">2</a></div>
|
537
586
|
<aside id="a1"><p>Footnote</p></aside>
|
538
587
|
<div id="a2"><p>Other Footnote</p></div>'
|
539
|
-
|
540
|
-
|
541
|
-
|
542
|
-
|
543
|
-
|
588
|
+
Html2Doc.process(html_input(simple_body), filename: "test")
|
589
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
590
|
+
to match_fuzzy(<<~OUTPUT)
|
591
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
592
|
+
#{word_body('<div>This is a very simple
|
544
593
|
document<a class="footnote" href="#_ftn1" style="mso-footnote-id:ftn1" name="_ftnref1" title="" id="_ftnref1"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a> allegedly<a class="footnote" href="#_ftn2" style="mso-footnote-id:ftn2" name="_ftnref2" title="" id="_ftnref2"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a></div>',
|
545
|
-
|
594
|
+
'<div style="mso-element:footnote-list"><div style="mso-element:footnote" id="ftn1">
|
546
595
|
<p class="MsoFootnoteText"><a style="mso-footnote-id:ftn1" href="#_ftn1" name="_ftnref1" title="" id="_ftnref1"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a>Footnote</p></div>
|
547
596
|
<div style="mso-element:footnote" id="ftn2">
|
548
597
|
<p class="MsoFootnoteText"><a style="mso-footnote-id:ftn2" href="#_ftn2" name="_ftnref2" title="" id="_ftnref2"><span class="MsoFootnoteReference"><span style="mso-special-character:footnote"></span></span></a>Other Footnote</p></div>
|
549
598
|
</div>')}
|
550
|
-
|
551
|
-
|
552
|
-
|
599
|
+
#{WORD_FTR1}
|
600
|
+
OUTPUT
|
601
|
+
end
|
553
602
|
|
554
|
-
|
555
|
-
|
603
|
+
it "labels lists with list styles" do
|
604
|
+
simple_body = <<~BODY
|
556
605
|
<div><ul>
|
557
606
|
<li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>
|
558
|
-
|
559
|
-
|
560
|
-
|
561
|
-
|
562
|
-
|
563
|
-
|
607
|
+
BODY
|
608
|
+
Html2Doc.process(html_input(simple_body), filename: "test", liststyles: {ul: "l1", ol: "l2"})
|
609
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
610
|
+
to match_fuzzy(<<~OUTPUT)
|
611
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
612
|
+
#{word_body('<div><ul>
|
564
613
|
<li style="mso-list:l1 level1 lfo1;" class="MsoNormal"><div><p class="MsoNormal"><ol><li style="mso-list:l2 level2 lfo1;" class="MsoNormal"><ul><li style="mso-list:l1 level3 lfo1;" class="MsoNormal"><p class="MsoNormal"><ol><li style="mso-list:l2 level4 lfo1;" class="MsoNormal"><ol><li style="mso-list:l2 level5 lfo1;" class="MsoNormal">A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ul></div>',
|
565
|
-
|
566
|
-
|
567
|
-
|
568
|
-
|
614
|
+
'<div style="mso-element:footnote-list"/>')}
|
615
|
+
#{WORD_FTR1}
|
616
|
+
OUTPUT
|
617
|
+
end
|
569
618
|
|
570
619
|
|
571
|
-
|
572
|
-
|
620
|
+
it "restarts numbering of lists with list styles" do
|
621
|
+
simple_body = <<~BODY
|
573
622
|
<div>
|
574
623
|
<ol><li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ol>
|
575
624
|
<ol><li><div><p><ol><li><ul><li><p><ol><li><ol><li>A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ol></div>
|
576
|
-
|
577
|
-
|
578
|
-
|
579
|
-
|
580
|
-
|
581
|
-
|
625
|
+
BODY
|
626
|
+
Html2Doc.process(html_input(simple_body), filename: "test", liststyles: {ul: "l1", ol: "l2"})
|
627
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
628
|
+
to match_fuzzy(<<~OUTPUT)
|
629
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
630
|
+
#{word_body('<div>
|
582
631
|
<ol><li style="mso-list:l2 level1 lfo1;" class="MsoNormal"><div><p class="MsoNormal"><ol><li style="mso-list:l2 level2 lfo1;" class="MsoNormal"><ul><li style="mso-list:l1 level3 lfo1;" class="MsoNormal"><p class="MsoNormal"><ol><li style="mso-list:l2 level4 lfo1;" class="MsoNormal"><ol><li style="mso-list:l2 level5 lfo1;" class="MsoNormal">A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ol>
|
583
632
|
<ol><li style="mso-list:l2 level1 lfo2;" class="MsoNormal"><div><p class="MsoNormal"><ol><li style="mso-list:l2 level2 lfo2;" class="MsoNormal"><ul><li style="mso-list:l1 level3 lfo2;" class="MsoNormal"><p class="MsoNormal"><ol><li style="mso-list:l2 level4 lfo2;" class="MsoNormal"><ol><li style="mso-list:l2 level5 lfo2;" class="MsoNormal">A</li></ol></li></ol></p></li></ul></li></ol></p></div></li></ol></div>',
|
584
|
-
|
585
|
-
|
586
|
-
|
587
|
-
|
633
|
+
'<div style="mso-element:footnote-list"/>')}
|
634
|
+
#{WORD_FTR1}
|
635
|
+
OUTPUT
|
636
|
+
end
|
588
637
|
|
589
|
-
|
590
|
-
|
638
|
+
it "replaces id attributes with explicit a@name bookmarks" do
|
639
|
+
simple_body = <<~BODY
|
591
640
|
<div>
|
592
641
|
<p id="a">Hello</p>
|
593
642
|
<p id="b"/>
|
594
643
|
</div>
|
595
|
-
|
596
|
-
|
597
|
-
|
598
|
-
|
599
|
-
|
600
|
-
|
644
|
+
BODY
|
645
|
+
Html2Doc.process(html_input(simple_body), filename: "test", liststyles: {ul: "l1", ol: "l2"})
|
646
|
+
expect(guid_clean(File.read("test.doc", encoding: "utf-8"))).
|
647
|
+
to match_fuzzy(<<~OUTPUT)
|
648
|
+
#{WORD_HDR} #{DEFAULT_STYLESHEET} #{WORD_HDR_END}
|
649
|
+
#{word_body('<div>
|
601
650
|
<p class="MsoNormal"><a name="a" id="a"></a>Hello</p>
|
602
651
|
<p class="MsoNormal"><a name="b" id="b"></a></p>
|
603
652
|
</div>',
|
604
|
-
|
605
|
-
|
606
|
-
|
607
|
-
|
653
|
+
'<div style="mso-element:footnote-list"/>')}
|
654
|
+
#{WORD_FTR1}
|
655
|
+
OUTPUT
|
656
|
+
end
|
608
657
|
|
609
658
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: html2doc
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.7.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ribose Inc.
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-04-17 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: htmlentities
|
@@ -301,6 +301,7 @@ files:
|
|
301
301
|
- README.adoc
|
302
302
|
- Rakefile
|
303
303
|
- bin/console
|
304
|
+
- bin/html2doc
|
304
305
|
- bin/rspec
|
305
306
|
- bin/setup
|
306
307
|
- html2doc.gemspec
|