sablon 0.0.22 → 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Gemfile.lock +1 -1
- data/README.md +94 -22
- data/lib/sablon.rb +1 -0
- data/lib/sablon/configuration/configuration.rb +73 -1
- data/lib/sablon/content.rb +77 -3
- data/lib/sablon/environment.rb +3 -0
- data/lib/sablon/html/ast.rb +249 -76
- data/lib/sablon/html/ast_builder.rb +2 -7
- data/lib/sablon/html/node_properties.rb +91 -0
- data/lib/sablon/relationship.rb +47 -0
- data/lib/sablon/template.rb +30 -0
- data/lib/sablon/version.rb +1 -1
- data/test/content_test.rb +121 -42
- data/test/fixtures/html/html_test_content.html +106 -15
- data/test/fixtures/html_sample.docx +0 -0
- data/test/fixtures/insertion_template.docx +0 -0
- data/test/html/ast_builder_test.rb +0 -5
- data/test/html/ast_test.rb +35 -0
- data/test/html/converter_style_test.rb +535 -0
- data/test/html/converter_test.rb +412 -528
- data/test/html/node_properties_test.rb +21 -0
- data/test/html_test.rb +12 -3
- data/test/test_helper.rb +16 -0
- metadata +7 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7bfea533a76e6d1eea7475e9916cce5c9d0b8be1
|
4
|
+
data.tar.gz: 58707c6b8a095d4e1e7d3be17bc39bec340d8ad0
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 42659e24a433a882c3d5c10f8dc9204a9e2a701e87d504c6fa3aa1d080c86f3a141de937df4e76db43ae21c527ea1eb2d816f0a766b24754ac6553e19d18ec7d
|
7
|
+
data.tar.gz: 5a4077e98f5214a8ac0603376ca90dbbc90726caa30013b66d480b994dff020541b891da30e3057f48bb077ebd1698f74a809c6c171070f8ce824519bcc74d1e
|
data/Gemfile.lock
CHANGED
data/README.md
CHANGED
@@ -114,15 +114,31 @@ word_processing_ml = <<-XML.gsub("\n", "")
|
|
114
114
|
</w:p>
|
115
115
|
XML
|
116
116
|
|
117
|
+
context = {
|
118
|
+
long_description: Sablon.content(:word_ml, word_processing_ml)
|
119
|
+
}
|
120
|
+
template.render_to_file File.expand_path("~/Desktop/output.docx"), context
|
121
|
+
```
|
122
|
+
In the example above the entire paragraph will be replaced because all of the nodes being inserted aren't valid children of a paragraph (w:p) element. The example below shows inline insertion, where only runs are added and instead of replacing the entire paragraph only the merge field gets removed.
|
123
|
+
|
124
|
+
**Important:** All text must be wrapped in a run tag for valid inline insertion because WordML is still inserted directly into the document "as is" without any structure transformations other than run properties being merged.
|
125
|
+
|
126
|
+
```ruby
|
127
|
+
word_processing_ml = <<-XML.gsub("\n", "")
|
128
|
+
<w:r w:rsidRPr="00B97C39">
|
129
|
+
<w:rPr>
|
130
|
+
<w:b />
|
131
|
+
</w:rPr>
|
132
|
+
<w:t>this is bold text</w:t>
|
133
|
+
</w:r>
|
134
|
+
XML
|
135
|
+
|
117
136
|
context = {
|
118
137
|
long_description: Sablon.content(:word_ml, word_processing_ml)
|
119
138
|
}
|
120
139
|
template.render_to_file File.expand_path("~/Desktop/output.docx"), context
|
121
140
|
```
|
122
141
|
|
123
|
-
IMPORTANT: This feature is very much *experimental*. Currently, the insertion
|
124
|
-
will replace the containing paragraph. This means that other content in the same
|
125
|
-
paragraph is discarded.
|
126
142
|
|
127
143
|
##### HTML
|
128
144
|
|
@@ -136,12 +152,43 @@ is sufficient:
|
|
136
152
|
To use HTML insertion prepare the context like so:
|
137
153
|
|
138
154
|
```ruby
|
139
|
-
html_body = <<-HTML
|
140
|
-
<div>
|
141
|
-
|
142
|
-
<
|
143
|
-
|
144
|
-
|
155
|
+
html_body = <<-HTML.strip
|
156
|
+
<div>
|
157
|
+
This text can contain <em>additional formatting</em> according to the
|
158
|
+
<strong>HTML</strong> specification. As well as links to external
|
159
|
+
<a href="https://github.com/senny/sablon">websites</a>, don't forget
|
160
|
+
the "http/https" bit.
|
161
|
+
</div>
|
162
|
+
|
163
|
+
<p style="text-align: right; background-color: #FFFF00">
|
164
|
+
Right aligned content with a yellow background color.
|
165
|
+
</p>
|
166
|
+
|
167
|
+
<div>
|
168
|
+
<span style="color: #123456">Inline styles</span> are possible as well
|
169
|
+
</div>
|
170
|
+
|
171
|
+
<table style="border: 1px solid #0000FF;">
|
172
|
+
<caption>Table's can also be created via HTML</caption>
|
173
|
+
<tr>
|
174
|
+
<td>Cell 1 only text</td>
|
175
|
+
<td>
|
176
|
+
<ul>
|
177
|
+
<li>List in Table - 1</li>
|
178
|
+
<li>List in Table - 2</li>
|
179
|
+
</ul>
|
180
|
+
</td>
|
181
|
+
</tr>
|
182
|
+
<tr>
|
183
|
+
<td></td>
|
184
|
+
<td>
|
185
|
+
<table style="border: 1px solid #FF0000;">
|
186
|
+
<tr><th>A</th><th>B</th></tr>
|
187
|
+
<tr><td>C</td><td>D</td></tr>
|
188
|
+
</table>
|
189
|
+
</td>
|
190
|
+
</tr>
|
191
|
+
</table>
|
145
192
|
HTML
|
146
193
|
context = {
|
147
194
|
article: Sablon.content(:html, html_body) }
|
@@ -151,24 +198,49 @@ context = {
|
|
151
198
|
template.render_to_file File.expand_path("~/Desktop/output.docx"), context
|
152
199
|
```
|
153
200
|
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
201
|
+
There is no 1:1 conversion between HTML and Open Office XML however, a large
|
202
|
+
number of tags are very similar. HTML insertion is relatively complete
|
203
|
+
covering several key content structures such as paragraphs, tables and lists.
|
204
|
+
The snippet above showcases some of the capabilities present, for a comprehensive
|
205
|
+
example please see the html insertion test fixture [here](test/fixtures/html/html_test_content.html).
|
206
|
+
All html element conversions are defined in [configuration.rb](lib/sablon/configuration/configuration.rb)
|
207
|
+
with their matching AST classes defined in [ast.rb](lib/sablon/html/ast.rb).
|
208
|
+
|
209
|
+
Basic conversion of CSS inline styles into matching WordML properties is possible
|
210
|
+
using the `style=" ... "` attribute in the HTML markup. Not all CSS properties
|
211
|
+
are supported as only a small subset of CSS styles have a direct Open Office XML
|
212
|
+
equivalent. Styles are passed onto nested elements if the parent can't use them.
|
213
|
+
The currently supported styles are also defined in [configuration.rb](lib/sablon/configuration/configuration.rb). Toggle
|
214
|
+
properties that aren't directly supported can be added using the
|
215
|
+
`text-decoration: ` style attribute with the proper XML tag name as the
|
216
|
+
value (i.e. `text-decoration: dstrike` for `w:dstrike`). Simple single value properties that do not need a conversion can be added using the XML property name directly, omitting the `w:` prefix i.e.
|
217
|
+
(`highlight: cyan` for `w:highlight`).
|
218
|
+
|
219
|
+
Table, Paragraph and Run property references can be found at:
|
159
220
|
* http://officeopenxml.com/WPparagraphProperties.php
|
160
221
|
* http://officeopenxml.com/WPtextFormatting.php
|
222
|
+
* http://officeopenxml.com/WPtableProperties.php
|
223
|
+
|
224
|
+
The full Open Office XML specification used to develop the HTML converter
|
225
|
+
can be found [here](https://www.ecma-international.org/publications/standards/Ecma-376.htm) (3rd Edition).
|
161
226
|
|
162
|
-
If you wish to write out your HTML code in an indented human readable fashion, or you are pulling content from the ERB templating engine in rails the following regular expression can help eliminate extraneous whitespace in the final document.
|
163
|
-
```ruby
|
164
|
-
# combine all white space
|
165
|
-
html_str = html_str.gsub(/\s+/, ' ')
|
166
|
-
# clear any white space between block level tags and other content
|
167
|
-
html_str.gsub(%r{\s*<(/?(?:h\d|div|p|br|ul|ol|li).*?)>\s*}, '<\1>')
|
168
|
-
```
|
169
227
|
|
170
|
-
|
228
|
+
The example above shows an HTML insertion operation that will replace the entire paragraph. In the same fashion as WordML, inline HTML insertion is possible where only the merge field is replaced as long as only "inline" elements are used. "Inline" in this context does not necessarily mean the same thing as it does in CSS, in this case it means that once the HTML is converted to WordML only valid children of a paragraph (w:p) tag exist. Unlike WordML insertion plain text can be used without being wrapped in tags when working with HTML, see the example below:
|
171
229
|
|
230
|
+
```ruby
|
231
|
+
inline_html = <<-HTML.strip
|
232
|
+
This text can contain <em>additional formatting</em> according to the
|
233
|
+
<strong>HTML</strong> specification. As well as links to external
|
234
|
+
<a href="https://github.com/senny/sablon">websites</a>, don't forget
|
235
|
+
the "http/https" bit.
|
236
|
+
HTML
|
237
|
+
context = {
|
238
|
+
article: Sablon.content(:html, inline_html) }
|
239
|
+
# alternative method using special key format
|
240
|
+
# 'html:article' => html_body
|
241
|
+
}
|
242
|
+
template.render_to_file File.expand_path("~/Desktop/output.docx"), context
|
243
|
+
```
|
172
244
|
|
173
245
|
#### Conditionals
|
174
246
|
|
data/lib/sablon.rb
CHANGED
@@ -53,11 +53,16 @@ module Sablon
|
|
53
53
|
@permitted_html_tags = {}
|
54
54
|
tags = {
|
55
55
|
# special tag used for elements with no parent, i.e. top level
|
56
|
-
'#document-fragment' => { type: :block, ast_class: :root, allowed_children:
|
56
|
+
'#document-fragment' => { type: :block, ast_class: :root, allowed_children: %i[_block _inline] },
|
57
57
|
|
58
58
|
# block level tags
|
59
|
+
table: { type: :block, ast_class: :table, allowed_children: %i[caption thead tbody tfoot tr ]},
|
60
|
+
tr: { type: :block, ast_class: :table_row, allowed_children: %i[th td] },
|
61
|
+
th: { type: :block, ast_class: :table_cell, properties: { b: nil, jc: 'center' }, allowed_children: %i[_block _inline] },
|
62
|
+
td: { type: :block, ast_class: :table_cell, allowed_children: %i[_block _inline] },
|
59
63
|
div: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Normal' }, allowed_children: :_inline },
|
60
64
|
p: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Paragraph' }, allowed_children: :_inline },
|
65
|
+
caption: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Caption' }, allowed_children: :_inline },
|
61
66
|
h1: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading1' }, allowed_children: :_inline },
|
62
67
|
h2: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading2' }, allowed_children: :_inline },
|
63
68
|
h3: { type: :block, ast_class: :paragraph, properties: { pStyle: 'Heading3' }, allowed_children: :_inline },
|
@@ -68,6 +73,11 @@ module Sablon
|
|
68
73
|
ul: { type: :block, ast_class: :list, properties: { pStyle: 'ListBullet' }, allowed_children: %i[ul li] },
|
69
74
|
li: { type: :block, ast_class: :list_paragraph },
|
70
75
|
|
76
|
+
# inline style tags for tables
|
77
|
+
thead: { type: :inline, ast_class: nil, properties: { tblHeader: nil }, allowed_children: :tr },
|
78
|
+
tbody: { type: :inline, ast_class: nil, properties: {}, allowed_children: :tr },
|
79
|
+
tfoot: { type: :inline, ast_class: nil, properties: {}, allowed_children: :tr },
|
80
|
+
|
71
81
|
# inline style tags
|
72
82
|
span: { type: :inline, ast_class: nil, properties: {} },
|
73
83
|
strong: { type: :inline, ast_class: nil, properties: { b: nil } },
|
@@ -80,6 +90,7 @@ module Sablon
|
|
80
90
|
sup: { type: :inline, ast_class: nil, properties: { vertAlign: 'superscript' } },
|
81
91
|
|
82
92
|
# inline content tags
|
93
|
+
a: { type: :inline, ast_class: :hyperlink, properties: { rStyle: 'Hyperlink' } },
|
83
94
|
text: { type: :inline, ast_class: :run, properties: {}, allowed_children: [] },
|
84
95
|
br: { type: :inline, ast_class: :newline, properties: {}, allowed_children: [] }
|
85
96
|
}
|
@@ -122,6 +133,67 @@ module Sablon
|
|
122
133
|
},
|
123
134
|
'text-align' => ->(v) { return 'jc', v }
|
124
135
|
},
|
136
|
+
# Styles specific to the Table AST class
|
137
|
+
table: {
|
138
|
+
'border' => lambda { |v|
|
139
|
+
props = @defined_style_conversions[:node][:_border].call(v)
|
140
|
+
#
|
141
|
+
return 'tblBorders', [
|
142
|
+
{ top: props }, { start: props }, { bottom: props },
|
143
|
+
{ end: props }, { insideH: props }, { insideV: props }
|
144
|
+
]
|
145
|
+
},
|
146
|
+
'margin' => lambda { |v|
|
147
|
+
vals = v.split.map do |s|
|
148
|
+
@defined_style_conversions[:node][:_sz].call(s)
|
149
|
+
end
|
150
|
+
#
|
151
|
+
props = [vals[0], vals[0], vals[0], vals[0]] if vals.length == 1
|
152
|
+
props = [vals[0], vals[1], vals[0], vals[1]] if vals.length == 2
|
153
|
+
props = [vals[0], vals[1], vals[2], vals[1]] if vals.length == 3
|
154
|
+
props = [vals[0], vals[1], vals[2], vals[3]] if vals.length > 3
|
155
|
+
return 'tblCellMar', [
|
156
|
+
{ top: { w: props[0], type: 'dxa' } },
|
157
|
+
{ end: { w: props[1], type: 'dxa' } },
|
158
|
+
{ bottom: { w: props[2], type: 'dxa' } },
|
159
|
+
{ start: { w: props[3], type: 'dxa' } }
|
160
|
+
]
|
161
|
+
},
|
162
|
+
'cellspacing' => lambda { |v|
|
163
|
+
v = @defined_style_conversions[:node][:_sz].call(v)
|
164
|
+
return 'tblCellSpacing', { w: v, type: 'dxa' }
|
165
|
+
},
|
166
|
+
'width' => lambda { |v|
|
167
|
+
v = @defined_style_conversions[:node][:_sz].call(v)
|
168
|
+
return 'tblW', { w: v, type: 'dxa' }
|
169
|
+
}
|
170
|
+
},
|
171
|
+
# Styles specific to the TableCell AST class
|
172
|
+
table_cell: {
|
173
|
+
'border' => lambda { |v|
|
174
|
+
value = @defined_style_conversions[:table]['border'].call(v)[1]
|
175
|
+
return 'tcBorders', value
|
176
|
+
},
|
177
|
+
'colspan' => ->(v) { return 'gridSpan', v },
|
178
|
+
'margin' => lambda { |v|
|
179
|
+
value = @defined_style_conversions[:table]['margin'].call(v)[1]
|
180
|
+
return 'tcMar', value
|
181
|
+
},
|
182
|
+
'rowspan' => lambda { |v|
|
183
|
+
return 'vMerge', 'restart' if v == 'start'
|
184
|
+
return 'vMerge', v if v == 'continue'
|
185
|
+
return 'vMerge', nil if v == 'end'
|
186
|
+
},
|
187
|
+
'vertical-align' => ->(v) { return 'vAlign', v },
|
188
|
+
'white-space' => lambda { |v|
|
189
|
+
return 'noWrap', nil if v == 'nowrap'
|
190
|
+
return 'tcFitText', 'true' if v == 'fit'
|
191
|
+
},
|
192
|
+
'width' => lambda { |v|
|
193
|
+
value = @defined_style_conversions[:table]['width'].call(v)[1]
|
194
|
+
return 'tcW', value
|
195
|
+
}
|
196
|
+
},
|
125
197
|
# Styles specific to the Paragraph AST class
|
126
198
|
paragraph: {
|
127
199
|
'border' => lambda { |v|
|
data/lib/sablon/content.rb
CHANGED
@@ -71,11 +71,85 @@ module Sablon
|
|
71
71
|
def self.id; :word_ml end
|
72
72
|
def self.wraps?(value) false end
|
73
73
|
|
74
|
+
def initialize(value)
|
75
|
+
super Nokogiri::XML.fragment(value)
|
76
|
+
end
|
77
|
+
|
74
78
|
def append_to(paragraph, display_node, env)
|
75
|
-
|
76
|
-
|
79
|
+
# if all nodes are inline then add them to the existing paragraph
|
80
|
+
# otherwise replace the paragraph with the new content.
|
81
|
+
if all_inline?
|
82
|
+
pr_tag = display_node.parent.at_xpath('./w:rPr')
|
83
|
+
add_siblings_to(display_node.parent, pr_tag)
|
84
|
+
display_node.parent.remove
|
85
|
+
else
|
86
|
+
add_siblings_to(paragraph)
|
87
|
+
paragraph.remove
|
88
|
+
end
|
89
|
+
end
|
90
|
+
|
91
|
+
# This allows proper equality checks with other WordML content objects.
|
92
|
+
# Due to the fact the `xml` attribute is a live Nokogiri object
|
93
|
+
# the default `==` comparison returns false unless it is the exact
|
94
|
+
# same object being compared. This method instead checks if the XML
|
95
|
+
# being added to the document is the same when the `other` object is
|
96
|
+
# an instance of the WordML content class.
|
97
|
+
def ==(other)
|
98
|
+
if other.class == self.class
|
99
|
+
xml.to_s == other.xml.to_s
|
100
|
+
else
|
101
|
+
super
|
102
|
+
end
|
103
|
+
end
|
104
|
+
|
105
|
+
private
|
106
|
+
|
107
|
+
# Returns `true` if all of the xml nodes to be inserted are
|
108
|
+
def all_inline?
|
109
|
+
(xml.children.map(&:node_name) - inline_tags).empty?
|
110
|
+
end
|
111
|
+
|
112
|
+
# Array of tags allowed to be a child of the w:p XML tag as defined
|
113
|
+
# by the Open XML specification
|
114
|
+
def inline_tags
|
115
|
+
%w[w:bdo w:bookmarkEnd w:bookmarkStart w:commentRangeEnd
|
116
|
+
w:commentRangeStart w:customXml
|
117
|
+
w:customXmlDelRangeEnd w:customXmlDelRangeStart
|
118
|
+
w:customXmlInsRangeEnd w:customXmlInsRangeStart
|
119
|
+
w:customXmlMoveFromRangeEnd w:customXmlMoveFromRangeStart
|
120
|
+
w:customXmlMoveToRangeEnd w:customXmlMoveToRangeStart
|
121
|
+
w:del w:dir w:fldSimple w:hyperlink w:ins w:moveFrom
|
122
|
+
w:moveFromRangeEnd w:moveFromRangeStart w:moveTo
|
123
|
+
w:moveToRangeEnd w:moveToRangeStart m:oMath m:oMathPara
|
124
|
+
w:pPr w:proofErr w:r w:sdt w:smartTag]
|
125
|
+
end
|
126
|
+
|
127
|
+
# Adds the XML to be inserted in the document as siblings to the
|
128
|
+
# node passed in. Run properties are merged here because of namespace
|
129
|
+
# issues when working with a document fragment
|
130
|
+
def add_siblings_to(node, rpr_tag = nil)
|
131
|
+
xml.children.reverse.each do |child|
|
132
|
+
node.add_next_sibling child
|
133
|
+
# merge properties
|
134
|
+
next unless rpr_tag
|
135
|
+
merge_rpr_tags(child, rpr_tag.children)
|
136
|
+
end
|
137
|
+
end
|
138
|
+
|
139
|
+
# Merges the provided properties into the run proprties of the
|
140
|
+
# node passed in. Properties are only added if they are not already
|
141
|
+
# defined on the node itself.
|
142
|
+
def merge_rpr_tags(node, props)
|
143
|
+
# first assert that all child runs (w:r tags) have a w:rPr tag
|
144
|
+
node.xpath('.//w:r').each do |child|
|
145
|
+
child.prepend_child '<w:rPr></w:rPr>' unless child.at_xpath('./w:rPr')
|
146
|
+
end
|
147
|
+
#
|
148
|
+
# merge run props, only adding them if they aren't already defined
|
149
|
+
node.xpath('.//w:rPr').each do |pr_tag|
|
150
|
+
existing = pr_tag.children.map(&:node_name)
|
151
|
+
props.map { |pr| pr_tag << pr unless existing.include? pr.node_name }
|
77
152
|
end
|
78
|
-
paragraph.remove
|
79
153
|
end
|
80
154
|
end
|
81
155
|
|
data/lib/sablon/environment.rb
CHANGED
@@ -5,6 +5,7 @@ module Sablon
|
|
5
5
|
attr_reader :template
|
6
6
|
attr_reader :numbering
|
7
7
|
attr_reader :context
|
8
|
+
attr_reader :relationship
|
8
9
|
|
9
10
|
# returns a new environment with merged contexts
|
10
11
|
def alter_context(context = {})
|
@@ -20,9 +21,11 @@ module Sablon
|
|
20
21
|
if parent_env
|
21
22
|
@template = parent_env.template
|
22
23
|
@numbering = parent_env.numbering
|
24
|
+
@relationship = parent_env.relationship
|
23
25
|
else
|
24
26
|
@template = template
|
25
27
|
@numbering = Numbering.new
|
28
|
+
@relationship = Relationship.new
|
26
29
|
end
|
27
30
|
#
|
28
31
|
@context = Context.transform_hash(context)
|
data/lib/sablon/html/ast.rb
CHANGED
@@ -1,4 +1,5 @@
|
|
1
1
|
require "sablon/html/ast_builder"
|
2
|
+
require "sablon/html/node_properties"
|
2
3
|
|
3
4
|
module Sablon
|
4
5
|
class HTMLConverter
|
@@ -90,81 +91,6 @@ module Sablon
|
|
90
91
|
end
|
91
92
|
end
|
92
93
|
|
93
|
-
# Manages the properties for an AST node
|
94
|
-
class NodeProperties
|
95
|
-
attr_reader :transferred_properties
|
96
|
-
|
97
|
-
def self.paragraph(properties)
|
98
|
-
new('w:pPr', properties, Paragraph::PROPERTIES)
|
99
|
-
end
|
100
|
-
|
101
|
-
def self.run(properties)
|
102
|
-
new('w:rPr', properties, Run::PROPERTIES)
|
103
|
-
end
|
104
|
-
|
105
|
-
def initialize(tagname, properties, whitelist)
|
106
|
-
@tagname = tagname
|
107
|
-
filter_properties(properties, whitelist)
|
108
|
-
end
|
109
|
-
|
110
|
-
def inspect
|
111
|
-
@properties.map { |k, v| v ? "#{k}=#{v}" : k }.join(';')
|
112
|
-
end
|
113
|
-
|
114
|
-
def [](key)
|
115
|
-
@properties[key]
|
116
|
-
end
|
117
|
-
|
118
|
-
def []=(key, value)
|
119
|
-
@properties[key] = value
|
120
|
-
end
|
121
|
-
|
122
|
-
def to_docx
|
123
|
-
"<#{@tagname}>#{properties_word_ml}</#{@tagname}>" unless @properties.empty?
|
124
|
-
end
|
125
|
-
|
126
|
-
private
|
127
|
-
|
128
|
-
# processes properties adding those on the whitelist to the
|
129
|
-
# properties instance variable and those not to the transferred_properties
|
130
|
-
# isntance variable
|
131
|
-
def filter_properties(properties, whitelist)
|
132
|
-
@transferred_properties = {}
|
133
|
-
@properties = {}
|
134
|
-
#
|
135
|
-
properties.each do |key, value|
|
136
|
-
if whitelist.include? key.to_s
|
137
|
-
@properties[key] = value
|
138
|
-
else
|
139
|
-
@transferred_properties[key] = value
|
140
|
-
end
|
141
|
-
end
|
142
|
-
end
|
143
|
-
|
144
|
-
# processes attributes defined on the node into wordML property syntax
|
145
|
-
def properties_word_ml
|
146
|
-
@properties.map { |k, v| transform_attr(k, v) }.join
|
147
|
-
end
|
148
|
-
|
149
|
-
# properties that have a list as the value get nested in tags and
|
150
|
-
# each entry in the list is transformed. When a value is a hash the
|
151
|
-
# keys in the hash are used to explicitly build the XML tag attributes.
|
152
|
-
def transform_attr(key, value)
|
153
|
-
if value.is_a? Array
|
154
|
-
sub_attrs = value.map do |sub_prop|
|
155
|
-
sub_prop.map { |k, v| transform_attr(k, v) }
|
156
|
-
end
|
157
|
-
"<w:#{key}>#{sub_attrs.join}</w:#{key}>"
|
158
|
-
elsif value.is_a? Hash
|
159
|
-
props = value.map { |k, v| format('w:%s="%s"', k, v) if v }
|
160
|
-
"<w:#{key} #{props.compact.join(' ')} />"
|
161
|
-
else
|
162
|
-
value = format('w:val="%s" ', value) if value
|
163
|
-
"<w:#{key} #{value}/>"
|
164
|
-
end
|
165
|
-
end
|
166
|
-
end
|
167
|
-
|
168
94
|
# A container for an array of AST nodes with convenience methods to
|
169
95
|
# work with the internal array as if it were a regular node
|
170
96
|
class Collection < Node
|
@@ -189,6 +115,10 @@ module Sablon
|
|
189
115
|
def inspect
|
190
116
|
"[#{nodes.map(&:inspect).join(', ')}]"
|
191
117
|
end
|
118
|
+
|
119
|
+
def <<(node)
|
120
|
+
@nodes << node
|
121
|
+
end
|
192
122
|
end
|
193
123
|
|
194
124
|
# Stores all of the AST nodes from the current fragment of HTML being
|
@@ -217,10 +147,23 @@ module Sablon
|
|
217
147
|
# An AST node representing the top level content container for a word
|
218
148
|
# document. These cannot be nested within other paragraph elements
|
219
149
|
class Paragraph < Node
|
150
|
+
attr_accessor :runs
|
151
|
+
|
220
152
|
PROPERTIES = %w[framePr ind jc keepLines keepNext numPr
|
221
153
|
outlineLvl pBdr pStyle rPr sectPr shd spacing
|
222
154
|
tabs textAlignment].freeze
|
223
|
-
|
155
|
+
|
156
|
+
# Permitted child tags defined by the OpenXML spec
|
157
|
+
CHILD_TAGS = %w[w:bdo w:bookmarkEnd w:bookmarkStart w:commentRangeEnd
|
158
|
+
w:commentRangeStart w:customXml
|
159
|
+
w:customXmlDelRangeEnd w:customXmlDelRangeStart
|
160
|
+
w:customXmlInsRangeEnd w:customXmlInsRangeStart
|
161
|
+
w:customXmlMoveFromRangeEnd w:customXmlMoveFromRangeStart
|
162
|
+
w:customXmlMoveToRangeEnd w:customXmlMoveToRangeStart
|
163
|
+
w:del w:dir w:fldSimple w:hyperlink w:ins w:moveFrom
|
164
|
+
w:moveFromRangeEnd w:moveFromRangeStart w:moveTo
|
165
|
+
w:moveToRangeEnd w:moveToRangeStart m:oMath m:oMathPara
|
166
|
+
w:pPr w:proofErr w:r w:sdt w:smartTag]
|
224
167
|
|
225
168
|
def initialize(env, node, properties)
|
226
169
|
super
|
@@ -340,6 +283,195 @@ module Sablon
|
|
340
283
|
end
|
341
284
|
end
|
342
285
|
|
286
|
+
# Builds a table from html table tags
|
287
|
+
class Table < Node
|
288
|
+
PROPERTIES = %w[jc shd tblBorders tblCaption tblCellMar tblCellSpacing
|
289
|
+
tblInd tblLayout tblLook tblOverlap tblpPr tblStyle
|
290
|
+
tblStyleColBandSize tblStyleRowBandSize tblW].freeze
|
291
|
+
|
292
|
+
def initialize(env, node, properties)
|
293
|
+
super
|
294
|
+
|
295
|
+
# Process properties
|
296
|
+
properties = self.class.process_properties(properties)
|
297
|
+
@properties = NodeProperties.table(properties)
|
298
|
+
trans_props = transferred_properties
|
299
|
+
|
300
|
+
# Pull out the caption node if it exists and convert it separately.
|
301
|
+
# If multiple caption tags are defined, only the first one is kept.
|
302
|
+
@caption = node.xpath('./caption').remove
|
303
|
+
@caption = nil if @caption.empty?
|
304
|
+
if @caption
|
305
|
+
cap_side_pat = /caption-side: ?(top|bottom)/
|
306
|
+
@cap_side = @caption.attr('style').to_s.match(cap_side_pat).to_a[1]
|
307
|
+
node.add_previous_sibling @caption
|
308
|
+
@caption = ASTBuilder.html_to_ast(env, @caption, trans_props)[0]
|
309
|
+
end
|
310
|
+
|
311
|
+
# convert remaining child nodes and pass on transferrable properties
|
312
|
+
@children = ASTBuilder.html_to_ast(env, node.children, trans_props)
|
313
|
+
@children = Collection.new(@children)
|
314
|
+
end
|
315
|
+
|
316
|
+
def to_docx
|
317
|
+
if @caption && @cap_side == 'bottom'
|
318
|
+
super('w:tbl') + @caption.to_docx
|
319
|
+
elsif @caption
|
320
|
+
# caption always goes above table unless explicitly set to "bottom"
|
321
|
+
@caption.to_docx + super('w:tbl')
|
322
|
+
else
|
323
|
+
super('w:tbl')
|
324
|
+
end
|
325
|
+
end
|
326
|
+
|
327
|
+
def accept(visitor)
|
328
|
+
super
|
329
|
+
@children.accept(visitor)
|
330
|
+
end
|
331
|
+
|
332
|
+
def inspect
|
333
|
+
if @caption && @cap_side == 'bottom'
|
334
|
+
"<Table{#{@properties.inspect}}: #{@children.inspect}, #{@caption.inspect}>"
|
335
|
+
elsif @caption
|
336
|
+
"<Table{#{@properties.inspect}}: #{@caption.inspect}, #{@children.inspect}>"
|
337
|
+
else
|
338
|
+
"<Table{#{@properties.inspect}}: #{@children.inspect}>"
|
339
|
+
end
|
340
|
+
end
|
341
|
+
|
342
|
+
private
|
343
|
+
|
344
|
+
def children_to_docx
|
345
|
+
@children.to_docx
|
346
|
+
end
|
347
|
+
end
|
348
|
+
|
349
|
+
# Converts html table rows into wordML table rows
|
350
|
+
class TableRow < Node
|
351
|
+
PROPERTIES = %w[cantSplit hidden jc tblCellSpacing tblHeader
|
352
|
+
trHeight tblPrEx].freeze
|
353
|
+
|
354
|
+
def initialize(env, node, properties)
|
355
|
+
super
|
356
|
+
properties = self.class.process_properties(properties)
|
357
|
+
@properties = NodeProperties.table_row(properties)
|
358
|
+
#
|
359
|
+
trans_props = transferred_properties
|
360
|
+
@children = ASTBuilder.html_to_ast(env, node.children, trans_props)
|
361
|
+
@children = Collection.new(@children)
|
362
|
+
end
|
363
|
+
|
364
|
+
def to_docx
|
365
|
+
super('w:tr')
|
366
|
+
end
|
367
|
+
|
368
|
+
def accept(visitor)
|
369
|
+
super
|
370
|
+
@children.accept(visitor)
|
371
|
+
end
|
372
|
+
|
373
|
+
def inspect
|
374
|
+
"<TableRow{#{@properties.inspect}}: #{@children.inspect}>"
|
375
|
+
end
|
376
|
+
|
377
|
+
private
|
378
|
+
|
379
|
+
def children_to_docx
|
380
|
+
@children.to_docx
|
381
|
+
end
|
382
|
+
end
|
383
|
+
|
384
|
+
# Converts html table cells into wordML table cells
|
385
|
+
class TableCell < Node
|
386
|
+
PROPERTIES = %w[gridSpan hideMark noWrap shd tcBorders tcFitText
|
387
|
+
tcMar tcW vAlign vMerge].freeze
|
388
|
+
|
389
|
+
# Permitted child tags defined by the OpenXML spec
|
390
|
+
CHILD_TAGS = %w[w:altChunk w:bookmarkEnd w:bookmarkStart w:commentRangeEnd
|
391
|
+
w:commentRangeStart w:customXml w:customXmlDelRangeEnd
|
392
|
+
w:customXmlDelRangeStart w:customXmlInsRangeEnd
|
393
|
+
w:customXmlInsRangeStart w:customXmlMoveFromRangeEnd
|
394
|
+
w:customXmlMoveFromRangeStart w:customXmlMoveToRangeEnd
|
395
|
+
w:customXmlMoveToRangeStart w:del w:ins w:moveFrom
|
396
|
+
w:moveFromRangeEnd w:moveFromRangeStart w:moveTo
|
397
|
+
w:moveToRangeEnd w:moveToRangeStart m:oMath m:oMathPara
|
398
|
+
w:p w:permEnd w:permStart w:proofErr w:sdt w:tbl w:tcPr]
|
399
|
+
|
400
|
+
def initialize(env, node, properties)
|
401
|
+
super
|
402
|
+
properties = self.class.process_properties(properties)
|
403
|
+
@properties = NodeProperties.table_cell(properties)
|
404
|
+
#
|
405
|
+
# Nodes are processed first "as is" and then based on the XML
|
406
|
+
# generated wrapped by paragraphs.
|
407
|
+
trans_props = transferred_properties
|
408
|
+
@children = ASTBuilder.html_to_ast(env, node.children, trans_props)
|
409
|
+
@children = wrap_with_paragraphs(env, @children)
|
410
|
+
end
|
411
|
+
|
412
|
+
def to_docx
|
413
|
+
super('w:tc')
|
414
|
+
end
|
415
|
+
|
416
|
+
def accept(visitor)
|
417
|
+
super
|
418
|
+
@children.accept(visitor)
|
419
|
+
end
|
420
|
+
|
421
|
+
def inspect
|
422
|
+
"<TableCell{#{@properties.inspect}}: #{@children.inspect}>"
|
423
|
+
end
|
424
|
+
|
425
|
+
private
|
426
|
+
|
427
|
+
# Wraps nodes in Paragraph AST nodes if needed to produced a valid
|
428
|
+
# document
|
429
|
+
def wrap_with_paragraphs(env, nodes)
|
430
|
+
# convert all nodes to live xml, and use first node to determine
|
431
|
+
# if that AST node should be wrapped in a paragraph
|
432
|
+
nodes_xml = nodes.map { |n| Nokogiri::XML.fragment(n.to_docx) }
|
433
|
+
#
|
434
|
+
para = nil
|
435
|
+
new_nodes = []
|
436
|
+
nodes_xml.each_with_index do |n, i|
|
437
|
+
next unless n.children.first
|
438
|
+
# add all nodes that need wrapped to a paragraph sequentially.
|
439
|
+
# New paragraphs are created when something that doesn't need
|
440
|
+
# wrapped is encountered to retain proper content ordering.
|
441
|
+
first_node_name = n.children.first.node_name
|
442
|
+
if wrapped_by_paragraph.include? first_node_name
|
443
|
+
if para.nil?
|
444
|
+
para = new_paragraph(env)
|
445
|
+
new_nodes << para
|
446
|
+
end
|
447
|
+
para.runs << nodes[i]
|
448
|
+
else
|
449
|
+
new_nodes << nodes[i]
|
450
|
+
para = nil
|
451
|
+
end
|
452
|
+
end
|
453
|
+
# Ensure the table cell has an empty paragraph if nothing else
|
454
|
+
new_nodes << new_paragraph(env) if new_nodes.empty?
|
455
|
+
# filter nils and return
|
456
|
+
Collection.new(new_nodes.compact)
|
457
|
+
end
|
458
|
+
|
459
|
+
# Returns a list of child tags that need to be wrapped in a paragraph
|
460
|
+
def wrapped_by_paragraph
|
461
|
+
Paragraph::CHILD_TAGS - self.class::CHILD_TAGS
|
462
|
+
end
|
463
|
+
|
464
|
+
# Creates a new Paragraph AST node, with no children
|
465
|
+
def new_paragraph(env)
|
466
|
+
para = Nokogiri::HTML.fragment('<p></p>').first_element_child
|
467
|
+
ASTBuilder.html_to_ast(env, [para], transferred_properties).first
|
468
|
+
end
|
469
|
+
|
470
|
+
def children_to_docx
|
471
|
+
@children.to_docx
|
472
|
+
end
|
473
|
+
end
|
474
|
+
|
343
475
|
# Create a run of text in the document, runs cannot be nested within
|
344
476
|
# each other
|
345
477
|
class Run < Node
|
@@ -387,5 +519,46 @@ module Sablon
|
|
387
519
|
"<w:br/>"
|
388
520
|
end
|
389
521
|
end
|
522
|
+
|
523
|
+
# Creates a clickable URL in the word document, this only supports external
|
524
|
+
# urls only
|
525
|
+
class Hyperlink < Node
|
526
|
+
def initialize(env, node, properties)
|
527
|
+
super
|
528
|
+
# properties are passed directly to runs because hyperlink nodes
|
529
|
+
# don't have a corresponding property tag like runs or paragraphs.
|
530
|
+
@runs = ASTBuilder.html_to_ast(env, node.children, properties)
|
531
|
+
@runs = Collection.new(@runs)
|
532
|
+
@target = node.attributes['href'].value
|
533
|
+
#
|
534
|
+
hyperlink_relation = {
|
535
|
+
Id: 'rId' + SecureRandom.uuid.delete('-'),
|
536
|
+
Type: 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink',
|
537
|
+
Target: @target,
|
538
|
+
TargetMode: 'External'
|
539
|
+
}
|
540
|
+
env.relationship.relationships << hyperlink_relation
|
541
|
+
@attributes = { 'r:id' => hyperlink_relation[:Id] }
|
542
|
+
end
|
543
|
+
|
544
|
+
def to_docx
|
545
|
+
super('w:hyperlink')
|
546
|
+
end
|
547
|
+
|
548
|
+
def inspect
|
549
|
+
"<Hyperlink{target:#{@target}}: #{@runs.inspect}>"
|
550
|
+
end
|
551
|
+
|
552
|
+
def accept(visitor)
|
553
|
+
super
|
554
|
+
@runs.accept(visitor)
|
555
|
+
end
|
556
|
+
|
557
|
+
private
|
558
|
+
|
559
|
+
def children_to_docx
|
560
|
+
@runs.to_docx
|
561
|
+
end
|
562
|
+
end
|
390
563
|
end
|
391
564
|
end
|