ghostwriter 0.4.2 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +164 -76
- data/RELEASE_NOTES.md +17 -0
- data/lib/ghostwriter/version.rb +1 -1
- data/lib/ghostwriter/writer.rb +63 -22
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 80d5aced9b18684b3640c28ce1e86c8e9859942f57fefbd26dd1a1f3e7791eaf
|
4
|
+
data.tar.gz: 0f71619c0a7e247cf163f074ca7c8bb54fa8c93daaa5649df8f853fd0ccab1da
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f9760753d4ffc30bee200a33347cd9aeda0b4593304f07ff9ce53c4ca1f971d51b50644feb763ffc51ec5635202e971704592f45d0cbe021693a7e790e39c7e9
|
7
|
+
data.tar.gz: 203df0d639d25f35a73dcdde0f12c037e8f508e086c34cec5e2da7325a8b0e173db10590da50550a50e853f74f0753ce231b36646619fc3c1f166bdd986e7186
|
data/README.md
CHANGED
@@ -2,13 +2,14 @@
|
|
2
2
|
|
3
3
|
Ghostwriter rewrites HTML as plain text while preserving as much legibility and functionality as possible.
|
4
4
|
|
5
|
-
It's sort of like a reverse-markdown.
|
5
|
+
It's sort of like a reverse-markdown or a very, very simple screen reader.
|
6
6
|
|
7
7
|
## But Why, Though?
|
8
8
|
|
9
|
-
* Spam filters tend to like emails with a plain text alternative
|
10
9
|
* Some email clients won't or can’t handle HTML at all
|
11
10
|
* Some people explicitly choose plaintext just by preference or accessibility
|
11
|
+
* Spam filters tend to like emails with a plain text alternative (but if you use this gem to help you spam people, I
|
12
|
+
will yell at you)
|
12
13
|
|
13
14
|
## Installation
|
14
15
|
|
@@ -28,14 +29,16 @@ Or install it manually with:
|
|
28
29
|
|
29
30
|
## Usage
|
30
31
|
|
31
|
-
Create a `Ghostwriter::Writer` with the html you want modified
|
32
|
+
Create a `Ghostwriter::Writer` and call `#textify` with the html you want modified:
|
32
33
|
|
33
34
|
```ruby
|
34
35
|
html = '<html><body><p>This is some markup <a href="tenjin.ca">and a link</a></p><p>Other tags translate, too</p></body></html>'
|
35
36
|
|
36
|
-
Ghostwriter::Writer.new(html)
|
37
|
+
Ghostwriter::Writer.new.textify(html)
|
37
38
|
```
|
39
|
+
|
38
40
|
Produces:
|
41
|
+
|
39
42
|
```
|
40
43
|
This is some markup and a link (tenjin.ca)
|
41
44
|
|
@@ -46,93 +49,153 @@ Other tags translate, too
|
|
46
49
|
|
47
50
|
Links are converted to the link text followed by the link target in brackets:
|
48
51
|
|
49
|
-
```
|
50
|
-
|
51
|
-
|
52
|
+
```html
|
53
|
+
|
54
|
+
<html>
|
55
|
+
<body>
|
56
|
+
Visit our <a href="https://example.com">Website</a>
|
57
|
+
<body>
|
58
|
+
</html>
|
52
59
|
```
|
53
60
|
|
54
|
-
|
61
|
+
Becomes:
|
62
|
+
|
55
63
|
```
|
56
64
|
Visit our Website (https://example.com)
|
57
65
|
```
|
58
66
|
|
59
67
|
#### Relative Links
|
68
|
+
|
60
69
|
Since emails are wholly distinct from your web address, relative links might break.
|
61
70
|
|
62
71
|
To avoid this problem, either use the `<base>` header tag:
|
63
72
|
|
64
|
-
```
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
HTML
|
75
|
-
|
76
|
-
Ghostwriter::Writer.new(html).textify
|
73
|
+
```html
|
74
|
+
|
75
|
+
<html>
|
76
|
+
<head>
|
77
|
+
<base href="https://www.example.com">
|
78
|
+
</head>
|
79
|
+
<body>
|
80
|
+
Use the base tag to <a href="/contact">expand</a> links.
|
81
|
+
</body>
|
82
|
+
</html>
|
77
83
|
```
|
78
|
-
|
84
|
+
|
85
|
+
Becomes:
|
86
|
+
|
79
87
|
```
|
80
|
-
|
88
|
+
Use the base tag to expand (https://www.example.com/contact) links
|
81
89
|
```
|
82
90
|
|
83
|
-
Or you can use the `link_base`
|
91
|
+
Or you can use the `link_base` configuration:
|
92
|
+
|
84
93
|
```ruby
|
85
|
-
|
94
|
+
Ghostwriter::Writer.new(link_base: 'tenjin.ca').textify(html)
|
95
|
+
```
|
86
96
|
|
87
|
-
|
97
|
+
### Images
|
98
|
+
|
99
|
+
Images with alt text are converted:
|
100
|
+
|
101
|
+
```html
|
102
|
+
<img src="logo.jpg" alt="ACME Anvils" />
|
88
103
|
```
|
89
104
|
|
90
|
-
|
105
|
+
Becomes:
|
106
|
+
|
91
107
|
```
|
92
|
-
|
108
|
+
ACME Anvils (logo.jpg)
|
93
109
|
```
|
94
110
|
|
95
|
-
|
96
|
-
Tables are often used email structuring because support for more modern CSS is inconsistent.
|
111
|
+
But images lacking alt text or with a presentation ARIA role are ignored:
|
97
112
|
|
98
|
-
|
113
|
+
```html
|
114
|
+
<!-- these will just become an empty string -->
|
115
|
+
<img src="decoration.jpg">
|
116
|
+
<img src="logo.jpg" role="presentation">
|
117
|
+
```
|
99
118
|
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
<base href="https://www.example.com/">
|
105
|
-
</head>
|
106
|
-
<body>
|
107
|
-
<table>
|
108
|
-
<thead>
|
109
|
-
<tr>
|
110
|
-
<th>Ship</th>
|
111
|
-
<th>Captain</th>
|
112
|
-
</tr>
|
113
|
-
</thead>
|
114
|
-
<tbody>
|
115
|
-
<tr>
|
116
|
-
<td>Enterprise</td>
|
117
|
-
<td>Jean-Luc Picard</td>
|
118
|
-
</tr>
|
119
|
-
<tr>
|
120
|
-
<td>TARDIS</td>
|
121
|
-
<td>The Doctor</td>
|
122
|
-
</tr>
|
123
|
-
<tr>
|
124
|
-
<td>Planet Express Ship</td>
|
125
|
-
<td>Turanga Leela</td>
|
126
|
-
</tr>
|
127
|
-
</tbody>
|
128
|
-
</table>
|
129
|
-
</body>
|
130
|
-
</html>
|
131
|
-
HTML
|
132
|
-
|
133
|
-
Ghostwriter::Writer.new(html).textify
|
119
|
+
And images with data URIs won't include the data portion.
|
120
|
+
|
121
|
+
```html
|
122
|
+
<img src="" alt="Data picture"/>
|
134
123
|
```
|
135
|
-
|
124
|
+
|
125
|
+
Becomes:
|
126
|
+
|
127
|
+
```
|
128
|
+
Data picture (embedded)
|
129
|
+
```
|
130
|
+
|
131
|
+
|
132
|
+
### Lists
|
133
|
+
|
134
|
+
Lists are converted, too. They are padded with newlines and are given simple markers:
|
135
|
+
|
136
|
+
```html
|
137
|
+
|
138
|
+
<ul>
|
139
|
+
<li>Planes</li>
|
140
|
+
<li>Trains</li>
|
141
|
+
<li>Automobiles</li>
|
142
|
+
</ul>
|
143
|
+
<ol>
|
144
|
+
<li>I get knocked down</li>
|
145
|
+
<li>I get up again</li>
|
146
|
+
<li>Never gonna keep me down</li>
|
147
|
+
</ol>
|
148
|
+
```
|
149
|
+
|
150
|
+
Becomes:
|
151
|
+
|
152
|
+
```
|
153
|
+
|
154
|
+
- Planes
|
155
|
+
- Trains
|
156
|
+
- Automobiles
|
157
|
+
|
158
|
+
1. I get knocked down
|
159
|
+
2. I get up again
|
160
|
+
3. Never gonna keep me down
|
161
|
+
|
162
|
+
```
|
163
|
+
|
164
|
+
### Tables
|
165
|
+
|
166
|
+
Tables are still often used in email structuring because support for more modern HTML and CSS is inconsistent. If your
|
167
|
+
table is purely presentational, mark it with `role="presentation"`. See below for details.
|
168
|
+
|
169
|
+
For real data tables, Ghostwriter tries to maintain table structure for simple tables:
|
170
|
+
|
171
|
+
```html
|
172
|
+
|
173
|
+
<table>
|
174
|
+
<thead>
|
175
|
+
<tr>
|
176
|
+
<th>Ship</th>
|
177
|
+
<th>Captain</th>
|
178
|
+
</tr>
|
179
|
+
</thead>
|
180
|
+
<tbody>
|
181
|
+
<tr>
|
182
|
+
<td>Enterprise</td>
|
183
|
+
<td>Jean-Luc Picard</td>
|
184
|
+
</tr>
|
185
|
+
<tr>
|
186
|
+
<td>TARDIS</td>
|
187
|
+
<td>The Doctor</td>
|
188
|
+
</tr>
|
189
|
+
<tr>
|
190
|
+
<td>Planet Express Ship</td>
|
191
|
+
<td>Turanga Leela</td>
|
192
|
+
</tr>
|
193
|
+
</tbody>
|
194
|
+
</table>
|
195
|
+
```
|
196
|
+
|
197
|
+
Becomes:
|
198
|
+
|
136
199
|
```
|
137
200
|
| Ship | Captain |
|
138
201
|
|---------------------|-----------------|
|
@@ -141,6 +204,30 @@ Produces:
|
|
141
204
|
| Planet Express Ship | Turanga Leela |
|
142
205
|
```
|
143
206
|
|
207
|
+
### Presentation ARIA Role
|
208
|
+
|
209
|
+
Lists and tables with `role="presentation"` will be treated as a simple container and the normal behaviour will be
|
210
|
+
suppressed.
|
211
|
+
|
212
|
+
```html
|
213
|
+
|
214
|
+
<table role="presentation">
|
215
|
+
<tr>
|
216
|
+
<td>The table is a lie</td>
|
217
|
+
</tr>
|
218
|
+
</table>
|
219
|
+
<ul role="presentation">
|
220
|
+
<li>No such list</li>
|
221
|
+
</ul>
|
222
|
+
```
|
223
|
+
|
224
|
+
Becomes:
|
225
|
+
|
226
|
+
```
|
227
|
+
The table is a lie
|
228
|
+
No such list
|
229
|
+
```
|
230
|
+
|
144
231
|
### Mail Gem Example
|
145
232
|
|
146
233
|
To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html
|
@@ -149,7 +236,8 @@ through Ghostwriter:
|
|
149
236
|
```ruby
|
150
237
|
require 'mail'
|
151
238
|
|
152
|
-
html
|
239
|
+
html = 'My email and a <a href="https://tenjin.ca">link</a>'
|
240
|
+
ghostwriter = Ghostwriter::Writer.new
|
153
241
|
|
154
242
|
Mail.deliver do
|
155
243
|
to 'bob@example.com'
|
@@ -162,7 +250,7 @@ Mail.deliver do
|
|
162
250
|
end
|
163
251
|
|
164
252
|
text_part do
|
165
|
-
body
|
253
|
+
body ghostwriter.textify(html)
|
166
254
|
end
|
167
255
|
end
|
168
256
|
|
@@ -181,19 +269,19 @@ After checking out the repo, run `bundle install` to install dependencies. Then,
|
|
181
269
|
can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
182
270
|
|
183
271
|
#### Local Install
|
184
|
-
To install this gem onto your local machine only, run
|
185
272
|
|
186
|
-
|
273
|
+
To install this gem onto your local machine only, run
|
274
|
+
|
275
|
+
`bundle exec rake install`
|
187
276
|
|
188
277
|
#### Gem Release
|
278
|
+
|
189
279
|
To release a gem to the world at large
|
190
280
|
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
196
|
-
3. Do a wee dance
|
281
|
+
1. Update the version number in `version.rb`,
|
282
|
+
2. Run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push
|
283
|
+
the `.gem` file to [rubygems.org](https://rubygems.org).
|
284
|
+
3. Do a wee dance
|
197
285
|
|
198
286
|
## License
|
199
287
|
|
data/RELEASE_NOTES.md
CHANGED
@@ -1,5 +1,22 @@
|
|
1
1
|
# Release Notes
|
2
2
|
|
3
|
+
## 1.0.0 (2021-03-21)
|
4
|
+
|
5
|
+
### Major
|
6
|
+
|
7
|
+
* Moved `link_base` parameter to constructor
|
8
|
+
* Moved input HTML parameter to `#textify`
|
9
|
+
|
10
|
+
### Minor
|
11
|
+
|
12
|
+
* Treats tables and lists with role="presentation" as simple containers
|
13
|
+
* Now handles ordered and unordered lists
|
14
|
+
* Images are now replaced with their alt text
|
15
|
+
|
16
|
+
### Bugfixes
|
17
|
+
|
18
|
+
* none
|
19
|
+
|
3
20
|
## 0.4.2 (2021-03-17)
|
4
21
|
|
5
22
|
### Major
|
data/lib/ghostwriter/version.rb
CHANGED
data/lib/ghostwriter/writer.rb
CHANGED
@@ -3,23 +3,31 @@
|
|
3
3
|
module Ghostwriter
|
4
4
|
# Main Ghostwriter converter object.
|
5
5
|
class Writer
|
6
|
-
|
7
|
-
|
6
|
+
# Creates a new ghostwriter
|
7
|
+
#
|
8
|
+
# @param [String] link_base the url to prefix relative links with
|
9
|
+
def initialize(link_base: '')
|
10
|
+
@link_base = link_base
|
11
|
+
@list_marker = '-'
|
8
12
|
end
|
9
13
|
|
10
14
|
# Strips HTML down to plain text.
|
11
15
|
#
|
12
|
-
# @param
|
13
|
-
|
14
|
-
|
16
|
+
# @param html [String] the HTML to be convert to text
|
17
|
+
#
|
18
|
+
# @return converted text
|
19
|
+
def textify(html)
|
20
|
+
doc = Nokogiri::HTML(normalize_whitespace(html).gsub('</p>', "</p>\n\n"))
|
21
|
+
|
22
|
+
doc.search('style, script').remove
|
15
23
|
|
16
|
-
doc
|
24
|
+
replace_anchors(doc)
|
25
|
+
replace_images(doc)
|
17
26
|
|
18
|
-
doc
|
19
|
-
doc.search('script').remove
|
27
|
+
simple_replace(doc, '*[role="presentation"]', "\n")
|
20
28
|
|
21
|
-
replace_anchors(doc, link_base)
|
22
29
|
replace_headers(doc)
|
30
|
+
replace_lists(doc)
|
23
31
|
replace_tables(doc)
|
24
32
|
|
25
33
|
simple_replace(doc, 'hr', "\n----------\n")
|
@@ -39,16 +47,9 @@ module Ghostwriter
|
|
39
47
|
html.gsub(/\s/, ' ').squeeze(' ')
|
40
48
|
end
|
41
49
|
|
42
|
-
def replace_anchors(doc
|
43
|
-
base = get_link_base(doc, default: link_base)
|
44
|
-
|
50
|
+
def replace_anchors(doc)
|
45
51
|
doc.search('a').each do |link_node|
|
46
|
-
|
47
|
-
href = URI(link_node['href'])
|
48
|
-
href = base + href.to_s unless href.absolute?
|
49
|
-
rescue URI::InvalidURIError
|
50
|
-
href = link_node['href'].gsub(/^(tel|mailto):/, '').strip
|
51
|
-
end
|
52
|
+
href = get_link_target(link_node, get_link_base(doc))
|
52
53
|
|
53
54
|
link_node.inner_html = if link_matches(href, link_node.inner_html)
|
54
55
|
href.to_s
|
@@ -62,16 +63,56 @@ module Ghostwriter
|
|
62
63
|
first.to_s.gsub(%r{^https?://}, '').chomp('/') == second.gsub(%r{^https?://}, '').chomp('/')
|
63
64
|
end
|
64
65
|
|
65
|
-
def get_link_base(doc
|
66
|
+
def get_link_base(doc)
|
66
67
|
# <base> node is unique by W3C spec
|
67
68
|
base_node = doc.search('base').first
|
68
69
|
|
69
|
-
base_node ? base_node['href'] :
|
70
|
+
base_node ? base_node['href'] : @link_base
|
71
|
+
end
|
72
|
+
|
73
|
+
def get_link_target(link_node, base)
|
74
|
+
href = URI(link_node['href'])
|
75
|
+
if href.absolute?
|
76
|
+
href
|
77
|
+
else
|
78
|
+
base + href.to_s
|
79
|
+
end
|
80
|
+
rescue URI::InvalidURIError
|
81
|
+
link_node['href'].gsub(/^(tel|mailto):/, '').strip
|
70
82
|
end
|
71
83
|
|
72
84
|
def replace_headers(doc)
|
73
85
|
doc.search('header, h1, h2, h3, h4, h5, h6').each do |node|
|
74
|
-
node.inner_html = "
|
86
|
+
node.inner_html = "-- #{ node.inner_html } --\n".squeeze(' ')
|
87
|
+
end
|
88
|
+
end
|
89
|
+
|
90
|
+
def replace_images(doc)
|
91
|
+
doc.search('img[role=presentation]').remove
|
92
|
+
|
93
|
+
doc.search('img').each do |img_node|
|
94
|
+
src = img_node['src']
|
95
|
+
alt = img_node['alt']
|
96
|
+
|
97
|
+
src = 'embedded' if src.start_with? 'data:'
|
98
|
+
|
99
|
+
img_node.replace("#{ alt } (#{ src })") unless alt.nil? || alt.empty?
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
def replace_lists(doc)
|
104
|
+
doc.search('ul, ol').each do |list_node|
|
105
|
+
list_node.search('./li').each_with_index do |list_item, i|
|
106
|
+
marker = if list_node.node_name == 'ol'
|
107
|
+
"#{ i + 1 }."
|
108
|
+
else
|
109
|
+
@list_marker
|
110
|
+
end
|
111
|
+
|
112
|
+
list_item.inner_html = "#{ marker } #{ list_item.inner_html }\n".squeeze(' ')
|
113
|
+
end
|
114
|
+
|
115
|
+
list_node.replace("\n#{ list_node.inner_html }\n")
|
75
116
|
end
|
76
117
|
end
|
77
118
|
|
@@ -120,7 +161,7 @@ module Ghostwriter
|
|
120
161
|
|
121
162
|
def simple_replace(doc, tag, replacement)
|
122
163
|
doc.search(tag).each do |node|
|
123
|
-
node.replace(replacement)
|
164
|
+
node.replace(node.inner_html + replacement)
|
124
165
|
end
|
125
166
|
end
|
126
167
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ghostwriter
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 1.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Robin Miller
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2021-03-
|
11
|
+
date: 2021-03-22 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|