ghostwriter 0.4.2 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d48bada0259aa38eb1cf98bfbf101970dce0f979ffb7293a8c23b5d5ca393d7d
4
- data.tar.gz: d1f4ac853988b75497b4c1ff7c4a16ba8cddd47be93e0112001fd5960a61a565
3
+ metadata.gz: 80d5aced9b18684b3640c28ce1e86c8e9859942f57fefbd26dd1a1f3e7791eaf
4
+ data.tar.gz: 0f71619c0a7e247cf163f074ca7c8bb54fa8c93daaa5649df8f853fd0ccab1da
5
5
  SHA512:
6
- metadata.gz: 5d85b5f1e5ef90c91fd03288012805c195c0a9c6ea8a3d7efb10b03b55443e060252f3f1e82f18f2854234174aa9daa80cc35a7b15f774798af5593a84b55881
7
- data.tar.gz: afaf20dbb5876e667fc33d3ff35eff58d1cb076e5bec941863a4b54689ac99bbdf2716cd52145bc7e5cd99f64c0d5954023cf160e01240c24c6a16036fca2eee
6
+ metadata.gz: f9760753d4ffc30bee200a33347cd9aeda0b4593304f07ff9ce53c4ca1f971d51b50644feb763ffc51ec5635202e971704592f45d0cbe021693a7e790e39c7e9
7
+ data.tar.gz: 203df0d639d25f35a73dcdde0f12c037e8f508e086c34cec5e2da7325a8b0e173db10590da50550a50e853f74f0753ce231b36646619fc3c1f166bdd986e7186
data/README.md CHANGED
@@ -2,13 +2,14 @@
2
2
 
3
3
  Ghostwriter rewrites HTML as plain text while preserving as much legibility and functionality as possible.
4
4
 
5
- It's sort of like a reverse-markdown.
5
+ It's sort of like a reverse-markdown or a very, very simple screen reader.
6
6
 
7
7
  ## But Why, Though?
8
8
 
9
- * Spam filters tend to like emails with a plain text alternative
10
9
  * Some email clients won't or can’t handle HTML at all
11
10
  * Some people explicitly choose plaintext just by preference or accessibility
11
+ * Spam filters tend to like emails with a plain text alternative (but if you use this gem to help you spam people, I
12
+ will yell at you)
12
13
 
13
14
  ## Installation
14
15
 
@@ -28,14 +29,16 @@ Or install it manually with:
28
29
 
29
30
  ## Usage
30
31
 
31
- Create a `Ghostwriter::Writer` with the html you want modified, and call `#textify`:
32
+ Create a `Ghostwriter::Writer` and call `#textify` with the html you want modified:
32
33
 
33
34
  ```ruby
34
35
  html = '<html><body><p>This is some markup <a href="tenjin.ca">and a link</a></p><p>Other tags translate, too</p></body></html>'
35
36
 
36
- Ghostwriter::Writer.new(html).textify
37
+ Ghostwriter::Writer.new.textify(html)
37
38
  ```
39
+
38
40
  Produces:
41
+
39
42
  ```
40
43
  This is some markup and a link (tenjin.ca)
41
44
 
@@ -46,93 +49,153 @@ Other tags translate, too
46
49
 
47
50
  Links are converted to the link text followed by the link target in brackets:
48
51
 
49
- ```ruby
50
- html = '<html><body>Visit our <a href="https://example.com">Website</a><body></html>'
51
- Ghostwriter::Writer.new(html).textify
52
+ ```html
53
+
54
+ <html>
55
+ <body>
56
+ Visit our <a href="https://example.com">Website</a>
57
+ <body>
58
+ </html>
52
59
  ```
53
60
 
54
- Produces:
61
+ Becomes:
62
+
55
63
  ```
56
64
  Visit our Website (https://example.com)
57
65
  ```
58
66
 
59
67
  #### Relative Links
68
+
60
69
  Since emails are wholly distinct from your web address, relative links might break.
61
70
 
62
71
  To avoid this problem, either use the `<base>` header tag:
63
72
 
64
- ```ruby
65
- html = <<~HTML
66
- <html>
67
- <head>
68
- <base href="https://www.example.com/">
69
- </head>
70
- <body>
71
- Relative links get <a href="/contact">expanded</a> using the head's base tag.
72
- </body>
73
- </html>
74
- HTML
75
-
76
- Ghostwriter::Writer.new(html).textify
73
+ ```html
74
+
75
+ <html>
76
+ <head>
77
+ <base href="https://www.example.com">
78
+ </head>
79
+ <body>
80
+ Use the base tag to <a href="/contact">expand</a> links.
81
+ </body>
82
+ </html>
77
83
  ```
78
- Produces:
84
+
85
+ Becomes:
86
+
79
87
  ```
80
- Relative links get expanded (https://www.example.com//contact) using the head's base tag.
88
+ Use the base tag to expand (https://www.example.com/contact) links
81
89
  ```
82
90
 
83
- Or you can use the `link_base` parameter:
91
+ Or you can use the `link_base` configuration:
92
+
84
93
  ```ruby
85
- html = '<html><body>Relative links get <a href="/contact">expanded</a></body></html> using the link_base parmeter, too.'
94
+ Ghostwriter::Writer.new(link_base: 'tenjin.ca').textify(html)
95
+ ```
86
96
 
87
- Ghostwriter::Writer.new(html).textify(link_base: 'tenjin.ca')
97
+ ### Images
98
+
99
+ Images with alt text are converted:
100
+
101
+ ```html
102
+ <img src="logo.jpg" alt="ACME Anvils" />
88
103
  ```
89
104
 
90
- Produces:
105
+ Becomes:
106
+
91
107
  ```
92
- "Relative links get expanded (tenjin.ca/contact) using the link_base parmeter, too."
108
+ ACME Anvils (logo.jpg)
93
109
  ```
94
110
 
95
- ### Tables
96
- Tables are often used email structuring because support for more modern CSS is inconsistent.
111
+ But images lacking alt text or with a presentation ARIA role are ignored:
97
112
 
98
- Ghostwriter tries to maintain table structure, but this will quickly devolve for complex structures.
113
+ ```html
114
+ <!-- these will just become an empty string -->
115
+ <img src="decoration.jpg">
116
+ <img src="logo.jpg" role="presentation">
117
+ ```
99
118
 
100
- ```ruby
101
- html = <<~HTML
102
- <html>
103
- <head>
104
- <base href="https://www.example.com/">
105
- </head>
106
- <body>
107
- <table>
108
- <thead>
109
- <tr>
110
- <th>Ship</th>
111
- <th>Captain</th>
112
- </tr>
113
- </thead>
114
- <tbody>
115
- <tr>
116
- <td>Enterprise</td>
117
- <td>Jean-Luc Picard</td>
118
- </tr>
119
- <tr>
120
- <td>TARDIS</td>
121
- <td>The Doctor</td>
122
- </tr>
123
- <tr>
124
- <td>Planet Express Ship</td>
125
- <td>Turanga Leela</td>
126
- </tr>
127
- </tbody>
128
- </table>
129
- </body>
130
- </html>
131
- HTML
132
-
133
- Ghostwriter::Writer.new(html).textify
119
+ And images with data URIs won't include the data portion.
120
+
121
+ ```html
122
+ <img src="" alt="Data picture"/>
134
123
  ```
135
- Produces:
124
+
125
+ Becomes:
126
+
127
+ ```
128
+ Data picture (embedded)
129
+ ```
130
+
131
+
132
+ ### Lists
133
+
134
+ Lists are converted, too. They are padded with newlines and are given simple markers:
135
+
136
+ ```html
137
+
138
+ <ul>
139
+ <li>Planes</li>
140
+ <li>Trains</li>
141
+ <li>Automobiles</li>
142
+ </ul>
143
+ <ol>
144
+ <li>I get knocked down</li>
145
+ <li>I get up again</li>
146
+ <li>Never gonna keep me down</li>
147
+ </ol>
148
+ ```
149
+
150
+ Becomes:
151
+
152
+ ```
153
+
154
+ - Planes
155
+ - Trains
156
+ - Automobiles
157
+
158
+ 1. I get knocked down
159
+ 2. I get up again
160
+ 3. Never gonna keep me down
161
+
162
+ ```
163
+
164
+ ### Tables
165
+
166
+ Tables are still often used in email structuring because support for more modern HTML and CSS is inconsistent. If your
167
+ table is purely presentational, mark it with `role="presentation"`. See below for details.
168
+
169
+ For real data tables, Ghostwriter tries to maintain table structure for simple tables:
170
+
171
+ ```html
172
+
173
+ <table>
174
+ <thead>
175
+ <tr>
176
+ <th>Ship</th>
177
+ <th>Captain</th>
178
+ </tr>
179
+ </thead>
180
+ <tbody>
181
+ <tr>
182
+ <td>Enterprise</td>
183
+ <td>Jean-Luc Picard</td>
184
+ </tr>
185
+ <tr>
186
+ <td>TARDIS</td>
187
+ <td>The Doctor</td>
188
+ </tr>
189
+ <tr>
190
+ <td>Planet Express Ship</td>
191
+ <td>Turanga Leela</td>
192
+ </tr>
193
+ </tbody>
194
+ </table>
195
+ ```
196
+
197
+ Becomes:
198
+
136
199
  ```
137
200
  | Ship | Captain |
138
201
  |---------------------|-----------------|
@@ -141,6 +204,30 @@ Produces:
141
204
  | Planet Express Ship | Turanga Leela |
142
205
  ```
143
206
 
207
+ ### Presentation ARIA Role
208
+
209
+ Lists and tables with `role="presentation"` will be treated as a simple container and the normal behaviour will be
210
+ suppressed.
211
+
212
+ ```html
213
+
214
+ <table role="presentation">
215
+ <tr>
216
+ <td>The table is a lie</td>
217
+ </tr>
218
+ </table>
219
+ <ul role="presentation">
220
+ <li>No such list</li>
221
+ </ul>
222
+ ```
223
+
224
+ Becomes:
225
+
226
+ ```
227
+ The table is a lie
228
+ No such list
229
+ ```
230
+
144
231
  ### Mail Gem Example
145
232
 
146
233
  To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html
@@ -149,7 +236,8 @@ through Ghostwriter:
149
236
  ```ruby
150
237
  require 'mail'
151
238
 
152
- html = 'My email and a <a href="https://tenjin.ca">link</a>'
239
+ html = 'My email and a <a href="https://tenjin.ca">link</a>'
240
+ ghostwriter = Ghostwriter::Writer.new
153
241
 
154
242
  Mail.deliver do
155
243
  to 'bob@example.com'
@@ -162,7 +250,7 @@ Mail.deliver do
162
250
  end
163
251
 
164
252
  text_part do
165
- body Ghostwriter::Writer.new(html).textify
253
+ body ghostwriter.textify(html)
166
254
  end
167
255
  end
168
256
 
@@ -181,19 +269,19 @@ After checking out the repo, run `bundle install` to install dependencies. Then,
181
269
  can also run `bin/console` for an interactive prompt that will allow you to experiment.
182
270
 
183
271
  #### Local Install
184
- To install this gem onto your local machine only, run
185
272
 
186
- `bundle exec rake install`
273
+ To install this gem onto your local machine only, run
274
+
275
+ `bundle exec rake install`
187
276
 
188
277
  #### Gem Release
278
+
189
279
  To release a gem to the world at large
190
280
 
191
- 1. Update the version number in `version.rb`,
192
- 2. Run `bundle exec rake release`,
193
- which will create a git tag for the version,
194
- push git commits and tags,
195
- and push the `.gem` file to [rubygems.org](https://rubygems.org).
196
- 3. Do a wee dance
281
+ 1. Update the version number in `version.rb`,
282
+ 2. Run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push
283
+ the `.gem` file to [rubygems.org](https://rubygems.org).
284
+ 3. Do a wee dance
197
285
 
198
286
  ## License
199
287
 
data/RELEASE_NOTES.md CHANGED
@@ -1,5 +1,22 @@
1
1
  # Release Notes
2
2
 
3
+ ## 1.0.0 (2021-03-21)
4
+
5
+ ### Major
6
+
7
+ * Moved `link_base` parameter to constructor
8
+ * Moved input HTML parameter to `#textify`
9
+
10
+ ### Minor
11
+
12
+ * Treats tables and lists with role="presentation" as simple containers
13
+ * Now handles ordered and unordered lists
14
+ * Images are now replaced with their alt text
15
+
16
+ ### Bugfixes
17
+
18
+ * none
19
+
3
20
  ## 0.4.2 (2021-03-17)
4
21
 
5
22
  ### Major
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Ghostwriter
4
- VERSION = '0.4.2'
4
+ VERSION = '1.0.0'
5
5
  end
@@ -3,23 +3,31 @@
3
3
  module Ghostwriter
4
4
  # Main Ghostwriter converter object.
5
5
  class Writer
6
- def initialize(html)
7
- @source_html = html
6
+ # Creates a new ghostwriter
7
+ #
8
+ # @param [String] link_base the url to prefix relative links with
9
+ def initialize(link_base: '')
10
+ @link_base = link_base
11
+ @list_marker = '-'
8
12
  end
9
13
 
10
14
  # Strips HTML down to plain text.
11
15
  #
12
- # @param link_base the url to prefix relative links with
13
- def textify(link_base: '')
14
- html = normalize_whitespace(@source_html).gsub('</p>', "</p>\n\n")
16
+ # @param html [String] the HTML to be convert to text
17
+ #
18
+ # @return converted text
19
+ def textify(html)
20
+ doc = Nokogiri::HTML(normalize_whitespace(html).gsub('</p>', "</p>\n\n"))
21
+
22
+ doc.search('style, script').remove
15
23
 
16
- doc = Nokogiri::HTML(html)
24
+ replace_anchors(doc)
25
+ replace_images(doc)
17
26
 
18
- doc.search('style').remove
19
- doc.search('script').remove
27
+ simple_replace(doc, '*[role="presentation"]', "\n")
20
28
 
21
- replace_anchors(doc, link_base)
22
29
  replace_headers(doc)
30
+ replace_lists(doc)
23
31
  replace_tables(doc)
24
32
 
25
33
  simple_replace(doc, 'hr', "\n----------\n")
@@ -39,16 +47,9 @@ module Ghostwriter
39
47
  html.gsub(/\s/, ' ').squeeze(' ')
40
48
  end
41
49
 
42
- def replace_anchors(doc, link_base)
43
- base = get_link_base(doc, default: link_base)
44
-
50
+ def replace_anchors(doc)
45
51
  doc.search('a').each do |link_node|
46
- begin
47
- href = URI(link_node['href'])
48
- href = base + href.to_s unless href.absolute?
49
- rescue URI::InvalidURIError
50
- href = link_node['href'].gsub(/^(tel|mailto):/, '').strip
51
- end
52
+ href = get_link_target(link_node, get_link_base(doc))
52
53
 
53
54
  link_node.inner_html = if link_matches(href, link_node.inner_html)
54
55
  href.to_s
@@ -62,16 +63,56 @@ module Ghostwriter
62
63
  first.to_s.gsub(%r{^https?://}, '').chomp('/') == second.gsub(%r{^https?://}, '').chomp('/')
63
64
  end
64
65
 
65
- def get_link_base(doc, default:)
66
+ def get_link_base(doc)
66
67
  # <base> node is unique by W3C spec
67
68
  base_node = doc.search('base').first
68
69
 
69
- base_node ? base_node['href'] : default
70
+ base_node ? base_node['href'] : @link_base
71
+ end
72
+
73
+ def get_link_target(link_node, base)
74
+ href = URI(link_node['href'])
75
+ if href.absolute?
76
+ href
77
+ else
78
+ base + href.to_s
79
+ end
80
+ rescue URI::InvalidURIError
81
+ link_node['href'].gsub(/^(tel|mailto):/, '').strip
70
82
  end
71
83
 
72
84
  def replace_headers(doc)
73
85
  doc.search('header, h1, h2, h3, h4, h5, h6').each do |node|
74
- node.inner_html = "- #{ node.inner_html } -\n".squeeze(' ')
86
+ node.inner_html = "-- #{ node.inner_html } --\n".squeeze(' ')
87
+ end
88
+ end
89
+
90
+ def replace_images(doc)
91
+ doc.search('img[role=presentation]').remove
92
+
93
+ doc.search('img').each do |img_node|
94
+ src = img_node['src']
95
+ alt = img_node['alt']
96
+
97
+ src = 'embedded' if src.start_with? 'data:'
98
+
99
+ img_node.replace("#{ alt } (#{ src })") unless alt.nil? || alt.empty?
100
+ end
101
+ end
102
+
103
+ def replace_lists(doc)
104
+ doc.search('ul, ol').each do |list_node|
105
+ list_node.search('./li').each_with_index do |list_item, i|
106
+ marker = if list_node.node_name == 'ol'
107
+ "#{ i + 1 }."
108
+ else
109
+ @list_marker
110
+ end
111
+
112
+ list_item.inner_html = "#{ marker } #{ list_item.inner_html }\n".squeeze(' ')
113
+ end
114
+
115
+ list_node.replace("\n#{ list_node.inner_html }\n")
75
116
  end
76
117
  end
77
118
 
@@ -120,7 +161,7 @@ module Ghostwriter
120
161
 
121
162
  def simple_replace(doc, tag, replacement)
122
163
  doc.search(tag).each do |node|
123
- node.replace(replacement)
164
+ node.replace(node.inner_html + replacement)
124
165
  end
125
166
  end
126
167
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ghostwriter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Robin Miller
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-03-18 00:00:00.000000000 Z
11
+ date: 2021-03-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri