ghostwriter 0.4.2 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d48bada0259aa38eb1cf98bfbf101970dce0f979ffb7293a8c23b5d5ca393d7d
4
- data.tar.gz: d1f4ac853988b75497b4c1ff7c4a16ba8cddd47be93e0112001fd5960a61a565
3
+ metadata.gz: 80d5aced9b18684b3640c28ce1e86c8e9859942f57fefbd26dd1a1f3e7791eaf
4
+ data.tar.gz: 0f71619c0a7e247cf163f074ca7c8bb54fa8c93daaa5649df8f853fd0ccab1da
5
5
  SHA512:
6
- metadata.gz: 5d85b5f1e5ef90c91fd03288012805c195c0a9c6ea8a3d7efb10b03b55443e060252f3f1e82f18f2854234174aa9daa80cc35a7b15f774798af5593a84b55881
7
- data.tar.gz: afaf20dbb5876e667fc33d3ff35eff58d1cb076e5bec941863a4b54689ac99bbdf2716cd52145bc7e5cd99f64c0d5954023cf160e01240c24c6a16036fca2eee
6
+ metadata.gz: f9760753d4ffc30bee200a33347cd9aeda0b4593304f07ff9ce53c4ca1f971d51b50644feb763ffc51ec5635202e971704592f45d0cbe021693a7e790e39c7e9
7
+ data.tar.gz: 203df0d639d25f35a73dcdde0f12c037e8f508e086c34cec5e2da7325a8b0e173db10590da50550a50e853f74f0753ce231b36646619fc3c1f166bdd986e7186
data/README.md CHANGED
@@ -2,13 +2,14 @@
2
2
 
3
3
  Ghostwriter rewrites HTML as plain text while preserving as much legibility and functionality as possible.
4
4
 
5
- It's sort of like a reverse-markdown.
5
+ It's sort of like a reverse-markdown or a very, very simple screen reader.
6
6
 
7
7
  ## But Why, Though?
8
8
 
9
- * Spam filters tend to like emails with a plain text alternative
10
9
  * Some email clients won't or can’t handle HTML at all
11
10
  * Some people explicitly choose plaintext just by preference or accessibility
11
+ * Spam filters tend to like emails with a plain text alternative (but if you use this gem to help you spam people, I
12
+ will yell at you)
12
13
 
13
14
  ## Installation
14
15
 
@@ -28,14 +29,16 @@ Or install it manually with:
28
29
 
29
30
  ## Usage
30
31
 
31
- Create a `Ghostwriter::Writer` with the html you want modified, and call `#textify`:
32
+ Create a `Ghostwriter::Writer` and call `#textify` with the html you want modified:
32
33
 
33
34
  ```ruby
34
35
  html = '<html><body><p>This is some markup <a href="tenjin.ca">and a link</a></p><p>Other tags translate, too</p></body></html>'
35
36
 
36
- Ghostwriter::Writer.new(html).textify
37
+ Ghostwriter::Writer.new.textify(html)
37
38
  ```
39
+
38
40
  Produces:
41
+
39
42
  ```
40
43
  This is some markup and a link (tenjin.ca)
41
44
 
@@ -46,93 +49,153 @@ Other tags translate, too
46
49
 
47
50
  Links are converted to the link text followed by the link target in brackets:
48
51
 
49
- ```ruby
50
- html = '<html><body>Visit our <a href="https://example.com">Website</a><body></html>'
51
- Ghostwriter::Writer.new(html).textify
52
+ ```html
53
+
54
+ <html>
55
+ <body>
56
+ Visit our <a href="https://example.com">Website</a>
57
+ <body>
58
+ </html>
52
59
  ```
53
60
 
54
- Produces:
61
+ Becomes:
62
+
55
63
  ```
56
64
  Visit our Website (https://example.com)
57
65
  ```
58
66
 
59
67
  #### Relative Links
68
+
60
69
  Since emails are wholly distinct from your web address, relative links might break.
61
70
 
62
71
  To avoid this problem, either use the `<base>` header tag:
63
72
 
64
- ```ruby
65
- html = <<~HTML
66
- <html>
67
- <head>
68
- <base href="https://www.example.com/">
69
- </head>
70
- <body>
71
- Relative links get <a href="/contact">expanded</a> using the head's base tag.
72
- </body>
73
- </html>
74
- HTML
75
-
76
- Ghostwriter::Writer.new(html).textify
73
+ ```html
74
+
75
+ <html>
76
+ <head>
77
+ <base href="https://www.example.com">
78
+ </head>
79
+ <body>
80
+ Use the base tag to <a href="/contact">expand</a> links.
81
+ </body>
82
+ </html>
77
83
  ```
78
- Produces:
84
+
85
+ Becomes:
86
+
79
87
  ```
80
- Relative links get expanded (https://www.example.com//contact) using the head's base tag.
88
+ Use the base tag to expand (https://www.example.com/contact) links
81
89
  ```
82
90
 
83
- Or you can use the `link_base` parameter:
91
+ Or you can use the `link_base` configuration:
92
+
84
93
  ```ruby
85
- html = '<html><body>Relative links get <a href="/contact">expanded</a></body></html> using the link_base parmeter, too.'
94
+ Ghostwriter::Writer.new(link_base: 'tenjin.ca').textify(html)
95
+ ```
86
96
 
87
- Ghostwriter::Writer.new(html).textify(link_base: 'tenjin.ca')
97
+ ### Images
98
+
99
+ Images with alt text are converted:
100
+
101
+ ```html
102
+ <img src="logo.jpg" alt="ACME Anvils" />
88
103
  ```
89
104
 
90
- Produces:
105
+ Becomes:
106
+
91
107
  ```
92
- "Relative links get expanded (tenjin.ca/contact) using the link_base parmeter, too."
108
+ ACME Anvils (logo.jpg)
93
109
  ```
94
110
 
95
- ### Tables
96
- Tables are often used email structuring because support for more modern CSS is inconsistent.
111
+ But images lacking alt text or with a presentation ARIA role are ignored:
97
112
 
98
- Ghostwriter tries to maintain table structure, but this will quickly devolve for complex structures.
113
+ ```html
114
+ <!-- these will just become an empty string -->
115
+ <img src="decoration.jpg">
116
+ <img src="logo.jpg" role="presentation">
117
+ ```
99
118
 
100
- ```ruby
101
- html = <<~HTML
102
- <html>
103
- <head>
104
- <base href="https://www.example.com/">
105
- </head>
106
- <body>
107
- <table>
108
- <thead>
109
- <tr>
110
- <th>Ship</th>
111
- <th>Captain</th>
112
- </tr>
113
- </thead>
114
- <tbody>
115
- <tr>
116
- <td>Enterprise</td>
117
- <td>Jean-Luc Picard</td>
118
- </tr>
119
- <tr>
120
- <td>TARDIS</td>
121
- <td>The Doctor</td>
122
- </tr>
123
- <tr>
124
- <td>Planet Express Ship</td>
125
- <td>Turanga Leela</td>
126
- </tr>
127
- </tbody>
128
- </table>
129
- </body>
130
- </html>
131
- HTML
132
-
133
- Ghostwriter::Writer.new(html).textify
119
+ And images with data URIs won't include the data portion.
120
+
121
+ ```html
122
+ <img src="" alt="Data picture"/>
134
123
  ```
135
- Produces:
124
+
125
+ Becomes:
126
+
127
+ ```
128
+ Data picture (embedded)
129
+ ```
130
+
131
+
132
+ ### Lists
133
+
134
+ Lists are converted, too. They are padded with newlines and are given simple markers:
135
+
136
+ ```html
137
+
138
+ <ul>
139
+ <li>Planes</li>
140
+ <li>Trains</li>
141
+ <li>Automobiles</li>
142
+ </ul>
143
+ <ol>
144
+ <li>I get knocked down</li>
145
+ <li>I get up again</li>
146
+ <li>Never gonna keep me down</li>
147
+ </ol>
148
+ ```
149
+
150
+ Becomes:
151
+
152
+ ```
153
+
154
+ - Planes
155
+ - Trains
156
+ - Automobiles
157
+
158
+ 1. I get knocked down
159
+ 2. I get up again
160
+ 3. Never gonna keep me down
161
+
162
+ ```
163
+
164
+ ### Tables
165
+
166
+ Tables are still often used in email structuring because support for more modern HTML and CSS is inconsistent. If your
167
+ table is purely presentational, mark it with `role="presentation"`. See below for details.
168
+
169
+ For real data tables, Ghostwriter tries to maintain table structure for simple tables:
170
+
171
+ ```html
172
+
173
+ <table>
174
+ <thead>
175
+ <tr>
176
+ <th>Ship</th>
177
+ <th>Captain</th>
178
+ </tr>
179
+ </thead>
180
+ <tbody>
181
+ <tr>
182
+ <td>Enterprise</td>
183
+ <td>Jean-Luc Picard</td>
184
+ </tr>
185
+ <tr>
186
+ <td>TARDIS</td>
187
+ <td>The Doctor</td>
188
+ </tr>
189
+ <tr>
190
+ <td>Planet Express Ship</td>
191
+ <td>Turanga Leela</td>
192
+ </tr>
193
+ </tbody>
194
+ </table>
195
+ ```
196
+
197
+ Becomes:
198
+
136
199
  ```
137
200
  | Ship | Captain |
138
201
  |---------------------|-----------------|
@@ -141,6 +204,30 @@ Produces:
141
204
  | Planet Express Ship | Turanga Leela |
142
205
  ```
143
206
 
207
+ ### Presentation ARIA Role
208
+
209
+ Lists and tables with `role="presentation"` will be treated as a simple container and the normal behaviour will be
210
+ suppressed.
211
+
212
+ ```html
213
+
214
+ <table role="presentation">
215
+ <tr>
216
+ <td>The table is a lie</td>
217
+ </tr>
218
+ </table>
219
+ <ul role="presentation">
220
+ <li>No such list</li>
221
+ </ul>
222
+ ```
223
+
224
+ Becomes:
225
+
226
+ ```
227
+ The table is a lie
228
+ No such list
229
+ ```
230
+
144
231
  ### Mail Gem Example
145
232
 
146
233
  To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html
@@ -149,7 +236,8 @@ through Ghostwriter:
149
236
  ```ruby
150
237
  require 'mail'
151
238
 
152
- html = 'My email and a <a href="https://tenjin.ca">link</a>'
239
+ html = 'My email and a <a href="https://tenjin.ca">link</a>'
240
+ ghostwriter = Ghostwriter::Writer.new
153
241
 
154
242
  Mail.deliver do
155
243
  to 'bob@example.com'
@@ -162,7 +250,7 @@ Mail.deliver do
162
250
  end
163
251
 
164
252
  text_part do
165
- body Ghostwriter::Writer.new(html).textify
253
+ body ghostwriter.textify(html)
166
254
  end
167
255
  end
168
256
 
@@ -181,19 +269,19 @@ After checking out the repo, run `bundle install` to install dependencies. Then,
181
269
  can also run `bin/console` for an interactive prompt that will allow you to experiment.
182
270
 
183
271
  #### Local Install
184
- To install this gem onto your local machine only, run
185
272
 
186
- `bundle exec rake install`
273
+ To install this gem onto your local machine only, run
274
+
275
+ `bundle exec rake install`
187
276
 
188
277
  #### Gem Release
278
+
189
279
  To release a gem to the world at large
190
280
 
191
- 1. Update the version number in `version.rb`,
192
- 2. Run `bundle exec rake release`,
193
- which will create a git tag for the version,
194
- push git commits and tags,
195
- and push the `.gem` file to [rubygems.org](https://rubygems.org).
196
- 3. Do a wee dance
281
+ 1. Update the version number in `version.rb`,
282
+ 2. Run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push
283
+ the `.gem` file to [rubygems.org](https://rubygems.org).
284
+ 3. Do a wee dance
197
285
 
198
286
  ## License
199
287
 
data/RELEASE_NOTES.md CHANGED
@@ -1,5 +1,22 @@
1
1
  # Release Notes
2
2
 
3
+ ## 1.0.0 (2021-03-21)
4
+
5
+ ### Major
6
+
7
+ * Moved `link_base` parameter to constructor
8
+ * Moved input HTML parameter to `#textify`
9
+
10
+ ### Minor
11
+
12
+ * Treats tables and lists with role="presentation" as simple containers
13
+ * Now handles ordered and unordered lists
14
+ * Images are now replaced with their alt text
15
+
16
+ ### Bugfixes
17
+
18
+ * none
19
+
3
20
  ## 0.4.2 (2021-03-17)
4
21
 
5
22
  ### Major
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Ghostwriter
4
- VERSION = '0.4.2'
4
+ VERSION = '1.0.0'
5
5
  end
@@ -3,23 +3,31 @@
3
3
  module Ghostwriter
4
4
  # Main Ghostwriter converter object.
5
5
  class Writer
6
- def initialize(html)
7
- @source_html = html
6
+ # Creates a new ghostwriter
7
+ #
8
+ # @param [String] link_base the url to prefix relative links with
9
+ def initialize(link_base: '')
10
+ @link_base = link_base
11
+ @list_marker = '-'
8
12
  end
9
13
 
10
14
  # Strips HTML down to plain text.
11
15
  #
12
- # @param link_base the url to prefix relative links with
13
- def textify(link_base: '')
14
- html = normalize_whitespace(@source_html).gsub('</p>', "</p>\n\n")
16
+ # @param html [String] the HTML to be convert to text
17
+ #
18
+ # @return converted text
19
+ def textify(html)
20
+ doc = Nokogiri::HTML(normalize_whitespace(html).gsub('</p>', "</p>\n\n"))
21
+
22
+ doc.search('style, script').remove
15
23
 
16
- doc = Nokogiri::HTML(html)
24
+ replace_anchors(doc)
25
+ replace_images(doc)
17
26
 
18
- doc.search('style').remove
19
- doc.search('script').remove
27
+ simple_replace(doc, '*[role="presentation"]', "\n")
20
28
 
21
- replace_anchors(doc, link_base)
22
29
  replace_headers(doc)
30
+ replace_lists(doc)
23
31
  replace_tables(doc)
24
32
 
25
33
  simple_replace(doc, 'hr', "\n----------\n")
@@ -39,16 +47,9 @@ module Ghostwriter
39
47
  html.gsub(/\s/, ' ').squeeze(' ')
40
48
  end
41
49
 
42
- def replace_anchors(doc, link_base)
43
- base = get_link_base(doc, default: link_base)
44
-
50
+ def replace_anchors(doc)
45
51
  doc.search('a').each do |link_node|
46
- begin
47
- href = URI(link_node['href'])
48
- href = base + href.to_s unless href.absolute?
49
- rescue URI::InvalidURIError
50
- href = link_node['href'].gsub(/^(tel|mailto):/, '').strip
51
- end
52
+ href = get_link_target(link_node, get_link_base(doc))
52
53
 
53
54
  link_node.inner_html = if link_matches(href, link_node.inner_html)
54
55
  href.to_s
@@ -62,16 +63,56 @@ module Ghostwriter
62
63
  first.to_s.gsub(%r{^https?://}, '').chomp('/') == second.gsub(%r{^https?://}, '').chomp('/')
63
64
  end
64
65
 
65
- def get_link_base(doc, default:)
66
+ def get_link_base(doc)
66
67
  # <base> node is unique by W3C spec
67
68
  base_node = doc.search('base').first
68
69
 
69
- base_node ? base_node['href'] : default
70
+ base_node ? base_node['href'] : @link_base
71
+ end
72
+
73
+ def get_link_target(link_node, base)
74
+ href = URI(link_node['href'])
75
+ if href.absolute?
76
+ href
77
+ else
78
+ base + href.to_s
79
+ end
80
+ rescue URI::InvalidURIError
81
+ link_node['href'].gsub(/^(tel|mailto):/, '').strip
70
82
  end
71
83
 
72
84
  def replace_headers(doc)
73
85
  doc.search('header, h1, h2, h3, h4, h5, h6').each do |node|
74
- node.inner_html = "- #{ node.inner_html } -\n".squeeze(' ')
86
+ node.inner_html = "-- #{ node.inner_html } --\n".squeeze(' ')
87
+ end
88
+ end
89
+
90
+ def replace_images(doc)
91
+ doc.search('img[role=presentation]').remove
92
+
93
+ doc.search('img').each do |img_node|
94
+ src = img_node['src']
95
+ alt = img_node['alt']
96
+
97
+ src = 'embedded' if src.start_with? 'data:'
98
+
99
+ img_node.replace("#{ alt } (#{ src })") unless alt.nil? || alt.empty?
100
+ end
101
+ end
102
+
103
+ def replace_lists(doc)
104
+ doc.search('ul, ol').each do |list_node|
105
+ list_node.search('./li').each_with_index do |list_item, i|
106
+ marker = if list_node.node_name == 'ol'
107
+ "#{ i + 1 }."
108
+ else
109
+ @list_marker
110
+ end
111
+
112
+ list_item.inner_html = "#{ marker } #{ list_item.inner_html }\n".squeeze(' ')
113
+ end
114
+
115
+ list_node.replace("\n#{ list_node.inner_html }\n")
75
116
  end
76
117
  end
77
118
 
@@ -120,7 +161,7 @@ module Ghostwriter
120
161
 
121
162
  def simple_replace(doc, tag, replacement)
122
163
  doc.search(tag).each do |node|
123
- node.replace(replacement)
164
+ node.replace(node.inner_html + replacement)
124
165
  end
125
166
  end
126
167
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ghostwriter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.2
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Robin Miller
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-03-18 00:00:00.000000000 Z
11
+ date: 2021-03-22 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri