ghostwriter 0.3.0 → 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -13
- data/.rubocop.yml +1 -0
- data/.ruby-version +1 -1
- data/Gemfile +2 -0
- data/README.md +291 -39
- data/RELEASE_NOTES.md +91 -0
- data/Rakefile +3 -1
- data/bin/console +3 -1
- data/dirt-textify.gemspec +31 -19
- data/lib/ghostwriter.rb +2 -0
- data/lib/ghostwriter/version.rb +3 -1
- data/lib/ghostwriter/writer.rb +138 -28
- metadata +59 -27
checksums.yaml
CHANGED
@@ -1,15 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
|
5
|
-
data.tar.gz: !binary |-
|
6
|
-
NjBjZWU4OWExMmYyOGU1NDMzNTI4MDhkNzEzMGU3YWFhNjdiM2M0MA==
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 2868e207e695355f8e9f40521b38d1f96b72b45c9ab73207396a87e5b4d535cd
|
4
|
+
data.tar.gz: cf111d734daa4bf94e4d9c2924dbd6c7d12b38b2b26a55db79110730f306fccc
|
7
5
|
SHA512:
|
8
|
-
metadata.gz:
|
9
|
-
|
10
|
-
NDc3ZmY5NjgyNzkxOTUzMDg4YWU2NmNkZDgyMDdkNjc4OTMzMmIzYWY1MWY2
|
11
|
-
OWRhMzgxNjc5ODM4Yjc4NGE5ZTBmODBmNGUzOGU3NDY2YTA5NDk=
|
12
|
-
data.tar.gz: !binary |-
|
13
|
-
NTIzYTdjNmRmZTRkMTc4NGIxZWJiYjBhYWVkMDI4ZDM1NTc4NDQ4ZTdkZDZk
|
14
|
-
YTVlN2Y1OGMzYTg3MjlkYmNhMjUyMGQwNTZlMmIzNjYxYmQyMDIxMjJiOTVi
|
15
|
-
N2YzMzczODBkZWMxZmY2MmJkYzJkYjQxZjJlZjBjZTM2OWJkMjQ=
|
6
|
+
metadata.gz: 8d71989a44d8d2da33496172c600ec38063100c53642850982a9a93ccefbea37e40847e93edfde09d3b0f0dad98f457296f01c070387d86b11cc06e2ee9e04c1
|
7
|
+
data.tar.gz: 0767f0d24a895477aee922bd960380608185b741c0d87975bba65eaa03270539e2af2776bbcef2f689b604b76fde7882dbe9322e522b190858c2699359ce8a3b
|
data/.rubocop.yml
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
inherit_from: ../.rubocop.yml
|
data/.ruby-version
CHANGED
@@ -1 +1 @@
|
|
1
|
-
ruby-
|
1
|
+
ruby-2.7.1
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -1,6 +1,15 @@
|
|
1
1
|
# Ghostwriter
|
2
2
|
|
3
|
-
|
3
|
+
A ruby gem that converts HTML to plain text, preserving as much legibility and functionality as possible.
|
4
|
+
|
5
|
+
It's sort of like a reverse-markdown or a very, very simple screen reader.
|
6
|
+
|
7
|
+
## But Why, Though?
|
8
|
+
|
9
|
+
* Some email clients won't or can’t handle HTML at all
|
10
|
+
* Some people explicitly choose plaintext just by preference or accessibility
|
11
|
+
* Spam filters tend to prefer emails with a plain text alternative (but if you use this gem to spam people, I will yell
|
12
|
+
at you)
|
4
13
|
|
5
14
|
## Installation
|
6
15
|
|
@@ -12,86 +21,329 @@ gem 'ghostwriter'
|
|
12
21
|
|
13
22
|
And then execute:
|
14
23
|
|
15
|
-
|
24
|
+
bundle
|
16
25
|
|
17
26
|
Or install it manually with:
|
18
27
|
|
19
|
-
|
28
|
+
gem install ghostwriter
|
20
29
|
|
21
30
|
## Usage
|
22
31
|
|
23
|
-
|
32
|
+
Create a `Ghostwriter::Writer` and call `#textify` with the html string you want modified:
|
33
|
+
|
34
|
+
```ruby
|
35
|
+
html = <<~HTML
|
36
|
+
<html>
|
37
|
+
<body>
|
38
|
+
<p>This is some text with <a href="tenjin.ca">a link</a></p>
|
39
|
+
<p>It handles other stuff, too.</p>
|
40
|
+
<hr>
|
41
|
+
<h1>Stuff Like</h1>
|
42
|
+
<ul>
|
43
|
+
<li>Images</li>
|
44
|
+
<li>Lists</li>
|
45
|
+
<li>Tables</li>
|
46
|
+
<li>And more</li>
|
47
|
+
</ul>
|
48
|
+
</body>
|
49
|
+
</html>
|
50
|
+
HTML
|
51
|
+
|
52
|
+
ghostwriter = Ghostwriter::Writer.new
|
53
|
+
|
54
|
+
puts ghostwriter.textify(html)
|
55
|
+
```
|
56
|
+
|
57
|
+
Produces:
|
24
58
|
|
25
|
-
|
26
|
-
|
59
|
+
```
|
60
|
+
This is some text with a link (tenjin.ca)
|
27
61
|
|
28
|
-
|
62
|
+
It handles other stuff, too.
|
29
63
|
|
30
|
-
* Spam filters prefer included plain text alternative
|
31
|
-
* Some email clients and apps can’t handle HTML
|
32
|
-
* Some people explicitly choose plaintext, either by requirement or simple preference
|
33
64
|
|
34
|
-
|
65
|
+
----------
|
35
66
|
|
36
|
-
|
37
|
-
|
67
|
+
-- Stuff Like --
|
68
|
+
- Images
|
69
|
+
- Lists
|
70
|
+
- Tables
|
71
|
+
- And more
|
72
|
+
```
|
73
|
+
|
74
|
+
### Links
|
75
|
+
|
76
|
+
Links are converted to the link text followed by the link target in brackets:
|
77
|
+
|
78
|
+
```html
|
79
|
+
Visit our <a href="https://example.com">Website</a>
|
80
|
+
```
|
81
|
+
|
82
|
+
Becomes:
|
83
|
+
|
84
|
+
```
|
85
|
+
Visit our Website (https://example.com)
|
86
|
+
```
|
87
|
+
|
88
|
+
#### Relative Links
|
89
|
+
|
90
|
+
Since emails are wholly distinct from your web address, relative links might break.
|
91
|
+
|
92
|
+
To avoid this problem, either use the `<base>` header tag:
|
93
|
+
|
94
|
+
```html
|
95
|
+
|
96
|
+
<html>
|
97
|
+
<head>
|
98
|
+
<base href="https://www.example.com">
|
99
|
+
</head>
|
100
|
+
<body>
|
101
|
+
Use the base tag to <a href="/contact">expand</a> links.
|
102
|
+
</body>
|
103
|
+
</html>
|
104
|
+
```
|
38
105
|
|
39
|
-
|
106
|
+
Becomes:
|
40
107
|
|
41
|
-
|
42
|
-
|
108
|
+
```
|
109
|
+
Use the base tag to expand (https://www.example.com/contact) links.
|
43
110
|
```
|
44
111
|
|
45
|
-
|
112
|
+
Or you can use the `link_base` configuration:
|
46
113
|
|
47
114
|
```ruby
|
48
|
-
|
115
|
+
Ghostwriter::Writer.new(link_base: 'tenjin.ca').textify(html)
|
116
|
+
```
|
117
|
+
|
118
|
+
### Images
|
119
|
+
|
120
|
+
Images with alt text are converted:
|
121
|
+
|
122
|
+
```html
|
123
|
+
<img src="logo.jpg" alt="ACME Anvils" />
|
124
|
+
```
|
125
|
+
|
126
|
+
Becomes:
|
127
|
+
|
128
|
+
```
|
129
|
+
ACME Anvils (logo.jpg)
|
130
|
+
```
|
131
|
+
|
132
|
+
But images lacking alt text or with a presentation ARIA role are ignored:
|
133
|
+
|
134
|
+
```html
|
135
|
+
<!-- these will just become an empty string -->
|
136
|
+
<img src="decoration.jpg">
|
137
|
+
<img src="logo.jpg" role="presentation">
|
138
|
+
```
|
139
|
+
|
140
|
+
And images with data URIs won't include the data portion.
|
141
|
+
|
142
|
+
```html
|
143
|
+
|
144
|
+
<img src=""
|
145
|
+
alt="Data picture" />
|
146
|
+
```
|
147
|
+
|
148
|
+
Becomes:
|
149
|
+
|
150
|
+
```
|
151
|
+
Data picture (embedded)
|
152
|
+
```
|
153
|
+
|
154
|
+
### Paragraphs and Linebreaks
|
155
|
+
|
156
|
+
Paragraphs are padded with a newline at the end. Line break tags add an empty line.
|
157
|
+
|
158
|
+
```html
|
159
|
+
<p>I would like to propose a toast.</p>
|
160
|
+
<p>This meal we enjoy together would be improved by one.</p>
|
161
|
+
<br />
|
162
|
+
<p>... Plug in the toaster and I'll get the bread.</p>
|
163
|
+
```
|
164
|
+
|
165
|
+
```
|
166
|
+
I would like to propose a toast.
|
167
|
+
|
168
|
+
This meal we enjoy together would be improved by one.
|
169
|
+
|
170
|
+
|
171
|
+
... Plug in the toaster and I'll get the bread.
|
172
|
+
|
173
|
+
```
|
174
|
+
|
175
|
+
### Headers
|
176
|
+
|
177
|
+
For now, headers are all treated the same and given a simple marker:
|
178
|
+
|
179
|
+
```html
|
180
|
+
<h1>Dog Maintenance and Repair</h1>
|
181
|
+
<h2>Food Input Port</h2>
|
182
|
+
<h3>Exhaust Port Considerations</h3>
|
183
|
+
```
|
184
|
+
|
185
|
+
Becomes:
|
186
|
+
|
187
|
+
```
|
188
|
+
-- Dog Maintenance and Repair --
|
189
|
+
-- Food Input Port --
|
190
|
+
-- Exhaust Port Considerations --
|
191
|
+
```
|
192
|
+
|
193
|
+
### Lists
|
194
|
+
|
195
|
+
Lists are converted, too. They are padded with newlines and are given simple markers:
|
196
|
+
|
197
|
+
```html
|
198
|
+
|
199
|
+
<ul>
|
200
|
+
<li>Planes</li>
|
201
|
+
<li>Trains</li>
|
202
|
+
<li>Automobiles</li>
|
203
|
+
</ul>
|
204
|
+
<ol>
|
205
|
+
<li>I get knocked down</li>
|
206
|
+
<li>I get up again</li>
|
207
|
+
<li>Never gonna keep me down</li>
|
208
|
+
</ol>
|
209
|
+
```
|
210
|
+
|
211
|
+
Becomes:
|
212
|
+
|
213
|
+
```
|
49
214
|
|
50
|
-
|
215
|
+
- Planes
|
216
|
+
- Trains
|
217
|
+
- Automobiles
|
51
218
|
|
52
|
-
|
219
|
+
1. I get knocked down
|
220
|
+
2. I get up again
|
221
|
+
3. Never gonna keep me down
|
53
222
|
|
54
223
|
```
|
55
224
|
|
56
|
-
|
225
|
+
### Tables
|
226
|
+
|
227
|
+
Tables are still often used in email structuring because support for more modern HTML and CSS is inconsistent. If your
|
228
|
+
table is purely presentational, mark it with `role="presentation"`. See below for details.
|
229
|
+
|
230
|
+
For real data tables, Ghostwriter tries to maintain table structure for simple tables:
|
231
|
+
|
232
|
+
```html
|
233
|
+
|
234
|
+
<table>
|
235
|
+
<thead>
|
236
|
+
<tr>
|
237
|
+
<th>Ship</th>
|
238
|
+
<th>Captain</th>
|
239
|
+
</tr>
|
240
|
+
</thead>
|
241
|
+
<tbody>
|
242
|
+
<tr>
|
243
|
+
<td>Enterprise</td>
|
244
|
+
<td>Jean-Luc Picard</td>
|
245
|
+
</tr>
|
246
|
+
<tr>
|
247
|
+
<td>TARDIS</td>
|
248
|
+
<td>The Doctor</td>
|
249
|
+
</tr>
|
250
|
+
<tr>
|
251
|
+
<td>Planet Express Ship</td>
|
252
|
+
<td>Turanga Leela</td>
|
253
|
+
</tr>
|
254
|
+
</tbody>
|
255
|
+
</table>
|
256
|
+
```
|
257
|
+
|
258
|
+
Becomes:
|
259
|
+
|
260
|
+
```
|
261
|
+
| Ship | Captain |
|
262
|
+
|---------------------|-----------------|
|
263
|
+
| Enterprise | Jean-Luc Picard |
|
264
|
+
| TARDIS | The Doctor |
|
265
|
+
| Planet Express Ship | Turanga Leela |
|
266
|
+
```
|
267
|
+
|
268
|
+
### Presentation ARIA Role
|
269
|
+
|
270
|
+
Lists and tables with `role="presentation"` will be treated as a simple container and the normal behaviour will be
|
271
|
+
suppressed.
|
272
|
+
|
273
|
+
```html
|
274
|
+
|
275
|
+
<table role="presentation">
|
276
|
+
<tr>
|
277
|
+
<td>The table is a lie</td>
|
278
|
+
</tr>
|
279
|
+
</table>
|
280
|
+
<ul role="presentation">
|
281
|
+
<li>No such list</li>
|
282
|
+
</ul>
|
283
|
+
```
|
284
|
+
|
285
|
+
Becomes:
|
286
|
+
|
287
|
+
```
|
288
|
+
The table is a lie
|
289
|
+
No such list
|
290
|
+
```
|
291
|
+
|
292
|
+
### Mail Gem Example
|
57
293
|
|
58
|
-
To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html
|
294
|
+
To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html
|
295
|
+
through Ghostwriter:
|
59
296
|
|
60
297
|
```ruby
|
61
298
|
require 'mail'
|
62
299
|
|
63
|
-
html
|
300
|
+
html = 'My email and a <a href="https://tenjin.ca">link</a>'
|
301
|
+
ghostwriter = Ghostwriter::Writer.new
|
64
302
|
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
303
|
+
Mail.deliver do
|
304
|
+
to 'bob@example.com'
|
305
|
+
from 'dot@example.com'
|
306
|
+
subject 'Using Ghostwriter with Mail'
|
69
307
|
|
70
|
-
|
308
|
+
html_part do
|
71
309
|
content_type 'text/html; charset=UTF-8'
|
72
310
|
body html
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
311
|
+
end
|
312
|
+
|
313
|
+
text_part do
|
314
|
+
body ghostwriter.textify(html)
|
315
|
+
end
|
78
316
|
end
|
79
317
|
|
80
318
|
```
|
81
319
|
|
82
320
|
## Contributing
|
321
|
+
|
83
322
|
Bug reports and pull requests are welcome on GitHub at https://github.com/TenjinInc/ghostwriter
|
84
323
|
|
85
|
-
This project is intended to be a friendly space for collaboration, and contributors are expected to adhere to the
|
324
|
+
This project is intended to be a friendly space for collaboration, and contributors are expected to adhere to the
|
86
325
|
[Contributor Covenant](contributor-covenant.org) code of conduct.
|
87
326
|
|
88
327
|
### Core Developers
|
89
|
-
After checking out the repo, run `bundle install` to install dependencies. Then, run `rake spec` to run the tests.
|
90
|
-
You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
91
328
|
|
92
|
-
|
93
|
-
|
94
|
-
|
329
|
+
After checking out the repo, run `bundle install` to install dependencies. Then, run `rake spec` to run the tests. You
|
330
|
+
can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
331
|
+
|
332
|
+
#### Local Install
|
333
|
+
|
334
|
+
To install this gem onto your local machine only, run
|
335
|
+
|
336
|
+
`bundle exec rake install`
|
337
|
+
|
338
|
+
#### Gem Release
|
339
|
+
|
340
|
+
To release a gem to the world at large
|
341
|
+
|
342
|
+
1. Update the version number in `version.rb`,
|
343
|
+
2. Run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push
|
344
|
+
the `.gem` file to [rubygems.org](https://rubygems.org).
|
345
|
+
3. Do a wee dance
|
95
346
|
|
96
347
|
## License
|
348
|
+
|
97
349
|
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
data/RELEASE_NOTES.md
ADDED
@@ -0,0 +1,91 @@
|
|
1
|
+
# Release Notes
|
2
|
+
|
3
|
+
## 1.0.1 (2021-03-22)
|
4
|
+
|
5
|
+
### Major
|
6
|
+
|
7
|
+
* none
|
8
|
+
|
9
|
+
### Minor
|
10
|
+
|
11
|
+
* Updated README
|
12
|
+
|
13
|
+
### Bugfixes
|
14
|
+
|
15
|
+
* Fixed hr padding behaviour
|
16
|
+
|
17
|
+
## 1.0.0 (2021-03-21)
|
18
|
+
|
19
|
+
### Major
|
20
|
+
|
21
|
+
* Moved `link_base` parameter to constructor
|
22
|
+
* Moved input HTML parameter to `#textify`
|
23
|
+
|
24
|
+
### Minor
|
25
|
+
|
26
|
+
* Treats tables and lists with role="presentation" as simple containers
|
27
|
+
* Now handles ordered and unordered lists
|
28
|
+
* Images are now replaced with their alt text
|
29
|
+
|
30
|
+
### Bugfixes
|
31
|
+
|
32
|
+
* none
|
33
|
+
|
34
|
+
## 0.4.2 (2021-03-17)
|
35
|
+
|
36
|
+
### Major
|
37
|
+
|
38
|
+
* none
|
39
|
+
|
40
|
+
### Minor
|
41
|
+
|
42
|
+
* none
|
43
|
+
|
44
|
+
### Bugfixes
|
45
|
+
|
46
|
+
* Works with links using `tel:` and `mailto:` schemas.
|
47
|
+
|
48
|
+
## 0.4.1 (2021-03-17)
|
49
|
+
|
50
|
+
### Major
|
51
|
+
|
52
|
+
* none
|
53
|
+
|
54
|
+
### Minor
|
55
|
+
|
56
|
+
* No longer provides link target in brackets after link text when they are the same
|
57
|
+
|
58
|
+
### Bugfixes
|
59
|
+
|
60
|
+
* Added explicit testing for HTML entity interpretation
|
61
|
+
|
62
|
+
## 0.4.0 (2021-03-16)
|
63
|
+
|
64
|
+
### Major
|
65
|
+
|
66
|
+
* Updated gem dependencies
|
67
|
+
|
68
|
+
### Minor
|
69
|
+
|
70
|
+
* Updated docs
|
71
|
+
* Added support for tables
|
72
|
+
|
73
|
+
### Bugfixes
|
74
|
+
|
75
|
+
* none
|
76
|
+
|
77
|
+
## 0.3.0 (2016-03-06)
|
78
|
+
|
79
|
+
### Major
|
80
|
+
|
81
|
+
* Renamed to Ghostwriter
|
82
|
+
|
83
|
+
### Minor
|
84
|
+
|
85
|
+
* Docs: Added instruction for using textify with mail gem
|
86
|
+
|
87
|
+
### Bugfixes
|
88
|
+
|
89
|
+
* none
|
90
|
+
|
91
|
+
|
data/Rakefile
CHANGED
data/bin/console
CHANGED
data/dirt-textify.gemspec
CHANGED
@@ -1,27 +1,39 @@
|
|
1
|
-
#
|
2
|
-
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
3
|
+
lib = File.expand_path('lib', __dir__)
|
3
4
|
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
5
|
require 'ghostwriter/version'
|
5
6
|
|
6
|
-
Gem::Specification.new do |
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
7
|
+
Gem::Specification.new do |spec|
|
8
|
+
spec.name = 'ghostwriter'
|
9
|
+
spec.version = Ghostwriter::VERSION
|
10
|
+
spec.authors = ['Robin Miller']
|
11
|
+
spec.email = ['robin@tenjin.ca']
|
12
|
+
|
13
|
+
spec.summary = 'Converts HTML to plain text'
|
14
|
+
spec.description = <<~DESC
|
15
|
+
Converts HTML to plain text, preserving as much legibility and functionality as possible.
|
16
|
+
|
17
|
+
Ideal for providing a plaintext multipart segment of email messages.
|
18
|
+
DESC
|
19
|
+
spec.homepage = 'https://github.com/TenjinInc/ghostwriter'
|
20
|
+
spec.license = 'MIT'
|
21
|
+
|
22
|
+
spec.files = `git ls-files -z`.split("\x0").reject do |f|
|
23
|
+
f.match(%r{^(test|spec|features)/})
|
24
|
+
end
|
11
25
|
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
gemspec.license = 'MIT'
|
26
|
+
spec.bindir = 'exe'
|
27
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
28
|
+
spec.require_paths = ['lib']
|
16
29
|
|
17
|
-
|
18
|
-
gemspec.bindir = 'exe'
|
19
|
-
gemspec.executables = gemspec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
20
|
-
gemspec.require_paths = ['lib']
|
30
|
+
spec.required_ruby_version = '~> 2.4'
|
21
31
|
|
22
|
-
|
32
|
+
spec.add_dependency 'nokogiri', '= 1.8.4'
|
23
33
|
|
24
|
-
|
25
|
-
|
26
|
-
|
34
|
+
spec.add_development_dependency 'bundler', '~> 2.2'
|
35
|
+
spec.add_development_dependency 'rake', '~> 13.0'
|
36
|
+
spec.add_development_dependency 'rspec', '~> 3.3'
|
37
|
+
spec.add_development_dependency 'rubocop', '~> 1.11'
|
38
|
+
spec.add_development_dependency 'rubocop-performance', '~> 1.10'
|
27
39
|
end
|
data/lib/ghostwriter.rb
CHANGED
data/lib/ghostwriter/version.rb
CHANGED
data/lib/ghostwriter/writer.rb
CHANGED
@@ -1,54 +1,164 @@
|
|
1
|
+
# frozen_string_literal: true
|
2
|
+
|
1
3
|
module Ghostwriter
|
4
|
+
# Main Ghostwriter converter object.
|
2
5
|
class Writer
|
3
|
-
|
4
|
-
|
6
|
+
# Creates a new ghostwriter
|
7
|
+
#
|
8
|
+
# @param [String] link_base the url to prefix relative links with
|
9
|
+
def initialize(link_base: '')
|
10
|
+
@link_base = link_base
|
11
|
+
@list_marker = '-'
|
5
12
|
end
|
6
13
|
|
7
|
-
#
|
14
|
+
# Strips HTML down to plain text.
|
15
|
+
#
|
16
|
+
# @param html [String] the HTML to be convert to text
|
8
17
|
#
|
9
|
-
#
|
10
|
-
|
11
|
-
|
12
|
-
html = @source_html.dup
|
18
|
+
# @return converted text
|
19
|
+
def textify(html)
|
20
|
+
doc = Nokogiri::HTML(html.gsub(/\s+/, ' '))
|
13
21
|
|
14
|
-
|
15
|
-
html.squeeze!(' ')
|
22
|
+
doc.search('style, script').remove
|
16
23
|
|
17
|
-
|
24
|
+
replace_anchors(doc)
|
25
|
+
replace_images(doc)
|
18
26
|
|
19
|
-
doc =
|
27
|
+
simple_replace(doc, '*[role="presentation"]', "\n")
|
20
28
|
|
21
|
-
doc
|
22
|
-
doc
|
29
|
+
replace_headers(doc)
|
30
|
+
replace_lists(doc)
|
31
|
+
replace_tables(doc)
|
23
32
|
|
24
|
-
|
33
|
+
simple_replace(doc, 'hr', "\n----------\n\n")
|
34
|
+
simple_replace(doc, 'br', "\n")
|
35
|
+
simple_replace(doc, 'p', "\n\n")
|
25
36
|
|
26
|
-
|
37
|
+
doc.text.strip.split("\n").collect(&:strip).join("\n").concat("\n")
|
38
|
+
end
|
39
|
+
|
40
|
+
private
|
27
41
|
|
42
|
+
def normalize_whitespace(html)
|
43
|
+
html.gsub(/\s/, ' ').squeeze(' ')
|
44
|
+
end
|
45
|
+
|
46
|
+
def replace_anchors(doc)
|
28
47
|
doc.search('a').each do |link_node|
|
29
|
-
href =
|
30
|
-
|
48
|
+
href = get_link_target(link_node, get_link_base(doc))
|
49
|
+
|
50
|
+
link_node.inner_html = if link_matches(href, link_node.inner_html)
|
51
|
+
href.to_s
|
52
|
+
else
|
53
|
+
"#{ link_node.inner_html } (#{ href })"
|
54
|
+
end
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
def link_matches(first, second)
|
59
|
+
first.to_s.gsub(%r{^https?://}, '').chomp('/') == second.gsub(%r{^https?://}, '').chomp('/')
|
60
|
+
end
|
61
|
+
|
62
|
+
def get_link_base(doc)
|
63
|
+
# <base> node is unique by W3C spec
|
64
|
+
base_node = doc.search('base').first
|
65
|
+
|
66
|
+
base_node ? base_node['href'] : @link_base
|
67
|
+
end
|
31
68
|
|
32
|
-
|
69
|
+
def get_link_target(link_node, base)
|
70
|
+
href = URI(link_node['href'])
|
71
|
+
if href.absolute?
|
72
|
+
href
|
73
|
+
else
|
74
|
+
base + href.to_s
|
33
75
|
end
|
76
|
+
rescue URI::InvalidURIError
|
77
|
+
link_node['href'].gsub(/^(tel|mailto):/, '').strip
|
78
|
+
end
|
34
79
|
|
80
|
+
def replace_headers(doc)
|
35
81
|
doc.search('header, h1, h2, h3, h4, h5, h6').each do |node|
|
36
|
-
node.inner_html = "
|
82
|
+
node.inner_html = "-- #{ node.inner_html } --\n".squeeze(' ')
|
37
83
|
end
|
84
|
+
end
|
38
85
|
|
39
|
-
|
40
|
-
|
86
|
+
def replace_images(doc)
|
87
|
+
doc.search('img[role=presentation]').remove
|
88
|
+
|
89
|
+
doc.search('img').each do |img_node|
|
90
|
+
src = img_node['src']
|
91
|
+
alt = img_node['alt']
|
92
|
+
|
93
|
+
src = 'embedded' if src.start_with? 'data:'
|
94
|
+
|
95
|
+
img_node.replace("#{ alt } (#{ src })") unless alt.nil? || alt.empty?
|
41
96
|
end
|
97
|
+
end
|
98
|
+
|
99
|
+
def replace_lists(doc)
|
100
|
+
doc.search('ul, ol').each do |list_node|
|
101
|
+
list_node.search('./li').each_with_index do |list_item, i|
|
102
|
+
marker = if list_node.node_name == 'ol'
|
103
|
+
"#{ i + 1 }."
|
104
|
+
else
|
105
|
+
@list_marker
|
106
|
+
end
|
42
107
|
|
43
|
-
|
44
|
-
|
108
|
+
list_item.inner_html = "#{ marker } #{ list_item.inner_html }\n".squeeze(' ')
|
109
|
+
end
|
110
|
+
|
111
|
+
list_node.replace("#{ list_node.inner_html }\n")
|
45
112
|
end
|
113
|
+
end
|
114
|
+
|
115
|
+
def replace_tables(doc)
|
116
|
+
doc.css('table').each do |table|
|
117
|
+
column_sizes = calculate_column_sizes(table)
|
46
118
|
|
47
|
-
|
48
|
-
|
49
|
-
# end
|
119
|
+
table.search('./thead/tr', './tbody/tr', './tr').each do |row|
|
120
|
+
replace_table_nodes(row, column_sizes)
|
50
121
|
|
51
|
-
|
122
|
+
row.inner_html = "#{ row.inner_html }|\n"
|
123
|
+
end
|
124
|
+
|
125
|
+
add_table_header_underline(table, column_sizes)
|
126
|
+
|
127
|
+
table.inner_html = "#{ table.inner_html }\n"
|
128
|
+
end
|
129
|
+
end
|
130
|
+
|
131
|
+
def calculate_column_sizes(table)
|
132
|
+
column_sizes = table.search('tr').collect do |row|
|
133
|
+
row.search('th', 'td').collect do |node|
|
134
|
+
node.inner_html.length
|
135
|
+
end
|
136
|
+
end
|
137
|
+
|
138
|
+
column_sizes.transpose.collect(&:max)
|
139
|
+
end
|
140
|
+
|
141
|
+
def add_table_header_underline(table, column_sizes)
|
142
|
+
table.search('./thead').each do |row|
|
143
|
+
header_bottom = "|#{ column_sizes.collect { |len| ('-' * (len + 2)) }.join('|') }|"
|
144
|
+
|
145
|
+
row.inner_html = "#{ row.inner_html }#{ header_bottom }\n"
|
146
|
+
end
|
147
|
+
end
|
148
|
+
|
149
|
+
def replace_table_nodes(row, column_sizes)
|
150
|
+
row.search('th', 'td').each_with_index do |node, i|
|
151
|
+
new_content = "| #{ node.inner_html }".squeeze(' ')
|
152
|
+
|
153
|
+
# +2 for the extra spacing between text and pipe
|
154
|
+
node.inner_html = new_content.ljust(column_sizes[i] + 2)
|
155
|
+
end
|
156
|
+
end
|
157
|
+
|
158
|
+
def simple_replace(doc, tag, replacement)
|
159
|
+
doc.search(tag).each do |node|
|
160
|
+
node.replace(node.inner_html + replacement)
|
161
|
+
end
|
52
162
|
end
|
53
163
|
end
|
54
|
-
end
|
164
|
+
end
|
metadata
CHANGED
@@ -1,86 +1,119 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: ghostwriter
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Robin Miller
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-03-23 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
|
-
- -
|
17
|
+
- - '='
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version:
|
19
|
+
version: 1.8.4
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
|
-
- -
|
24
|
+
- - '='
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version:
|
26
|
+
version: 1.8.4
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: bundler
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
30
30
|
requirements:
|
31
|
-
- - ~>
|
31
|
+
- - "~>"
|
32
32
|
- !ruby/object:Gem::Version
|
33
|
-
version: '
|
33
|
+
version: '2.2'
|
34
34
|
type: :development
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
|
-
- - ~>
|
38
|
+
- - "~>"
|
39
39
|
- !ruby/object:Gem::Version
|
40
|
-
version: '
|
40
|
+
version: '2.2'
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: rake
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
44
44
|
requirements:
|
45
|
-
- - ~>
|
45
|
+
- - "~>"
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version: '
|
47
|
+
version: '13.0'
|
48
48
|
type: :development
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
|
-
- - ~>
|
52
|
+
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version: '
|
54
|
+
version: '13.0'
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
name: rspec
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
58
58
|
requirements:
|
59
|
-
- - ~>
|
59
|
+
- - "~>"
|
60
60
|
- !ruby/object:Gem::Version
|
61
61
|
version: '3.3'
|
62
62
|
type: :development
|
63
63
|
prerelease: false
|
64
64
|
version_requirements: !ruby/object:Gem::Requirement
|
65
65
|
requirements:
|
66
|
-
- - ~>
|
66
|
+
- - "~>"
|
67
67
|
- !ruby/object:Gem::Version
|
68
68
|
version: '3.3'
|
69
|
-
|
69
|
+
- !ruby/object:Gem::Dependency
|
70
|
+
name: rubocop
|
71
|
+
requirement: !ruby/object:Gem::Requirement
|
72
|
+
requirements:
|
73
|
+
- - "~>"
|
74
|
+
- !ruby/object:Gem::Version
|
75
|
+
version: '1.11'
|
76
|
+
type: :development
|
77
|
+
prerelease: false
|
78
|
+
version_requirements: !ruby/object:Gem::Requirement
|
79
|
+
requirements:
|
80
|
+
- - "~>"
|
81
|
+
- !ruby/object:Gem::Version
|
82
|
+
version: '1.11'
|
83
|
+
- !ruby/object:Gem::Dependency
|
84
|
+
name: rubocop-performance
|
85
|
+
requirement: !ruby/object:Gem::Requirement
|
86
|
+
requirements:
|
87
|
+
- - "~>"
|
88
|
+
- !ruby/object:Gem::Version
|
89
|
+
version: '1.10'
|
90
|
+
type: :development
|
91
|
+
prerelease: false
|
92
|
+
version_requirements: !ruby/object:Gem::Requirement
|
93
|
+
requirements:
|
94
|
+
- - "~>"
|
95
|
+
- !ruby/object:Gem::Version
|
96
|
+
version: '1.10'
|
97
|
+
description: |
|
98
|
+
Converts HTML to plain text, preserving as much legibility and functionality as possible.
|
99
|
+
|
100
|
+
Ideal for providing a plaintext multipart segment of email messages.
|
70
101
|
email:
|
71
102
|
- robin@tenjin.ca
|
72
103
|
executables: []
|
73
104
|
extensions: []
|
74
105
|
extra_rdoc_files: []
|
75
106
|
files:
|
76
|
-
- .gitignore
|
77
|
-
- .rspec
|
78
|
-
- .
|
79
|
-
- .
|
107
|
+
- ".gitignore"
|
108
|
+
- ".rspec"
|
109
|
+
- ".rubocop.yml"
|
110
|
+
- ".ruby-version"
|
111
|
+
- ".travis.yml"
|
80
112
|
- CODE_OF_CONDUCT.md
|
81
113
|
- Gemfile
|
82
114
|
- LICENSE.txt
|
83
115
|
- README.md
|
116
|
+
- RELEASE_NOTES.md
|
84
117
|
- Rakefile
|
85
118
|
- bin/console
|
86
119
|
- bin/setup
|
@@ -98,18 +131,17 @@ require_paths:
|
|
98
131
|
- lib
|
99
132
|
required_ruby_version: !ruby/object:Gem::Requirement
|
100
133
|
requirements:
|
101
|
-
- -
|
134
|
+
- - "~>"
|
102
135
|
- !ruby/object:Gem::Version
|
103
|
-
version: '
|
136
|
+
version: '2.4'
|
104
137
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
105
138
|
requirements:
|
106
|
-
- -
|
139
|
+
- - ">="
|
107
140
|
- !ruby/object:Gem::Version
|
108
141
|
version: '0'
|
109
142
|
requirements: []
|
110
|
-
|
111
|
-
rubygems_version: 2.4.3
|
143
|
+
rubygems_version: 3.1.2
|
112
144
|
signing_key:
|
113
145
|
specification_version: 4
|
114
|
-
summary:
|
146
|
+
summary: Converts HTML to plain text
|
115
147
|
test_files: []
|