ghostwriter 1.0.0 → 1.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 80d5aced9b18684b3640c28ce1e86c8e9859942f57fefbd26dd1a1f3e7791eaf
4
- data.tar.gz: 0f71619c0a7e247cf163f074ca7c8bb54fa8c93daaa5649df8f853fd0ccab1da
3
+ metadata.gz: 2868e207e695355f8e9f40521b38d1f96b72b45c9ab73207396a87e5b4d535cd
4
+ data.tar.gz: cf111d734daa4bf94e4d9c2924dbd6c7d12b38b2b26a55db79110730f306fccc
5
5
  SHA512:
6
- metadata.gz: f9760753d4ffc30bee200a33347cd9aeda0b4593304f07ff9ce53c4ca1f971d51b50644feb763ffc51ec5635202e971704592f45d0cbe021693a7e790e39c7e9
7
- data.tar.gz: 203df0d639d25f35a73dcdde0f12c037e8f508e086c34cec5e2da7325a8b0e173db10590da50550a50e853f74f0753ce231b36646619fc3c1f166bdd986e7186
6
+ metadata.gz: 8d71989a44d8d2da33496172c600ec38063100c53642850982a9a93ccefbea37e40847e93edfde09d3b0f0dad98f457296f01c070387d86b11cc06e2ee9e04c1
7
+ data.tar.gz: 0767f0d24a895477aee922bd960380608185b741c0d87975bba65eaa03270539e2af2776bbcef2f689b604b76fde7882dbe9322e522b190858c2699359ce8a3b
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Ghostwriter
2
2
 
3
- Ghostwriter rewrites HTML as plain text while preserving as much legibility and functionality as possible.
3
+ A ruby gem that converts HTML to plain text, preserving as much legibility and functionality as possible.
4
4
 
5
5
  It's sort of like a reverse-markdown or a very, very simple screen reader.
6
6
 
@@ -8,8 +8,8 @@ It's sort of like a reverse-markdown or a very, very simple screen reader.
8
8
 
9
9
  * Some email clients won't or can’t handle HTML at all
10
10
  * Some people explicitly choose plaintext just by preference or accessibility
11
- * Spam filters tend to like emails with a plain text alternative (but if you use this gem to help you spam people, I
12
- will yell at you)
11
+ * Spam filters tend to prefer emails with a plain text alternative (but if you use this gem to spam people, I will yell
12
+ at you)
13
13
 
14
14
  ## Installation
15
15
 
@@ -29,20 +29,46 @@ Or install it manually with:
29
29
 
30
30
  ## Usage
31
31
 
32
- Create a `Ghostwriter::Writer` and call `#textify` with the html you want modified:
32
+ Create a `Ghostwriter::Writer` and call `#textify` with the html string you want modified:
33
33
 
34
34
  ```ruby
35
- html = '<html><body><p>This is some markup <a href="tenjin.ca">and a link</a></p><p>Other tags translate, too</p></body></html>'
35
+ html = <<~HTML
36
+ <html>
37
+ <body>
38
+ <p>This is some text with <a href="tenjin.ca">a link</a></p>
39
+ <p>It handles other stuff, too.</p>
40
+ <hr>
41
+ <h1>Stuff Like</h1>
42
+ <ul>
43
+ <li>Images</li>
44
+ <li>Lists</li>
45
+ <li>Tables</li>
46
+ <li>And more</li>
47
+ </ul>
48
+ </body>
49
+ </html>
50
+ HTML
36
51
 
37
- Ghostwriter::Writer.new.textify(html)
52
+ ghostwriter = Ghostwriter::Writer.new
53
+
54
+ puts ghostwriter.textify(html)
38
55
  ```
39
56
 
40
57
  Produces:
41
58
 
42
59
  ```
43
- This is some markup and a link (tenjin.ca)
60
+ This is some text with a link (tenjin.ca)
61
+
62
+ It handles other stuff, too.
63
+
44
64
 
45
- Other tags translate, too
65
+ ----------
66
+
67
+ -- Stuff Like --
68
+ - Images
69
+ - Lists
70
+ - Tables
71
+ - And more
46
72
  ```
47
73
 
48
74
  ### Links
@@ -50,12 +76,7 @@ Other tags translate, too
50
76
  Links are converted to the link text followed by the link target in brackets:
51
77
 
52
78
  ```html
53
-
54
- <html>
55
- <body>
56
79
  Visit our <a href="https://example.com">Website</a>
57
- <body>
58
- </html>
59
80
  ```
60
81
 
61
82
  Becomes:
@@ -85,7 +106,7 @@ Use the base tag to <a href="/contact">expand</a> links.
85
106
  Becomes:
86
107
 
87
108
  ```
88
- Use the base tag to expand (https://www.example.com/contact) links
109
+ Use the base tag to expand (https://www.example.com/contact) links.
89
110
  ```
90
111
 
91
112
  Or you can use the `link_base` configuration:
@@ -119,7 +140,9 @@ But images lacking alt text or with a presentation ARIA role are ignored:
119
140
  And images with data URIs won't include the data portion.
120
141
 
121
142
  ```html
122
- <img src="data:image/gif;base64,R0lGODdhIwAjAMZ/AAkMBxETEBUUDBoaExkaGCIcFx4fGCEfFCcfECkjHiUlHiglGikmFjAqFi8pJCsrJT8sCjMzLDUzJzs0GjkzLTszKTM1Mzg4MD48Mzs+O0tAIElCJ1NCGVdBHUtEMkNFQjlHTFJDOkdGPT1ISUxLRENOT1tMI01PTGdLKk1RU0hTVEtTT0NVVFRTTExYWE9YVGhVP1VZXGFYTWhaMFRcWHFYL1FdXV1dRHdZMVRgYFhgXFdiY11hY1tkX31hJltmZ2pnWnloLGFrbG9oYXlqN3NqTnBqWHxqRItvRIh0Nod0ToF2U5J4LX55Xm97e4B5aZqAQpGAdqOCOZKEYZ2FOJyEVoyKbqiOXpySbLCVcLCXaKWbdKCdfZyhi66dksGdc76fbbije7mkdLOmgq6ogrCpibyvirexisWvhs2vgsGyiLq1lce1lMC5ks28nsfBmcHDq9bAl9PDmMnFo9TGh8zIoM7Jm9vLs9nRo93QqtfSquLQpdXUs+fdterlw////ywAAAAAIwAjAAAH/oArOTo6PYaGOz08P0KMOTZCOzw7PzY/Pz2JPYSDhTSFPTSXPY0tIiIfJz05o5Q/O7A5moc6O4Q0oS8uQisXGCItwTItP5OxOrKjhzSfLzYvgz85ERQXJKcSIkZeJDqOl43StrSEKzo2LhkOGBISDw40JyIVFVEyorBCkZmwtCsrtnLQSJCAwoMFCiwoiECPAr0TjPrtECJwXLMVNARlUCBhQAEFC2SsgWPGDBs3d2RcorSD1SVGr3qskOkihoIH70DO0cOHDx48evD0KQONmQ0aORZJE3VLRYoPBRwoUCCCSx07eoL+xLNnj5UfNFry4BHuR6EcK0qkKJFhAYUE/g+cdHlz1efPrnvM2MjhQlYOWTxktXThIoUKhQoKDHBi5Y0dO0CD5smzJ46NvWJfjYW1w4WKEiWkKkgw9UYdPXTo8Mn6042bvX9pTHoFa5GKzykekP5owEidN1u6PKnzMw+QJ3ttUPr7qKUs0C5KHOyoAMMaNWrmjKlSRYscMFm+nBBUybkLSYsIl3DxwAgcKwWMzGnz5kqTK1e09AEDI0uGE8rJEgNfsuxVggoujGABF1xMoYAVc9RRhxxq5JGVHn3EEYcIGfT1igvGKLfDZyWMkMINa5QhQRNz9CQhT1n5URmHJ8Sygw2BSWLDbaCpgEFPNzxBV4QwApVhHBhg/vABZ0pJIhuCoI0wQhFlkLEGGWfQ9wZ2W6KRBhoUJKncKyK2tMOBPI6wwAxltInlG1uKcQUUV3xpwQUXACSJjbCAxgJoJShggBVtnmGGlm/M4UYcX14QQQQ1PpJjUjmsd5sKCg5gBRdkYMlGG2KwoUYWWYARxgXVnODXqmP9CWgJIESwxhJTbEHGGGbMsSWpaRRBQQQXpPKIiJOgg+BnI4AwwhxcHFHrGGN0KYYYaEhAzQX/7flIDMqx4CoIJY7QxhpY0GorXXXwkUcRj1Lg7gfMDavcCSx4BqsIHpyxRhtT1FCDEmNgF4YY1j6KZ4eXXTast9GVcAIHG2TZRhlT/qCAAg5IZIzCA+1QQ0EGKbgAG7c0pPOAAgQcwEQSZ2R5RhlYVIFEFVccAQEAAASgWEIrXEZYDDHQYAEBAQSAcxBUbCExGWVsMfMVCHSA89QCbHBDX4QRRsPURuMcQBBQYLHGHGuwoYUYVdQQxAIOBCCACVLUgDMBS7rwwgtENHDAAEYLMIAAHhABRRVYKFEDDjjU0AA9HiQhxQQOCDC1BXe/UAQVVATRwAIDDGCAAAd0EAQTTEgBBQ4IIFSBFHFPdYEIFJBAQOUE1K5AAyZgnsQME/jNwAG/e7QBFT4sYEABBiQv6ANDDLDCCwPULr0ADYyeOQcMLMAAAxNAIQUHJwckYEDn5CfvgAEKvECA3+R7nrwB2k+ggQkmaLB3++Sz3zkMIawQCAA7" alt="Data picture"/>
143
+
144
+ <img src="data:image/gif;base64,R0lGODdhIwAjAMZ/AAkMBxETEBUUDBoaExkaGCIcFx4fGCEfFCcfECkjHiUlHiglGikmFjAqFi8pJCsrJT8sCjMzLDUzJzs0GjkzLTszKTM1Mzg4MD48Mzs+O0tAIElCJ1NCGVdBHUtEMkNFQjlHTFJDOkdGPT1ISUxLRENOT1tMI01PTGdLKk1RU0hTVEtTT0NVVFRTTExYWE9YVGhVP1VZXGFYTWhaMFRcWHFYL1FdXV1dRHdZMVRgYFhgXFdiY11hY1tkX31hJltmZ2pnWnloLGFrbG9oYXlqN3NqTnBqWHxqRItvRIh0Nod0ToF2U5J4LX55Xm97e4B5aZqAQpGAdqOCOZKEYZ2FOJyEVoyKbqiOXpySbLCVcLCXaKWbdKCdfZyhi66dksGdc76fbbije7mkdLOmgq6ogrCpibyvirexisWvhs2vgsGyiLq1lce1lMC5ks28nsfBmcHDq9bAl9PDmMnFo9TGh8zIoM7Jm9vLs9nRo93QqtfSquLQpdXUs+fdterlw////ywAAAAAIwAjAAAH/oArOTo6PYaGOz08P0KMOTZCOzw7PzY/Pz2JPYSDhTSFPTSXPY0tIiIfJz05o5Q/O7A5moc6O4Q0oS8uQisXGCItwTItP5OxOrKjhzSfLzYvgz85ERQXJKcSIkZeJDqOl43StrSEKzo2LhkOGBISDw40JyIVFVEyorBCkZmwtCsrtnLQSJCAwoMFCiwoiECPAr0TjPrtECJwXLMVNARlUCBhQAEFC2SsgWPGDBs3d2RcorSD1SVGr3qskOkihoIH70DO0cOHDx48evD0KQONmQ0aORZJE3VLRYoPBRwoUCCCSx07eoL+xLNnj5UfNFry4BHuR6EcK0qkKJFhAYUE/g+cdHlz1efPrnvM2MjhQlYOWTxktXThIoUKhQoKDHBi5Y0dO0CD5smzJ46NvWJfjYW1w4WKEiWkKkgw9UYdPXTo8Mn6042bvX9pTHoFa5GKzykekP5owEidN1u6PKnzMw+QJ3ttUPr7qKUs0C5KHOyoAMMaNWrmjKlSRYscMFm+nBBUybkLSYsIl3DxwAgcKwWMzGnz5kqTK1e09AEDI0uGE8rJEgNfsuxVggoujGABF1xMoYAVc9RRhxxq5JGVHn3EEYcIGfT1igvGKLfDZyWMkMINa5QhQRNz9CQhT1n5URmHJ8Sygw2BSWLDbaCpgEFPNzxBV4QwApVhHBhg/vABZ0pJIhuCoI0wQhFlkLEGGWfQ9wZ2W6KRBhoUJKncKyK2tMOBPI6wwAxltInlG1uKcQUUV3xpwQUXACSJjbCAxgJoJShggBVtnmGGlm/M4UYcX14QQQQ1PpJjUjmsd5sKCg5gBRdkYMlGG2KwoUYWWYARxgXVnODXqmP9CWgJIESwxhJTbEHGGGbMsSWpaRRBQQQXpPKIiJOgg+BnI4AwwhxcHFHrGGN0KYYYaEhAzQX/7flIDMqx4CoIJY7QxhpY0GorXXXwkUcRj1Lg7gfMDavcCSx4BqsIHpyxRhtT1FCDEmNgF4YY1j6KZ4eXXTast9GVcAIHG2TZRhlT/qCAAg5IZIzCA+1QQ0EGKbgAG7c0pPOAAgQcwEQSZ2R5RhlYVIFEFVccAQEAAASgWEIrXEZYDDHQYAEBAQSAcxBUbCExGWVsMfMVCHSA89QCbHBDX4QRRsPURuMcQBBQYLHGHGuwoYUYVdQQxAIOBCCACVLUgDMBS7rwwgtENHDAAEYLMIAAHhABRRVYKFEDDjjU0AA9HiQhxQQOCDC1BXe/UAQVVATRwAIDDGCAAAd0EAQTTEgBBQ4IIFSBFHFPdYEIFJBAQOUE1K5AAyZgnsQME/jNwAG/e7QBFT4sYEABBiQv6ANDDLDCCwPULr0ADYyeOQcMLMAAAxNAIQUHJwckYEDn5CfvgAEKvECA3+R7nrwB2k+ggQkmaLB3++Sz3zkMIawQCAA7"
145
+ alt="Data picture" />
123
146
  ```
124
147
 
125
148
  Becomes:
@@ -128,6 +151,44 @@ Becomes:
128
151
  Data picture (embedded)
129
152
  ```
130
153
 
154
+ ### Paragraphs and Linebreaks
155
+
156
+ Paragraphs are padded with a newline at the end. Line break tags add an empty line.
157
+
158
+ ```html
159
+ <p>I would like to propose a toast.</p>
160
+ <p>This meal we enjoy together would be improved by one.</p>
161
+ <br />
162
+ <p>... Plug in the toaster and I'll get the bread.</p>
163
+ ```
164
+
165
+ ```
166
+ I would like to propose a toast.
167
+
168
+ This meal we enjoy together would be improved by one.
169
+
170
+
171
+ ... Plug in the toaster and I'll get the bread.
172
+
173
+ ```
174
+
175
+ ### Headers
176
+
177
+ For now, headers are all treated the same and given a simple marker:
178
+
179
+ ```html
180
+ <h1>Dog Maintenance and Repair</h1>
181
+ <h2>Food Input Port</h2>
182
+ <h3>Exhaust Port Considerations</h3>
183
+ ```
184
+
185
+ Becomes:
186
+
187
+ ```
188
+ -- Dog Maintenance and Repair --
189
+ -- Food Input Port --
190
+ -- Exhaust Port Considerations --
191
+ ```
131
192
 
132
193
  ### Lists
133
194
 
data/RELEASE_NOTES.md CHANGED
@@ -1,5 +1,19 @@
1
1
  # Release Notes
2
2
 
3
+ ## 1.0.1 (2021-03-22)
4
+
5
+ ### Major
6
+
7
+ * none
8
+
9
+ ### Minor
10
+
11
+ * Updated README
12
+
13
+ ### Bugfixes
14
+
15
+ * Fixed hr padding behaviour
16
+
3
17
  ## 1.0.0 (2021-03-21)
4
18
 
5
19
  ### Major
data/dirt-textify.gemspec CHANGED
@@ -10,9 +10,11 @@ Gem::Specification.new do |spec|
10
10
  spec.authors = ['Robin Miller']
11
11
  spec.email = ['robin@tenjin.ca']
12
12
 
13
- spec.summary = 'Intelligently extracts plaintext from an HTML document.'
13
+ spec.summary = 'Converts HTML to plain text'
14
14
  spec.description = <<~DESC
15
- Transforms HTML into plaintext while preserving legibility and functionality.
15
+ Converts HTML to plain text, preserving as much legibility and functionality as possible.
16
+
17
+ Ideal for providing a plaintext multipart segment of email messages.
16
18
  DESC
17
19
  spec.homepage = 'https://github.com/TenjinInc/ghostwriter'
18
20
  spec.license = 'MIT'
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Ghostwriter
4
- VERSION = '1.0.0'
4
+ VERSION = '1.0.1'
5
5
  end
@@ -17,7 +17,7 @@ module Ghostwriter
17
17
  #
18
18
  # @return converted text
19
19
  def textify(html)
20
- doc = Nokogiri::HTML(normalize_whitespace(html).gsub('</p>', "</p>\n\n"))
20
+ doc = Nokogiri::HTML(html.gsub(/\s+/, ' '))
21
21
 
22
22
  doc.search('style, script').remove
23
23
 
@@ -30,15 +30,11 @@ module Ghostwriter
30
30
  replace_lists(doc)
31
31
  replace_tables(doc)
32
32
 
33
- simple_replace(doc, 'hr', "\n----------\n")
33
+ simple_replace(doc, 'hr', "\n----------\n\n")
34
34
  simple_replace(doc, 'br', "\n")
35
+ simple_replace(doc, 'p', "\n\n")
35
36
 
36
- # doc.search('p').each do |link_node|
37
- # link_node.inner_html = link_node.inner_html + "\n\n"
38
- # end
39
-
40
- # trim, but only single-space character
41
- doc.text.gsub(/^ +| +$/, '')
37
+ doc.text.strip.split("\n").collect(&:strip).join("\n").concat("\n")
42
38
  end
43
39
 
44
40
  private
@@ -112,7 +108,7 @@ module Ghostwriter
112
108
  list_item.inner_html = "#{ marker } #{ list_item.inner_html }\n".squeeze(' ')
113
109
  end
114
110
 
115
- list_node.replace("\n#{ list_node.inner_html }\n")
111
+ list_node.replace("#{ list_node.inner_html }\n")
116
112
  end
117
113
  end
118
114
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ghostwriter
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Robin Miller
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-03-22 00:00:00.000000000 Z
11
+ date: 2021-03-23 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -94,9 +94,10 @@ dependencies:
94
94
  - - "~>"
95
95
  - !ruby/object:Gem::Version
96
96
  version: '1.10'
97
- description: 'Transforms HTML into plaintext while preserving legibility and functionality.
97
+ description: |
98
+ Converts HTML to plain text, preserving as much legibility and functionality as possible.
98
99
 
99
- '
100
+ Ideal for providing a plaintext multipart segment of email messages.
100
101
  email:
101
102
  - robin@tenjin.ca
102
103
  executables: []
@@ -142,5 +143,5 @@ requirements: []
142
143
  rubygems_version: 3.1.2
143
144
  signing_key:
144
145
  specification_version: 4
145
- summary: Intelligently extracts plaintext from an HTML document.
146
+ summary: Converts HTML to plain text
146
147
  test_files: []