ghostwriter 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,15 +1,7 @@
1
1
  ---
2
- !binary "U0hBMQ==":
3
- metadata.gz: !binary |-
4
- MjAxZjQzNDU2NTEyMDgwODc3NTAyYjQyYTRhNGZjOTEwMzg4MTE0Yw==
5
- data.tar.gz: !binary |-
6
- NjBjZWU4OWExMmYyOGU1NDMzNTI4MDhkNzEzMGU3YWFhNjdiM2M0MA==
2
+ SHA256:
3
+ metadata.gz: afedaad685b3c06c7baf6e7b1a7b00d8397f9ad1446ddfdcf835dfc99df19969
4
+ data.tar.gz: 786a612861e5d11e19672057371720c57dd7dc67e1f9b7333bf9269b819b0548
7
5
  SHA512:
8
- metadata.gz: !binary |-
9
- ZDcxMDQ5N2IzMTU0YzlhMzZmMDBkODk2NDVhZjEwZTkyOTY0ZDFmMzZiMmYw
10
- NDc3ZmY5NjgyNzkxOTUzMDg4YWU2NmNkZDgyMDdkNjc4OTMzMmIzYWY1MWY2
11
- OWRhMzgxNjc5ODM4Yjc4NGE5ZTBmODBmNGUzOGU3NDY2YTA5NDk=
12
- data.tar.gz: !binary |-
13
- NTIzYTdjNmRmZTRkMTc4NGIxZWJiYjBhYWVkMDI4ZDM1NTc4NDQ4ZTdkZDZk
14
- YTVlN2Y1OGMzYTg3MjlkYmNhMjUyMGQwNTZlMmIzNjYxYmQyMDIxMjJiOTVi
15
- N2YzMzczODBkZWMxZmY2MmJkYzJkYjQxZjJlZjBjZTM2OWJkMjQ=
6
+ metadata.gz: 610473e1214cd68edd7b1ce952fd43c44c22a29bc0895bebcc57b4e5e00385bccaec2008e23252eaf438a8b9a1ebd56af0a4c3956be53791ef1b5ee8b75dc1a3
7
+ data.tar.gz: 3f3bee72a1515077e7ccff0b1baae94607e33c84c83a84f8b51fdf3234aeaecaa48ddf0d6cfc2a38584ce6c0d0d7f52db5b4021638eb611089087ee638ba2d16
data/.rubocop.yml ADDED
@@ -0,0 +1 @@
1
+ inherit_from: ../.rubocop.yml
data/.ruby-version CHANGED
@@ -1 +1 @@
1
- ruby-1.9.3
1
+ ruby-2.7.1
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source 'https://rubygems.org'
2
4
 
3
5
  # Specify gem dependencies in ghostwriter.gemspec
data/README.md CHANGED
@@ -1,6 +1,14 @@
1
1
  # Ghostwriter
2
2
 
3
- Ghostwriter rewrites your emails to conform to varying email client requirements.
3
+ Ghostwriter rewrites HTML as plain text while preserving as much legibility and functionality as possible.
4
+
5
+ It's sort of like a reverse-markdown.
6
+
7
+ ## But Why, Though?
8
+
9
+ * Spam filters tend to like emails with a plain text alternative
10
+ * Some email clients won't or can’t handle HTML at all
11
+ * Some people explicitly choose plaintext just by preference or accessibility
4
12
 
5
13
  ## Installation
6
14
 
@@ -12,25 +20,14 @@ gem 'ghostwriter'
12
20
 
13
21
  And then execute:
14
22
 
15
- $ bundle
23
+ bundle
16
24
 
17
25
  Or install it manually with:
18
26
 
19
- $ gem install ghostwriter
27
+ gem install ghostwriter
20
28
 
21
29
  ## Usage
22
30
 
23
- ###Stripping HTML
24
-
25
- Transform HTML into plaintext while preserving as much legibility and functionality as possible.
26
- It's prime use is in quickly producing an automatic plaintext version of HTML emails.
27
-
28
- Why offer plaintext?
29
-
30
- * Spam filters prefer included plain text alternative
31
- * Some email clients and apps can’t handle HTML
32
- * Some people explicitly choose plaintext, either by requirement or simple preference
33
-
34
31
  Create a `Ghostwriter::Writer` with the html you want modified, and call `#textify`:
35
32
 
36
33
  ```ruby
@@ -39,10 +36,10 @@ html = '<html><body>This is some markup <a href="tenjin.ca">and a link</a><p>Oth
39
36
  Ghostwriter::Writer.new(html).textify
40
37
 
41
38
  => "This is some markup and a link (tenjin.ca)\nOther tags translate, too\n\n"
42
-
39
+
43
40
  ```
44
41
 
45
- `#textify` will use a `<base>` tag if included in the HTML source, or if one is provided explicitly:
42
+ `#textify` will use the `<base>` tag if found in the HTML source, or if one is provided explicitly:
46
43
 
47
44
  ```ruby
48
45
  html = '<html><body>Relative links <a href="/contact">Link</a></body></html>'
@@ -53,9 +50,10 @@ Ghostwriter::Writer.new(html).textify(link_base: 'tenjin.ca')
53
50
 
54
51
  ```
55
52
 
56
- #### Mail Gem Example
53
+ ### Mail Gem Example
57
54
 
58
- To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html through Ghostwriter:
55
+ To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html
56
+ through Ghostwriter:
59
57
 
60
58
  ```ruby
61
59
  require 'mail'
@@ -63,35 +61,49 @@ require 'mail'
63
61
  html = 'My email and a <a href="http://tenjin.ca">link</a>'
64
62
 
65
63
  mail = Mail.deliver do
66
- to 'bob@example.com'
67
- from 'dot@example.com'
68
- subject 'Using Ghostwriter with Mail'
64
+ to 'bob@example.com'
65
+ from 'dot@example.com'
66
+ subject 'Using Ghostwriter with Mail'
69
67
 
70
- html_part do
68
+ html_part do
71
69
  content_type 'text/html; charset=UTF-8'
72
70
  body html
73
- end
74
-
75
- text_part do
76
- body Ghostwriter::Writer.new(html).textify
77
- end
71
+ end
72
+
73
+ text_part do
74
+ body Ghostwriter::Writer.new(html).textify
75
+ end
78
76
  end
79
77
 
80
78
  ```
81
79
 
82
80
  ## Contributing
81
+
83
82
  Bug reports and pull requests are welcome on GitHub at https://github.com/TenjinInc/ghostwriter
84
83
 
85
- This project is intended to be a friendly space for collaboration, and contributors are expected to adhere to the
84
+ This project is intended to be a friendly space for collaboration, and contributors are expected to adhere to the
86
85
  [Contributor Covenant](contributor-covenant.org) code of conduct.
87
86
 
88
87
  ### Core Developers
89
- After checking out the repo, run `bundle install` to install dependencies. Then, run `rake spec` to run the tests.
90
- You can also run `bin/console` for an interactive prompt that will allow you to experiment.
91
88
 
92
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the
93
- version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version,
94
- push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
89
+ After checking out the repo, run `bundle install` to install dependencies. Then, run `rake spec` to run the tests. You
90
+ can also run `bin/console` for an interactive prompt that will allow you to experiment.
91
+
92
+ #### Local Install
93
+ To install this gem onto your local machine only, run
94
+
95
+ `bundle exec rake install`
96
+
97
+ #### Gem Release
98
+ To release a gem to the world at large
99
+
100
+ 1. Update the version number in `version.rb`,
101
+ 2. Run `bundle exec rake release`,
102
+ which will create a git tag for the version,
103
+ push git commits and tags,
104
+ and push the `.gem` file to [rubygems.org](https://rubygems.org).
105
+ 3. Do a wee dance
95
106
 
96
107
  ## License
108
+
97
109
  The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
data/RELEASE_NOTES.md ADDED
@@ -0,0 +1,32 @@
1
+ # Release Notes
2
+
3
+ ## 0.4.0 (2021-03-16)
4
+
5
+ ### Major
6
+
7
+ * Updated gem dependencies
8
+
9
+ ### Minor
10
+
11
+ * Updated docs
12
+ * Added support for tables
13
+
14
+ ### Bugfixes
15
+
16
+ * none
17
+
18
+ ## 0.3.0 (2016-03-06)
19
+
20
+ ### Major
21
+
22
+ * Renamed to Ghostwriter
23
+
24
+ ### Minor
25
+
26
+ * Docs: Added instruction for using textify with mail gem
27
+
28
+ ### Bugfixes
29
+
30
+ * none
31
+
32
+
data/Rakefile CHANGED
@@ -1,6 +1,8 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'bundler/gem_tasks'
2
4
  require 'rspec/core/rake_task'
3
5
 
4
6
  RSpec::Core::RakeTask.new(:spec)
5
7
 
6
- task :default => :spec
8
+ task default: :spec
data/bin/console CHANGED
@@ -1,4 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
+ #
3
+ # frozen_string_literal: true
2
4
 
3
5
  require 'bundler/setup'
4
6
  require 'ghostwriter'
@@ -10,5 +12,5 @@ require 'ghostwriter'
10
12
  # require "pry"
11
13
  # Pry.start
12
14
 
13
- require "irb"
15
+ require 'irb'
14
16
  IRB.start
data/dirt-textify.gemspec CHANGED
@@ -1,27 +1,37 @@
1
- # coding: utf-8
2
- lib = File.expand_path('../lib', __FILE__)
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path('lib', __dir__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require 'ghostwriter/version'
5
6
 
6
- Gem::Specification.new do |gemspec|
7
- gemspec.name = 'ghostwriter'
8
- gemspec.version = Ghostwriter::VERSION
9
- gemspec.authors = ['Robin Miller']
10
- gemspec.email = ['robin@tenjin.ca']
7
+ Gem::Specification.new do |spec|
8
+ spec.name = 'ghostwriter'
9
+ spec.version = Ghostwriter::VERSION
10
+ spec.authors = ['Robin Miller']
11
+ spec.email = ['robin@tenjin.ca']
12
+
13
+ spec.summary = 'Intelligently extracts plaintext from an HTML document.'
14
+ spec.description = <<~DESC
15
+ Transforms HTML into plaintext while preserving legibility and functionality.
16
+ DESC
17
+ spec.homepage = 'https://github.com/TenjinInc/ghostwriter'
18
+ spec.license = 'MIT'
19
+
20
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
21
+ f.match(%r{^(test|spec|features)/})
22
+ end
11
23
 
12
- gemspec.summary = %q{Intelligently extracts plaintext from an HTML document.}
13
- gemspec.description = %q{Transforms HTML into plaintext while preserving legibility and functionality. }
14
- gemspec.homepage = 'https://github.com/TenjinInc/ghostwriter'
15
- gemspec.license = 'MIT'
24
+ spec.bindir = 'exe'
25
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
26
+ spec.require_paths = ['lib']
16
27
 
17
- gemspec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
- gemspec.bindir = 'exe'
19
- gemspec.executables = gemspec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
- gemspec.require_paths = ['lib']
28
+ spec.required_ruby_version = '~> 2.4'
21
29
 
22
- gemspec.add_dependency 'nokogiri', '~> 1.6'
30
+ spec.add_dependency 'nokogiri', '= 1.8.4'
23
31
 
24
- gemspec.add_development_dependency 'bundler', '~> 1.10'
25
- gemspec.add_development_dependency 'rake', '~> 10.0'
26
- gemspec.add_development_dependency 'rspec', '~> 3.3'
32
+ spec.add_development_dependency 'bundler', '~> 2.2'
33
+ spec.add_development_dependency 'rake', '~> 13.0'
34
+ spec.add_development_dependency 'rspec', '~> 3.3'
35
+ spec.add_development_dependency 'rubocop', '~> 1.11'
36
+ spec.add_development_dependency 'rubocop-performance', '~> 1.10'
27
37
  end
data/lib/ghostwriter.rb CHANGED
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'ghostwriter/version'
2
4
  require 'ghostwriter/writer'
3
5
  require 'nokogiri'
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module Ghostwriter
2
- VERSION = '0.3.0'
4
+ VERSION = '0.4.0'
3
5
  end
@@ -1,54 +1,102 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module Ghostwriter
4
+ # Main Ghostwriter converter object.
2
5
  class Writer
3
6
  def initialize(html)
4
7
  @source_html = html
5
8
  end
6
9
 
7
- # Intelligently strips HTML down to text.
10
+ # Strips HTML down to plain text.
8
11
  #
9
- # Options:
10
- # link_base: the url to prefix relative links with
11
- def textify(options={})
12
- html = @source_html.dup
13
-
14
- html.gsub!(/\n|\t/, ' ')
15
- html.squeeze!(' ')
16
-
17
- html.gsub!('</p>', "</p>\n\n")
12
+ # @param link_base the url to prefix relative links with
13
+ def textify(link_base: '')
14
+ html = normalize_whitespace(@source_html).gsub('</p>', "</p>\n\n")
18
15
 
19
16
  doc = Nokogiri::HTML(html)
20
17
 
21
18
  doc.search('style').remove
22
19
  doc.search('script').remove
23
20
 
24
- base = doc.search('base').first #<base> is unique by W3C spec
21
+ replace_anchors(doc, link_base)
22
+ replace_headers(doc)
23
+ replace_table(doc)
24
+
25
+ simple_replace(doc, 'hr', "\n----------\n")
26
+ simple_replace(doc, 'br', "\n")
25
27
 
26
- base_url = base ? base['href'] : options[:link_base] || ''
28
+ # doc.search('p').each do |link_node|
29
+ # link_node.inner_html = link_node.inner_html + "\n\n"
30
+ # end
31
+
32
+ # trim, but only single-space character
33
+ doc.text.gsub(/^ +| +$/, '')
34
+ end
35
+
36
+ private
37
+
38
+ def normalize_whitespace(html)
39
+ html.gsub(/\s/, ' ').squeeze(' ')
40
+ end
41
+
42
+ def replace_anchors(doc, link_base)
43
+ # <base> node is unique by W3C spec
44
+ base = doc.search('base').first
45
+ base_url = base ? base['href'] : link_base
27
46
 
28
47
  doc.search('a').each do |link_node|
29
48
  href = URI(link_node['href'])
30
49
  href = base_url + href.to_s unless href.absolute?
31
50
 
32
- link_node.inner_html = "#{link_node.inner_html} (#{href})"
51
+ link_node.inner_html = "#{ link_node.inner_html } (#{ href })"
33
52
  end
53
+ end
34
54
 
55
+ def replace_headers(doc)
35
56
  doc.search('header, h1, h2, h3, h4, h5, h6').each do |node|
36
- node.inner_html = "- #{node.inner_html} -\n".squeeze(' ')
57
+ node.inner_html = "- #{ node.inner_html } -\n".squeeze(' ')
37
58
  end
59
+ end
38
60
 
39
- doc.search('hr').each do |node|
40
- node.replace "\n----------\n"
41
- end
61
+ def replace_table(doc)
62
+ doc.css('table').each do |table|
63
+ column_sizes = table.search('tr').collect do |row|
64
+ row.search('th', 'td').collect do |node|
65
+ node.inner_html.length
66
+ end
67
+ end
68
+
69
+ column_sizes = column_sizes.transpose.collect(&:max)
70
+
71
+ table.search('./thead/tr', './tbody/tr', './tr').each do |row|
72
+ replace_table_nodes(row, column_sizes)
73
+
74
+ row.inner_html = "#{ row.inner_html }|\n"
75
+ end
42
76
 
43
- doc.search('br').each do |node|
44
- node.replace "\n"
77
+ table.search('./thead').each do |row|
78
+ header_bottom = "|#{ column_sizes.collect { |len| ('-' * (len + 2)) }.join('|') }|"
79
+
80
+ row.inner_html = "#{ row.inner_html }#{ header_bottom }\n"
81
+ end
82
+
83
+ table.inner_html = "#{ table.inner_html }\n"
45
84
  end
85
+ end
46
86
 
47
- # doc.search('p').each do |link_node|
48
- # link_node.inner_html = link_node.inner_html + "\n\n"
49
- # end
87
+ def replace_table_nodes(row, column_sizes)
88
+ row.search('th', 'td').each_with_index do |node, i|
89
+ new_content = "| #{ node.inner_html }".squeeze(' ')
50
90
 
51
- doc.text.gsub(/^[ ]+|[ ]+$/, '')
91
+ # +2 for the extra spacing between text and pipe
92
+ node.inner_html = new_content.ljust(column_sizes[i] + 2)
93
+ end
94
+ end
95
+
96
+ def simple_replace(doc, tag, replacement)
97
+ doc.search(tag).each do |node|
98
+ node.replace(replacement)
99
+ end
52
100
  end
53
101
  end
54
- end
102
+ end
metadata CHANGED
@@ -1,86 +1,118 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ghostwriter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Robin Miller
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-03-07 00:00:00.000000000 Z
11
+ date: 2021-03-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ~>
17
+ - - '='
18
18
  - !ruby/object:Gem::Version
19
- version: '1.6'
19
+ version: 1.8.4
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - '='
25
25
  - !ruby/object:Gem::Version
26
- version: '1.6'
26
+ version: 1.8.4
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: bundler
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - ~>
31
+ - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: '1.10'
33
+ version: '2.2'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - ~>
38
+ - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: '1.10'
40
+ version: '2.2'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: rake
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ~>
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '10.0'
47
+ version: '13.0'
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ~>
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '10.0'
54
+ version: '13.0'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: rspec
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - ~>
59
+ - - "~>"
60
60
  - !ruby/object:Gem::Version
61
61
  version: '3.3'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - ~>
66
+ - - "~>"
67
67
  - !ruby/object:Gem::Version
68
68
  version: '3.3'
69
- description: ! 'Transforms HTML into plaintext while preserving legibility and functionality. '
69
+ - !ruby/object:Gem::Dependency
70
+ name: rubocop
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.11'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.11'
83
+ - !ruby/object:Gem::Dependency
84
+ name: rubocop-performance
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.10'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.10'
97
+ description: 'Transforms HTML into plaintext while preserving legibility and functionality.
98
+
99
+ '
70
100
  email:
71
101
  - robin@tenjin.ca
72
102
  executables: []
73
103
  extensions: []
74
104
  extra_rdoc_files: []
75
105
  files:
76
- - .gitignore
77
- - .rspec
78
- - .ruby-version
79
- - .travis.yml
106
+ - ".gitignore"
107
+ - ".rspec"
108
+ - ".rubocop.yml"
109
+ - ".ruby-version"
110
+ - ".travis.yml"
80
111
  - CODE_OF_CONDUCT.md
81
112
  - Gemfile
82
113
  - LICENSE.txt
83
114
  - README.md
115
+ - RELEASE_NOTES.md
84
116
  - Rakefile
85
117
  - bin/console
86
118
  - bin/setup
@@ -98,17 +130,16 @@ require_paths:
98
130
  - lib
99
131
  required_ruby_version: !ruby/object:Gem::Requirement
100
132
  requirements:
101
- - - ! '>='
133
+ - - "~>"
102
134
  - !ruby/object:Gem::Version
103
- version: '0'
135
+ version: '2.4'
104
136
  required_rubygems_version: !ruby/object:Gem::Requirement
105
137
  requirements:
106
- - - ! '>='
138
+ - - ">="
107
139
  - !ruby/object:Gem::Version
108
140
  version: '0'
109
141
  requirements: []
110
- rubyforge_project:
111
- rubygems_version: 2.4.3
142
+ rubygems_version: 3.1.2
112
143
  signing_key:
113
144
  specification_version: 4
114
145
  summary: Intelligently extracts plaintext from an HTML document.