ghostwriter 0.3.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,15 +1,7 @@
1
1
  ---
2
- !binary "U0hBMQ==":
3
- metadata.gz: !binary |-
4
- MjAxZjQzNDU2NTEyMDgwODc3NTAyYjQyYTRhNGZjOTEwMzg4MTE0Yw==
5
- data.tar.gz: !binary |-
6
- NjBjZWU4OWExMmYyOGU1NDMzNTI4MDhkNzEzMGU3YWFhNjdiM2M0MA==
2
+ SHA256:
3
+ metadata.gz: afedaad685b3c06c7baf6e7b1a7b00d8397f9ad1446ddfdcf835dfc99df19969
4
+ data.tar.gz: 786a612861e5d11e19672057371720c57dd7dc67e1f9b7333bf9269b819b0548
7
5
  SHA512:
8
- metadata.gz: !binary |-
9
- ZDcxMDQ5N2IzMTU0YzlhMzZmMDBkODk2NDVhZjEwZTkyOTY0ZDFmMzZiMmYw
10
- NDc3ZmY5NjgyNzkxOTUzMDg4YWU2NmNkZDgyMDdkNjc4OTMzMmIzYWY1MWY2
11
- OWRhMzgxNjc5ODM4Yjc4NGE5ZTBmODBmNGUzOGU3NDY2YTA5NDk=
12
- data.tar.gz: !binary |-
13
- NTIzYTdjNmRmZTRkMTc4NGIxZWJiYjBhYWVkMDI4ZDM1NTc4NDQ4ZTdkZDZk
14
- YTVlN2Y1OGMzYTg3MjlkYmNhMjUyMGQwNTZlMmIzNjYxYmQyMDIxMjJiOTVi
15
- N2YzMzczODBkZWMxZmY2MmJkYzJkYjQxZjJlZjBjZTM2OWJkMjQ=
6
+ metadata.gz: 610473e1214cd68edd7b1ce952fd43c44c22a29bc0895bebcc57b4e5e00385bccaec2008e23252eaf438a8b9a1ebd56af0a4c3956be53791ef1b5ee8b75dc1a3
7
+ data.tar.gz: 3f3bee72a1515077e7ccff0b1baae94607e33c84c83a84f8b51fdf3234aeaecaa48ddf0d6cfc2a38584ce6c0d0d7f52db5b4021638eb611089087ee638ba2d16
data/.rubocop.yml ADDED
@@ -0,0 +1 @@
1
+ inherit_from: ../.rubocop.yml
data/.ruby-version CHANGED
@@ -1 +1 @@
1
- ruby-1.9.3
1
+ ruby-2.7.1
data/Gemfile CHANGED
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  source 'https://rubygems.org'
2
4
 
3
5
  # Specify gem dependencies in ghostwriter.gemspec
data/README.md CHANGED
@@ -1,6 +1,14 @@
1
1
  # Ghostwriter
2
2
 
3
- Ghostwriter rewrites your emails to conform to varying email client requirements.
3
+ Ghostwriter rewrites HTML as plain text while preserving as much legibility and functionality as possible.
4
+
5
+ It's sort of like a reverse-markdown.
6
+
7
+ ## But Why, Though?
8
+
9
+ * Spam filters tend to like emails with a plain text alternative
10
+ * Some email clients won't or can’t handle HTML at all
11
+ * Some people explicitly choose plaintext just by preference or accessibility
4
12
 
5
13
  ## Installation
6
14
 
@@ -12,25 +20,14 @@ gem 'ghostwriter'
12
20
 
13
21
  And then execute:
14
22
 
15
- $ bundle
23
+ bundle
16
24
 
17
25
  Or install it manually with:
18
26
 
19
- $ gem install ghostwriter
27
+ gem install ghostwriter
20
28
 
21
29
  ## Usage
22
30
 
23
- ###Stripping HTML
24
-
25
- Transform HTML into plaintext while preserving as much legibility and functionality as possible.
26
- It's prime use is in quickly producing an automatic plaintext version of HTML emails.
27
-
28
- Why offer plaintext?
29
-
30
- * Spam filters prefer included plain text alternative
31
- * Some email clients and apps can’t handle HTML
32
- * Some people explicitly choose plaintext, either by requirement or simple preference
33
-
34
31
  Create a `Ghostwriter::Writer` with the html you want modified, and call `#textify`:
35
32
 
36
33
  ```ruby
@@ -39,10 +36,10 @@ html = '<html><body>This is some markup <a href="tenjin.ca">and a link</a><p>Oth
39
36
  Ghostwriter::Writer.new(html).textify
40
37
 
41
38
  => "This is some markup and a link (tenjin.ca)\nOther tags translate, too\n\n"
42
-
39
+
43
40
  ```
44
41
 
45
- `#textify` will use a `<base>` tag if included in the HTML source, or if one is provided explicitly:
42
+ `#textify` will use the `<base>` tag if found in the HTML source, or if one is provided explicitly:
46
43
 
47
44
  ```ruby
48
45
  html = '<html><body>Relative links <a href="/contact">Link</a></body></html>'
@@ -53,9 +50,10 @@ Ghostwriter::Writer.new(html).textify(link_base: 'tenjin.ca')
53
50
 
54
51
  ```
55
52
 
56
- #### Mail Gem Example
53
+ ### Mail Gem Example
57
54
 
58
- To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html through Ghostwriter:
55
+ To use `#textify` with the [mail](https://github.com/mikel/mail) gem, just provide the text-part by pasisng the html
56
+ through Ghostwriter:
59
57
 
60
58
  ```ruby
61
59
  require 'mail'
@@ -63,35 +61,49 @@ require 'mail'
63
61
  html = 'My email and a <a href="http://tenjin.ca">link</a>'
64
62
 
65
63
  mail = Mail.deliver do
66
- to 'bob@example.com'
67
- from 'dot@example.com'
68
- subject 'Using Ghostwriter with Mail'
64
+ to 'bob@example.com'
65
+ from 'dot@example.com'
66
+ subject 'Using Ghostwriter with Mail'
69
67
 
70
- html_part do
68
+ html_part do
71
69
  content_type 'text/html; charset=UTF-8'
72
70
  body html
73
- end
74
-
75
- text_part do
76
- body Ghostwriter::Writer.new(html).textify
77
- end
71
+ end
72
+
73
+ text_part do
74
+ body Ghostwriter::Writer.new(html).textify
75
+ end
78
76
  end
79
77
 
80
78
  ```
81
79
 
82
80
  ## Contributing
81
+
83
82
  Bug reports and pull requests are welcome on GitHub at https://github.com/TenjinInc/ghostwriter
84
83
 
85
- This project is intended to be a friendly space for collaboration, and contributors are expected to adhere to the
84
+ This project is intended to be a friendly space for collaboration, and contributors are expected to adhere to the
86
85
  [Contributor Covenant](contributor-covenant.org) code of conduct.
87
86
 
88
87
  ### Core Developers
89
- After checking out the repo, run `bundle install` to install dependencies. Then, run `rake spec` to run the tests.
90
- You can also run `bin/console` for an interactive prompt that will allow you to experiment.
91
88
 
92
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the
93
- version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version,
94
- push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
89
+ After checking out the repo, run `bundle install` to install dependencies. Then, run `rake spec` to run the tests. You
90
+ can also run `bin/console` for an interactive prompt that will allow you to experiment.
91
+
92
+ #### Local Install
93
+ To install this gem onto your local machine only, run
94
+
95
+ `bundle exec rake install`
96
+
97
+ #### Gem Release
98
+ To release a gem to the world at large
99
+
100
+ 1. Update the version number in `version.rb`,
101
+ 2. Run `bundle exec rake release`,
102
+ which will create a git tag for the version,
103
+ push git commits and tags,
104
+ and push the `.gem` file to [rubygems.org](https://rubygems.org).
105
+ 3. Do a wee dance
95
106
 
96
107
  ## License
108
+
97
109
  The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
data/RELEASE_NOTES.md ADDED
@@ -0,0 +1,32 @@
1
+ # Release Notes
2
+
3
+ ## 0.4.0 (2021-03-16)
4
+
5
+ ### Major
6
+
7
+ * Updated gem dependencies
8
+
9
+ ### Minor
10
+
11
+ * Updated docs
12
+ * Added support for tables
13
+
14
+ ### Bugfixes
15
+
16
+ * none
17
+
18
+ ## 0.3.0 (2016-03-06)
19
+
20
+ ### Major
21
+
22
+ * Renamed to Ghostwriter
23
+
24
+ ### Minor
25
+
26
+ * Docs: Added instruction for using textify with mail gem
27
+
28
+ ### Bugfixes
29
+
30
+ * none
31
+
32
+
data/Rakefile CHANGED
@@ -1,6 +1,8 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'bundler/gem_tasks'
2
4
  require 'rspec/core/rake_task'
3
5
 
4
6
  RSpec::Core::RakeTask.new(:spec)
5
7
 
6
- task :default => :spec
8
+ task default: :spec
data/bin/console CHANGED
@@ -1,4 +1,6 @@
1
1
  #!/usr/bin/env ruby
2
+ #
3
+ # frozen_string_literal: true
2
4
 
3
5
  require 'bundler/setup'
4
6
  require 'ghostwriter'
@@ -10,5 +12,5 @@ require 'ghostwriter'
10
12
  # require "pry"
11
13
  # Pry.start
12
14
 
13
- require "irb"
15
+ require 'irb'
14
16
  IRB.start
data/dirt-textify.gemspec CHANGED
@@ -1,27 +1,37 @@
1
- # coding: utf-8
2
- lib = File.expand_path('../lib', __FILE__)
1
+ # frozen_string_literal: true
2
+
3
+ lib = File.expand_path('lib', __dir__)
3
4
  $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
5
  require 'ghostwriter/version'
5
6
 
6
- Gem::Specification.new do |gemspec|
7
- gemspec.name = 'ghostwriter'
8
- gemspec.version = Ghostwriter::VERSION
9
- gemspec.authors = ['Robin Miller']
10
- gemspec.email = ['robin@tenjin.ca']
7
+ Gem::Specification.new do |spec|
8
+ spec.name = 'ghostwriter'
9
+ spec.version = Ghostwriter::VERSION
10
+ spec.authors = ['Robin Miller']
11
+ spec.email = ['robin@tenjin.ca']
12
+
13
+ spec.summary = 'Intelligently extracts plaintext from an HTML document.'
14
+ spec.description = <<~DESC
15
+ Transforms HTML into plaintext while preserving legibility and functionality.
16
+ DESC
17
+ spec.homepage = 'https://github.com/TenjinInc/ghostwriter'
18
+ spec.license = 'MIT'
19
+
20
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
21
+ f.match(%r{^(test|spec|features)/})
22
+ end
11
23
 
12
- gemspec.summary = %q{Intelligently extracts plaintext from an HTML document.}
13
- gemspec.description = %q{Transforms HTML into plaintext while preserving legibility and functionality. }
14
- gemspec.homepage = 'https://github.com/TenjinInc/ghostwriter'
15
- gemspec.license = 'MIT'
24
+ spec.bindir = 'exe'
25
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
26
+ spec.require_paths = ['lib']
16
27
 
17
- gemspec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
- gemspec.bindir = 'exe'
19
- gemspec.executables = gemspec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
- gemspec.require_paths = ['lib']
28
+ spec.required_ruby_version = '~> 2.4'
21
29
 
22
- gemspec.add_dependency 'nokogiri', '~> 1.6'
30
+ spec.add_dependency 'nokogiri', '= 1.8.4'
23
31
 
24
- gemspec.add_development_dependency 'bundler', '~> 1.10'
25
- gemspec.add_development_dependency 'rake', '~> 10.0'
26
- gemspec.add_development_dependency 'rspec', '~> 3.3'
32
+ spec.add_development_dependency 'bundler', '~> 2.2'
33
+ spec.add_development_dependency 'rake', '~> 13.0'
34
+ spec.add_development_dependency 'rspec', '~> 3.3'
35
+ spec.add_development_dependency 'rubocop', '~> 1.11'
36
+ spec.add_development_dependency 'rubocop-performance', '~> 1.10'
27
37
  end
data/lib/ghostwriter.rb CHANGED
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  require 'ghostwriter/version'
2
4
  require 'ghostwriter/writer'
3
5
  require 'nokogiri'
@@ -1,3 +1,5 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module Ghostwriter
2
- VERSION = '0.3.0'
4
+ VERSION = '0.4.0'
3
5
  end
@@ -1,54 +1,102 @@
1
+ # frozen_string_literal: true
2
+
1
3
  module Ghostwriter
4
+ # Main Ghostwriter converter object.
2
5
  class Writer
3
6
  def initialize(html)
4
7
  @source_html = html
5
8
  end
6
9
 
7
- # Intelligently strips HTML down to text.
10
+ # Strips HTML down to plain text.
8
11
  #
9
- # Options:
10
- # link_base: the url to prefix relative links with
11
- def textify(options={})
12
- html = @source_html.dup
13
-
14
- html.gsub!(/\n|\t/, ' ')
15
- html.squeeze!(' ')
16
-
17
- html.gsub!('</p>', "</p>\n\n")
12
+ # @param link_base the url to prefix relative links with
13
+ def textify(link_base: '')
14
+ html = normalize_whitespace(@source_html).gsub('</p>', "</p>\n\n")
18
15
 
19
16
  doc = Nokogiri::HTML(html)
20
17
 
21
18
  doc.search('style').remove
22
19
  doc.search('script').remove
23
20
 
24
- base = doc.search('base').first #<base> is unique by W3C spec
21
+ replace_anchors(doc, link_base)
22
+ replace_headers(doc)
23
+ replace_table(doc)
24
+
25
+ simple_replace(doc, 'hr', "\n----------\n")
26
+ simple_replace(doc, 'br', "\n")
25
27
 
26
- base_url = base ? base['href'] : options[:link_base] || ''
28
+ # doc.search('p').each do |link_node|
29
+ # link_node.inner_html = link_node.inner_html + "\n\n"
30
+ # end
31
+
32
+ # trim, but only single-space character
33
+ doc.text.gsub(/^ +| +$/, '')
34
+ end
35
+
36
+ private
37
+
38
+ def normalize_whitespace(html)
39
+ html.gsub(/\s/, ' ').squeeze(' ')
40
+ end
41
+
42
+ def replace_anchors(doc, link_base)
43
+ # <base> node is unique by W3C spec
44
+ base = doc.search('base').first
45
+ base_url = base ? base['href'] : link_base
27
46
 
28
47
  doc.search('a').each do |link_node|
29
48
  href = URI(link_node['href'])
30
49
  href = base_url + href.to_s unless href.absolute?
31
50
 
32
- link_node.inner_html = "#{link_node.inner_html} (#{href})"
51
+ link_node.inner_html = "#{ link_node.inner_html } (#{ href })"
33
52
  end
53
+ end
34
54
 
55
+ def replace_headers(doc)
35
56
  doc.search('header, h1, h2, h3, h4, h5, h6').each do |node|
36
- node.inner_html = "- #{node.inner_html} -\n".squeeze(' ')
57
+ node.inner_html = "- #{ node.inner_html } -\n".squeeze(' ')
37
58
  end
59
+ end
38
60
 
39
- doc.search('hr').each do |node|
40
- node.replace "\n----------\n"
41
- end
61
+ def replace_table(doc)
62
+ doc.css('table').each do |table|
63
+ column_sizes = table.search('tr').collect do |row|
64
+ row.search('th', 'td').collect do |node|
65
+ node.inner_html.length
66
+ end
67
+ end
68
+
69
+ column_sizes = column_sizes.transpose.collect(&:max)
70
+
71
+ table.search('./thead/tr', './tbody/tr', './tr').each do |row|
72
+ replace_table_nodes(row, column_sizes)
73
+
74
+ row.inner_html = "#{ row.inner_html }|\n"
75
+ end
42
76
 
43
- doc.search('br').each do |node|
44
- node.replace "\n"
77
+ table.search('./thead').each do |row|
78
+ header_bottom = "|#{ column_sizes.collect { |len| ('-' * (len + 2)) }.join('|') }|"
79
+
80
+ row.inner_html = "#{ row.inner_html }#{ header_bottom }\n"
81
+ end
82
+
83
+ table.inner_html = "#{ table.inner_html }\n"
45
84
  end
85
+ end
46
86
 
47
- # doc.search('p').each do |link_node|
48
- # link_node.inner_html = link_node.inner_html + "\n\n"
49
- # end
87
+ def replace_table_nodes(row, column_sizes)
88
+ row.search('th', 'td').each_with_index do |node, i|
89
+ new_content = "| #{ node.inner_html }".squeeze(' ')
50
90
 
51
- doc.text.gsub(/^[ ]+|[ ]+$/, '')
91
+ # +2 for the extra spacing between text and pipe
92
+ node.inner_html = new_content.ljust(column_sizes[i] + 2)
93
+ end
94
+ end
95
+
96
+ def simple_replace(doc, tag, replacement)
97
+ doc.search(tag).each do |node|
98
+ node.replace(replacement)
99
+ end
52
100
  end
53
101
  end
54
- end
102
+ end
metadata CHANGED
@@ -1,86 +1,118 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ghostwriter
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Robin Miller
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-03-07 00:00:00.000000000 Z
11
+ date: 2021-03-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ~>
17
+ - - '='
18
18
  - !ruby/object:Gem::Version
19
- version: '1.6'
19
+ version: 1.8.4
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - '='
25
25
  - !ruby/object:Gem::Version
26
- version: '1.6'
26
+ version: 1.8.4
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: bundler
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - ~>
31
+ - - "~>"
32
32
  - !ruby/object:Gem::Version
33
- version: '1.10'
33
+ version: '2.2'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - ~>
38
+ - - "~>"
39
39
  - !ruby/object:Gem::Version
40
- version: '1.10'
40
+ version: '2.2'
41
41
  - !ruby/object:Gem::Dependency
42
42
  name: rake
43
43
  requirement: !ruby/object:Gem::Requirement
44
44
  requirements:
45
- - - ~>
45
+ - - "~>"
46
46
  - !ruby/object:Gem::Version
47
- version: '10.0'
47
+ version: '13.0'
48
48
  type: :development
49
49
  prerelease: false
50
50
  version_requirements: !ruby/object:Gem::Requirement
51
51
  requirements:
52
- - - ~>
52
+ - - "~>"
53
53
  - !ruby/object:Gem::Version
54
- version: '10.0'
54
+ version: '13.0'
55
55
  - !ruby/object:Gem::Dependency
56
56
  name: rspec
57
57
  requirement: !ruby/object:Gem::Requirement
58
58
  requirements:
59
- - - ~>
59
+ - - "~>"
60
60
  - !ruby/object:Gem::Version
61
61
  version: '3.3'
62
62
  type: :development
63
63
  prerelease: false
64
64
  version_requirements: !ruby/object:Gem::Requirement
65
65
  requirements:
66
- - - ~>
66
+ - - "~>"
67
67
  - !ruby/object:Gem::Version
68
68
  version: '3.3'
69
- description: ! 'Transforms HTML into plaintext while preserving legibility and functionality. '
69
+ - !ruby/object:Gem::Dependency
70
+ name: rubocop
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.11'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.11'
83
+ - !ruby/object:Gem::Dependency
84
+ name: rubocop-performance
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '1.10'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '1.10'
97
+ description: 'Transforms HTML into plaintext while preserving legibility and functionality.
98
+
99
+ '
70
100
  email:
71
101
  - robin@tenjin.ca
72
102
  executables: []
73
103
  extensions: []
74
104
  extra_rdoc_files: []
75
105
  files:
76
- - .gitignore
77
- - .rspec
78
- - .ruby-version
79
- - .travis.yml
106
+ - ".gitignore"
107
+ - ".rspec"
108
+ - ".rubocop.yml"
109
+ - ".ruby-version"
110
+ - ".travis.yml"
80
111
  - CODE_OF_CONDUCT.md
81
112
  - Gemfile
82
113
  - LICENSE.txt
83
114
  - README.md
115
+ - RELEASE_NOTES.md
84
116
  - Rakefile
85
117
  - bin/console
86
118
  - bin/setup
@@ -98,17 +130,16 @@ require_paths:
98
130
  - lib
99
131
  required_ruby_version: !ruby/object:Gem::Requirement
100
132
  requirements:
101
- - - ! '>='
133
+ - - "~>"
102
134
  - !ruby/object:Gem::Version
103
- version: '0'
135
+ version: '2.4'
104
136
  required_rubygems_version: !ruby/object:Gem::Requirement
105
137
  requirements:
106
- - - ! '>='
138
+ - - ">="
107
139
  - !ruby/object:Gem::Version
108
140
  version: '0'
109
141
  requirements: []
110
- rubyforge_project:
111
- rubygems_version: 2.4.3
142
+ rubygems_version: 3.1.2
112
143
  signing_key:
113
144
  specification_version: 4
114
145
  summary: Intelligently extracts plaintext from an HTML document.