html_massage 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 9a3fc809a01adfac1d58cd81df25ac7aaea1ebc7
4
+ data.tar.gz: 00662d083333766c7b79f16e70b8ce5f565bebc3
5
+ SHA512:
6
+ metadata.gz: 06ff063b79bc5f7a12058ff179d2d65c30f089d2225e8452c616393d9e27136a6f7b4924dfb4890e9d9ca4f75b1537c53b4d9d10fab1434f8e8eb828dcd5e4ee
7
+ data.tar.gz: de7cbde0140cfb12581f3bf97b3534742ce248a16d7efa92b84c2475625554efc0ef6cf894a78108b75c48d91a0397b5f1a15f2ee2490da0eafbc5d0abb6c1ac
data/.travis.yml ADDED
@@ -0,0 +1,6 @@
1
+ rvm:
2
+ - 1.9.2
3
+ - 1.9.3
4
+ - 2.0.0
5
+
6
+ script: "bundle exec rspec --color --format nested spec"
data/README.md CHANGED
@@ -1,81 +1,125 @@
1
- # html_massage
1
+ # HTML Massage [![Build Status](https://secure.travis-ci.org/harlantwood/html_massage.png)](https://travis-ci.org/harlantwood/html_massage) [![Gem Version](https://badge.fury.io/rb/html_massage.png)](http://badge.fury.io/rb/html_massage)
2
2
 
3
- Give your HTML a massage, in just the ways it loves:
3
+ ## Supported Ruby versions
4
+
5
+ - 1.9.2
6
+ - 1.9.3
7
+ - 2.0.0
8
+
9
+ Note that ruby 1.8.x is _not_ supported.
10
+
11
+ ## Summary
4
12
 
5
13
  * Remove headers and footers and navigation, and strip to only the "content" part of the HTML
6
14
  * Sanitize tags, removing javascript and styling
7
- * Convert your HTML to nicely-formatted plain text
15
+ * Convert HTML to markdown, plain text, or sanitized HTML
16
+
17
+ ## Massaging from the command line
18
+
19
+ html_massage html https://en.wikipedia.org/wiki/Technological_singularity > singularity.html
20
+ html_massage text https://en.wikipedia.org/wiki/Technological_singularity > singularity.txt
21
+ html_massage markdown https://en.wikipedia.org/wiki/Technological_singularity > singularity.md
22
+
23
+ These files will look something like:
24
+
25
+ ==> singularity.html <==
26
+ <h1 id="firstHeading" class="firstHeading"><span dir="auto">Technological singularity</span></h1>
8
27
 
9
- ## Sample Usage
28
+ <p>The <b>technological singularity</b> is the theoretical emergence of greater-than-human <a href="/wiki/Superintelligence" title="Superintelligence">superintelligence</a> through technological means.<sup id="cite_ref-1" class="reference"><a href="#cite_note-1"><span>[</span>1<span>]</span></a></sup> Since the capabilities of such intelligence would be difficult for an unaided human mind to comprehend, the occurrence of a technological singularity is seen as an intellectual <a href="/wiki/Event_horizon" title="Event horizon">event horizon</a>, beyond which events cannot be predicted or understood.</p>
29
+ ...
30
+
31
+ ==> singularity.md <==
32
+ # Technological singularity
33
+
34
+ The **technological singularity** is the theoretical emergence of greater-than-human [superintelligence](https://en.wikipedia.org/wiki/Superintelligence "Superintelligence") through technological means. [1] Since the capabilities of such intelligence would be difficult for an unaided human mind to comprehend, the occurrence of a technological singularity is seen as an intellectual [event horizon](https://en.wikipedia.org/wiki/Event_horizon "Event horizon") , beyond which events cannot be predicted or understood.
35
+ ...
36
+
37
+ ==> singularity.txt <==
38
+ Technological singularity
39
+
40
+ The technological singularity is the theoretical emergence of greater-than-human superintelligence through technological means.[1] Since the capabilities of such intelligence would be difficult for an unaided human mind to comprehend, the occurrence of a technological singularity is seen as an intellectual event horizon, beyond which events cannot be predicted or understood.
41
+ ...
42
+
43
+ ## Massaging from Ruby
10
44
 
11
45
  ### Full Massage
12
46
 
13
- require 'html_massage'
47
+ * Use default whitelist of tags and attributes to sanitize HTML
48
+ * Use default selectors (both include and exclude lists) to attempt to capture only the "content" part of the HTML page
49
+
50
+ ```ruby
51
+ require 'html_massage'
14
52
 
15
- html = %{
16
- <html>
17
- <head>
18
- <script type="text/javascript">document.write('I am a bad script');</script>
19
- </head>
20
- <body>
21
- <div id="header">My Site</div>
22
- <div>This is some <i>great</i> content!</div>
23
- </body>
24
- </html>
25
- }
53
+ html = %{
54
+ <html>
55
+ <head>
56
+ <script type="text/javascript">document.write('I am a bad script');</script>
57
+ </head>
58
+ <body>
59
+ <div id="header">My Site</div>
60
+ <div>This is some <i>great</i> content!</div>
61
+ </body>
62
+ </html>
63
+ }
26
64
 
27
- HtmlMassage.html( html )
28
- # => "<div>This is some <i>great</i> content!</div>"
65
+ HtmlMassage.html( html )
66
+ # => "<div>This is some <i>great</i> content!</div>"
29
67
 
30
- HtmlMassage.markdown( html )
31
- # => "This is some _great_ content!"
68
+ HtmlMassage.markdown( html )
69
+ # => "This is some _great_ content!"
32
70
 
33
- HtmlMassage.text( html )
34
- # => "This is some great content!"
71
+ HtmlMassage.text( html )
72
+ # => "This is some great content!"
73
+ ```
35
74
 
36
75
  ### Custom includes and excludes
37
76
 
38
- html = %{
39
- <html>
40
- <body>
41
- <div class="custom_navigation">some links to other pages...</div>
42
- <div>This is some <i>great</i> content!</div>
43
- </body>
44
- </html>
45
- }
46
-
47
- html_massage = HtmlMassage.new( html )
48
- html_massage.exclude!( [ '.custom_navigation' ] )
49
- html_massage.include!( [ 'body' ] )
50
- html_massage.to_html
51
- # => <div>This is some <i>great</i> content!</div>
77
+ ```ruby
78
+ html = %{
79
+ <html>
80
+ <body>
81
+ <div class="custom_navigation">some links to other pages...</div>
82
+ <div>This is some <i>great</i> content!</div>
83
+ </body>
84
+ </html>
85
+ }
86
+
87
+ html_massage = HtmlMassage.new( html )
88
+ html_massage.exclude!( [ '.custom_navigation' ] )
89
+ html_massage.include!( [ 'body' ] )
90
+ html_massage.to_html
91
+ # => <div>This is some <i>great</i> content!</div>
92
+ ```
52
93
 
53
94
  ### Sanitize HTML
54
95
 
55
- html = %{
56
- <html>
57
- <head>
58
- <script type="text/javascript">document.write('I am a bad script');</script>
59
- </head>
60
- <body>
61
- <div>This is some <i>great</i> content!</div>
62
- </body>
63
- </html>
64
- }
65
-
66
- html_massage = HtmlMassage.new( html )
67
- html_massage.sanitize!( :elements => ['div'] )
68
- html_massage.to_html
69
- # => <div>This is some <i>great</i> content!</div>
96
+ ```ruby
97
+ html = %{
98
+ <html>
99
+ <head>
100
+ <script type="text/javascript">document.write('I am a bad script');</script>
101
+ </head>
102
+ <body>
103
+ <div>This is some <i>great</i> content!</div>
104
+ </body>
105
+ </html>
106
+ }
107
+
108
+ html_massage = HtmlMassage.new( html )
109
+ html_massage.sanitize!( :elements => ['div'] )
110
+ html_massage.to_html
111
+ # => <div>This is some <i>great</i> content!</div>
112
+ ```
70
113
 
71
114
  ### Make Links Absolute
72
115
 
73
- html = %{
74
- <a href ="/foo/bar.html">Click this link</a>
75
- }
76
-
77
- html_massage = HtmlMassage.new( html )
78
- html_massage.absolutify_links!( 'http://example.com/joe/page1.html' )
79
- html_massage.to_html
80
- # <a href ="http://example.com/foo/bar.html">Click this link</a>
116
+ ```ruby
117
+ html = %{
118
+ <a href ="/foo/bar.html">Click this link</a>
119
+ }
81
120
 
121
+ html_massage = HtmlMassage.new( html )
122
+ html_massage.absolutify_links!( 'http://example.com/joe/page1.html' )
123
+ html_massage.to_html
124
+ # => <a href ="http://example.com/foo/bar.html">Click this link</a>
125
+ ```
@@ -1,3 +1,3 @@
1
1
  module HtmlMassager
2
- VERSION = "0.2.1"
2
+ VERSION = "0.3.0"
3
3
  end
metadata CHANGED
@@ -1,113 +1,100 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html_massage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
5
- prerelease:
4
+ version: 0.3.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - Harlan T Wood
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2012-11-25 00:00:00.000000000 Z
11
+ date: 2013-08-29 00:00:00.000000000 Z
13
12
  dependencies:
14
13
  - !ruby/object:Gem::Dependency
15
14
  name: nokogiri
16
15
  requirement: !ruby/object:Gem::Requirement
17
- none: false
18
16
  requirements:
19
- - - ! '>='
17
+ - - '>='
20
18
  - !ruby/object:Gem::Version
21
19
  version: '1.4'
22
20
  type: :runtime
23
21
  prerelease: false
24
22
  version_requirements: !ruby/object:Gem::Requirement
25
- none: false
26
23
  requirements:
27
- - - ! '>='
24
+ - - '>='
28
25
  - !ruby/object:Gem::Version
29
26
  version: '1.4'
30
27
  - !ruby/object:Gem::Dependency
31
28
  name: sanitize
32
29
  requirement: !ruby/object:Gem::Requirement
33
- none: false
34
30
  requirements:
35
- - - ! '>='
31
+ - - '>='
36
32
  - !ruby/object:Gem::Version
37
33
  version: '2.0'
38
34
  type: :runtime
39
35
  prerelease: false
40
36
  version_requirements: !ruby/object:Gem::Requirement
41
- none: false
42
37
  requirements:
43
- - - ! '>='
38
+ - - '>='
44
39
  - !ruby/object:Gem::Version
45
40
  version: '2.0'
46
41
  - !ruby/object:Gem::Dependency
47
42
  name: thor
48
43
  requirement: !ruby/object:Gem::Requirement
49
- none: false
50
44
  requirements:
51
- - - ! '>='
45
+ - - '>='
52
46
  - !ruby/object:Gem::Version
53
47
  version: '0'
54
48
  type: :runtime
55
49
  prerelease: false
56
50
  version_requirements: !ruby/object:Gem::Requirement
57
- none: false
58
51
  requirements:
59
- - - ! '>='
52
+ - - '>='
60
53
  - !ruby/object:Gem::Version
61
54
  version: '0'
62
55
  - !ruby/object:Gem::Dependency
63
56
  name: rest-client
64
57
  requirement: !ruby/object:Gem::Requirement
65
- none: false
66
58
  requirements:
67
- - - ! '>='
59
+ - - '>='
68
60
  - !ruby/object:Gem::Version
69
61
  version: '1.6'
70
62
  type: :runtime
71
63
  prerelease: false
72
64
  version_requirements: !ruby/object:Gem::Requirement
73
- none: false
74
65
  requirements:
75
- - - ! '>='
66
+ - - '>='
76
67
  - !ruby/object:Gem::Version
77
68
  version: '1.6'
78
69
  - !ruby/object:Gem::Dependency
79
70
  name: reverse_markdown
80
71
  requirement: !ruby/object:Gem::Requirement
81
- none: false
82
72
  requirements:
83
- - - ! '>='
73
+ - - '>='
84
74
  - !ruby/object:Gem::Version
85
75
  version: '0.4'
86
76
  type: :runtime
87
77
  prerelease: false
88
78
  version_requirements: !ruby/object:Gem::Requirement
89
- none: false
90
79
  requirements:
91
- - - ! '>='
80
+ - - '>='
92
81
  - !ruby/object:Gem::Version
93
82
  version: '0.4'
94
83
  - !ruby/object:Gem::Dependency
95
84
  name: rspec
96
85
  requirement: !ruby/object:Gem::Requirement
97
- none: false
98
86
  requirements:
99
- - - ! '>='
87
+ - - '>='
100
88
  - !ruby/object:Gem::Version
101
89
  version: '2.5'
102
90
  type: :development
103
91
  prerelease: false
104
92
  version_requirements: !ruby/object:Gem::Requirement
105
- none: false
106
93
  requirements:
107
- - - ! '>='
94
+ - - '>='
108
95
  - !ruby/object:Gem::Version
109
96
  version: '2.5'
110
- description: ! 'Massages HTML how you want to: sanitize tags, remove headers and footers;
97
+ description: 'Massages HTML how you want to: sanitize tags, remove headers and footers;
111
98
  output to html, markdown, or plain text.'
112
99
  email:
113
100
  - code@harlantwood.net
@@ -117,6 +104,7 @@ extensions: []
117
104
  extra_rdoc_files: []
118
105
  files:
119
106
  - .gitignore
107
+ - .travis.yml
120
108
  - Gemfile
121
109
  - License-MIT
122
110
  - README.md
@@ -130,27 +118,26 @@ files:
130
118
  - spec/html_massage_spec.rb
131
119
  homepage: https://github.com/harlantwood/html_massage
132
120
  licenses: []
121
+ metadata: {}
133
122
  post_install_message:
134
123
  rdoc_options: []
135
124
  require_paths:
136
125
  - lib
137
126
  required_ruby_version: !ruby/object:Gem::Requirement
138
- none: false
139
127
  requirements:
140
- - - ! '>='
128
+ - - '>='
141
129
  - !ruby/object:Gem::Version
142
130
  version: '0'
143
131
  required_rubygems_version: !ruby/object:Gem::Requirement
144
- none: false
145
132
  requirements:
146
- - - ! '>='
133
+ - - '>='
147
134
  - !ruby/object:Gem::Version
148
135
  version: '0'
149
136
  requirements: []
150
137
  rubyforge_project:
151
- rubygems_version: 1.8.24
138
+ rubygems_version: 2.0.7
152
139
  signing_key:
153
- specification_version: 3
140
+ specification_version: 4
154
141
  summary: Massages HTML how you want to.
155
142
  test_files:
156
143
  - spec/html_massage_spec.rb