html_massage 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 9a3fc809a01adfac1d58cd81df25ac7aaea1ebc7
4
+ data.tar.gz: 00662d083333766c7b79f16e70b8ce5f565bebc3
5
+ SHA512:
6
+ metadata.gz: 06ff063b79bc5f7a12058ff179d2d65c30f089d2225e8452c616393d9e27136a6f7b4924dfb4890e9d9ca4f75b1537c53b4d9d10fab1434f8e8eb828dcd5e4ee
7
+ data.tar.gz: de7cbde0140cfb12581f3bf97b3534742ce248a16d7efa92b84c2475625554efc0ef6cf894a78108b75c48d91a0397b5f1a15f2ee2490da0eafbc5d0abb6c1ac
data/.travis.yml ADDED
@@ -0,0 +1,6 @@
1
+ rvm:
2
+ - 1.9.2
3
+ - 1.9.3
4
+ - 2.0.0
5
+
6
+ script: "bundle exec rspec --color --format nested spec"
data/README.md CHANGED
@@ -1,81 +1,125 @@
1
- # html_massage
1
+ # HTML Massage [![Build Status](https://secure.travis-ci.org/harlantwood/html_massage.png)](https://travis-ci.org/harlantwood/html_massage) [![Gem Version](https://badge.fury.io/rb/html_massage.png)](http://badge.fury.io/rb/html_massage)
2
2
 
3
- Give your HTML a massage, in just the ways it loves:
3
+ ## Supported Ruby versions
4
+
5
+ - 1.9.2
6
+ - 1.9.3
7
+ - 2.0.0
8
+
9
+ Note that ruby 1.8.x is _not_ supported.
10
+
11
+ ## Summary
4
12
 
5
13
  * Remove headers and footers and navigation, and strip to only the "content" part of the HTML
6
14
  * Sanitize tags, removing javascript and styling
7
- * Convert your HTML to nicely-formatted plain text
15
+ * Convert HTML to markdown, plain text, or sanitized HTML
16
+
17
+ ## Massaging from the command line
18
+
19
+ html_massage html https://en.wikipedia.org/wiki/Technological_singularity > singularity.html
20
+ html_massage text https://en.wikipedia.org/wiki/Technological_singularity > singularity.txt
21
+ html_massage markdown https://en.wikipedia.org/wiki/Technological_singularity > singularity.md
22
+
23
+ These files will look something like:
24
+
25
+ ==> singularity.html <==
26
+ <h1 id="firstHeading" class="firstHeading"><span dir="auto">Technological singularity</span></h1>
8
27
 
9
- ## Sample Usage
28
+ <p>The <b>technological singularity</b> is the theoretical emergence of greater-than-human <a href="/wiki/Superintelligence" title="Superintelligence">superintelligence</a> through technological means.<sup id="cite_ref-1" class="reference"><a href="#cite_note-1"><span>[</span>1<span>]</span></a></sup> Since the capabilities of such intelligence would be difficult for an unaided human mind to comprehend, the occurrence of a technological singularity is seen as an intellectual <a href="/wiki/Event_horizon" title="Event horizon">event horizon</a>, beyond which events cannot be predicted or understood.</p>
29
+ ...
30
+
31
+ ==> singularity.md <==
32
+ # Technological singularity
33
+
34
+ The **technological singularity** is the theoretical emergence of greater-than-human [superintelligence](https://en.wikipedia.org/wiki/Superintelligence "Superintelligence") through technological means. [1] Since the capabilities of such intelligence would be difficult for an unaided human mind to comprehend, the occurrence of a technological singularity is seen as an intellectual [event horizon](https://en.wikipedia.org/wiki/Event_horizon "Event horizon") , beyond which events cannot be predicted or understood.
35
+ ...
36
+
37
+ ==> singularity.txt <==
38
+ Technological singularity
39
+
40
+ The technological singularity is the theoretical emergence of greater-than-human superintelligence through technological means.[1] Since the capabilities of such intelligence would be difficult for an unaided human mind to comprehend, the occurrence of a technological singularity is seen as an intellectual event horizon, beyond which events cannot be predicted or understood.
41
+ ...
42
+
43
+ ## Massaging from Ruby
10
44
 
11
45
  ### Full Massage
12
46
 
13
- require 'html_massage'
47
+ * Use default whitelist of tags and attributes to sanitize HTML
48
+ * Use default selectors (both include and exclude lists) to attempt to capture only the "content" part of the HTML page
49
+
50
+ ```ruby
51
+ require 'html_massage'
14
52
 
15
- html = %{
16
- <html>
17
- <head>
18
- <script type="text/javascript">document.write('I am a bad script');</script>
19
- </head>
20
- <body>
21
- <div id="header">My Site</div>
22
- <div>This is some <i>great</i> content!</div>
23
- </body>
24
- </html>
25
- }
53
+ html = %{
54
+ <html>
55
+ <head>
56
+ <script type="text/javascript">document.write('I am a bad script');</script>
57
+ </head>
58
+ <body>
59
+ <div id="header">My Site</div>
60
+ <div>This is some <i>great</i> content!</div>
61
+ </body>
62
+ </html>
63
+ }
26
64
 
27
- HtmlMassage.html( html )
28
- # => "<div>This is some <i>great</i> content!</div>"
65
+ HtmlMassage.html( html )
66
+ # => "<div>This is some <i>great</i> content!</div>"
29
67
 
30
- HtmlMassage.markdown( html )
31
- # => "This is some _great_ content!"
68
+ HtmlMassage.markdown( html )
69
+ # => "This is some _great_ content!"
32
70
 
33
- HtmlMassage.text( html )
34
- # => "This is some great content!"
71
+ HtmlMassage.text( html )
72
+ # => "This is some great content!"
73
+ ```
35
74
 
36
75
  ### Custom includes and excludes
37
76
 
38
- html = %{
39
- <html>
40
- <body>
41
- <div class="custom_navigation">some links to other pages...</div>
42
- <div>This is some <i>great</i> content!</div>
43
- </body>
44
- </html>
45
- }
46
-
47
- html_massage = HtmlMassage.new( html )
48
- html_massage.exclude!( [ '.custom_navigation' ] )
49
- html_massage.include!( [ 'body' ] )
50
- html_massage.to_html
51
- # => <div>This is some <i>great</i> content!</div>
77
+ ```ruby
78
+ html = %{
79
+ <html>
80
+ <body>
81
+ <div class="custom_navigation">some links to other pages...</div>
82
+ <div>This is some <i>great</i> content!</div>
83
+ </body>
84
+ </html>
85
+ }
86
+
87
+ html_massage = HtmlMassage.new( html )
88
+ html_massage.exclude!( [ '.custom_navigation' ] )
89
+ html_massage.include!( [ 'body' ] )
90
+ html_massage.to_html
91
+ # => <div>This is some <i>great</i> content!</div>
92
+ ```
52
93
 
53
94
  ### Sanitize HTML
54
95
 
55
- html = %{
56
- <html>
57
- <head>
58
- <script type="text/javascript">document.write('I am a bad script');</script>
59
- </head>
60
- <body>
61
- <div>This is some <i>great</i> content!</div>
62
- </body>
63
- </html>
64
- }
65
-
66
- html_massage = HtmlMassage.new( html )
67
- html_massage.sanitize!( :elements => ['div'] )
68
- html_massage.to_html
69
- # => <div>This is some <i>great</i> content!</div>
96
+ ```ruby
97
+ html = %{
98
+ <html>
99
+ <head>
100
+ <script type="text/javascript">document.write('I am a bad script');</script>
101
+ </head>
102
+ <body>
103
+ <div>This is some <i>great</i> content!</div>
104
+ </body>
105
+ </html>
106
+ }
107
+
108
+ html_massage = HtmlMassage.new( html )
109
+ html_massage.sanitize!( :elements => ['div'] )
110
+ html_massage.to_html
111
+ # => <div>This is some <i>great</i> content!</div>
112
+ ```
70
113
 
71
114
  ### Make Links Absolute
72
115
 
73
- html = %{
74
- <a href ="/foo/bar.html">Click this link</a>
75
- }
76
-
77
- html_massage = HtmlMassage.new( html )
78
- html_massage.absolutify_links!( 'http://example.com/joe/page1.html' )
79
- html_massage.to_html
80
- # <a href ="http://example.com/foo/bar.html">Click this link</a>
116
+ ```ruby
117
+ html = %{
118
+ <a href ="/foo/bar.html">Click this link</a>
119
+ }
81
120
 
121
+ html_massage = HtmlMassage.new( html )
122
+ html_massage.absolutify_links!( 'http://example.com/joe/page1.html' )
123
+ html_massage.to_html
124
+ # => <a href ="http://example.com/foo/bar.html">Click this link</a>
125
+ ```
@@ -1,3 +1,3 @@
1
1
  module HtmlMassager
2
- VERSION = "0.2.1"
2
+ VERSION = "0.3.0"
3
3
  end
metadata CHANGED
@@ -1,113 +1,100 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: html_massage
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
5
- prerelease:
4
+ version: 0.3.0
6
5
  platform: ruby
7
6
  authors:
8
7
  - Harlan T Wood
9
8
  autorequire:
10
9
  bindir: bin
11
10
  cert_chain: []
12
- date: 2012-11-25 00:00:00.000000000 Z
11
+ date: 2013-08-29 00:00:00.000000000 Z
13
12
  dependencies:
14
13
  - !ruby/object:Gem::Dependency
15
14
  name: nokogiri
16
15
  requirement: !ruby/object:Gem::Requirement
17
- none: false
18
16
  requirements:
19
- - - ! '>='
17
+ - - '>='
20
18
  - !ruby/object:Gem::Version
21
19
  version: '1.4'
22
20
  type: :runtime
23
21
  prerelease: false
24
22
  version_requirements: !ruby/object:Gem::Requirement
25
- none: false
26
23
  requirements:
27
- - - ! '>='
24
+ - - '>='
28
25
  - !ruby/object:Gem::Version
29
26
  version: '1.4'
30
27
  - !ruby/object:Gem::Dependency
31
28
  name: sanitize
32
29
  requirement: !ruby/object:Gem::Requirement
33
- none: false
34
30
  requirements:
35
- - - ! '>='
31
+ - - '>='
36
32
  - !ruby/object:Gem::Version
37
33
  version: '2.0'
38
34
  type: :runtime
39
35
  prerelease: false
40
36
  version_requirements: !ruby/object:Gem::Requirement
41
- none: false
42
37
  requirements:
43
- - - ! '>='
38
+ - - '>='
44
39
  - !ruby/object:Gem::Version
45
40
  version: '2.0'
46
41
  - !ruby/object:Gem::Dependency
47
42
  name: thor
48
43
  requirement: !ruby/object:Gem::Requirement
49
- none: false
50
44
  requirements:
51
- - - ! '>='
45
+ - - '>='
52
46
  - !ruby/object:Gem::Version
53
47
  version: '0'
54
48
  type: :runtime
55
49
  prerelease: false
56
50
  version_requirements: !ruby/object:Gem::Requirement
57
- none: false
58
51
  requirements:
59
- - - ! '>='
52
+ - - '>='
60
53
  - !ruby/object:Gem::Version
61
54
  version: '0'
62
55
  - !ruby/object:Gem::Dependency
63
56
  name: rest-client
64
57
  requirement: !ruby/object:Gem::Requirement
65
- none: false
66
58
  requirements:
67
- - - ! '>='
59
+ - - '>='
68
60
  - !ruby/object:Gem::Version
69
61
  version: '1.6'
70
62
  type: :runtime
71
63
  prerelease: false
72
64
  version_requirements: !ruby/object:Gem::Requirement
73
- none: false
74
65
  requirements:
75
- - - ! '>='
66
+ - - '>='
76
67
  - !ruby/object:Gem::Version
77
68
  version: '1.6'
78
69
  - !ruby/object:Gem::Dependency
79
70
  name: reverse_markdown
80
71
  requirement: !ruby/object:Gem::Requirement
81
- none: false
82
72
  requirements:
83
- - - ! '>='
73
+ - - '>='
84
74
  - !ruby/object:Gem::Version
85
75
  version: '0.4'
86
76
  type: :runtime
87
77
  prerelease: false
88
78
  version_requirements: !ruby/object:Gem::Requirement
89
- none: false
90
79
  requirements:
91
- - - ! '>='
80
+ - - '>='
92
81
  - !ruby/object:Gem::Version
93
82
  version: '0.4'
94
83
  - !ruby/object:Gem::Dependency
95
84
  name: rspec
96
85
  requirement: !ruby/object:Gem::Requirement
97
- none: false
98
86
  requirements:
99
- - - ! '>='
87
+ - - '>='
100
88
  - !ruby/object:Gem::Version
101
89
  version: '2.5'
102
90
  type: :development
103
91
  prerelease: false
104
92
  version_requirements: !ruby/object:Gem::Requirement
105
- none: false
106
93
  requirements:
107
- - - ! '>='
94
+ - - '>='
108
95
  - !ruby/object:Gem::Version
109
96
  version: '2.5'
110
- description: ! 'Massages HTML how you want to: sanitize tags, remove headers and footers;
97
+ description: 'Massages HTML how you want to: sanitize tags, remove headers and footers;
111
98
  output to html, markdown, or plain text.'
112
99
  email:
113
100
  - code@harlantwood.net
@@ -117,6 +104,7 @@ extensions: []
117
104
  extra_rdoc_files: []
118
105
  files:
119
106
  - .gitignore
107
+ - .travis.yml
120
108
  - Gemfile
121
109
  - License-MIT
122
110
  - README.md
@@ -130,27 +118,26 @@ files:
130
118
  - spec/html_massage_spec.rb
131
119
  homepage: https://github.com/harlantwood/html_massage
132
120
  licenses: []
121
+ metadata: {}
133
122
  post_install_message:
134
123
  rdoc_options: []
135
124
  require_paths:
136
125
  - lib
137
126
  required_ruby_version: !ruby/object:Gem::Requirement
138
- none: false
139
127
  requirements:
140
- - - ! '>='
128
+ - - '>='
141
129
  - !ruby/object:Gem::Version
142
130
  version: '0'
143
131
  required_rubygems_version: !ruby/object:Gem::Requirement
144
- none: false
145
132
  requirements:
146
- - - ! '>='
133
+ - - '>='
147
134
  - !ruby/object:Gem::Version
148
135
  version: '0'
149
136
  requirements: []
150
137
  rubyforge_project:
151
- rubygems_version: 1.8.24
138
+ rubygems_version: 2.0.7
152
139
  signing_key:
153
- specification_version: 3
140
+ specification_version: 4
154
141
  summary: Massages HTML how you want to.
155
142
  test_files:
156
143
  - spec/html_massage_spec.rb