html-proofer 2.4.2 → 2.5.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 110942931188ba50ea9cfa54113e766fdb2e82ab
4
- data.tar.gz: 302722012ef4ae3c3357312bf48cf13214b43696
3
+ metadata.gz: 1b2c47bd5d3ea56e4d6f2a5f4ee3ba01ed43b337
4
+ data.tar.gz: 381527e029581dcb0249372f345d838a2950f136
5
5
  SHA512:
6
- metadata.gz: aca39deda0c1c82c566d9b1d4723c632343c5b68193aa0923e6f4ad9ba9042d8eb6ef12a980627c1b021432f01c7bdabbb64f4516dd472a2072750e94526c3e6
7
- data.tar.gz: 64e204388cff5f32d1995fae38a85ea981bc52f9a420802ae21b499da859ed525a32d23afca8ae4d51c1ab215987439623007279fb468606f50f24e2092ad8e4
6
+ metadata.gz: 0a72bfe1b0702177843865d6b36008a1136a1e8366c37ffce4ea37a91d8dc876e394167328a77068ae1f2b70d2fa48f7fa0bcd82d0d8b7518d219949ae8cd056
7
+ data.tar.gz: 25a3d37a3dcf3ff4f8ea439ac1344bdac5f55b4d5621017f7b8b22d4e963fb5f7c8e49d774a223a251117e7407afad464aae471839bdb04f4174edaa3f170370
data/README.md CHANGED
@@ -22,16 +22,10 @@ Or install it yourself as:
22
22
 
23
23
  **NOTE:** When installation speed matters, set `NOKOGIRI_USE_SYSTEM_LIBRARIES` to `true` in your environment. This is useful for increasing the speed of your Continuous Integration builds.
24
24
 
25
- ### Real-life examples
26
-
27
- Project | Repository
28
- :--- | :---
29
- [Raspberry Pi documentation](http://www.raspberrypi.org/documentation/) | [raspberrypi/documentation]( https://github.com/raspberrypi/documentation)
30
- [Open Whisper Systems website](https://whispersystems.org/) | [WhisperSystems/whispersystems.org](https://github.com/WhisperSystems/whispersystems.org)
31
- [Jekyll website](http://jekyllrb.com/) | [jekyll/jekyll](https://github.com/jekyll/jekyll)
32
-
33
25
  ## What's Tested?
34
26
 
27
+ You can enable or disable most of the following checks.
28
+
35
29
  ### Images
36
30
 
37
31
  `img` elements:
@@ -44,24 +38,24 @@ Project | Repository
44
38
 
45
39
  `a`, `link` elements:
46
40
 
47
- * Whether your internal links are not broken; this includes hash references (`#linkToMe`)
41
+ * Whether your internal links are working
42
+ * Whether your internal hash references (`#linkToMe`) are working
48
43
  * Whether external links are working
49
44
 
50
45
  ### Scripts
51
46
 
52
47
  `script` elements:
53
48
 
54
- * Whether your internal script references are not broken
49
+ * Whether your internal script references are working
55
50
  * Whether external scripts are loading
56
51
 
57
52
  ### Favicon
58
53
 
59
- Checks if your favicons are valid. This is an optional feature, set the `check_favicon` option to turn it on.
54
+ * Whether your favicons are valid.
60
55
 
61
56
  ### HTML
62
57
 
63
- Nokogiri looks at the markup and [provides errors](http://www.nokogiri.org/tutorials/ensuring_well_formed_markup.html) when parsing your document.
64
- This is an optional feature, set the `check_html` option to enable validation errors from Nokogiri.
58
+ * Whether your HTML markup is valid. This is done via [Nokogiri, to ensure well-formed markup](http://www.nokogiri.org/tutorials/ensuring_well_formed_markup.html).
65
59
 
66
60
  ## Usage
67
61
 
@@ -125,12 +119,34 @@ task :test do
125
119
  end
126
120
  ```
127
121
 
128
- Don't have or want a `Rakefile`? You _could_ also do something like the following:
122
+ Don't have or want a `Rakefile`? You can also do something like the following:
129
123
 
130
124
  ```bash
131
125
  htmlproof ./_site
132
126
  ```
133
127
 
128
+ ### Array of links
129
+
130
+ Instead of a directory as the first argument, you can also pass in an array of links:
131
+
132
+ ``` ruby
133
+ HTML::Proofer.new(["http://github.com", "http://jekyllrb.com"])
134
+ ```
135
+
136
+ This configures Proofer to just test those links to ensure they are valid. Note that for the command-line, you'll need to pass a special `--as-links` argument:
137
+
138
+ ``` bash
139
+ htmlproof www.google.com,www.github.com --as-links
140
+ ```
141
+
142
+ ## Ignoring content
143
+
144
+ Add the `data-proofer-ignore` attribute to any tag to ignore it from every check.
145
+
146
+ ``` html
147
+ <a href="http://notareallink" data-proofer-ignore>Not checked.</a>
148
+ ```
149
+
134
150
  ## Configuration
135
151
 
136
152
  The `HTML::Proofer` constructor takes an optional hash of additional options:
@@ -148,12 +164,13 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
148
164
  | `error_sort` | Defines the sort order for error output. Can be `:path`, `:desc`, or `:status`. | `:path`
149
165
  | `ext` | The extension of your HTML files including the dot. | `.html`
150
166
  | `file_ignore` | An array of Strings or RegExps containing file paths that are safe to ignore. | `[]` |
151
- | `href_ignore` | An array of Strings or RegExps containing `href`s that are safe to ignore. Note that non-HTTP(S) URIs are always ignored. | `[]` |
152
- | `href_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms links that match `RegExp` into `String` via `gsub`. | `{}` |
167
+ | `href_ignore` | An array of Strings or RegExps containing `href`s that are safe to ignore. Note that non-HTTP(S) URIs are always ignored. **Will be renamed in a future release.** | `[]` |
168
+ | `href_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms links that match `RegExp` into `String` via `gsub`. **Will be renamed in a future release.** | `{}` |
153
169
  | `ignore_script_embeds` | When `check_html` is enabled, `script` tags containing markup [are reported as errors](http://git.io/vOovv). Enabling this option ignores those errors. | `false`
154
170
  | `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
155
171
  | `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
156
- | `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. | `false` |
172
+ | `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**| `false` |
173
+ | `verbosity` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`
157
174
 
158
175
  ### Configuring Typhoeus and Hydra
159
176
 
@@ -163,7 +180,7 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
163
180
  HTML::Proofer.new("out/", {:ext => ".htm", :typhoeus => { :verbose => true, :ssl_verifyhost => 2 } })
164
181
  ```
165
182
 
166
- This sets `HTML::Proofer`'s extensions to use _.htm_, and gives Typhoeus a configuration for it to be verbose, and use specific SSL settings. Check [the Typhoeus documentation](https://github.com/typhoeus/typhoeus#other-curl-options) for more information on what options it can receive.
183
+ This sets `HTML::Proofer`'s extensions to use _.htm_, gives Typhoeus a configuration for it to be verbose, and use specific SSL settings. Check [the Typhoeus documentation](https://github.com/typhoeus/typhoeus#other-curl-options) for more information on what options it can receive.
167
184
 
168
185
  You can similarly pass in a `:hydra` option with a hash configuration for Hydra.
169
186
 
@@ -171,40 +188,26 @@ The default value is `typhoeus => { :followlocation => true }`.
171
188
 
172
189
  ### Configuring Parallel
173
190
 
174
- [Parallel](https://github.com/grosser/parallel) is being used to speed internal file checks. You can pass in any of its options with the options "namespace" `:parallel`. For example:
191
+ [Parallel](https://github.com/grosser/parallel) is used to speed internal file checks. You can pass in any of its options with the options namespace `:parallel`. For example:
175
192
 
176
193
  ``` ruby
177
194
  HTML::Proofer.new("out/", {:ext => ".htm", :parallel => { :in_processes => 3} })
178
195
  ```
179
196
 
180
- `:in_processes => 3` will be passed into Parallel as a configuration option.
181
-
182
- ### Array of links
183
-
184
- Instead of a directory as the first argument, you can also pass in an array of links:
185
-
186
- ``` ruby
187
- HTML::Proofer.new(["http://github.com", "http://jekyllrb.com"])
188
- ```
189
-
190
- This configures Proofer to just test those links to ensure they are valid. Note that for the command-line, you'll need to pass a special `--as-links` argument:
197
+ In this example, `:in_processes => 3` is passed into Parallel as a configuration option.
191
198
 
192
- ``` bash
193
- bin/htmlproof www.google.com,www.github.com --as-links
194
- ```
199
+ ## Logging
195
200
 
196
- ## Ignoring content
201
+ HTML-Proofer can be as noisy or as quiet as you'd like. There are two ways to log information:
197
202
 
198
- Add the `data-proofer-ignore` attribute to any tag to ignore it from the checks.
203
+ * If you set the `:verbose` option to `true`, HTML-Proofer will provide some debug information.
204
+ * If you set the `:verbosity` option, you can better define the level of logging. See the configuration table above for more information.
199
205
 
200
-
201
- ``` html
202
- <a href="http://notareallink" data-proofer-ignore>Not checked.</a>
203
- ```
206
+ `:verbosity` is newer and offers better configuration. `:verbose` will be deprecated in a future 3.x.x release.
204
207
 
205
208
  ## Custom tests
206
209
 
207
- Want to write your own test? Sure! Just create two classes--one that inherits from `HTML::Proofer::CheckRunner`, and another that inherits from `HTML::Proofer::Checkable`.
210
+ Want to write your own test? Sure! Just create two classes--one that inherits from `HTML::Proofer::Checkable`, and another that inherits from `HTML::Proofer::CheckRunner`.
208
211
 
209
212
  The `CheckRunner` subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message)` to explain the error.
210
213
 
@@ -214,7 +217,6 @@ Here's an example custom test that protects against `mailto` links that point to
214
217
 
215
218
  ``` ruby
216
219
  class OctocatLinkCheck < ::HTML::Proofer::Checkable
217
-
218
220
  def mailto?
219
221
  return false if @data_ignore_proofer || @href.nil? || @href.empty?
220
222
  return @href.match /^mailto\:/
@@ -227,13 +229,13 @@ class OctocatLinkCheck < ::HTML::Proofer::Checkable
227
229
  end
228
230
 
229
231
  class MailToOctocat < ::HTML::Proofer::CheckRunner
230
-
231
232
  def run
232
- @html.css('a').each do |l|
233
- link = OctocatLinkCheck.new l, self
233
+ @html.css('a').each do |node|
234
+ link = OctocatLinkCheck.new(node, self)
235
+ line = node.line
234
236
 
235
237
  if link.mailto? && link.octocat?
236
- return add_issue("Don't email the Octocat directly!")
238
+ return add_issue("Don't email the Octocat directly!", line)
237
239
  end
238
240
  end
239
241
  end
@@ -244,7 +246,7 @@ end
244
246
 
245
247
  ### Certificates
246
248
 
247
- To ignore certificates turn off the Typhoeus SSL verification:
249
+ To ignore certificates, turn off Typhoeus' SSL verification:
248
250
 
249
251
  ``` ruby
250
252
  HTML::Proofer.new("out/", {
@@ -264,3 +266,11 @@ HTML::Proofer.new("out/", {
264
266
  :headers => { "User-Agent" => "Mozilla/5.0 (compatible; My New User-Agent)" }
265
267
  }}).run
266
268
  ```
269
+
270
+ ## Real-life examples
271
+
272
+ Project | Repository
273
+ :--- | :---
274
+ [Raspberry Pi documentation](http://www.raspberrypi.org/documentation/) | [raspberrypi/documentation]( https://github.com/raspberrypi/documentation)
275
+ [Open Whisper Systems website](https://whispersystems.org/) | [WhisperSystems/whispersystems.org](https://github.com/WhisperSystems/whispersystems.org)
276
+ [Jekyll website](http://jekyllrb.com/) | [jekyll/jekyll](https://github.com/jekyll/jekyll)
data/bin/htmlproof CHANGED
@@ -8,7 +8,7 @@ require 'mercenary'
8
8
  require 'rubygems'
9
9
 
10
10
  def to_regex?(item)
11
- if item.start_with? '/' and item.end_with? '/'
11
+ if item.start_with?('/') && item.end_with?('/')
12
12
  Regexp.new item[1...-1]
13
13
  else
14
14
  item
@@ -40,6 +40,7 @@ Mercenary.program(:htmlproof) do |p|
40
40
  p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4x status code range.'
41
41
  p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'Comma-separated list of Strings or RegExps containing URLs that are safe to ignore.'
42
42
  p.option 'verbose', '--verbose', 'Enables more verbose logging.'
43
+ p.option 'verbosity', '--verbosity', String, 'Sets the logging level, as determined by Yell'
43
44
 
44
45
  p.action do |args, opts|
45
46
  args = ['.'] if args.empty?
@@ -47,7 +48,7 @@ Mercenary.program(:htmlproof) do |p|
47
48
 
48
49
  options = {}
49
50
 
50
- # prepare every to go to proofer
51
+ # prepare everything to go to proofer
51
52
  p.options.select { |o| !opts[o.config_key].nil? }.each do |option|
52
53
  if option.return_type.to_s == 'Array' # TODO: is_a? doesn't work here?
53
54
  opts[option.config_key] = opts[option.config_key].map { |i| to_regex?(i) }
@@ -65,6 +66,7 @@ Mercenary.program(:htmlproof) do |p|
65
66
  end
66
67
 
67
68
  options[:error_sort] = opts['error-sort'].to_sym unless opts['error-sort'].nil?
69
+ options[:verbosity] = opts['verbosity'].to_sym unless opts['verbosity'].nil?
68
70
 
69
71
  path = path.delete(' ').split(',') if opts['as_links']
70
72
 
data/lib/html/proofer.rb CHANGED
@@ -20,20 +20,27 @@ rescue LoadError; end
20
20
  module HTML
21
21
 
22
22
  class Proofer
23
- include Utils
23
+ include HTML::Proofer::Utils
24
24
 
25
- attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts
25
+ attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts, :external_urls
26
26
 
27
27
  TYPHOEUS_DEFAULTS = {
28
28
  :followlocation => true,
29
29
  :headers => {
30
- "User-Agent" => "Mozilla/5.0 (compatible; HTML Proofer/#{VERSION}; +https://github.com/gjtorikian/html-proofer)"
30
+ 'User-Agent' => "Mozilla/5.0 (compatible; HTML Proofer/#{VERSION}; +https://github.com/gjtorikian/html-proofer)"
31
31
  }
32
32
  }
33
33
 
34
34
  def initialize(src, opts = {})
35
35
  @src = src
36
36
 
37
+ if opts[:verbose]
38
+ warn '`@options[:verbose]` will be removed in a future 3.x.x release: http://git.io/vGHHh'
39
+ end
40
+ if opts[:href_ignore]
41
+ warn '`@options[:href_ignore]` will be renamed in a future 3.x.x release: http://git.io/vGHHy'
42
+ end
43
+
37
44
  @proofer_opts = {
38
45
  :ext => '.html',
39
46
  :check_favicon => false,
@@ -72,7 +79,7 @@ module HTML
72
79
  end
73
80
 
74
81
  def logger
75
- @logger ||= HTML::Proofer::Log.new(@options[:verbose])
82
+ @logger ||= HTML::Proofer::Log.new(@options[:verbose], @options[:verbosity])
76
83
  end
77
84
 
78
85
  def run
@@ -95,27 +102,27 @@ module HTML
95
102
 
96
103
  def check_list_of_links
97
104
  if @options[:href_swap]
98
- @src = @src.map do |external_url|
99
- swap(external_url, @options[:href_swap])
105
+ @src = @src.map do |url|
106
+ swap(url, @options[:href_swap])
100
107
  end
101
108
  end
102
- external_urls = Hash[*@src.map { |s| [s, nil] }.flatten]
103
- validate_urls(external_urls)
109
+ @external_urls = Hash[*@src.map { |s| [s, nil] }.flatten]
110
+ validate_urls
104
111
  end
105
112
 
106
113
  # Collects any external URLs found in a directory of files. Also collectes
107
114
  # every failed test from check_files_for_internal_woes.
108
115
  # Sends the external URLs to Typhoeus for batch processing.
109
116
  def check_directory_of_files
110
- external_urls = {}
117
+ @external_urls = {}
111
118
  results = check_files_for_internal_woes
112
119
 
113
120
  results.each do |item|
114
- external_urls.merge!(item[:external_urls])
121
+ @external_urls.merge!(item[:external_urls])
115
122
  @failed_tests.concat(item[:failed_tests])
116
123
  end
117
124
 
118
- validate_urls(external_urls) unless @options[:disable_external]
125
+ validate_urls unless @options[:disable_external]
119
126
 
120
127
  logger.log :info, :blue, "Ran on #{files.length} files!\n\n"
121
128
  end
@@ -137,8 +144,8 @@ module HTML
137
144
  end
138
145
  end
139
146
 
140
- def validate_urls(external_urls)
141
- url_validator = HTML::Proofer::UrlValidator.new(logger, external_urls, @options, @typhoeus_opts, @hydra_opts)
147
+ def validate_urls
148
+ url_validator = HTML::Proofer::UrlValidator.new(logger, @external_urls, @options, @typhoeus_opts, @hydra_opts)
142
149
  @failed_tests.concat(url_validator.run)
143
150
  end
144
151
 
@@ -0,0 +1,16 @@
1
+ module HTML
2
+ class Proofer
3
+ module Cache
4
+ def create_nokogiri(path)
5
+ if File.exist? path
6
+ content = File.open(path).read
7
+ else
8
+ content = path
9
+ end
10
+
11
+ Nokogiri::HTML(content)
12
+ end
13
+ module_function :create_nokogiri
14
+ end
15
+ end
16
+ end
@@ -6,7 +6,8 @@ module HTML
6
6
  class CheckRunner
7
7
 
8
8
  attr_reader :issues, :src, :path, :options, :typhoeus_opts, :hydra_opts, :parallel_opts, \
9
- :validation_opts, :external_urls, :href_ignores, :url_ignores, :alt_ignores, :empty_alt_ignore
9
+ :validation_opts, :external_urls, :href_ignores, :url_ignores, :alt_ignores, \
10
+ :empty_alt_ignore
10
11
 
11
12
  def initialize(src, path, html, options, typhoeus_opts, hydra_opts, parallel_opts, validation_opts)
12
13
  @src = src
@@ -23,6 +24,7 @@ module HTML
23
24
  @alt_ignores = @options[:alt_ignore]
24
25
  @empty_alt_ignore = @options[:empty_alt_ignore]
25
26
  @external_urls = {}
27
+ @external_domain_paths_with_queries = {}
26
28
  end
27
29
 
28
30
  def run
@@ -33,14 +35,45 @@ module HTML
33
35
  @issues << Issue.new(@path, desc, line_number, status)
34
36
  end
35
37
 
36
- def add_to_external_urls(href)
37
- if @external_urls[href]
38
- @external_urls[href] << @path
38
+ def add_to_external_urls(url, line)
39
+ return if @external_urls[url]
40
+ uri = Addressable::URI.parse(url)
41
+
42
+ if uri.query.nil?
43
+ add_path_for_url(url)
44
+ else
45
+ new_url_query_values?(uri, url)
46
+ end
47
+ end
48
+
49
+ def add_path_for_url(url)
50
+ if @external_urls[url]
51
+ @external_urls[url] << @path
39
52
  else
40
- @external_urls[href] = [@path]
53
+ @external_urls[url] = [@path]
41
54
  end
42
55
  end
43
56
 
57
+ def new_url_query_values?(uri, url)
58
+ queries = uri.query_values.keys.join('-')
59
+ domain_path = extract_domain_path(uri)
60
+ if @external_domain_paths_with_queries[domain_path].nil?
61
+ add_path_for_url(url)
62
+ # remember queries we've seen, ignore future ones
63
+ @external_domain_paths_with_queries[domain_path] = [queries]
64
+ else
65
+ # add queries we haven't seen
66
+ unless @external_domain_paths_with_queries[domain_path].include?(queries)
67
+ add_path_for_url(url)
68
+ @external_domain_paths_with_queries[domain_path] << queries
69
+ end
70
+ end
71
+ end
72
+
73
+ def extract_domain_path(uri)
74
+ uri.host + uri.path
75
+ end
76
+
44
77
  def self.checks
45
78
  classes = []
46
79
 
@@ -52,7 +85,7 @@ module HTML
52
85
  classes
53
86
  end
54
87
 
55
- private
88
+ private
56
89
 
57
90
  def remove_ignored(html)
58
91
  html.css('code, pre').each(&:unlink)
@@ -5,7 +5,8 @@ module HTML
5
5
  class Proofer
6
6
  # Represents the superclass from which all checks derive.
7
7
  class Checkable
8
- include HTML::Utils
8
+ include HTML::Proofer::Utils
9
+
9
10
  attr_reader :line
10
11
 
11
12
  def initialize(obj, check)
@@ -75,12 +76,12 @@ module HTML
75
76
  return true if ignores_pattern_check(@check.url_ignores)
76
77
 
77
78
  # ignore user defined hrefs
78
- if 'LinkCheckable' === @type
79
+ if 'LinkCheckable' == @type
79
80
  return true if ignores_pattern_check(@check.href_ignores)
80
81
  end
81
82
 
82
83
  # ignore user defined alts
83
- if 'ImageCheckable' === @type
84
+ if 'ImageCheckable' == @type
84
85
  return true if ignores_pattern_check(@check.alt_ignores)
85
86
  end
86
87
  end
@@ -102,7 +103,7 @@ module HTML
102
103
  def file_path
103
104
  return if path.nil?
104
105
 
105
- if path =~ /^\// # path relative to root
106
+ if path =~ %r{^/} # path relative to root
106
107
  base = File.directory?(@check.src) ? @check.src : File.dirname(@check.src)
107
108
  elsif File.exist?(File.expand_path path, @check.src) # relative links, path is a file
108
109
  base = File.dirname @check.path
@@ -158,7 +159,6 @@ module HTML
158
159
  def real_attr(attr)
159
160
  attr.to_s unless attr.nil? || attr.empty?
160
161
  end
161
-
162
162
  end
163
163
  end
164
164
  end