html-proofer 2.4.2 → 2.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 110942931188ba50ea9cfa54113e766fdb2e82ab
4
- data.tar.gz: 302722012ef4ae3c3357312bf48cf13214b43696
3
+ metadata.gz: 1b2c47bd5d3ea56e4d6f2a5f4ee3ba01ed43b337
4
+ data.tar.gz: 381527e029581dcb0249372f345d838a2950f136
5
5
  SHA512:
6
- metadata.gz: aca39deda0c1c82c566d9b1d4723c632343c5b68193aa0923e6f4ad9ba9042d8eb6ef12a980627c1b021432f01c7bdabbb64f4516dd472a2072750e94526c3e6
7
- data.tar.gz: 64e204388cff5f32d1995fae38a85ea981bc52f9a420802ae21b499da859ed525a32d23afca8ae4d51c1ab215987439623007279fb468606f50f24e2092ad8e4
6
+ metadata.gz: 0a72bfe1b0702177843865d6b36008a1136a1e8366c37ffce4ea37a91d8dc876e394167328a77068ae1f2b70d2fa48f7fa0bcd82d0d8b7518d219949ae8cd056
7
+ data.tar.gz: 25a3d37a3dcf3ff4f8ea439ac1344bdac5f55b4d5621017f7b8b22d4e963fb5f7c8e49d774a223a251117e7407afad464aae471839bdb04f4174edaa3f170370
data/README.md CHANGED
@@ -22,16 +22,10 @@ Or install it yourself as:
22
22
 
23
23
  **NOTE:** When installation speed matters, set `NOKOGIRI_USE_SYSTEM_LIBRARIES` to `true` in your environment. This is useful for increasing the speed of your Continuous Integration builds.
24
24
 
25
- ### Real-life examples
26
-
27
- Project | Repository
28
- :--- | :---
29
- [Raspberry Pi documentation](http://www.raspberrypi.org/documentation/) | [raspberrypi/documentation]( https://github.com/raspberrypi/documentation)
30
- [Open Whisper Systems website](https://whispersystems.org/) | [WhisperSystems/whispersystems.org](https://github.com/WhisperSystems/whispersystems.org)
31
- [Jekyll website](http://jekyllrb.com/) | [jekyll/jekyll](https://github.com/jekyll/jekyll)
32
-
33
25
  ## What's Tested?
34
26
 
27
+ You can enable or disable most of the following checks.
28
+
35
29
  ### Images
36
30
 
37
31
  `img` elements:
@@ -44,24 +38,24 @@ Project | Repository
44
38
 
45
39
  `a`, `link` elements:
46
40
 
47
- * Whether your internal links are not broken; this includes hash references (`#linkToMe`)
41
+ * Whether your internal links are working
42
+ * Whether your internal hash references (`#linkToMe`) are working
48
43
  * Whether external links are working
49
44
 
50
45
  ### Scripts
51
46
 
52
47
  `script` elements:
53
48
 
54
- * Whether your internal script references are not broken
49
+ * Whether your internal script references are working
55
50
  * Whether external scripts are loading
56
51
 
57
52
  ### Favicon
58
53
 
59
- Checks if your favicons are valid. This is an optional feature, set the `check_favicon` option to turn it on.
54
+ * Whether your favicons are valid.
60
55
 
61
56
  ### HTML
62
57
 
63
- Nokogiri looks at the markup and [provides errors](http://www.nokogiri.org/tutorials/ensuring_well_formed_markup.html) when parsing your document.
64
- This is an optional feature, set the `check_html` option to enable validation errors from Nokogiri.
58
+ * Whether your HTML markup is valid. This is done via [Nokogiri, to ensure well-formed markup](http://www.nokogiri.org/tutorials/ensuring_well_formed_markup.html).
65
59
 
66
60
  ## Usage
67
61
 
@@ -125,12 +119,34 @@ task :test do
125
119
  end
126
120
  ```
127
121
 
128
- Don't have or want a `Rakefile`? You _could_ also do something like the following:
122
+ Don't have or want a `Rakefile`? You can also do something like the following:
129
123
 
130
124
  ```bash
131
125
  htmlproof ./_site
132
126
  ```
133
127
 
128
+ ### Array of links
129
+
130
+ Instead of a directory as the first argument, you can also pass in an array of links:
131
+
132
+ ``` ruby
133
+ HTML::Proofer.new(["http://github.com", "http://jekyllrb.com"])
134
+ ```
135
+
136
+ This configures Proofer to just test those links to ensure they are valid. Note that for the command-line, you'll need to pass a special `--as-links` argument:
137
+
138
+ ``` bash
139
+ htmlproof www.google.com,www.github.com --as-links
140
+ ```
141
+
142
+ ## Ignoring content
143
+
144
+ Add the `data-proofer-ignore` attribute to any tag to ignore it from every check.
145
+
146
+ ``` html
147
+ <a href="http://notareallink" data-proofer-ignore>Not checked.</a>
148
+ ```
149
+
134
150
  ## Configuration
135
151
 
136
152
  The `HTML::Proofer` constructor takes an optional hash of additional options:
@@ -148,12 +164,13 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
148
164
  | `error_sort` | Defines the sort order for error output. Can be `:path`, `:desc`, or `:status`. | `:path`
149
165
  | `ext` | The extension of your HTML files including the dot. | `.html`
150
166
  | `file_ignore` | An array of Strings or RegExps containing file paths that are safe to ignore. | `[]` |
151
- | `href_ignore` | An array of Strings or RegExps containing `href`s that are safe to ignore. Note that non-HTTP(S) URIs are always ignored. | `[]` |
152
- | `href_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms links that match `RegExp` into `String` via `gsub`. | `{}` |
167
+ | `href_ignore` | An array of Strings or RegExps containing `href`s that are safe to ignore. Note that non-HTTP(S) URIs are always ignored. **Will be renamed in a future release.** | `[]` |
168
+ | `href_swap` | A hash containing key-value pairs of `RegExp => String`. It transforms links that match `RegExp` into `String` via `gsub`. **Will be renamed in a future release.** | `{}` |
153
169
  | `ignore_script_embeds` | When `check_html` is enabled, `script` tags containing markup [are reported as errors](http://git.io/vOovv). Enabling this option ignores those errors. | `false`
154
170
  | `only_4xx` | Only reports errors for links that fall within the 4xx status code range. | `false` |
155
171
  | `url_ignore` | An array of Strings or RegExps containing URLs that are safe to ignore. It affects all HTML attributes. Note that non-HTTP(S) URIs are always ignored. | `[]` |
156
- | `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. | `false` |
172
+ | `verbose` | If `true`, outputs extra information as the checking happens. Useful for debugging. **Will be deprecated in a future release.**| `false` |
173
+ | `verbosity` | Sets the logging level, as determined by [Yell](https://github.com/rudionrails/yell). | `:info`
157
174
 
158
175
  ### Configuring Typhoeus and Hydra
159
176
 
@@ -163,7 +180,7 @@ The `HTML::Proofer` constructor takes an optional hash of additional options:
163
180
  HTML::Proofer.new("out/", {:ext => ".htm", :typhoeus => { :verbose => true, :ssl_verifyhost => 2 } })
164
181
  ```
165
182
 
166
- This sets `HTML::Proofer`'s extensions to use _.htm_, and gives Typhoeus a configuration for it to be verbose, and use specific SSL settings. Check [the Typhoeus documentation](https://github.com/typhoeus/typhoeus#other-curl-options) for more information on what options it can receive.
183
+ This sets `HTML::Proofer`'s extensions to use _.htm_, gives Typhoeus a configuration for it to be verbose, and use specific SSL settings. Check [the Typhoeus documentation](https://github.com/typhoeus/typhoeus#other-curl-options) for more information on what options it can receive.
167
184
 
168
185
  You can similarly pass in a `:hydra` option with a hash configuration for Hydra.
169
186
 
@@ -171,40 +188,26 @@ The default value is `typhoeus => { :followlocation => true }`.
171
188
 
172
189
  ### Configuring Parallel
173
190
 
174
- [Parallel](https://github.com/grosser/parallel) is being used to speed internal file checks. You can pass in any of its options with the options "namespace" `:parallel`. For example:
191
+ [Parallel](https://github.com/grosser/parallel) is used to speed internal file checks. You can pass in any of its options with the options namespace `:parallel`. For example:
175
192
 
176
193
  ``` ruby
177
194
  HTML::Proofer.new("out/", {:ext => ".htm", :parallel => { :in_processes => 3} })
178
195
  ```
179
196
 
180
- `:in_processes => 3` will be passed into Parallel as a configuration option.
181
-
182
- ### Array of links
183
-
184
- Instead of a directory as the first argument, you can also pass in an array of links:
185
-
186
- ``` ruby
187
- HTML::Proofer.new(["http://github.com", "http://jekyllrb.com"])
188
- ```
189
-
190
- This configures Proofer to just test those links to ensure they are valid. Note that for the command-line, you'll need to pass a special `--as-links` argument:
197
+ In this example, `:in_processes => 3` is passed into Parallel as a configuration option.
191
198
 
192
- ``` bash
193
- bin/htmlproof www.google.com,www.github.com --as-links
194
- ```
199
+ ## Logging
195
200
 
196
- ## Ignoring content
201
+ HTML-Proofer can be as noisy or as quiet as you'd like. There are two ways to log information:
197
202
 
198
- Add the `data-proofer-ignore` attribute to any tag to ignore it from the checks.
203
+ * If you set the `:verbose` option to `true`, HTML-Proofer will provide some debug information.
204
+ * If you set the `:verbosity` option, you can better define the level of logging. See the configuration table above for more information.
199
205
 
200
-
201
- ``` html
202
- <a href="http://notareallink" data-proofer-ignore>Not checked.</a>
203
- ```
206
+ `:verbosity` is newer and offers better configuration. `:verbose` will be deprecated in a future 3.x.x release.
204
207
 
205
208
  ## Custom tests
206
209
 
207
- Want to write your own test? Sure! Just create two classes--one that inherits from `HTML::Proofer::CheckRunner`, and another that inherits from `HTML::Proofer::Checkable`.
210
+ Want to write your own test? Sure! Just create two classes--one that inherits from `HTML::Proofer::Checkable`, and another that inherits from `HTML::Proofer::CheckRunner`.
208
211
 
209
212
  The `CheckRunner` subclass must define one method called `run`. This is called on your content, and is responsible for performing the validation on whatever elements you like. When you catch a broken issue, call `add_issue(message)` to explain the error.
210
213
 
@@ -214,7 +217,6 @@ Here's an example custom test that protects against `mailto` links that point to
214
217
 
215
218
  ``` ruby
216
219
  class OctocatLinkCheck < ::HTML::Proofer::Checkable
217
-
218
220
  def mailto?
219
221
  return false if @data_ignore_proofer || @href.nil? || @href.empty?
220
222
  return @href.match /^mailto\:/
@@ -227,13 +229,13 @@ class OctocatLinkCheck < ::HTML::Proofer::Checkable
227
229
  end
228
230
 
229
231
  class MailToOctocat < ::HTML::Proofer::CheckRunner
230
-
231
232
  def run
232
- @html.css('a').each do |l|
233
- link = OctocatLinkCheck.new l, self
233
+ @html.css('a').each do |node|
234
+ link = OctocatLinkCheck.new(node, self)
235
+ line = node.line
234
236
 
235
237
  if link.mailto? && link.octocat?
236
- return add_issue("Don't email the Octocat directly!")
238
+ return add_issue("Don't email the Octocat directly!", line)
237
239
  end
238
240
  end
239
241
  end
@@ -244,7 +246,7 @@ end
244
246
 
245
247
  ### Certificates
246
248
 
247
- To ignore certificates turn off the Typhoeus SSL verification:
249
+ To ignore certificates, turn off Typhoeus' SSL verification:
248
250
 
249
251
  ``` ruby
250
252
  HTML::Proofer.new("out/", {
@@ -264,3 +266,11 @@ HTML::Proofer.new("out/", {
264
266
  :headers => { "User-Agent" => "Mozilla/5.0 (compatible; My New User-Agent)" }
265
267
  }}).run
266
268
  ```
269
+
270
+ ## Real-life examples
271
+
272
+ Project | Repository
273
+ :--- | :---
274
+ [Raspberry Pi documentation](http://www.raspberrypi.org/documentation/) | [raspberrypi/documentation]( https://github.com/raspberrypi/documentation)
275
+ [Open Whisper Systems website](https://whispersystems.org/) | [WhisperSystems/whispersystems.org](https://github.com/WhisperSystems/whispersystems.org)
276
+ [Jekyll website](http://jekyllrb.com/) | [jekyll/jekyll](https://github.com/jekyll/jekyll)
data/bin/htmlproof CHANGED
@@ -8,7 +8,7 @@ require 'mercenary'
8
8
  require 'rubygems'
9
9
 
10
10
  def to_regex?(item)
11
- if item.start_with? '/' and item.end_with? '/'
11
+ if item.start_with?('/') && item.end_with?('/')
12
12
  Regexp.new item[1...-1]
13
13
  else
14
14
  item
@@ -40,6 +40,7 @@ Mercenary.program(:htmlproof) do |p|
40
40
  p.option 'only_4xx', '--only-4xx', 'Only reports errors for links that fall within the 4x status code range.'
41
41
  p.option 'url_ignore', '--url-ignore link1,[link2,...]', Array, 'Comma-separated list of Strings or RegExps containing URLs that are safe to ignore.'
42
42
  p.option 'verbose', '--verbose', 'Enables more verbose logging.'
43
+ p.option 'verbosity', '--verbosity', String, 'Sets the logging level, as determined by Yell'
43
44
 
44
45
  p.action do |args, opts|
45
46
  args = ['.'] if args.empty?
@@ -47,7 +48,7 @@ Mercenary.program(:htmlproof) do |p|
47
48
 
48
49
  options = {}
49
50
 
50
- # prepare every to go to proofer
51
+ # prepare everything to go to proofer
51
52
  p.options.select { |o| !opts[o.config_key].nil? }.each do |option|
52
53
  if option.return_type.to_s == 'Array' # TODO: is_a? doesn't work here?
53
54
  opts[option.config_key] = opts[option.config_key].map { |i| to_regex?(i) }
@@ -65,6 +66,7 @@ Mercenary.program(:htmlproof) do |p|
65
66
  end
66
67
 
67
68
  options[:error_sort] = opts['error-sort'].to_sym unless opts['error-sort'].nil?
69
+ options[:verbosity] = opts['verbosity'].to_sym unless opts['verbosity'].nil?
68
70
 
69
71
  path = path.delete(' ').split(',') if opts['as_links']
70
72
 
data/lib/html/proofer.rb CHANGED
@@ -20,20 +20,27 @@ rescue LoadError; end
20
20
  module HTML
21
21
 
22
22
  class Proofer
23
- include Utils
23
+ include HTML::Proofer::Utils
24
24
 
25
- attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts
25
+ attr_reader :options, :typhoeus_opts, :hydra_opts, :parallel_opts, :validation_opts, :external_urls
26
26
 
27
27
  TYPHOEUS_DEFAULTS = {
28
28
  :followlocation => true,
29
29
  :headers => {
30
- "User-Agent" => "Mozilla/5.0 (compatible; HTML Proofer/#{VERSION}; +https://github.com/gjtorikian/html-proofer)"
30
+ 'User-Agent' => "Mozilla/5.0 (compatible; HTML Proofer/#{VERSION}; +https://github.com/gjtorikian/html-proofer)"
31
31
  }
32
32
  }
33
33
 
34
34
  def initialize(src, opts = {})
35
35
  @src = src
36
36
 
37
+ if opts[:verbose]
38
+ warn '`@options[:verbose]` will be removed in a future 3.x.x release: http://git.io/vGHHh'
39
+ end
40
+ if opts[:href_ignore]
41
+ warn '`@options[:href_ignore]` will be renamed in a future 3.x.x release: http://git.io/vGHHy'
42
+ end
43
+
37
44
  @proofer_opts = {
38
45
  :ext => '.html',
39
46
  :check_favicon => false,
@@ -72,7 +79,7 @@ module HTML
72
79
  end
73
80
 
74
81
  def logger
75
- @logger ||= HTML::Proofer::Log.new(@options[:verbose])
82
+ @logger ||= HTML::Proofer::Log.new(@options[:verbose], @options[:verbosity])
76
83
  end
77
84
 
78
85
  def run
@@ -95,27 +102,27 @@ module HTML
95
102
 
96
103
  def check_list_of_links
97
104
  if @options[:href_swap]
98
- @src = @src.map do |external_url|
99
- swap(external_url, @options[:href_swap])
105
+ @src = @src.map do |url|
106
+ swap(url, @options[:href_swap])
100
107
  end
101
108
  end
102
- external_urls = Hash[*@src.map { |s| [s, nil] }.flatten]
103
- validate_urls(external_urls)
109
+ @external_urls = Hash[*@src.map { |s| [s, nil] }.flatten]
110
+ validate_urls
104
111
  end
105
112
 
106
113
  # Collects any external URLs found in a directory of files. Also collectes
107
114
  # every failed test from check_files_for_internal_woes.
108
115
  # Sends the external URLs to Typhoeus for batch processing.
109
116
  def check_directory_of_files
110
- external_urls = {}
117
+ @external_urls = {}
111
118
  results = check_files_for_internal_woes
112
119
 
113
120
  results.each do |item|
114
- external_urls.merge!(item[:external_urls])
121
+ @external_urls.merge!(item[:external_urls])
115
122
  @failed_tests.concat(item[:failed_tests])
116
123
  end
117
124
 
118
- validate_urls(external_urls) unless @options[:disable_external]
125
+ validate_urls unless @options[:disable_external]
119
126
 
120
127
  logger.log :info, :blue, "Ran on #{files.length} files!\n\n"
121
128
  end
@@ -137,8 +144,8 @@ module HTML
137
144
  end
138
145
  end
139
146
 
140
- def validate_urls(external_urls)
141
- url_validator = HTML::Proofer::UrlValidator.new(logger, external_urls, @options, @typhoeus_opts, @hydra_opts)
147
+ def validate_urls
148
+ url_validator = HTML::Proofer::UrlValidator.new(logger, @external_urls, @options, @typhoeus_opts, @hydra_opts)
142
149
  @failed_tests.concat(url_validator.run)
143
150
  end
144
151
 
@@ -0,0 +1,16 @@
1
+ module HTML
2
+ class Proofer
3
+ module Cache
4
+ def create_nokogiri(path)
5
+ if File.exist? path
6
+ content = File.open(path).read
7
+ else
8
+ content = path
9
+ end
10
+
11
+ Nokogiri::HTML(content)
12
+ end
13
+ module_function :create_nokogiri
14
+ end
15
+ end
16
+ end
@@ -6,7 +6,8 @@ module HTML
6
6
  class CheckRunner
7
7
 
8
8
  attr_reader :issues, :src, :path, :options, :typhoeus_opts, :hydra_opts, :parallel_opts, \
9
- :validation_opts, :external_urls, :href_ignores, :url_ignores, :alt_ignores, :empty_alt_ignore
9
+ :validation_opts, :external_urls, :href_ignores, :url_ignores, :alt_ignores, \
10
+ :empty_alt_ignore
10
11
 
11
12
  def initialize(src, path, html, options, typhoeus_opts, hydra_opts, parallel_opts, validation_opts)
12
13
  @src = src
@@ -23,6 +24,7 @@ module HTML
23
24
  @alt_ignores = @options[:alt_ignore]
24
25
  @empty_alt_ignore = @options[:empty_alt_ignore]
25
26
  @external_urls = {}
27
+ @external_domain_paths_with_queries = {}
26
28
  end
27
29
 
28
30
  def run
@@ -33,14 +35,45 @@ module HTML
33
35
  @issues << Issue.new(@path, desc, line_number, status)
34
36
  end
35
37
 
36
- def add_to_external_urls(href)
37
- if @external_urls[href]
38
- @external_urls[href] << @path
38
+ def add_to_external_urls(url, line)
39
+ return if @external_urls[url]
40
+ uri = Addressable::URI.parse(url)
41
+
42
+ if uri.query.nil?
43
+ add_path_for_url(url)
44
+ else
45
+ new_url_query_values?(uri, url)
46
+ end
47
+ end
48
+
49
+ def add_path_for_url(url)
50
+ if @external_urls[url]
51
+ @external_urls[url] << @path
39
52
  else
40
- @external_urls[href] = [@path]
53
+ @external_urls[url] = [@path]
41
54
  end
42
55
  end
43
56
 
57
+ def new_url_query_values?(uri, url)
58
+ queries = uri.query_values.keys.join('-')
59
+ domain_path = extract_domain_path(uri)
60
+ if @external_domain_paths_with_queries[domain_path].nil?
61
+ add_path_for_url(url)
62
+ # remember queries we've seen, ignore future ones
63
+ @external_domain_paths_with_queries[domain_path] = [queries]
64
+ else
65
+ # add queries we haven't seen
66
+ unless @external_domain_paths_with_queries[domain_path].include?(queries)
67
+ add_path_for_url(url)
68
+ @external_domain_paths_with_queries[domain_path] << queries
69
+ end
70
+ end
71
+ end
72
+
73
+ def extract_domain_path(uri)
74
+ uri.host + uri.path
75
+ end
76
+
44
77
  def self.checks
45
78
  classes = []
46
79
 
@@ -52,7 +85,7 @@ module HTML
52
85
  classes
53
86
  end
54
87
 
55
- private
88
+ private
56
89
 
57
90
  def remove_ignored(html)
58
91
  html.css('code, pre').each(&:unlink)
@@ -5,7 +5,8 @@ module HTML
5
5
  class Proofer
6
6
  # Represents the superclass from which all checks derive.
7
7
  class Checkable
8
- include HTML::Utils
8
+ include HTML::Proofer::Utils
9
+
9
10
  attr_reader :line
10
11
 
11
12
  def initialize(obj, check)
@@ -75,12 +76,12 @@ module HTML
75
76
  return true if ignores_pattern_check(@check.url_ignores)
76
77
 
77
78
  # ignore user defined hrefs
78
- if 'LinkCheckable' === @type
79
+ if 'LinkCheckable' == @type
79
80
  return true if ignores_pattern_check(@check.href_ignores)
80
81
  end
81
82
 
82
83
  # ignore user defined alts
83
- if 'ImageCheckable' === @type
84
+ if 'ImageCheckable' == @type
84
85
  return true if ignores_pattern_check(@check.alt_ignores)
85
86
  end
86
87
  end
@@ -102,7 +103,7 @@ module HTML
102
103
  def file_path
103
104
  return if path.nil?
104
105
 
105
- if path =~ /^\// # path relative to root
106
+ if path =~ %r{^/} # path relative to root
106
107
  base = File.directory?(@check.src) ? @check.src : File.dirname(@check.src)
107
108
  elsif File.exist?(File.expand_path path, @check.src) # relative links, path is a file
108
109
  base = File.dirname @check.path
@@ -158,7 +159,6 @@ module HTML
158
159
  def real_attr(attr)
159
160
  attr.to_s unless attr.nil? || attr.empty?
160
161
  end
161
-
162
162
  end
163
163
  end
164
164
  end