w3clove 0.2.1 → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +4 -16
- data/lib/w3clove.rb +1 -0
- data/lib/w3clove/page.rb +16 -5
- data/lib/w3clove/reporter.rb +21 -0
- data/lib/w3clove/sitemap.rb +6 -2
- data/lib/w3clove/templates/w3clove.html.erb +93 -0
- data/lib/w3clove/validator.rb +4 -54
- data/lib/w3clove/version.rb +1 -1
- metadata +6 -4
data/README.rdoc
CHANGED
@@ -2,21 +2,7 @@
|
|
2
2
|
|
3
3
|
This is my {Ruby Mendicant University}[http://university.rubymendicant.com/] personal project, and is currently in alpha status.
|
4
4
|
|
5
|
-
|
6
|
-
|
7
|
-
Currently, the official {W3C Validator site}[http://validator.w3.org/] only lets you validate one URL at a time, so when you want to validate all the pages on a web site, it can be a tedious process. There is a {related tool}[http://www.htmlhelp.com/tools/validator/batch.html.en] that lets you use a batch mode for this and submit a list of URLs to be checked, but it is still a semi-manual process, and the output is not very useful.
|
8
|
-
|
9
|
-
My plan then is building a command line utility that would accept as input a XML sitemap file, or its URL, expecting it to be on the {Google Sitemap format}[http://en.wikipedia.org/wiki/Google_Sitemaps]. This utility will then check the markup validation of each URL on this sitemap querying the W3C Validator, and store all detected errors and warnings. After checking all the URLs, it will generate as output an HTML file, with a style similar to what RCov[https://github.com/relevance/rcov] produces, showing all these errors on an easy to read format, grouping common errors together, sorting them by popularity, and linking to the URLs and to the explanations on how to correct them.
|
10
|
-
|
11
|
-
Internally, it would use the {w3c_validators gem}[http://rubygems.org/gems/w3c_validators] to do the individual checks, so my gem would be concerned only with the XML sitemap parsing, building the queue, storing the errors, grouping and sorting them, and producing the HTML output.
|
12
|
-
|
13
|
-
I've already done something similar to this, I sent {a little contribution to docrails}[https://github.com/lifo/docrails/blob/master/railties/guides/w3c_validator.rb] that checks the generated guides using this gem.
|
14
|
-
|
15
|
-
= Bonus points:
|
16
|
-
|
17
|
-
* in addition to an XML file, accept as input the URL of a site and crawl the site to find all internal links
|
18
|
-
* validate the markup locally, without querying the W3C site, for more speed and to not saturate the W3C site
|
19
|
-
* store the results on a local database, so on subsequent checks, only the pages that had errors are re-checked (unless a --checkall force flag is passed). This way developers can check the whole site, get the errors, deploy the corrections, and recheck the site.
|
5
|
+
It's a site-wide markup validator, a Ruby gem that lets you validate a whole web site against the W3C Markup Validator, from the command line, and generate a comprehensive report of all errors found.
|
20
6
|
|
21
7
|
= Installation:
|
22
8
|
|
@@ -28,7 +14,9 @@ w3clove is a Ruby gem that can be installed on the usual way
|
|
28
14
|
|
29
15
|
Pass it the url of an XML sitemap to be checked, like:
|
30
16
|
|
31
|
-
w3clove
|
17
|
+
w3clove https://github.com/jaimeiniesta/w3clove/raw/master/spec/samples/sitemap.xml
|
18
|
+
|
19
|
+
This will validate all the URLs of your sitemap.xml (so be sure to pass it a short one or you'll wait for ages), and generate an HTML file with the full report.
|
32
20
|
|
33
21
|
= Notes:
|
34
22
|
|
data/lib/w3clove.rb
CHANGED
data/lib/w3clove/page.rb
CHANGED
@@ -1,5 +1,6 @@
|
|
1
1
|
# -*- encoding: utf-8 -*-
|
2
2
|
|
3
|
+
require 'timeout'
|
3
4
|
require 'w3c_validators'
|
4
5
|
include W3CValidators
|
5
6
|
|
@@ -16,15 +17,19 @@ module W3Clove
|
|
16
17
|
end
|
17
18
|
|
18
19
|
##
|
19
|
-
# Checks for errors and returns true if none found, false otherwise
|
20
|
-
#
|
21
|
-
# warnings but without errors will return true
|
20
|
+
# Checks for errors and returns true if none found, false otherwise.
|
21
|
+
# Warnings are not considered as validation errors so a page with
|
22
|
+
# warnings but without errors will return true.
|
22
23
|
# If the validation goes well, errors should be an array. Otherwise
|
23
|
-
# it will still be nil, which will not be considered validated
|
24
|
+
# it will still be nil, which will not be considered validated.
|
24
25
|
def valid?
|
25
26
|
!errors.nil? && errors.empty?
|
26
27
|
end
|
27
28
|
|
29
|
+
##
|
30
|
+
# Returns the collection of errors from the validations of this page.
|
31
|
+
# If it has no validation errors, it will be an empty array.
|
32
|
+
# It an exception occurs, it will be nil.
|
28
33
|
def errors
|
29
34
|
@errors ||= validations.errors.map {|e|
|
30
35
|
W3Clove::Message.new(e.message_id,
|
@@ -36,6 +41,10 @@ module W3Clove
|
|
36
41
|
nil
|
37
42
|
end
|
38
43
|
|
44
|
+
##
|
45
|
+
# Returns the collection of warnings from the validations of this page.
|
46
|
+
# If it has no validation warnings, it will be an empty array.
|
47
|
+
# It an exception occurs, it will be nil.
|
39
48
|
def warnings
|
40
49
|
@warnings ||= validations.warnings.map {|w|
|
41
50
|
W3Clove::Message.new(w.message_id,
|
@@ -49,8 +58,10 @@ module W3Clove
|
|
49
58
|
|
50
59
|
private
|
51
60
|
|
61
|
+
##
|
62
|
+
# Gets the validations for this page, ensuring it times out soon
|
52
63
|
def validations
|
53
|
-
@validations ||= MarkupValidator.new.validate_uri(url)
|
64
|
+
@validations ||= Timeout::timeout(10) { MarkupValidator.new.validate_uri(url) }
|
54
65
|
end
|
55
66
|
end
|
56
67
|
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
|
3
|
+
require 'erb'
|
4
|
+
|
5
|
+
module W3Clove
|
6
|
+
module Reporter
|
7
|
+
extend self
|
8
|
+
|
9
|
+
##
|
10
|
+
# Create the html report for the sitemap
|
11
|
+
def generate_html(sitemap)
|
12
|
+
template = ERB.new(open(File.dirname(__FILE__)+'/templates/w3clove.html.erb').read)
|
13
|
+
|
14
|
+
File.open('w3clove.html', 'w') do |f|
|
15
|
+
f.write(template.result(sitemap.get_binding))
|
16
|
+
end
|
17
|
+
rescue Exception => e
|
18
|
+
puts "ERROR generating report: #{e}"
|
19
|
+
end
|
20
|
+
end
|
21
|
+
end
|
data/lib/w3clove/sitemap.rb
CHANGED
@@ -19,11 +19,15 @@ module W3Clove
|
|
19
19
|
end
|
20
20
|
|
21
21
|
def errors
|
22
|
-
pages.map {|p| p.errors}.flatten
|
22
|
+
@errors ||= pages.map {|p| p.errors}.flatten.reject {|e| e.nil?}
|
23
23
|
end
|
24
24
|
|
25
25
|
def warnings
|
26
|
-
pages.map {|p| p.warnings}.flatten
|
26
|
+
@warnings ||= pages.map {|p| p.warnings}.flatten.reject {|e| e.nil?}
|
27
|
+
end
|
28
|
+
|
29
|
+
def get_binding
|
30
|
+
binding
|
27
31
|
end
|
28
32
|
|
29
33
|
private
|
@@ -0,0 +1,93 @@
|
|
1
|
+
<html>
|
2
|
+
<head>
|
3
|
+
<title>W3Clove report for <%= @url %></title>
|
4
|
+
<link href='http://fonts.googleapis.com/css?family=Cabin+Sketch:bold' rel='stylesheet' type='text/css'>
|
5
|
+
<style type="text/css">
|
6
|
+
#header, #main, #footer {
|
7
|
+
padding: 5px; margin: 5px;
|
8
|
+
background-color: #eee;
|
9
|
+
border: 1px solid black;
|
10
|
+
}
|
11
|
+
#header, h1, h2, h3 {
|
12
|
+
font-family: 'Cabin Sketch', arial, serif;
|
13
|
+
}
|
14
|
+
#header {
|
15
|
+
font-size: 2.5em;
|
16
|
+
}
|
17
|
+
#footer {
|
18
|
+
font-size: .85em;
|
19
|
+
text-align: center;
|
20
|
+
}
|
21
|
+
.page {
|
22
|
+
padding: 5px; margin: 5px;
|
23
|
+
border: 1px solid black;
|
24
|
+
background-color: white;
|
25
|
+
}
|
26
|
+
</style>
|
27
|
+
</head>
|
28
|
+
<body>
|
29
|
+
<div id="header">
|
30
|
+
W3Clove :: site-wide markup validation
|
31
|
+
</div>
|
32
|
+
|
33
|
+
<div id="main">
|
34
|
+
<h1>Report for <%= @url %></h1>
|
35
|
+
|
36
|
+
<h2>SITEMAP SUMMARY</h2>
|
37
|
+
<p>TOTAL: <%= errors.length %> errors, <%= warnings.length %> warnings.</p>
|
38
|
+
|
39
|
+
<% if errors.length > 0 %>
|
40
|
+
<h2>POPULAR ERRORS</h2>
|
41
|
+
<ul>
|
42
|
+
<% errors.group_by {|e| e.message_id}.sort_by {|m,e| e.length}.reverse.each do |message_id, errors| %>
|
43
|
+
<li>error <a href="http://validator.w3.org/docs/errors.html#ve-<%= message_id %>"><%= message_id %></a> happens <%= errors.length %> times</li>
|
44
|
+
<% end %>
|
45
|
+
</ul>
|
46
|
+
<% end %>
|
47
|
+
|
48
|
+
<% if warnings.length > 0 %>
|
49
|
+
<h2>POPULAR WARNINGS</h2>
|
50
|
+
<ul>
|
51
|
+
<% warnings.group_by {|w| w.message_id}.sort_by {|m,w| w.length}.reverse.each do |message_id, warnings| %>
|
52
|
+
<li>warning <a href="http://validator.w3.org/docs/errors.html#ve-<%= message_id %>"><%= message_id %></a> found <%= warnings.length %> times</li>
|
53
|
+
<% end %>
|
54
|
+
</ul>
|
55
|
+
<% end %>
|
56
|
+
|
57
|
+
<% processed_pages = pages.select{|p| !p.exception} %>
|
58
|
+
<% if processed_pages.size > 0 %>
|
59
|
+
<h2>DETAILS PER PAGE</h2>
|
60
|
+
<% processed_pages.each do |page| %>
|
61
|
+
<div class='page'>
|
62
|
+
<h3><%= page.url %></h3>
|
63
|
+
<p class='page_summary'>
|
64
|
+
<%= "#{page.errors.length} errors, #{page.warnings.length} warnings" %>
|
65
|
+
</p>
|
66
|
+
|
67
|
+
<ul class="page_errors">
|
68
|
+
<% page.errors.each do |error| %>
|
69
|
+
<li>
|
70
|
+
Error <a href="http://validator.w3.org/docs/errors.html#ve-<%= error.message_id %>"><%= error.message_id %></a> on line <%= error.line %>:
|
71
|
+
<%= error.text %>
|
72
|
+
</li>
|
73
|
+
<% end %>
|
74
|
+
</ul>
|
75
|
+
|
76
|
+
<ul class="page_warnings">
|
77
|
+
<% page.warnings.each do |warning| %>
|
78
|
+
<li>
|
79
|
+
Warning <a href="http://validator.w3.org/docs/errors.html#ve-<%= warning.message_id %>"><%= warning.message_id %></a> on line <%= warning.line %>:
|
80
|
+
<%= warning.text %>
|
81
|
+
</li>
|
82
|
+
<% end %>
|
83
|
+
</ul>
|
84
|
+
</div>
|
85
|
+
<% end %>
|
86
|
+
<% end %>
|
87
|
+
</div>
|
88
|
+
|
89
|
+
<div id="footer">
|
90
|
+
<strong>w3clove</strong>@2011 <a href='http://jaimeiniesta.com'>Jaime Iniesta</a>. Get w3clove from <a href="http://github.com/jaimeiniesta/w3clove">github</a>.
|
91
|
+
</div>
|
92
|
+
</body>
|
93
|
+
</html>
|
data/lib/w3clove/validator.rb
CHANGED
@@ -14,71 +14,21 @@ module W3Clove
|
|
14
14
|
# Parses a remote xml sitemap and checks markup validation for each url
|
15
15
|
# Shows progress on dot-style (...F...FFE..). A dot is a valid page,
|
16
16
|
# an F is a page with errors, and an E is an exception
|
17
|
-
# After the checking is done, a detailed summary is
|
17
|
+
# After the checking is done, a detailed summary is generated
|
18
18
|
def check(url)
|
19
19
|
sitemap = W3Clove::Sitemap.new(url)
|
20
|
-
say "Validating #{sitemap.pages.length} pages
|
20
|
+
say "Validating #{sitemap.pages.length} pages"
|
21
21
|
|
22
22
|
sitemap.pages.each do |page|
|
23
23
|
say_inline page.valid? ? "." : (page.errors.nil? ? 'E' : 'F')
|
24
24
|
end
|
25
25
|
|
26
|
-
|
26
|
+
W3Clove::Reporter.generate_html(sitemap)
|
27
|
+
say "\nValidation finished, see the report at w3clove.html"
|
27
28
|
end
|
28
29
|
|
29
30
|
private
|
30
31
|
|
31
|
-
##
|
32
|
-
# Outputs the results of the validation
|
33
|
-
def show_results(sitemap)
|
34
|
-
show_sitemap_summary(sitemap)
|
35
|
-
show_popular_errors(sitemap)
|
36
|
-
show_popular_warnings(sitemap)
|
37
|
-
say "\n\nDETAILS PER PAGE"
|
38
|
-
sitemap.pages.select {|page| !page.errors.empty?}.each do |p|
|
39
|
-
show_page_summary(p)
|
40
|
-
end
|
41
|
-
end
|
42
|
-
|
43
|
-
def show_sitemap_summary(sitemap)
|
44
|
-
<<HEREDOC
|
45
|
-
SITEMAP SUMMARY
|
46
|
-
TOTAL: #{sitemap.errors.length} errors, #{sitemap.warnings.length} warnings
|
47
|
-
HEREDOC
|
48
|
-
end
|
49
|
-
|
50
|
-
def show_popular_errors(sitemap)
|
51
|
-
say "\n\nMOST POPULAR ERRORS\n"
|
52
|
-
sitemap.errors.group_by {|e| e.message_id}
|
53
|
-
.sort_by {|m,e| e.length}
|
54
|
-
.reverse.each do |message_id, errors|
|
55
|
-
say "error #{message_id} happens #{errors.length} times"
|
56
|
-
end
|
57
|
-
end
|
58
|
-
|
59
|
-
def show_popular_warnings(sitemap)
|
60
|
-
say "\n\nMOST POPULAR WARNINGS\n"
|
61
|
-
sitemap.warnings.group_by {|e| e.message_id}
|
62
|
-
.sort_by {|m,e| e.length}
|
63
|
-
.reverse.each do |message_id, warnings|
|
64
|
-
say "warning #{message_id} happens #{warnings.length} times"
|
65
|
-
end
|
66
|
-
end
|
67
|
-
|
68
|
-
def show_page_summary(page)
|
69
|
-
say "\n ** #{page.url} **"
|
70
|
-
" #{page.errors.length} errors, #{page.warnings.length} warnings"
|
71
|
-
page.errors.each do |error|
|
72
|
-
say "\n Error #{error.message_id} on line #{error.line}:"
|
73
|
-
say " #{error.text}"
|
74
|
-
end
|
75
|
-
|
76
|
-
page.warnings.each do |warning|
|
77
|
-
say "\n Warning #{warning.message_id} on line #{warning.line}:"
|
78
|
-
say " #{warning.text}"
|
79
|
-
end
|
80
|
-
end
|
81
|
-
|
82
32
|
def printer
|
83
33
|
@printer ||= STDOUT
|
84
34
|
end
|
data/lib/w3clove/version.rb
CHANGED
metadata
CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
|
|
4
4
|
prerelease: false
|
5
5
|
segments:
|
6
6
|
- 0
|
7
|
-
-
|
8
|
-
-
|
9
|
-
version: 0.
|
7
|
+
- 3
|
8
|
+
- 0
|
9
|
+
version: 0.3.0
|
10
10
|
platform: ruby
|
11
11
|
authors:
|
12
12
|
- Jaime Iniesta
|
@@ -14,7 +14,7 @@ autorequire:
|
|
14
14
|
bindir: bin
|
15
15
|
cert_chain: []
|
16
16
|
|
17
|
-
date: 2011-03-
|
17
|
+
date: 2011-03-23 00:00:00 +01:00
|
18
18
|
default_executable:
|
19
19
|
dependencies:
|
20
20
|
- !ruby/object:Gem::Dependency
|
@@ -90,7 +90,9 @@ files:
|
|
90
90
|
- lib/w3clove.rb
|
91
91
|
- lib/w3clove/message.rb
|
92
92
|
- lib/w3clove/page.rb
|
93
|
+
- lib/w3clove/reporter.rb
|
93
94
|
- lib/w3clove/sitemap.rb
|
95
|
+
- lib/w3clove/templates/w3clove.html.erb
|
94
96
|
- lib/w3clove/validator.rb
|
95
97
|
- lib/w3clove/version.rb
|
96
98
|
- spec/message_spec.rb
|