w3clove 0.2.1 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -2,21 +2,7 @@
2
2
 
3
3
  This is my {Ruby Mendicant University}[http://university.rubymendicant.com/] personal project, and is currently in alpha status.
4
4
 
5
- I want to build a site-wide markup validator, a Ruby gem that lets you validate a whole web site against the W3C Markup Validator, from the command line, and generate a comprehensive report of all errors found.
6
-
7
- Currently, the official {W3C Validator site}[http://validator.w3.org/] only lets you validate one URL at a time, so when you want to validate all the pages on a web site, it can be a tedious process. There is a {related tool}[http://www.htmlhelp.com/tools/validator/batch.html.en] that lets you use a batch mode for this and submit a list of URLs to be checked, but it is still a semi-manual process, and the output is not very useful.
8
-
9
- My plan then is building a command line utility that would accept as input a XML sitemap file, or its URL, expecting it to be on the {Google Sitemap format}[http://en.wikipedia.org/wiki/Google_Sitemaps]. This utility will then check the markup validation of each URL on this sitemap querying the W3C Validator, and store all detected errors and warnings. After checking all the URLs, it will generate as output an HTML file, with a style similar to what RCov[https://github.com/relevance/rcov] produces, showing all these errors on an easy to read format, grouping common errors together, sorting them by popularity, and linking to the URLs and to the explanations on how to correct them.
10
-
11
- Internally, it would use the {w3c_validators gem}[http://rubygems.org/gems/w3c_validators] to do the individual checks, so my gem would be concerned only with the XML sitemap parsing, building the queue, storing the errors, grouping and sorting them, and producing the HTML output.
12
-
13
- I've already done something similar to this, I sent {a little contribution to docrails}[https://github.com/lifo/docrails/blob/master/railties/guides/w3c_validator.rb] that checks the generated guides using this gem.
14
-
15
- = Bonus points:
16
-
17
- * in addition to an XML file, accept as input the URL of a site and crawl the site to find all internal links
18
- * validate the markup locally, without querying the W3C site, for more speed and to not saturate the W3C site
19
- * store the results on a local database, so on subsequent checks, only the pages that had errors are re-checked (unless a --checkall force flag is passed). This way developers can check the whole site, get the errors, deploy the corrections, and recheck the site.
5
+ It's a site-wide markup validator, a Ruby gem that lets you validate a whole web site against the W3C Markup Validator, from the command line, and generate a comprehensive report of all errors found.
20
6
 
21
7
  = Installation:
22
8
 
@@ -28,7 +14,9 @@ w3clove is a Ruby gem that can be installed on the usual way
28
14
 
29
15
  Pass it the url of an XML sitemap to be checked, like:
30
16
 
31
- w3clove http://www.ryanair.com/sitemap.xml
17
+ w3clove https://github.com/jaimeiniesta/w3clove/raw/master/spec/samples/sitemap.xml
18
+
19
+ This will validate all the URLs of your sitemap.xml (so be sure to pass it a short one or you'll wait for ages), and generate an HTML file with the full report.
32
20
 
33
21
  = Notes:
34
22
 
@@ -5,4 +5,5 @@ module W3Clove
5
5
  require_relative './w3clove/sitemap'
6
6
  require_relative './w3clove/page'
7
7
  require_relative './w3clove/message'
8
+ require_relative './w3clove/reporter'
8
9
  end
@@ -1,5 +1,6 @@
1
1
  # -*- encoding: utf-8 -*-
2
2
 
3
+ require 'timeout'
3
4
  require 'w3c_validators'
4
5
  include W3CValidators
5
6
 
@@ -16,15 +17,19 @@ module W3Clove
16
17
  end
17
18
 
18
19
  ##
19
- # Checks for errors and returns true if none found, false otherwise
20
- # warnings are not considered as validation errors so a page with
21
- # warnings but without errors will return true
20
+ # Checks for errors and returns true if none found, false otherwise.
21
+ # Warnings are not considered as validation errors so a page with
22
+ # warnings but without errors will return true.
22
23
  # If the validation goes well, errors should be an array. Otherwise
23
- # it will still be nil, which will not be considered validated
24
+ # it will still be nil, which will not be considered validated.
24
25
  def valid?
25
26
  !errors.nil? && errors.empty?
26
27
  end
27
28
 
29
+ ##
30
+ # Returns the collection of errors from the validations of this page.
31
+ # If it has no validation errors, it will be an empty array.
32
+ # It an exception occurs, it will be nil.
28
33
  def errors
29
34
  @errors ||= validations.errors.map {|e|
30
35
  W3Clove::Message.new(e.message_id,
@@ -36,6 +41,10 @@ module W3Clove
36
41
  nil
37
42
  end
38
43
 
44
+ ##
45
+ # Returns the collection of warnings from the validations of this page.
46
+ # If it has no validation warnings, it will be an empty array.
47
+ # It an exception occurs, it will be nil.
39
48
  def warnings
40
49
  @warnings ||= validations.warnings.map {|w|
41
50
  W3Clove::Message.new(w.message_id,
@@ -49,8 +58,10 @@ module W3Clove
49
58
 
50
59
  private
51
60
 
61
+ ##
62
+ # Gets the validations for this page, ensuring it times out soon
52
63
  def validations
53
- @validations ||= MarkupValidator.new.validate_uri(url)
64
+ @validations ||= Timeout::timeout(10) { MarkupValidator.new.validate_uri(url) }
54
65
  end
55
66
  end
56
67
  end
@@ -0,0 +1,21 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ require 'erb'
4
+
5
+ module W3Clove
6
+ module Reporter
7
+ extend self
8
+
9
+ ##
10
+ # Create the html report for the sitemap
11
+ def generate_html(sitemap)
12
+ template = ERB.new(open(File.dirname(__FILE__)+'/templates/w3clove.html.erb').read)
13
+
14
+ File.open('w3clove.html', 'w') do |f|
15
+ f.write(template.result(sitemap.get_binding))
16
+ end
17
+ rescue Exception => e
18
+ puts "ERROR generating report: #{e}"
19
+ end
20
+ end
21
+ end
@@ -19,11 +19,15 @@ module W3Clove
19
19
  end
20
20
 
21
21
  def errors
22
- pages.map {|p| p.errors}.flatten
22
+ @errors ||= pages.map {|p| p.errors}.flatten.reject {|e| e.nil?}
23
23
  end
24
24
 
25
25
  def warnings
26
- pages.map {|p| p.warnings}.flatten
26
+ @warnings ||= pages.map {|p| p.warnings}.flatten.reject {|e| e.nil?}
27
+ end
28
+
29
+ def get_binding
30
+ binding
27
31
  end
28
32
 
29
33
  private
@@ -0,0 +1,93 @@
1
+ <html>
2
+ <head>
3
+ <title>W3Clove report for <%= @url %></title>
4
+ <link href='http://fonts.googleapis.com/css?family=Cabin+Sketch:bold' rel='stylesheet' type='text/css'>
5
+ <style type="text/css">
6
+ #header, #main, #footer {
7
+ padding: 5px; margin: 5px;
8
+ background-color: #eee;
9
+ border: 1px solid black;
10
+ }
11
+ #header, h1, h2, h3 {
12
+ font-family: 'Cabin Sketch', arial, serif;
13
+ }
14
+ #header {
15
+ font-size: 2.5em;
16
+ }
17
+ #footer {
18
+ font-size: .85em;
19
+ text-align: center;
20
+ }
21
+ .page {
22
+ padding: 5px; margin: 5px;
23
+ border: 1px solid black;
24
+ background-color: white;
25
+ }
26
+ </style>
27
+ </head>
28
+ <body>
29
+ <div id="header">
30
+ W3Clove :: site-wide markup validation
31
+ </div>
32
+
33
+ <div id="main">
34
+ <h1>Report for <%= @url %></h1>
35
+
36
+ <h2>SITEMAP SUMMARY</h2>
37
+ <p>TOTAL: <%= errors.length %> errors, <%= warnings.length %> warnings.</p>
38
+
39
+ <% if errors.length > 0 %>
40
+ <h2>POPULAR ERRORS</h2>
41
+ <ul>
42
+ <% errors.group_by {|e| e.message_id}.sort_by {|m,e| e.length}.reverse.each do |message_id, errors| %>
43
+ <li>error <a href="http://validator.w3.org/docs/errors.html#ve-<%= message_id %>"><%= message_id %></a> happens <%= errors.length %> times</li>
44
+ <% end %>
45
+ </ul>
46
+ <% end %>
47
+
48
+ <% if warnings.length > 0 %>
49
+ <h2>POPULAR WARNINGS</h2>
50
+ <ul>
51
+ <% warnings.group_by {|w| w.message_id}.sort_by {|m,w| w.length}.reverse.each do |message_id, warnings| %>
52
+ <li>warning <a href="http://validator.w3.org/docs/errors.html#ve-<%= message_id %>"><%= message_id %></a> found <%= warnings.length %> times</li>
53
+ <% end %>
54
+ </ul>
55
+ <% end %>
56
+
57
+ <% processed_pages = pages.select{|p| !p.exception} %>
58
+ <% if processed_pages.size > 0 %>
59
+ <h2>DETAILS PER PAGE</h2>
60
+ <% processed_pages.each do |page| %>
61
+ <div class='page'>
62
+ <h3><%= page.url %></h3>
63
+ <p class='page_summary'>
64
+ <%= "#{page.errors.length} errors, #{page.warnings.length} warnings" %>
65
+ </p>
66
+
67
+ <ul class="page_errors">
68
+ <% page.errors.each do |error| %>
69
+ <li>
70
+ Error <a href="http://validator.w3.org/docs/errors.html#ve-<%= error.message_id %>"><%= error.message_id %></a> on line <%= error.line %>:
71
+ <%= error.text %>
72
+ </li>
73
+ <% end %>
74
+ </ul>
75
+
76
+ <ul class="page_warnings">
77
+ <% page.warnings.each do |warning| %>
78
+ <li>
79
+ Warning <a href="http://validator.w3.org/docs/errors.html#ve-<%= warning.message_id %>"><%= warning.message_id %></a> on line <%= warning.line %>:
80
+ <%= warning.text %>
81
+ </li>
82
+ <% end %>
83
+ </ul>
84
+ </div>
85
+ <% end %>
86
+ <% end %>
87
+ </div>
88
+
89
+ <div id="footer">
90
+ <strong>w3clove</strong>@2011 <a href='http://jaimeiniesta.com'>Jaime Iniesta</a>. Get w3clove from <a href="http://github.com/jaimeiniesta/w3clove">github</a>.
91
+ </div>
92
+ </body>
93
+ </html>
@@ -14,71 +14,21 @@ module W3Clove
14
14
  # Parses a remote xml sitemap and checks markup validation for each url
15
15
  # Shows progress on dot-style (...F...FFE..). A dot is a valid page,
16
16
  # an F is a page with errors, and an E is an exception
17
- # After the checking is done, a detailed summary is shown
17
+ # After the checking is done, a detailed summary is generated
18
18
  def check(url)
19
19
  sitemap = W3Clove::Sitemap.new(url)
20
- say "Validating #{sitemap.pages.length} pages..."
20
+ say "Validating #{sitemap.pages.length} pages"
21
21
 
22
22
  sitemap.pages.each do |page|
23
23
  say_inline page.valid? ? "." : (page.errors.nil? ? 'E' : 'F')
24
24
  end
25
25
 
26
- show_results(sitemap)
26
+ W3Clove::Reporter.generate_html(sitemap)
27
+ say "\nValidation finished, see the report at w3clove.html"
27
28
  end
28
29
 
29
30
  private
30
31
 
31
- ##
32
- # Outputs the results of the validation
33
- def show_results(sitemap)
34
- show_sitemap_summary(sitemap)
35
- show_popular_errors(sitemap)
36
- show_popular_warnings(sitemap)
37
- say "\n\nDETAILS PER PAGE"
38
- sitemap.pages.select {|page| !page.errors.empty?}.each do |p|
39
- show_page_summary(p)
40
- end
41
- end
42
-
43
- def show_sitemap_summary(sitemap)
44
- <<HEREDOC
45
- SITEMAP SUMMARY
46
- TOTAL: #{sitemap.errors.length} errors, #{sitemap.warnings.length} warnings
47
- HEREDOC
48
- end
49
-
50
- def show_popular_errors(sitemap)
51
- say "\n\nMOST POPULAR ERRORS\n"
52
- sitemap.errors.group_by {|e| e.message_id}
53
- .sort_by {|m,e| e.length}
54
- .reverse.each do |message_id, errors|
55
- say "error #{message_id} happens #{errors.length} times"
56
- end
57
- end
58
-
59
- def show_popular_warnings(sitemap)
60
- say "\n\nMOST POPULAR WARNINGS\n"
61
- sitemap.warnings.group_by {|e| e.message_id}
62
- .sort_by {|m,e| e.length}
63
- .reverse.each do |message_id, warnings|
64
- say "warning #{message_id} happens #{warnings.length} times"
65
- end
66
- end
67
-
68
- def show_page_summary(page)
69
- say "\n ** #{page.url} **"
70
- " #{page.errors.length} errors, #{page.warnings.length} warnings"
71
- page.errors.each do |error|
72
- say "\n Error #{error.message_id} on line #{error.line}:"
73
- say " #{error.text}"
74
- end
75
-
76
- page.warnings.each do |warning|
77
- say "\n Warning #{warning.message_id} on line #{warning.line}:"
78
- say " #{warning.text}"
79
- end
80
- end
81
-
82
32
  def printer
83
33
  @printer ||= STDOUT
84
34
  end
@@ -1,5 +1,5 @@
1
1
  # -*- encoding: utf-8 -*-
2
2
 
3
3
  module W3Clove
4
- VERSION = "0.2.1"
4
+ VERSION = "0.3.0"
5
5
  end
metadata CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
4
4
  prerelease: false
5
5
  segments:
6
6
  - 0
7
- - 2
8
- - 1
9
- version: 0.2.1
7
+ - 3
8
+ - 0
9
+ version: 0.3.0
10
10
  platform: ruby
11
11
  authors:
12
12
  - Jaime Iniesta
@@ -14,7 +14,7 @@ autorequire:
14
14
  bindir: bin
15
15
  cert_chain: []
16
16
 
17
- date: 2011-03-22 00:00:00 +01:00
17
+ date: 2011-03-23 00:00:00 +01:00
18
18
  default_executable:
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
@@ -90,7 +90,9 @@ files:
90
90
  - lib/w3clove.rb
91
91
  - lib/w3clove/message.rb
92
92
  - lib/w3clove/page.rb
93
+ - lib/w3clove/reporter.rb
93
94
  - lib/w3clove/sitemap.rb
95
+ - lib/w3clove/templates/w3clove.html.erb
94
96
  - lib/w3clove/validator.rb
95
97
  - lib/w3clove/version.rb
96
98
  - spec/message_spec.rb