blinkr 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d8e918082f460b113ad7517dc20f4838c1b3052d
4
- data.tar.gz: 523df75b0eb689beb7ce691371c4e0fc8eb422c0
3
+ metadata.gz: 8f345eab68efa8b4884096359a7bf64c0286687f
4
+ data.tar.gz: 2366eee2265b2033197e68341ded32dad7c68195
5
5
  SHA512:
6
- metadata.gz: aac832c6fb06a704a95d91249b64d278c694c6b990b30954ac1daacb8f93f721595f2fa9ae2daecc474eb5bd72415b315fc509a528dbb019b7a339326e9aa679
7
- data.tar.gz: ad1d2fc269e08db56f02eee1b0f4d101435d026c4de6357e750ea96f3e2dd8888aa332f686fa8ca453f7635a3e7bfb7acb9e82fbb4fba502ce88d3e61ea2ab7b
6
+ metadata.gz: 455e68a0cf36690b202beced966708b49aeca94d77ccbd46d01cff174731630175eb6e1e17b7437df9fedb5266ad9f4ba2fe1b20aa7ac7f73f98c24318fc40c0
7
+ data.tar.gz: 538f12426478bf7cc97208d3c244016ec6176c7d932a399c87555f13e5d69dbe56a065a620b929e0b47d37d050a51cd597724ab49297ca8c4a69ab81fe79ef17
data/README.md CHANGED
@@ -87,6 +87,8 @@ ignore_fragments: true
87
87
  # Control the number of threads used to run phantomjs. By default 8.
88
88
  phantomjs_threads: 8
89
89
 
90
+ # Export the report to phantomjs
91
+
90
92
  ````
91
93
 
92
94
  You can specify a custom config file on the command link:
@@ -114,6 +116,70 @@ mode (this is very verbose, so normally used with `-s`):
114
116
  blinkr -c my_blinkr.yaml -s http://www.acme.com/corp -v
115
117
  ````
116
118
 
119
+ ## Extending Blinkr
120
+
121
+ Blinkr is based around a pipeline. Issues with the pages are *collected*,
122
+ *analysed*, and then passed to the report for *transformation* and rendering.
123
+ Additional sections may *appended* to the report.
124
+
125
+ To add extensions to blinkr, you need to define a custom pipeline. The pipeline
126
+ is defined in a ruby file (e.g. `blinkr.rb`)
127
+
128
+ ````
129
+ require 'acme/spellcheck'
130
+
131
+ Blinkr::Extensions::Pipeline.new do |config|
132
+ # define the default extensions
133
+ extension Blinkr::Extensions::Links.new config
134
+ extension Blinkr::Extensions::JavaScript.new config
135
+ extension Blinkr::Extensions::Resources.new config
136
+
137
+ # define custom extensions
138
+ extension ACME::Extensions::SpellCheck.new config
139
+ end
140
+ ````
141
+
142
+ NOTE: You must add the default extensions to a custom pipeline, for them to be
143
+ executed.
144
+
145
+ The pipeline is defined in `blinkr.yaml`:
146
+
147
+ ````
148
+ # Use a custom pipeline
149
+ pipeline: blinkr.rb
150
+ ````
151
+
152
+ An extension is just a standard Ruby class. It should declare an
153
+ `initialize(config)` method, and may declare one or more of:
154
+
155
+ * `collect(page)`
156
+ * `analyze(context, typhoeus)`
157
+ * `transform(page, error, default_html)`
158
+ * `append(context)`
159
+
160
+ Each method is called as the pipeline progresses. Arguments passed are:
161
+
162
+ * `page` - a object containing the tyhpoeus `response`, the page `body` (as a
163
+ Nokogiri HTML document), an array of `errors` for the page, any
164
+ `resource_errors` which ocurred when the page was loaded, and any
165
+ `javascript_errors` which ocurred when the page was loaded
166
+ * `context` - a map of `url` => `page`s which are being analysed. After the
167
+ analyze phase, and before the transform phase, any pages with no errors
168
+ are removed from the context
169
+ * `typhoeus` - a wrapper around typhoeus, defining a `process` method and
170
+ a `process_all` method, both of which take a `url` and a `retry` limit, and
171
+ accept a block to execute when a response is returned.
172
+ * `error` - an individual error, consisting of a `type`, a `url`, a `title`, a
173
+ `code`, a `message`, a `detail`, a `snippet` and an fontawesome `icon` class
174
+ * `default_html` - the default HTML used to display the error
175
+
176
+ `transform` should return the HTML used to display the error. `append` should
177
+ return any HTML to be appended to the report. A templating language, such as
178
+ slim or haml may be used to generate the HTML.
179
+
180
+ The build extensions, in lib/blinkr/extensions are good examples of how
181
+ extensions can perform broken link analysis, or collect and format resource
182
+ loading and javascript execution errors.
117
183
 
118
184
  ## Contributing
119
185
 
data/lib/blinkr.rb CHANGED
@@ -1,21 +1,25 @@
1
1
  require 'blinkr/version'
2
- require 'blinkr/check'
2
+ require 'blinkr/engine'
3
3
  require 'blinkr/report'
4
+ require 'blinkr/config'
5
+ require 'blinkr/typhoeus_wrapper'
4
6
  require 'yaml'
5
7
 
6
8
  module Blinkr
7
9
  def self.run(base_url, config = 'blinkr.yaml', single, verbose, vverbose)
10
+ args = { :base_url => base_url, :verbose => verbose, :vverbose => vverbose }
8
11
  if !config.nil? && File.exists?(config)
9
- config = YAML.load_file(config)
12
+ config = Blinkr::Config.read config, args
10
13
  else
11
- config = {}
14
+ config = Blinkr::Config.new args
12
15
  end
13
- blinkr = Blinkr::Check.new(base_url || config['base_url'], sitemap: config['sitemap'], skips: config['skips'], max_retrys: config['max_retrys'], max_page_retrys: config['max_page_retrys'], verbose: verbose, vverbose: vverbose, browser: config['browser'], viewport: config['viewport'], ignore_fragments: config['ignore_fragments'], ignores: config['ignores'], phantomjs_threads: config['phantomjs_threads'])
16
+
14
17
  if single.nil?
15
- Blinkr::Report.render(blinkr.check, config['report'])
18
+ Blinkr::Engine.new(config).run
16
19
  else
17
- blinkr.single single
20
+ Blinkr::TyphoeusWrapper.new(config).debug(single)
18
21
  end
19
22
  end
20
23
 
21
24
  end
25
+
@@ -0,0 +1,46 @@
1
+ require 'ostruct'
2
+
3
+ module Blinkr
4
+ class Config < OpenStruct
5
+
6
+ def self.read file, args
7
+ raise "Cannot read #{file}" unless File.exists? file
8
+ Config.new(YAML.load_file(file).merge(args).merge({ :config_file => file }))
9
+ end
10
+
11
+ DEFAULTS = {:skips => [], :ignores => [], :max_retrys => 3, :browser => 'typhoeus', :viewport => 1200, :phantomjs_threads => 8, :report => 'blinkr.html'}
12
+
13
+ def initialize(hash={})
14
+ super(DEFAULTS.merge(hash))
15
+ end
16
+
17
+ def validate
18
+ ignores.each {|ignore| raise "An ignore must be a hash" unless ignore.is_a? Hash}
19
+ raise "Must specify base_url" if base_url.nil?
20
+ raise "Must specify sitemap" if sitemap.nil?
21
+ self
22
+ end
23
+
24
+ def sitemap
25
+ if super.nil?
26
+ URI.join(base_url, 'sitemap.xml').to_s
27
+ else
28
+ super
29
+ end
30
+ end
31
+
32
+ def max_page_retrys
33
+ @max_page_retrys || @max_retrys
34
+ end
35
+
36
+ def ignored? url, code, message
37
+ ignores.any? { |ignore| ( !url.nil? && ignore.has_key?('url') ? !ignore['url'].match(url).nil? : true ) && ( !code.nil? && ignore.has_key?('code') ? ignore['code'] == code : true ) && ( !message.nil? && ignore.has_key?('message') ? !ignore['message'].match(message).nil? : true ) }
38
+ end
39
+
40
+ def skipped? url
41
+ skips.any? { |regex| regex.match(url) }
42
+ end
43
+
44
+ end
45
+ end
46
+
@@ -0,0 +1,142 @@
1
+ require 'nokogiri'
2
+ require 'blinkr/phantomjs_wrapper'
3
+ require 'blinkr/typhoeus_wrapper'
4
+ require 'blinkr/http_utils'
5
+ require 'blinkr/sitemap'
6
+ require 'blinkr/report'
7
+ require 'blinkr/extensions/links'
8
+ require 'blinkr/extensions/javascript'
9
+ require 'blinkr/extensions/resources'
10
+ require 'blinkr/extensions/pipeline'
11
+ require 'json'
12
+ require 'ostruct'
13
+
14
+ # Monkeypatch OpenStruct
15
+ class OpenStruct
16
+
17
+ EXCEPT = [:response, :body, :resource_errors, :javascript_errors]
18
+
19
+ def to_json(*args)
20
+ to_h.delete_if{ |k, v| EXCEPT.include?(k) }.to_json(*args)
21
+ end
22
+
23
+ end
24
+
25
+ module Blinkr
26
+ class Engine
27
+ include HttpUtils
28
+ include Sitemap
29
+
30
+ def initialize config
31
+ @config = config.validate
32
+ @extensions = []
33
+ load_pipeline
34
+ end
35
+
36
+ def run
37
+ context = OpenStruct.new({:pages => {}})
38
+ typhoeus, browser = TyphoeusWrapper.new(@config, context)
39
+ browser = PhantomJSWrapper.new(@config, context) if @config.browser == 'phantomjs'
40
+ page_count = 0
41
+ browser.process_all(sitemap_locations, @config.max_page_retrys) do |response, resource_errors, javascript_errors|
42
+ if response.success?
43
+ url = response.request.base_url
44
+ puts "Loaded page #{url}" if @config.verbose
45
+ body = Nokogiri::HTML(response.body)
46
+ page = OpenStruct.new({ :response => response, :body => body, :errors => ErrorArray.new(@config), :resource_errors => resource_errors || [], :javascript_errors => javascript_errors || [] })
47
+ context.pages[url] = page
48
+ collect page
49
+ page_count += 1
50
+ else
51
+ puts "#{respones.code} #{response.status_message} Unable to load page #{url} #{'(' + response.return_message + ')' unless response.return_message.nil?}"
52
+ end
53
+ end
54
+ typhoeus.hydra.run if @config.browser == 'typhoeus'
55
+ analyze context, typhoeus
56
+ puts "Loaded #{page_count} pages using #{browser.name}. Performed #{typhoeus.count} requests using typhoeus."
57
+ context.pages.reject! { |url, page| page.errors.empty? }
58
+ unless @config.export.nil?
59
+ File.open(@config.export, 'w') do |file|
60
+ file.write(context.to_json)
61
+ end
62
+ end
63
+ Blinkr::Report.new(context, self, @config).render
64
+ end
65
+
66
+ def append context
67
+ exec :append, context
68
+ end
69
+
70
+ def transform page, error, &block
71
+ default = yield
72
+ result = exec(:transform, page, error, default)
73
+ if result.empty?
74
+ default
75
+ else
76
+ result.join
77
+ end
78
+ end
79
+
80
+ def analyze context, typhoeus
81
+ exec :analyze, context, typhoeus
82
+ end
83
+
84
+ def collect page
85
+ exec :collect, page
86
+ end
87
+
88
+ private
89
+
90
+ class ErrorArray < Array
91
+
92
+ def initialize config
93
+ @config = config
94
+ end
95
+
96
+ def << error
97
+ unless @config.ignored?(error.url, error.code, error.message)
98
+ super
99
+ else
100
+ self
101
+ end
102
+ end
103
+
104
+ end
105
+
106
+ def extension ext
107
+ @extensions << ext
108
+ end
109
+
110
+ def default_pipeline
111
+ extension Blinkr::Extensions::Links.new @config
112
+ extension Blinkr::Extensions::JavaScript.new @config
113
+ extension Blinkr::Extensions::Resources.new @config
114
+ end
115
+
116
+ def exec method, *args
117
+ result = []
118
+ @extensions.each do |e|
119
+ result << e.send(method, *args) if e.respond_to? method
120
+ end
121
+ result
122
+ end
123
+
124
+ def load_pipeline
125
+ unless @config.pipeline.nil?
126
+ pipeline_file = File.join(File.dirname(@config.config_file), @config.pipeline)
127
+ if File.exists?( pipeline_file )
128
+ p = eval(File.read( pipeline_file ), nil, pipeline_file, 1).load @config
129
+ p.extensions.each do |e|
130
+ extension( e )
131
+ end
132
+ else
133
+ raise "Cannot find pipeline file #{pipeline_file}"
134
+ end
135
+ else
136
+ default_pipeline
137
+ end
138
+ end
139
+
140
+ end
141
+ end
142
+
@@ -0,0 +1,17 @@
1
+ module Blinkr
2
+ module Extensions
3
+ class ATitle
4
+
5
+ def initialize config
6
+ @config = config
7
+ end
8
+
9
+ def collect page
10
+ page.body.css('a:not([title])').each do |a|
11
+ page.errors << OpenStruct.new({ :severity => 'info', :category => 'SEO', :type => '<a title=""> missing', :title => "#{a['href']} (line #{a.line})", :message => '<a title=""> missing', :snippet => a.to_s, :icon => 'fa-info' })
12
+ end
13
+ end
14
+
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,19 @@
1
+ module Blinkr
2
+ module Extensions
3
+ class EmptyAHref
4
+
5
+ def initialize config
6
+ @config = config
7
+ end
8
+
9
+ def collect page
10
+ page.body.css('a[href]').each do |a|
11
+ if a['href'].empty?
12
+ page.errors << OpenStruct.new({ :severity => 'info', :category => 'HTML Compatibility/Correctness', :type => '<a href=""> empty', :title => %Q{<a href=""> empty (line #{a.line})}, :message => %Q{<a href=""> empty}, :snippet => a.to_s, :icon => 'fa-info' })
13
+ end
14
+ end
15
+ end
16
+
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,17 @@
1
+ module Blinkr
2
+ module Extensions
3
+ class ImgAlt
4
+
5
+ def initialize config
6
+ @config = config
7
+ end
8
+
9
+ def collect page
10
+ page.body.css('img:not([alt])').each do |img|
11
+ page.errors << OpenStruct.new({ :severity => 'warning', :category => 'SEO', :type => '<img alt=""> missing', :title => "#{img['src']} (line #{img.line})", :message => '<img alt=""> missing', :snippet => img.to_s, :icon => 'fa-info' })
12
+ end
13
+ end
14
+
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,21 @@
1
+ module Blinkr
2
+ module Extensions
3
+ class InlineCss
4
+
5
+ def initialize config
6
+ @config = config
7
+ end
8
+
9
+ def collect page
10
+ page.body.css('[style]').each do |elm|
11
+ if elm['style'] == ""
12
+ page.errors << OpenStruct.new({ :severity => 'info', :category => 'HTML Compatibility/Correctness', :type => 'style attribute is empty', :title => %Q{"#{elm['style']}" (line #{elm.line})}, :message => 'style attribute is empty', :snippet => elm.to_s, :icon => 'fa-info' })
13
+ else
14
+ page.errors << OpenStruct.new({ :severity => 'info', :category => 'HTML Compatibility/Correctness', :type => 'Inline CSS detected', :title => %Q{"#{elm['style']}" (line #{elm.line})}, :message => 'inline style', :snippet => elm.to_s, :icon => 'fa-info' })
15
+ end
16
+ end
17
+ end
18
+
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,17 @@
1
+ module Blinkr
2
+ module Extensions
3
+ class JavaScript
4
+
5
+ def initialize config
6
+ @config = config
7
+ end
8
+
9
+ def collect page
10
+ page.javascript_errors.each do |error|
11
+ page.errors << OpenStruct.new({ :severity => 'danger', :category => 'JavaScript', :type => 'JavaScript error', :title => error.msg, :snippet => error.trace, :icon => 'fa-gears' })
12
+ end
13
+ end
14
+
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,53 @@
1
+ require 'blinkr/http_utils'
2
+
3
+ module Blinkr
4
+ module Extensions
5
+ class Links
6
+ include HttpUtils
7
+
8
+ def initialize config
9
+ @config = config
10
+ @links = {}
11
+ end
12
+
13
+ def collect page
14
+ page.body.css('a[href]').each do |a|
15
+ attr = a.attribute('href')
16
+ src = page.response.effective_url
17
+ url = attr.value
18
+ url = sanitize url, src
19
+ unless url.nil? || @config.skipped?(url)
20
+ @links[url] ||= []
21
+ @links[url] << {:page => page, :line => attr.line, :snippet => attr.parent.to_s}
22
+ end
23
+ end
24
+ end
25
+
26
+ def analyze context, typhoeus
27
+ puts "----------------------" if @config.verbose
28
+ puts " #{@links.length} links to check " if @config.verbose
29
+ puts "----------------------" if @config.verbose
30
+ @links.each do |url, metadata|
31
+ typhoeus.process(url, @config.max_retrys) do |resp|
32
+ puts "Loaded #{url} via typhoeus #{'(cached)' if resp.cached?}" if @config.verbose
33
+ unless resp.success? || resp.code == 200
34
+ metadata.each do |src|
35
+ code = resp.code.to_i unless resp.code.nil? || resp.code == 0
36
+ if resp.status_message.nil?
37
+ message = resp.return_message
38
+ else
39
+ message = resp.status_message
40
+ detail = resp.return_message unless resp.return_message == "No error"
41
+ end
42
+ src[:page].errors << OpenStruct.new({ :severity => 'danger', :category => 'Resources missing', :type => '<a href=""> target cannot be loaded', :url => url, :title => "#{url} (line #{src[:line]})", :code => code, :message => message, :detail => detail, :snippet => src[:snippet], :icon => 'fa-bookmark-o' })
43
+ end
44
+ end
45
+ end
46
+ end
47
+ typhoeus.hydra.run
48
+ end
49
+
50
+ end
51
+ end
52
+ end
53
+