doubletake 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +15 -0
- data/.gitignore +11 -0
- data/.rspec +2 -0
- data/Gemfile +4 -0
- data/README.md +147 -0
- data/Rakefile +1 -0
- data/bin/console +14 -0
- data/bin/setup +7 -0
- data/doubletake.gemspec +35 -0
- data/exe/doubletake +108 -0
- data/lib/config.rb +76 -0
- data/lib/crawler_lib.rb +66 -0
- data/lib/doubletake.rb +8 -0
- data/lib/doubletake/version.rb +3 -0
- data/lib/site_context.rb +248 -0
- metadata +158 -0
checksums.yaml
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
---
|
2
|
+
!binary "U0hBMQ==":
|
3
|
+
metadata.gz: !binary |-
|
4
|
+
ZTg2Nzg5ZDExOGQzN2QwNjdiNGMyODNmNTAwYTRhMGU4MGQzNDA1Zg==
|
5
|
+
data.tar.gz: !binary |-
|
6
|
+
OTM4MmRmM2FhOGExMGI4MzM4OWYxYWMwYzMzYWFmMTczNDdjZWM5OA==
|
7
|
+
SHA512:
|
8
|
+
metadata.gz: !binary |-
|
9
|
+
NTM2Y2M2ZGVmNmIwMzM4NTk1MjFjZWNkYmNmNTY0MDg3NDg2NWZhNGJhZmRm
|
10
|
+
YmY5Mjk3ODRhYjEwNTY0YzE5ODA5ZDg1MWU3ZTI0OTBjNTU1ZjIxOGI3MzQy
|
11
|
+
ODJkN2YyNjlmZGM1MDRmYzFkNGQ2NDJmZGNhZTYyMGNjYWU0ZDM=
|
12
|
+
data.tar.gz: !binary |-
|
13
|
+
OWE4Y2UxMmYyMDk4N2M1NGZkM2UxNzE3Zjk5MDgxZTg3OTU5MGZhZThiYjUx
|
14
|
+
ZDllNTIzM2VmYmU3MTYxODc2M2VlZGE5NmI4NWZhNTdkZTlhMjNkNDIyNDhh
|
15
|
+
OWU4YzgyZTBmZmRmZDgzZWMwODk5OTM3NWJlYTM2ZjFkNjA2YTU=
|
data/.gitignore
ADDED
data/.rspec
ADDED
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,147 @@
|
|
1
|
+
# DoubleTake
|
2
|
+
|
3
|
+
DoubleTake is a Visual regression testing tool for web-applications. It is packaged and distributed as a Ruby Gem.
|
4
|
+
|
5
|
+
|
6
|
+
## Installation
|
7
|
+
|
8
|
+
|
9
|
+
Install it using:
|
10
|
+
|
11
|
+
$ gem install doubletake
|
12
|
+
|
13
|
+
## Usage
|
14
|
+
|
15
|
+
To list all available features and functionality:
|
16
|
+
$ doubletake help
|
17
|
+
|
18
|
+
To see details on usage of each feature:
|
19
|
+
$ doubletake help [COMMAND]
|
20
|
+
|
21
|
+
## Quick Start Guide
|
22
|
+
|
23
|
+
### Compare
|
24
|
+
|
25
|
+
#### Why?
|
26
|
+
|
27
|
+
CSS allows changes to be made easily but very difficult to make sure those changes don't produce unexpected
|
28
|
+
consequences elsewhere in the project.
|
29
|
+
|
30
|
+
#### How?
|
31
|
+
|
32
|
+
DoubleTake is a command line tool and is configured using a YAML file.
|
33
|
+
|
34
|
+
Generating and editing a config file is easy.
|
35
|
+
```
|
36
|
+
Run:
|
37
|
+
$ doubletake generate --file /tmp/mysite_config.yml
|
38
|
+
```
|
39
|
+
|
40
|
+
Use your preferred file editor and edit the config file.
|
41
|
+
```
|
42
|
+
$ vim /tmp/mysite_config.yml
|
43
|
+
stage: 'https://mysite-stage-env.com'
|
44
|
+
prod: 'https://mysite-prod-env.com'
|
45
|
+
```
|
46
|
+
*NOTE*: Always add the scheme of the site while configuring i.e, http:// or https://
|
47
|
+
|
48
|
+
Edit the ignored list to skip URIs or file formats. DoubleTake generates sensible defaults that can be extended.
|
49
|
+
|
50
|
+
The window resolution can also be changed. The default desktop resolution is set to 1400x800
|
51
|
+
|
52
|
+
DoubleTake is cable to authenticate with user credentials provided in the config file. This allows more in-depth scans.
|
53
|
+
|
54
|
+
To do this simple make the following changes to the YAML file.
|
55
|
+
```
|
56
|
+
LOGIN: true
|
57
|
+
LOGIN_URI: user/login
|
58
|
+
USER_DOM_ID: edit-name
|
59
|
+
USER_VALUE: username
|
60
|
+
PASS_DOM_ID: edit-pass
|
61
|
+
PASS_VALUE: secret_password
|
62
|
+
```
|
63
|
+
LOGIN_URI is the path where a user has to navigate to reach the login page. For example:
|
64
|
+
https://mysite.com/login is the complete URL then the LOGIN_URI would be.
|
65
|
+
```
|
66
|
+
LOGIN_URI: login
|
67
|
+
```
|
68
|
+
USER_DOM_ID & PASS_DOM_ID are the web element ID which Selenium uses to find and type in the credentials.
|
69
|
+
USER_VALUE & PASS_VALUE are the actual username and password to login.
|
70
|
+
|
71
|
+
If you want DoubleTake to assert if it was successfully able to login with the given details:
|
72
|
+
```
|
73
|
+
LOGIN_CONFIRM: true
|
74
|
+
LOGIN_CONFIRM_CHECK: homepage-onsite-team
|
75
|
+
```
|
76
|
+
|
77
|
+
LOGIN_CONFIRM_CHECK: is the ID of the web element it should find if it was successfully able to login.
|
78
|
+
|
79
|
+
If you don't want to use this feature you can turn it off by setting
|
80
|
+
```
|
81
|
+
LOGIN_CONFIRM: false
|
82
|
+
```
|
83
|
+
There are other values in the config file such as
|
84
|
+
```
|
85
|
+
bad_links: []
|
86
|
+
to_be_scraped: []
|
87
|
+
scraped: []
|
88
|
+
LOGGED_IN: false
|
89
|
+
```
|
90
|
+
These are auto populated by DoubleTake during runtime and used with the Resume feature. Resume is covered later in this guide.
|
91
|
+
|
92
|
+
Now that we have our config file ready we can launch our first scan.
|
93
|
+
|
94
|
+
$ doubletake compare --conf /tmp/mysite_config.yml
|
95
|
+
|
96
|
+
If the Gem installed correctly and selenium was able to find the Firefox binary you will see two browsers loading pages and
|
97
|
+
DoubleTake will be taking screenshots and does image comparision.
|
98
|
+
|
99
|
+
Pages that have changed will have their screenshots saved in DoubleTake_data folder in your home directory.
|
100
|
+
|
101
|
+
### Scrape
|
102
|
+
|
103
|
+
#### Why?
|
104
|
+
|
105
|
+
Short answer is: I have been asked several times in the past to make complete site backups and preserve how they render in
|
106
|
+
multiple browsers. This was generally when a major revamp of the site was going to be made or as evidence to show to the client.
|
107
|
+
There are also times when the UI/UX developers need to quickly see how all the pages look without having to browse every single
|
108
|
+
page and prefer to see screenshots.
|
109
|
+
|
110
|
+
#### How?
|
111
|
+
|
112
|
+
All features in DoubleTake make use of the same config file format. We can reuse the config file we made earlier.
|
113
|
+
|
114
|
+
Scrape only looks at the "stage: " value in the config and ignores "prod: ".
|
115
|
+
|
116
|
+
To scrape the URL:
|
117
|
+
$ doubletake scrape --conf /tmp/mysite_config.yml
|
118
|
+
|
119
|
+
This time you should see only one Firefox browser launch and all the screenshots will be stored in DoubleTake_data folder in
|
120
|
+
your home directory.
|
121
|
+
|
122
|
+
### Resume
|
123
|
+
|
124
|
+
#### Why?
|
125
|
+
|
126
|
+
Cause shit happens and you don't want to restart a scan of a 4000 page e-commerce site after it has already scanned 2500 page.
|
127
|
+
|
128
|
+
#### How?
|
129
|
+
|
130
|
+
A progress config file is auto-generated when a Compare or Scrape scan is started. These files can be found in DoubleTake_data
|
131
|
+
folder in the home directory
|
132
|
+
|
133
|
+
Ex: ~/DoubleTake_data/desktop/progress_mysite.yml
|
134
|
+
|
135
|
+
In addition to this progress config file we also need to known the type of the scan that is being resumed i.e, compare / scrape
|
136
|
+
|
137
|
+
To Resume a scan:
|
138
|
+
$ doubletake resume --type compare --conf ~/DoubleTake_data/desktop/progress_mysite.com.yml
|
139
|
+
|
140
|
+
|
141
|
+
## Contributing
|
142
|
+
|
143
|
+
1. Fork it ( https://github.com/MelchiSalins/doubletake/fork )
|
144
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
145
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
146
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
147
|
+
5. Create a new Pull Request
|
data/Rakefile
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
require "bundler/gem_tasks"
|
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "doubletake"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start
|
data/bin/setup
ADDED
data/doubletake.gemspec
ADDED
@@ -0,0 +1,35 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'doubletake/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "doubletake"
|
8
|
+
spec.version = Doubletake::VERSION
|
9
|
+
spec.authors = ["Melchi Salins"]
|
10
|
+
spec.email = ["melchisalins@gmail.com"]
|
11
|
+
|
12
|
+
spec.summary = %q{Visual regression testing tool and more }
|
13
|
+
spec.homepage = "http://melchisalins.users.sf.net/"
|
14
|
+
spec.license = "MIT"
|
15
|
+
spec.description = "This is test description"
|
16
|
+
|
17
|
+
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
18
|
+
spec.bindir = "exe"
|
19
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
20
|
+
spec.require_paths = ["lib"]
|
21
|
+
|
22
|
+
if spec.respond_to?(:metadata)
|
23
|
+
spec.metadata['allowed_push_host'] = "https://rubygems.org"
|
24
|
+
end
|
25
|
+
|
26
|
+
spec.add_development_dependency "bundler", "~> 1.9"
|
27
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
28
|
+
spec.add_runtime_dependency "pry", "~> 0.10.1"
|
29
|
+
spec.add_runtime_dependency 'rmagick', '>= 2.13.4', '~> 2.13.4'
|
30
|
+
spec.add_runtime_dependency 'selenium-webdriver', '>= 2.45.0', '~> 2.45.0'
|
31
|
+
spec.add_runtime_dependency 'thor', '~> 0.19.1'
|
32
|
+
spec.post_install_message = "Thanks for installing!"
|
33
|
+
spec.required_ruby_version = '>= 1.9.3'
|
34
|
+
spec.requirements << 'Firefox browser'
|
35
|
+
end
|
data/exe/doubletake
ADDED
@@ -0,0 +1,108 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require 'doubletake'
|
4
|
+
require 'thor'
|
5
|
+
require 'uri'
|
6
|
+
require 'pry'
|
7
|
+
|
8
|
+
class DoubleTake < Thor
|
9
|
+
include CrawlerHelper
|
10
|
+
|
11
|
+
desc "compare", "Starts comparing the stage and prod site configured in the YAML file"
|
12
|
+
option :conf, :desc => "--conf config_file.yml", :alias => "-c", :required => true
|
13
|
+
def compare
|
14
|
+
puts "* DoubleTake Compare Initializing..."
|
15
|
+
$config = YAML.load(File.open(options[:conf], "r"))
|
16
|
+
$config.LOGGED_IN = false # forcing this variable to false.
|
17
|
+
# puts $config.inspect
|
18
|
+
start
|
19
|
+
end # compare
|
20
|
+
|
21
|
+
desc "scrape ", "Scrape the specified Domain."
|
22
|
+
option :conf, :desc => "--conf config_file.yml",
|
23
|
+
:alias => "-c", :required => true
|
24
|
+
def scrape
|
25
|
+
#TODO: This command doesn't support resume capability yet.
|
26
|
+
# Need to make resume a subcommand of comapre and scrape.
|
27
|
+
puts "* DoubleTake Scrape Initializing..."
|
28
|
+
$config = YAML.load(File.open(options[:conf], "r"))
|
29
|
+
$config.LOGGED_IN = false # forcing this variable to false.
|
30
|
+
# puts $config.inspect
|
31
|
+
start
|
32
|
+
end
|
33
|
+
|
34
|
+
desc "resume ", "Resumes a previous session of compare, Progress YAML file are located in the ./date directory."
|
35
|
+
option :conf, :required => true, :desc => "resume --conf progress_timestamp.yml",
|
36
|
+
:alias => "-c"
|
37
|
+
option :type, :required => true, :desc => "Specify the type of scan being resumed (scrape/compare).",
|
38
|
+
:alias => "-t"
|
39
|
+
def resume
|
40
|
+
puts "* Resuming previous session."
|
41
|
+
$config = YAML.load(File.open(options[:conf], "r"))
|
42
|
+
# puts $config.inspect
|
43
|
+
if ["scrape", "compare"].include? options[:type]
|
44
|
+
$cf = options[:type]
|
45
|
+
else
|
46
|
+
puts "--type should be either 'scrape' or 'compare'"
|
47
|
+
exit 1
|
48
|
+
end
|
49
|
+
start
|
50
|
+
end
|
51
|
+
|
52
|
+
desc "generate", "Generate a config file template."
|
53
|
+
option :file, :required => true, :desc => "Filename and path to write to. Ex: ~/config.yml"
|
54
|
+
def generate
|
55
|
+
begin
|
56
|
+
c = Configuration.new
|
57
|
+
File.open(options[:file],"w") {|f| f.write(c.to_yaml)}
|
58
|
+
rescue Exception => error
|
59
|
+
puts error
|
60
|
+
exit 1
|
61
|
+
end
|
62
|
+
end
|
63
|
+
no_commands do
|
64
|
+
def start
|
65
|
+
begin
|
66
|
+
$cf = caller[0][/`([^']*)'/, 1] unless $cf# Setting Calling function unless already set.
|
67
|
+
threads = []
|
68
|
+
$domains_list = {}
|
69
|
+
if $config.all_good?
|
70
|
+
puts "* $config.all_good? returned true."
|
71
|
+
else
|
72
|
+
puts "- There is error(s) in the suppplied config file. Cannot proceed."
|
73
|
+
exit(1)
|
74
|
+
end
|
75
|
+
$hostname = URI.parse($config.stage).host # This is required to name folders to store scan results.
|
76
|
+
puts $hostname
|
77
|
+
$domains_list[$hostname] = [$config.stage, $config.prod]
|
78
|
+
|
79
|
+
$domains_list.each do |site, urls|
|
80
|
+
# Here we pick each domain and spawn new threads
|
81
|
+
# for the regression tests.
|
82
|
+
# TODO: This is useful only when stage and prod are a list
|
83
|
+
# of domains.
|
84
|
+
threads << Thread.new {task_1 = Crawler.new(site, urls[0], urls[1])
|
85
|
+
task_1.crawl if $config.LOGGED_IN == false
|
86
|
+
# Add code to check if creds are provided and then execute following:
|
87
|
+
if $config.LOGIN
|
88
|
+
task_1.login_to_as(urls[0], task_1.driver1)
|
89
|
+
task_1.login_to_as(urls[1], task_1.driver2) if $cf == "compare"
|
90
|
+
$config.to_be_scraped << task_1.driver1.current_url
|
91
|
+
task_1.crawl
|
92
|
+
end
|
93
|
+
task_1.clean_up
|
94
|
+
}
|
95
|
+
puts "* Thread Dispatched"
|
96
|
+
end
|
97
|
+
threads.each {|thr| thr.join}
|
98
|
+
rescue Interrupt
|
99
|
+
puts "\n* Exiting."
|
100
|
+
rescue Exception => e
|
101
|
+
puts e
|
102
|
+
puts "\n* Exiting."
|
103
|
+
end
|
104
|
+
end
|
105
|
+
end
|
106
|
+
end
|
107
|
+
|
108
|
+
DoubleTake.start(ARGV)
|
data/lib/config.rb
ADDED
@@ -0,0 +1,76 @@
|
|
1
|
+
require 'yaml'
|
2
|
+
require 'pry'
|
3
|
+
require './lib/crawler_lib.rb'
|
4
|
+
|
5
|
+
class Configuration
|
6
|
+
attr_accessor :stage, :prod, :ignored, :DESKTOP, :to_be_scraped
|
7
|
+
attr_accessor :scraped, :bad_links, :LOGGED_IN
|
8
|
+
attr_reader :LOGIN_URI, :USER_VALUE, :USER_DOM_ID
|
9
|
+
attr_reader :PASS_VALUE, :PASS_DOM_ID
|
10
|
+
attr_reader :LOGIN_CONFIRM_CHECK, :LOGIN_CONFIRM, :LOGIN
|
11
|
+
attr_reader :URI_THRESHOLD, :IMAGE_THRESHOLD
|
12
|
+
|
13
|
+
def initialize
|
14
|
+
@stage = ""
|
15
|
+
@prod = ""
|
16
|
+
@ignored = ["ignore_me", "not_important_url_prefix",".css", ".pdf", ".js", ".jpg", ".png", "video/pop", "user/logout", "?", "=", "#"]
|
17
|
+
@SCREEN_RESOLUTION = {:desktop => [1400,800]}
|
18
|
+
@IMAGE_THRESHOLD = 0
|
19
|
+
@LOGIN = true
|
20
|
+
@LOGIN_URI = 'user/login' # http://example.com/login
|
21
|
+
@USER_DOM_ID = 'edit-name'
|
22
|
+
@USER_VALUE = 'melchisalins'
|
23
|
+
@PASS_DOM_ID = 'edit-pass'
|
24
|
+
@PASS_VALUE = 'secret_password'
|
25
|
+
@LOGIN_CONFIRM = false
|
26
|
+
@LOGIN_CONFIRM_CHECK = 'homepage-onsite-team'
|
27
|
+
@bad_links = []
|
28
|
+
@to_be_scraped = []
|
29
|
+
@scraped = []
|
30
|
+
@LOGGED_IN = false
|
31
|
+
end
|
32
|
+
|
33
|
+
def all_good?
|
34
|
+
begin
|
35
|
+
# Fixes scheme of the URL if not present. This is needed by Selenium
|
36
|
+
return_value = false
|
37
|
+
if @stage.length <= 0 && @prod.length <= 0
|
38
|
+
puts "Stage and Production URL missing."
|
39
|
+
return_value = false
|
40
|
+
return return_value
|
41
|
+
else
|
42
|
+
@stage = fix_scheme(@stage) if URI.parse(@stage).scheme == nil
|
43
|
+
@prod = fix_scheme(@prod) if URI.parse(@prod).scheme == nil
|
44
|
+
return_value = true
|
45
|
+
end
|
46
|
+
|
47
|
+
if @LOGIN && @LOGIN_URI.nil? == false && @USER_DOM_ID.nil? == false && @USER_VALUE.nil? == false && @PASS_DOM_ID.nil? == false && @PASS_VALUE.nil? == false
|
48
|
+
return_value = true
|
49
|
+
else
|
50
|
+
puts "Please configure LOGIN parameters"
|
51
|
+
return_value = false
|
52
|
+
return return_value
|
53
|
+
end
|
54
|
+
|
55
|
+
if @LOGIN_CONFIRM && @LOGIN_CONFIRM_CHECK
|
56
|
+
return_value = true
|
57
|
+
elsif @LOGIN_CONFIRM == false
|
58
|
+
return_value = true
|
59
|
+
else
|
60
|
+
puts "Please configure LOGIN_CONFIRM_CHECK value"
|
61
|
+
return_value = false
|
62
|
+
return return_value
|
63
|
+
end
|
64
|
+
|
65
|
+
return return_value
|
66
|
+
rescue Exception => e
|
67
|
+
puts e
|
68
|
+
return false
|
69
|
+
end
|
70
|
+
end
|
71
|
+
end
|
72
|
+
# c = Configuration.new
|
73
|
+
# File.open("test_yaml.yml","w") {|f| f.write(c.to_yaml)}
|
74
|
+
# y = YAML.load(File.open("test_yaml.yml", "r"))
|
75
|
+
# puts y.inspect
|
76
|
+
# y.validate
|
data/lib/crawler_lib.rb
ADDED
@@ -0,0 +1,66 @@
|
|
1
|
+
require './lib/config.rb'
|
2
|
+
require 'uri'
|
3
|
+
require 'selenium-webdriver'
|
4
|
+
|
5
|
+
module CrawlerHelper
|
6
|
+
def test
|
7
|
+
puts "Test call"
|
8
|
+
end
|
9
|
+
|
10
|
+
def fix_scheme(url)
|
11
|
+
puts "- No scheme provided for #{url}, trying to fix it."
|
12
|
+
driver = Selenium::WebDriver.for :firefox
|
13
|
+
driver.get("http://"+url) #assumes redirect to https is setup if it exists.
|
14
|
+
url_tmp = driver.current_url
|
15
|
+
scheme = URI.parse(url_tmp).scheme
|
16
|
+
driver.quit
|
17
|
+
puts "scheme is: #{scheme}"
|
18
|
+
return scheme+"://"+url
|
19
|
+
end
|
20
|
+
|
21
|
+
|
22
|
+
def sanitize(link)
|
23
|
+
# puts link
|
24
|
+
name = link.gsub(":", "")
|
25
|
+
name = name.gsub("/", "")
|
26
|
+
name = name.gsub("%", "")
|
27
|
+
name = name.gsub('\\', "")
|
28
|
+
name = name.gsub('.', "")
|
29
|
+
# puts name
|
30
|
+
return name
|
31
|
+
end
|
32
|
+
#
|
33
|
+
# def bad_link?(each_link)
|
34
|
+
# # return True if these characters exists in
|
35
|
+
# # the URL: $, #, png, css, js, jpg, pdf
|
36
|
+
# # Check for both upper and lower case ^
|
37
|
+
# begin
|
38
|
+
# each = each_link.upcase
|
39
|
+
# if each.include?("?") || each.include?("#") || each.include?(".PNG") || each.include?(".CSS") || each.include?("JS") || each.include?("JPG") || each.include?("PDF") || each.include?("/VIDEO/POP")
|
40
|
+
# # puts "Bad Link: " + each.to_s
|
41
|
+
# return true
|
42
|
+
# else
|
43
|
+
# return false
|
44
|
+
# end
|
45
|
+
# rescue Exception => e
|
46
|
+
# puts e.message
|
47
|
+
# puts e.backtrace.inspect
|
48
|
+
# end
|
49
|
+
#
|
50
|
+
# end
|
51
|
+
|
52
|
+
def do_not_ignore?(each_link, scraped)
|
53
|
+
# This checks if the passed link should be
|
54
|
+
# scraped or not based on:
|
55
|
+
# Has it already been scraped, is it bad_link?
|
56
|
+
# puts each_link
|
57
|
+
# puts scraped.class
|
58
|
+
if scraped.include?(each_link)
|
59
|
+
return false
|
60
|
+
elsif bad_link?(each_link)
|
61
|
+
return false
|
62
|
+
else
|
63
|
+
return true
|
64
|
+
end
|
65
|
+
end
|
66
|
+
end
|
data/lib/doubletake.rb
ADDED
data/lib/site_context.rb
ADDED
@@ -0,0 +1,248 @@
|
|
1
|
+
require 'fileutils'
|
2
|
+
require 'selenium-webdriver'
|
3
|
+
require './lib/crawler_lib.rb'
|
4
|
+
require './lib/config.rb'
|
5
|
+
require 'pry'
|
6
|
+
require 'RMagick'
|
7
|
+
require 'yaml'
|
8
|
+
require 'csv'
|
9
|
+
|
10
|
+
# require 'pry-debugger'
|
11
|
+
|
12
|
+
# class Configuration
|
13
|
+
# # attr_accessor :stage, :prod, :ignored, :DESKTOP
|
14
|
+
# def initialize
|
15
|
+
# @stage = "https://rialto-stage.equiem.com.au"
|
16
|
+
# @prod = "https://atrialto.com"
|
17
|
+
# @ignored = ["ignore_me", "not_important_url_prefix",".css", ".pdf", ".js", ".jpg", ".png", "video/pop", "user/logout", "?", "=", "#"]
|
18
|
+
# @SCREEN_RESOLUTION = {:desktop => [1400,800], :mobile => [300,150]}
|
19
|
+
# @IMAGE_THRESHOLD = 0
|
20
|
+
# @LOGIN = true
|
21
|
+
# @LOGIN_URI = 'login' # http://example.com/login
|
22
|
+
# @USER_DOM_ID = 'edit-name'
|
23
|
+
# @USER_VALUE = 'melchisalins'
|
24
|
+
# @PASS_DOM_ID = 'edit-pass'
|
25
|
+
# @PASS_VALUE = 'secret_password'
|
26
|
+
# @LOGIN_CONFIRM = true
|
27
|
+
# @LOGIN_CONFIRM_CHECK = 'homepage-onsite-team'
|
28
|
+
# end
|
29
|
+
# end
|
30
|
+
|
31
|
+
class SiteContext
|
32
|
+
attr_accessor :driver
|
33
|
+
|
34
|
+
def initialize
|
35
|
+
puts "SiteContext initializes!"
|
36
|
+
end
|
37
|
+
|
38
|
+
def set_driver(browser = :chrome, remote = "http://192.168.15.43:4444/wd/hub/")
|
39
|
+
# driver = Selenium::WebDriver.for(:remote, :url => remote, :desired_capabilities => browser)
|
40
|
+
driver = Selenium::WebDriver.for :firefox
|
41
|
+
return driver
|
42
|
+
end
|
43
|
+
|
44
|
+
def login_to_as(site, driver)
|
45
|
+
$config.LOGGED_IN = true
|
46
|
+
driver.get(site + $config.LOGIN_URI)
|
47
|
+
username = driver.find_element(:id, $config.USER_DOM_ID)
|
48
|
+
username.clear
|
49
|
+
username.send_keys($config.USER_VALUE)
|
50
|
+
password = driver.find_element(:id, $config.PASS_DOM_ID)
|
51
|
+
password.clear
|
52
|
+
password.send_keys($config.PASS_VALUE+"\n")
|
53
|
+
|
54
|
+
if "Terms and Conditions".include?(driver.title)
|
55
|
+
driver.find_element(:id, "edit-legal-accept").click
|
56
|
+
driver.find_element(:id, "edit-save").click
|
57
|
+
end
|
58
|
+
|
59
|
+
if driver.find_element(:class, $config.LOGIN_CONFIRM_CHECK) != nil
|
60
|
+
return true
|
61
|
+
else
|
62
|
+
return false
|
63
|
+
end
|
64
|
+
end #End of Method login_to_as
|
65
|
+
end
|
66
|
+
|
67
|
+
class Crawler < SiteContext
|
68
|
+
include CrawlerHelper
|
69
|
+
include Magick
|
70
|
+
attr_accessor :site, :driver1, :driver2, :progress
|
71
|
+
|
72
|
+
class Progress
|
73
|
+
attr_accessor :driver1_type, :driver2_type
|
74
|
+
attr_accessor :stage, :prod
|
75
|
+
attr_accessor :bad_links, :to_be_scraped, :scraped
|
76
|
+
|
77
|
+
def initialize
|
78
|
+
@driver1_type = ""
|
79
|
+
@driver2_type = ""
|
80
|
+
@stage = ""
|
81
|
+
@prod = ""
|
82
|
+
@bad_links = []
|
83
|
+
@to_be_scraped = []
|
84
|
+
@scraped = []
|
85
|
+
end
|
86
|
+
end
|
87
|
+
|
88
|
+
|
89
|
+
def initialize(site, test, base, browser = :firefox)
|
90
|
+
@site = site.to_s
|
91
|
+
FileUtils::mkdir_p "#{ENV['HOME']}/DoubleTake_data/desktop/"+@site
|
92
|
+
puts "* Screenshots are saved in #{ENV['HOME']}/DoubleTake_data/desktop/"+@site
|
93
|
+
$config.to_be_scraped << test
|
94
|
+
@test_domain_length = test.length
|
95
|
+
puts "Crawler initialized"
|
96
|
+
@driver1 = SiteContext.new
|
97
|
+
@driver1 = @driver1.set_driver(browser)
|
98
|
+
@driver1.get(test)
|
99
|
+
@driver1.manage.window.resize_to(1400, 800) #Desktop size
|
100
|
+
unless $cf == "scrape"
|
101
|
+
@driver2 = SiteContext.new
|
102
|
+
@driver2 = @driver2.set_driver(browser)
|
103
|
+
@driver2.get(base)
|
104
|
+
@driver2.manage.window.resize_to(1400, 800) #Desktop size
|
105
|
+
end
|
106
|
+
end
|
107
|
+
|
108
|
+
def clean_up
|
109
|
+
@driver1.quit
|
110
|
+
@driver2.quit unless $cf == "scrape"
|
111
|
+
puts "Destroyed WebDriver instances."
|
112
|
+
end
|
113
|
+
|
114
|
+
def bad_link?(link)
|
115
|
+
# This should populate bad_links[] and return bool
|
116
|
+
# regarding the link being passed in.
|
117
|
+
# $config.scraped << link
|
118
|
+
# Bad link could be parameterised URLs(?, #) or
|
119
|
+
# External domains such as facebook, twitter etc. or
|
120
|
+
# link is non http Ex: mailto: ftp: file: etc.
|
121
|
+
# puts $config.scraped
|
122
|
+
if link.include?("$") || link.include?("#") || link.include?(".png") || link.include?(".js")
|
123
|
+
puts "Bad Link: "+link
|
124
|
+
$config.scraped << link if ($config.scraped.include?(link) == false)
|
125
|
+
$config.bad_links << link if ($config.bad_links.include?(link) == false)
|
126
|
+
return true
|
127
|
+
elsif link.include?(".pdf") || link.include?(".jpeg") || link.include?(".css") || link.include?(".jpg")
|
128
|
+
puts "Bad Link: "+link
|
129
|
+
$config.scraped << link if ($config.scraped.include?(link) == false)
|
130
|
+
$config.bad_links << link if ($config.bad_links.include?(link) == false)
|
131
|
+
return true
|
132
|
+
elsif link.include?("video/pop") || link.include?("?") || link.include?("/user/logout")
|
133
|
+
puts "Bad Link: "+link
|
134
|
+
$config.scraped << link if ($config.scraped.include?(link) == false)
|
135
|
+
$config.bad_links << link if ($config.bad_links.include?(link) == false)
|
136
|
+
return true
|
137
|
+
elsif link[0..3] != "http" #TODO: This doesn't seem to work.
|
138
|
+
puts "Bad Link: "+link
|
139
|
+
$config.scraped << link if ($config.scraped.include?(link) == false)
|
140
|
+
$config.bad_links << link if ($config.bad_links.include?(link) == false)
|
141
|
+
return true
|
142
|
+
elsif link[0..@test_domain_length-1] != $config.stage || link.include?("%")
|
143
|
+
puts "Out of Scope: "+link
|
144
|
+
$config.scraped << link if ($config.scraped.include?(link) == false)
|
145
|
+
$config.bad_links << link if ($config.bad_links.include?(link) == false)
|
146
|
+
return true
|
147
|
+
elsif $config.scraped.include?(link)
|
148
|
+
puts "Already scraped: "+link
|
149
|
+
$config.bad_links << link if ($config.bad_links.include?(link) == false)
|
150
|
+
return true
|
151
|
+
else
|
152
|
+
puts "Good Link: "+link
|
153
|
+
return false
|
154
|
+
end #End of `if`
|
155
|
+
|
156
|
+
end
|
157
|
+
|
158
|
+
def crawl
|
159
|
+
# unless $config.to_be_scraped.empty?
|
160
|
+
loop do
|
161
|
+
puts "length of progress.to_be_scraped: #{$config.to_be_scraped.length.to_s}"
|
162
|
+
break if $config.to_be_scraped.length < 1
|
163
|
+
puts "length of @to_be_scrapped: #{$config.to_be_scraped.length.to_s}"
|
164
|
+
$config.to_be_scraped.each do |each_link|
|
165
|
+
puts "* length of @to_be_scrapped: #{$config.to_be_scraped.length.to_s}"
|
166
|
+
puts "** length of $config.scraped: #{$config.scraped.length.to_s}"
|
167
|
+
puts "Working on: #{each_link}"
|
168
|
+
begin
|
169
|
+
@driver1.get(each_link)
|
170
|
+
#### This Code block collects New Links and cleans
|
171
|
+
# $config.to_be_scraped Array.
|
172
|
+
all_a_objs = @driver1.find_elements(:xpath, '//a')
|
173
|
+
all_a_objs.each do |each_a_obj|
|
174
|
+
if each_a_obj.attribute("href") != nil #Why? Cause some link obj are dicks and don't have a href
|
175
|
+
# TODO: ^^ This if should be changed to begin - rescue
|
176
|
+
$config.to_be_scraped << each_a_obj.attribute("href") if (each_a_obj.attribute("href").include?("http") && bad_link?(each_a_obj.attribute("href")) == false)
|
177
|
+
end
|
178
|
+
end #all_a_objs.each do |each_a_obj|
|
179
|
+
$config.to_be_scraped.uniq! # Remove duplicate links.
|
180
|
+
$config.to_be_scraped.each do |each_new_link|
|
181
|
+
#This code block cleans the $config.to_be_scraped Array
|
182
|
+
$config.to_be_scraped = $config.to_be_scraped - [each_new_link] if ($config.scraped.include?(each_new_link) || bad_link?(each_new_link))
|
183
|
+
end #$config.to_be_scraped.each do |each_new_link|
|
184
|
+
if $config.scraped.include?(each_link)
|
185
|
+
# In case a bad link makes it into the loop
|
186
|
+
# This code-block will skip over it.
|
187
|
+
# It will also delte it from the $config.to_be_scraped Array
|
188
|
+
$config.to_be_scraped = $config.to_be_scraped - [each_link]
|
189
|
+
puts "Already Scrapped linked creeped in: #{each_link}"
|
190
|
+
# next
|
191
|
+
end
|
192
|
+
|
193
|
+
#
|
194
|
+
### End of Code Block to collect URL's to be scraped
|
195
|
+
stage_uri = each_link[@test_domain_length..-1]
|
196
|
+
prod_link = $config.prod + stage_uri
|
197
|
+
# *****************************************
|
198
|
+
if $cf == "scrape"
|
199
|
+
warning_log = CSV.open("#{ENV['HOME']}/DoubleTake_data/desktop/#{@site}.csv", "a")
|
200
|
+
name = sanitize(stage_uri)
|
201
|
+
@driver1.save_screenshot("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
|
202
|
+
@driver1.find_elements(:css, ".messages.error").each do |ele|
|
203
|
+
warning_log << [@driver1.current_url, ele.text] unless ele.nil?
|
204
|
+
end
|
205
|
+
warning_log.close
|
206
|
+
else
|
207
|
+
@driver2.get(prod_link)
|
208
|
+
image_stuff(stage_uri)
|
209
|
+
end
|
210
|
+
# *****************************************
|
211
|
+
$config.scraped << each_link # Last Step: Marking the URL as scraped!
|
212
|
+
$config.scraped.uniq! # bad_link? may add duplicate entries
|
213
|
+
$config.to_be_scraped = $config.to_be_scraped - [nil] # This was issue when .delete() was used which resulted in element replaced by nil
|
214
|
+
$config.to_be_scraped = $config.to_be_scraped - [each_link]
|
215
|
+
$config.to_be_scraped.uniq!
|
216
|
+
File.open("#{ENV['HOME']}/DoubleTake_data/desktop/progress_#{@site}.yml", "w") {|f| f.write($config.to_yaml)}
|
217
|
+
rescue Selenium::WebDriver::Error::StaleElementReferenceError => e
|
218
|
+
puts "Stale element error occured moving to next link: #{stage_uri}"
|
219
|
+
puts e
|
220
|
+
next
|
221
|
+
rescue Exception => e
|
222
|
+
puts "Generic Exception occured"
|
223
|
+
binding.pry
|
224
|
+
puts e.backtrace
|
225
|
+
next
|
226
|
+
end #End of begin
|
227
|
+
end #$config.to_be_scraped.each do |each_link|
|
228
|
+
end # Loop do
|
229
|
+
puts "to_be_scraped: " + $config.to_be_scraped.to_s
|
230
|
+
puts "scraped : " + $config.scraped.to_s
|
231
|
+
end # Crawl Ending
|
232
|
+
|
233
|
+
def image_stuff(stage_uri)
|
234
|
+
name = sanitize(stage_uri)
|
235
|
+
@driver1.save_screenshot("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
|
236
|
+
@driver2.save_screenshot("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_prod.png")
|
237
|
+
# a, b = IO.read("#{ENV['HOME']}/DoubleTake_data/desktop/stage_"+@site+"/"+name+".png")[0x10..0x18].unpack('NN')
|
238
|
+
img1 = ImageList.new("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
|
239
|
+
img2 = ImageList.new("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_prod.png")
|
240
|
+
diff_img, diff_metric = img1[0].compare_channel( img2[0], Magick::MeanSquaredErrorMetric )
|
241
|
+
if diff_metric > $config.IMAGE_THRESHOLD
|
242
|
+
diff_img.write("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_diff.png")
|
243
|
+
else
|
244
|
+
File.delete("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
|
245
|
+
File.delete("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_prod.png")
|
246
|
+
end # if diff_metric > $IMAGE_THRESHOLD
|
247
|
+
end #def image_stuff(image1, image2)
|
248
|
+
end #Class Crawler < SiteContext
|
metadata
ADDED
@@ -0,0 +1,158 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: doubletake
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Melchi Salins
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2015-04-26 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: bundler
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ~>
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '1.9'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ~>
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '1.9'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rake
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ~>
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '10.0'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ~>
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '10.0'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: pry
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - ~>
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: 0.10.1
|
48
|
+
type: :runtime
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - ~>
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: 0.10.1
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: rmagick
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ! '>='
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: 2.13.4
|
62
|
+
- - ~>
|
63
|
+
- !ruby/object:Gem::Version
|
64
|
+
version: 2.13.4
|
65
|
+
type: :runtime
|
66
|
+
prerelease: false
|
67
|
+
version_requirements: !ruby/object:Gem::Requirement
|
68
|
+
requirements:
|
69
|
+
- - ! '>='
|
70
|
+
- !ruby/object:Gem::Version
|
71
|
+
version: 2.13.4
|
72
|
+
- - ~>
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: 2.13.4
|
75
|
+
- !ruby/object:Gem::Dependency
|
76
|
+
name: selenium-webdriver
|
77
|
+
requirement: !ruby/object:Gem::Requirement
|
78
|
+
requirements:
|
79
|
+
- - ! '>='
|
80
|
+
- !ruby/object:Gem::Version
|
81
|
+
version: 2.45.0
|
82
|
+
- - ~>
|
83
|
+
- !ruby/object:Gem::Version
|
84
|
+
version: 2.45.0
|
85
|
+
type: :runtime
|
86
|
+
prerelease: false
|
87
|
+
version_requirements: !ruby/object:Gem::Requirement
|
88
|
+
requirements:
|
89
|
+
- - ! '>='
|
90
|
+
- !ruby/object:Gem::Version
|
91
|
+
version: 2.45.0
|
92
|
+
- - ~>
|
93
|
+
- !ruby/object:Gem::Version
|
94
|
+
version: 2.45.0
|
95
|
+
- !ruby/object:Gem::Dependency
|
96
|
+
name: thor
|
97
|
+
requirement: !ruby/object:Gem::Requirement
|
98
|
+
requirements:
|
99
|
+
- - ~>
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: 0.19.1
|
102
|
+
type: :runtime
|
103
|
+
prerelease: false
|
104
|
+
version_requirements: !ruby/object:Gem::Requirement
|
105
|
+
requirements:
|
106
|
+
- - ~>
|
107
|
+
- !ruby/object:Gem::Version
|
108
|
+
version: 0.19.1
|
109
|
+
description: This is test description
|
110
|
+
email:
|
111
|
+
- melchisalins@gmail.com
|
112
|
+
executables:
|
113
|
+
- doubletake
|
114
|
+
extensions: []
|
115
|
+
extra_rdoc_files: []
|
116
|
+
files:
|
117
|
+
- .gitignore
|
118
|
+
- .rspec
|
119
|
+
- .travis.yml
|
120
|
+
- Gemfile
|
121
|
+
- README.md
|
122
|
+
- Rakefile
|
123
|
+
- bin/console
|
124
|
+
- bin/setup
|
125
|
+
- doubletake.gemspec
|
126
|
+
- exe/doubletake
|
127
|
+
- lib/config.rb
|
128
|
+
- lib/crawler_lib.rb
|
129
|
+
- lib/doubletake.rb
|
130
|
+
- lib/doubletake/version.rb
|
131
|
+
- lib/site_context.rb
|
132
|
+
homepage: http://melchisalins.users.sf.net/
|
133
|
+
licenses:
|
134
|
+
- MIT
|
135
|
+
metadata:
|
136
|
+
allowed_push_host: https://rubygems.org
|
137
|
+
post_install_message: Thanks for installing!
|
138
|
+
rdoc_options: []
|
139
|
+
require_paths:
|
140
|
+
- lib
|
141
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
142
|
+
requirements:
|
143
|
+
- - ! '>='
|
144
|
+
- !ruby/object:Gem::Version
|
145
|
+
version: 1.9.3
|
146
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
147
|
+
requirements:
|
148
|
+
- - ! '>='
|
149
|
+
- !ruby/object:Gem::Version
|
150
|
+
version: '0'
|
151
|
+
requirements:
|
152
|
+
- Firefox browser
|
153
|
+
rubyforge_project:
|
154
|
+
rubygems_version: 2.4.6
|
155
|
+
signing_key:
|
156
|
+
specification_version: 4
|
157
|
+
summary: Visual regression testing tool and more
|
158
|
+
test_files: []
|