doubletake 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,15 @@
1
+ ---
2
+ !binary "U0hBMQ==":
3
+ metadata.gz: !binary |-
4
+ ZTg2Nzg5ZDExOGQzN2QwNjdiNGMyODNmNTAwYTRhMGU4MGQzNDA1Zg==
5
+ data.tar.gz: !binary |-
6
+ OTM4MmRmM2FhOGExMGI4MzM4OWYxYWMwYzMzYWFmMTczNDdjZWM5OA==
7
+ SHA512:
8
+ metadata.gz: !binary |-
9
+ NTM2Y2M2ZGVmNmIwMzM4NTk1MjFjZWNkYmNmNTY0MDg3NDg2NWZhNGJhZmRm
10
+ YmY5Mjk3ODRhYjEwNTY0YzE5ODA5ZDg1MWU3ZTI0OTBjNTU1ZjIxOGI3MzQy
11
+ ODJkN2YyNjlmZGM1MDRmYzFkNGQ2NDJmZGNhZTYyMGNjYWU0ZDM=
12
+ data.tar.gz: !binary |-
13
+ OWE4Y2UxMmYyMDk4N2M1NGZkM2UxNzE3Zjk5MDgxZTg3OTU5MGZhZThiYjUx
14
+ ZDllNTIzM2VmYmU3MTYxODc2M2VlZGE5NmI4NWZhNTdkZTlhMjNkNDIyNDhh
15
+ OWU4YzgyZTBmZmRmZDgzZWMwODk5OTM3NWJlYTM2ZjFkNjA2YTU=
data/.gitignore ADDED
@@ -0,0 +1,11 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ ./data/*
11
+ .travis.yml
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in doubletake.gemspec
4
+ gemspec
data/README.md ADDED
@@ -0,0 +1,147 @@
1
+ # DoubleTake
2
+
3
+ DoubleTake is a Visual regression testing tool for web-applications. It is packaged and distributed as a Ruby Gem.
4
+
5
+
6
+ ## Installation
7
+
8
+
9
+ Install it using:
10
+
11
+ $ gem install doubletake
12
+
13
+ ## Usage
14
+
15
+ To list all available features and functionality:
16
+ $ doubletake help
17
+
18
+ To see details on usage of each feature:
19
+ $ doubletake help [COMMAND]
20
+
21
+ ## Quick Start Guide
22
+
23
+ ### Compare
24
+
25
+ #### Why?
26
+
27
+ CSS allows changes to be made easily but very difficult to make sure those changes don't produce unexpected
28
+ consequences elsewhere in the project.
29
+
30
+ #### How?
31
+
32
+ DoubleTake is a command line tool and is configured using a YAML file.
33
+
34
+ Generating and editing a config file is easy.
35
+ ```
36
+ Run:
37
+ $ doubletake generate --file /tmp/mysite_config.yml
38
+ ```
39
+
40
+ Use your preferred file editor and edit the config file.
41
+ ```
42
+ $ vim /tmp/mysite_config.yml
43
+ stage: 'https://mysite-stage-env.com'
44
+ prod: 'https://mysite-prod-env.com'
45
+ ```
46
+ *NOTE*: Always add the scheme of the site while configuring i.e, http:// or https://
47
+
48
+ Edit the ignored list to skip URIs or file formats. DoubleTake generates sensible defaults that can be extended.
49
+
50
+ The window resolution can also be changed. The default desktop resolution is set to 1400x800
51
+
52
+ DoubleTake is cable to authenticate with user credentials provided in the config file. This allows more in-depth scans.
53
+
54
+ To do this simple make the following changes to the YAML file.
55
+ ```
56
+ LOGIN: true
57
+ LOGIN_URI: user/login
58
+ USER_DOM_ID: edit-name
59
+ USER_VALUE: username
60
+ PASS_DOM_ID: edit-pass
61
+ PASS_VALUE: secret_password
62
+ ```
63
+ LOGIN_URI is the path where a user has to navigate to reach the login page. For example:
64
+ https://mysite.com/login is the complete URL then the LOGIN_URI would be.
65
+ ```
66
+ LOGIN_URI: login
67
+ ```
68
+ USER_DOM_ID & PASS_DOM_ID are the web element ID which Selenium uses to find and type in the credentials.
69
+ USER_VALUE & PASS_VALUE are the actual username and password to login.
70
+
71
+ If you want DoubleTake to assert if it was successfully able to login with the given details:
72
+ ```
73
+ LOGIN_CONFIRM: true
74
+ LOGIN_CONFIRM_CHECK: homepage-onsite-team
75
+ ```
76
+
77
+ LOGIN_CONFIRM_CHECK: is the ID of the web element it should find if it was successfully able to login.
78
+
79
+ If you don't want to use this feature you can turn it off by setting
80
+ ```
81
+ LOGIN_CONFIRM: false
82
+ ```
83
+ There are other values in the config file such as
84
+ ```
85
+ bad_links: []
86
+ to_be_scraped: []
87
+ scraped: []
88
+ LOGGED_IN: false
89
+ ```
90
+ These are auto populated by DoubleTake during runtime and used with the Resume feature. Resume is covered later in this guide.
91
+
92
+ Now that we have our config file ready we can launch our first scan.
93
+
94
+ $ doubletake compare --conf /tmp/mysite_config.yml
95
+
96
+ If the Gem installed correctly and selenium was able to find the Firefox binary you will see two browsers loading pages and
97
+ DoubleTake will be taking screenshots and does image comparision.
98
+
99
+ Pages that have changed will have their screenshots saved in DoubleTake_data folder in your home directory.
100
+
101
+ ### Scrape
102
+
103
+ #### Why?
104
+
105
+ Short answer is: I have been asked several times in the past to make complete site backups and preserve how they render in
106
+ multiple browsers. This was generally when a major revamp of the site was going to be made or as evidence to show to the client.
107
+ There are also times when the UI/UX developers need to quickly see how all the pages look without having to browse every single
108
+ page and prefer to see screenshots.
109
+
110
+ #### How?
111
+
112
+ All features in DoubleTake make use of the same config file format. We can reuse the config file we made earlier.
113
+
114
+ Scrape only looks at the "stage: " value in the config and ignores "prod: ".
115
+
116
+ To scrape the URL:
117
+ $ doubletake scrape --conf /tmp/mysite_config.yml
118
+
119
+ This time you should see only one Firefox browser launch and all the screenshots will be stored in DoubleTake_data folder in
120
+ your home directory.
121
+
122
+ ### Resume
123
+
124
+ #### Why?
125
+
126
+ Cause shit happens and you don't want to restart a scan of a 4000 page e-commerce site after it has already scanned 2500 page.
127
+
128
+ #### How?
129
+
130
+ A progress config file is auto-generated when a Compare or Scrape scan is started. These files can be found in DoubleTake_data
131
+ folder in the home directory
132
+
133
+ Ex: ~/DoubleTake_data/desktop/progress_mysite.yml
134
+
135
+ In addition to this progress config file we also need to known the type of the scan that is being resumed i.e, compare / scrape
136
+
137
+ To Resume a scan:
138
+ $ doubletake resume --type compare --conf ~/DoubleTake_data/desktop/progress_mysite.com.yml
139
+
140
+
141
+ ## Contributing
142
+
143
+ 1. Fork it ( https://github.com/MelchiSalins/doubletake/fork )
144
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
145
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
146
+ 4. Push to the branch (`git push origin my-new-feature`)
147
+ 5. Create a new Pull Request
data/Rakefile ADDED
@@ -0,0 +1 @@
1
+ require "bundler/gem_tasks"
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "doubletake"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,35 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'doubletake/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "doubletake"
8
+ spec.version = Doubletake::VERSION
9
+ spec.authors = ["Melchi Salins"]
10
+ spec.email = ["melchisalins@gmail.com"]
11
+
12
+ spec.summary = %q{Visual regression testing tool and more }
13
+ spec.homepage = "http://melchisalins.users.sf.net/"
14
+ spec.license = "MIT"
15
+ spec.description = "This is test description"
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
+ spec.bindir = "exe"
19
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
+ spec.require_paths = ["lib"]
21
+
22
+ if spec.respond_to?(:metadata)
23
+ spec.metadata['allowed_push_host'] = "https://rubygems.org"
24
+ end
25
+
26
+ spec.add_development_dependency "bundler", "~> 1.9"
27
+ spec.add_development_dependency "rake", "~> 10.0"
28
+ spec.add_runtime_dependency "pry", "~> 0.10.1"
29
+ spec.add_runtime_dependency 'rmagick', '>= 2.13.4', '~> 2.13.4'
30
+ spec.add_runtime_dependency 'selenium-webdriver', '>= 2.45.0', '~> 2.45.0'
31
+ spec.add_runtime_dependency 'thor', '~> 0.19.1'
32
+ spec.post_install_message = "Thanks for installing!"
33
+ spec.required_ruby_version = '>= 1.9.3'
34
+ spec.requirements << 'Firefox browser'
35
+ end
data/exe/doubletake ADDED
@@ -0,0 +1,108 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'doubletake'
4
+ require 'thor'
5
+ require 'uri'
6
+ require 'pry'
7
+
8
+ class DoubleTake < Thor
9
+ include CrawlerHelper
10
+
11
+ desc "compare", "Starts comparing the stage and prod site configured in the YAML file"
12
+ option :conf, :desc => "--conf config_file.yml", :alias => "-c", :required => true
13
+ def compare
14
+ puts "* DoubleTake Compare Initializing..."
15
+ $config = YAML.load(File.open(options[:conf], "r"))
16
+ $config.LOGGED_IN = false # forcing this variable to false.
17
+ # puts $config.inspect
18
+ start
19
+ end # compare
20
+
21
+ desc "scrape ", "Scrape the specified Domain."
22
+ option :conf, :desc => "--conf config_file.yml",
23
+ :alias => "-c", :required => true
24
+ def scrape
25
+ #TODO: This command doesn't support resume capability yet.
26
+ # Need to make resume a subcommand of comapre and scrape.
27
+ puts "* DoubleTake Scrape Initializing..."
28
+ $config = YAML.load(File.open(options[:conf], "r"))
29
+ $config.LOGGED_IN = false # forcing this variable to false.
30
+ # puts $config.inspect
31
+ start
32
+ end
33
+
34
+ desc "resume ", "Resumes a previous session of compare, Progress YAML file are located in the ./date directory."
35
+ option :conf, :required => true, :desc => "resume --conf progress_timestamp.yml",
36
+ :alias => "-c"
37
+ option :type, :required => true, :desc => "Specify the type of scan being resumed (scrape/compare).",
38
+ :alias => "-t"
39
+ def resume
40
+ puts "* Resuming previous session."
41
+ $config = YAML.load(File.open(options[:conf], "r"))
42
+ # puts $config.inspect
43
+ if ["scrape", "compare"].include? options[:type]
44
+ $cf = options[:type]
45
+ else
46
+ puts "--type should be either 'scrape' or 'compare'"
47
+ exit 1
48
+ end
49
+ start
50
+ end
51
+
52
+ desc "generate", "Generate a config file template."
53
+ option :file, :required => true, :desc => "Filename and path to write to. Ex: ~/config.yml"
54
+ def generate
55
+ begin
56
+ c = Configuration.new
57
+ File.open(options[:file],"w") {|f| f.write(c.to_yaml)}
58
+ rescue Exception => error
59
+ puts error
60
+ exit 1
61
+ end
62
+ end
63
+ no_commands do
64
+ def start
65
+ begin
66
+ $cf = caller[0][/`([^']*)'/, 1] unless $cf# Setting Calling function unless already set.
67
+ threads = []
68
+ $domains_list = {}
69
+ if $config.all_good?
70
+ puts "* $config.all_good? returned true."
71
+ else
72
+ puts "- There is error(s) in the suppplied config file. Cannot proceed."
73
+ exit(1)
74
+ end
75
+ $hostname = URI.parse($config.stage).host # This is required to name folders to store scan results.
76
+ puts $hostname
77
+ $domains_list[$hostname] = [$config.stage, $config.prod]
78
+
79
+ $domains_list.each do |site, urls|
80
+ # Here we pick each domain and spawn new threads
81
+ # for the regression tests.
82
+ # TODO: This is useful only when stage and prod are a list
83
+ # of domains.
84
+ threads << Thread.new {task_1 = Crawler.new(site, urls[0], urls[1])
85
+ task_1.crawl if $config.LOGGED_IN == false
86
+ # Add code to check if creds are provided and then execute following:
87
+ if $config.LOGIN
88
+ task_1.login_to_as(urls[0], task_1.driver1)
89
+ task_1.login_to_as(urls[1], task_1.driver2) if $cf == "compare"
90
+ $config.to_be_scraped << task_1.driver1.current_url
91
+ task_1.crawl
92
+ end
93
+ task_1.clean_up
94
+ }
95
+ puts "* Thread Dispatched"
96
+ end
97
+ threads.each {|thr| thr.join}
98
+ rescue Interrupt
99
+ puts "\n* Exiting."
100
+ rescue Exception => e
101
+ puts e
102
+ puts "\n* Exiting."
103
+ end
104
+ end
105
+ end
106
+ end
107
+
108
+ DoubleTake.start(ARGV)
data/lib/config.rb ADDED
@@ -0,0 +1,76 @@
1
+ require 'yaml'
2
+ require 'pry'
3
+ require './lib/crawler_lib.rb'
4
+
5
+ class Configuration
6
+ attr_accessor :stage, :prod, :ignored, :DESKTOP, :to_be_scraped
7
+ attr_accessor :scraped, :bad_links, :LOGGED_IN
8
+ attr_reader :LOGIN_URI, :USER_VALUE, :USER_DOM_ID
9
+ attr_reader :PASS_VALUE, :PASS_DOM_ID
10
+ attr_reader :LOGIN_CONFIRM_CHECK, :LOGIN_CONFIRM, :LOGIN
11
+ attr_reader :URI_THRESHOLD, :IMAGE_THRESHOLD
12
+
13
+ def initialize
14
+ @stage = ""
15
+ @prod = ""
16
+ @ignored = ["ignore_me", "not_important_url_prefix",".css", ".pdf", ".js", ".jpg", ".png", "video/pop", "user/logout", "?", "=", "#"]
17
+ @SCREEN_RESOLUTION = {:desktop => [1400,800]}
18
+ @IMAGE_THRESHOLD = 0
19
+ @LOGIN = true
20
+ @LOGIN_URI = 'user/login' # http://example.com/login
21
+ @USER_DOM_ID = 'edit-name'
22
+ @USER_VALUE = 'melchisalins'
23
+ @PASS_DOM_ID = 'edit-pass'
24
+ @PASS_VALUE = 'secret_password'
25
+ @LOGIN_CONFIRM = false
26
+ @LOGIN_CONFIRM_CHECK = 'homepage-onsite-team'
27
+ @bad_links = []
28
+ @to_be_scraped = []
29
+ @scraped = []
30
+ @LOGGED_IN = false
31
+ end
32
+
33
+ def all_good?
34
+ begin
35
+ # Fixes scheme of the URL if not present. This is needed by Selenium
36
+ return_value = false
37
+ if @stage.length <= 0 && @prod.length <= 0
38
+ puts "Stage and Production URL missing."
39
+ return_value = false
40
+ return return_value
41
+ else
42
+ @stage = fix_scheme(@stage) if URI.parse(@stage).scheme == nil
43
+ @prod = fix_scheme(@prod) if URI.parse(@prod).scheme == nil
44
+ return_value = true
45
+ end
46
+
47
+ if @LOGIN && @LOGIN_URI.nil? == false && @USER_DOM_ID.nil? == false && @USER_VALUE.nil? == false && @PASS_DOM_ID.nil? == false && @PASS_VALUE.nil? == false
48
+ return_value = true
49
+ else
50
+ puts "Please configure LOGIN parameters"
51
+ return_value = false
52
+ return return_value
53
+ end
54
+
55
+ if @LOGIN_CONFIRM && @LOGIN_CONFIRM_CHECK
56
+ return_value = true
57
+ elsif @LOGIN_CONFIRM == false
58
+ return_value = true
59
+ else
60
+ puts "Please configure LOGIN_CONFIRM_CHECK value"
61
+ return_value = false
62
+ return return_value
63
+ end
64
+
65
+ return return_value
66
+ rescue Exception => e
67
+ puts e
68
+ return false
69
+ end
70
+ end
71
+ end
72
+ # c = Configuration.new
73
+ # File.open("test_yaml.yml","w") {|f| f.write(c.to_yaml)}
74
+ # y = YAML.load(File.open("test_yaml.yml", "r"))
75
+ # puts y.inspect
76
+ # y.validate
@@ -0,0 +1,66 @@
1
+ require './lib/config.rb'
2
+ require 'uri'
3
+ require 'selenium-webdriver'
4
+
5
+ module CrawlerHelper
6
+ def test
7
+ puts "Test call"
8
+ end
9
+
10
+ def fix_scheme(url)
11
+ puts "- No scheme provided for #{url}, trying to fix it."
12
+ driver = Selenium::WebDriver.for :firefox
13
+ driver.get("http://"+url) #assumes redirect to https is setup if it exists.
14
+ url_tmp = driver.current_url
15
+ scheme = URI.parse(url_tmp).scheme
16
+ driver.quit
17
+ puts "scheme is: #{scheme}"
18
+ return scheme+"://"+url
19
+ end
20
+
21
+
22
+ def sanitize(link)
23
+ # puts link
24
+ name = link.gsub(":", "")
25
+ name = name.gsub("/", "")
26
+ name = name.gsub("%", "")
27
+ name = name.gsub('\\', "")
28
+ name = name.gsub('.', "")
29
+ # puts name
30
+ return name
31
+ end
32
+ #
33
+ # def bad_link?(each_link)
34
+ # # return True if these characters exists in
35
+ # # the URL: $, #, png, css, js, jpg, pdf
36
+ # # Check for both upper and lower case ^
37
+ # begin
38
+ # each = each_link.upcase
39
+ # if each.include?("?") || each.include?("#") || each.include?(".PNG") || each.include?(".CSS") || each.include?("JS") || each.include?("JPG") || each.include?("PDF") || each.include?("/VIDEO/POP")
40
+ # # puts "Bad Link: " + each.to_s
41
+ # return true
42
+ # else
43
+ # return false
44
+ # end
45
+ # rescue Exception => e
46
+ # puts e.message
47
+ # puts e.backtrace.inspect
48
+ # end
49
+ #
50
+ # end
51
+
52
+ def do_not_ignore?(each_link, scraped)
53
+ # This checks if the passed link should be
54
+ # scraped or not based on:
55
+ # Has it already been scraped, is it bad_link?
56
+ # puts each_link
57
+ # puts scraped.class
58
+ if scraped.include?(each_link)
59
+ return false
60
+ elsif bad_link?(each_link)
61
+ return false
62
+ else
63
+ return true
64
+ end
65
+ end
66
+ end
data/lib/doubletake.rb ADDED
@@ -0,0 +1,8 @@
1
+ require "doubletake/version"
2
+ require './lib/crawler_lib.rb'
3
+ require './lib/config.rb'
4
+ require './lib/site_context.rb'
5
+
6
+ module Doubletake
7
+ # Your code goes here...
8
+ end
@@ -0,0 +1,3 @@
1
+ module Doubletake
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,248 @@
1
+ require 'fileutils'
2
+ require 'selenium-webdriver'
3
+ require './lib/crawler_lib.rb'
4
+ require './lib/config.rb'
5
+ require 'pry'
6
+ require 'RMagick'
7
+ require 'yaml'
8
+ require 'csv'
9
+
10
+ # require 'pry-debugger'
11
+
12
+ # class Configuration
13
+ # # attr_accessor :stage, :prod, :ignored, :DESKTOP
14
+ # def initialize
15
+ # @stage = "https://rialto-stage.equiem.com.au"
16
+ # @prod = "https://atrialto.com"
17
+ # @ignored = ["ignore_me", "not_important_url_prefix",".css", ".pdf", ".js", ".jpg", ".png", "video/pop", "user/logout", "?", "=", "#"]
18
+ # @SCREEN_RESOLUTION = {:desktop => [1400,800], :mobile => [300,150]}
19
+ # @IMAGE_THRESHOLD = 0
20
+ # @LOGIN = true
21
+ # @LOGIN_URI = 'login' # http://example.com/login
22
+ # @USER_DOM_ID = 'edit-name'
23
+ # @USER_VALUE = 'melchisalins'
24
+ # @PASS_DOM_ID = 'edit-pass'
25
+ # @PASS_VALUE = 'secret_password'
26
+ # @LOGIN_CONFIRM = true
27
+ # @LOGIN_CONFIRM_CHECK = 'homepage-onsite-team'
28
+ # end
29
+ # end
30
+
31
+ class SiteContext
32
+ attr_accessor :driver
33
+
34
+ def initialize
35
+ puts "SiteContext initializes!"
36
+ end
37
+
38
+ def set_driver(browser = :chrome, remote = "http://192.168.15.43:4444/wd/hub/")
39
+ # driver = Selenium::WebDriver.for(:remote, :url => remote, :desired_capabilities => browser)
40
+ driver = Selenium::WebDriver.for :firefox
41
+ return driver
42
+ end
43
+
44
+ def login_to_as(site, driver)
45
+ $config.LOGGED_IN = true
46
+ driver.get(site + $config.LOGIN_URI)
47
+ username = driver.find_element(:id, $config.USER_DOM_ID)
48
+ username.clear
49
+ username.send_keys($config.USER_VALUE)
50
+ password = driver.find_element(:id, $config.PASS_DOM_ID)
51
+ password.clear
52
+ password.send_keys($config.PASS_VALUE+"\n")
53
+
54
+ if "Terms and Conditions".include?(driver.title)
55
+ driver.find_element(:id, "edit-legal-accept").click
56
+ driver.find_element(:id, "edit-save").click
57
+ end
58
+
59
+ if driver.find_element(:class, $config.LOGIN_CONFIRM_CHECK) != nil
60
+ return true
61
+ else
62
+ return false
63
+ end
64
+ end #End of Method login_to_as
65
+ end
66
+
67
+ class Crawler < SiteContext
68
+ include CrawlerHelper
69
+ include Magick
70
+ attr_accessor :site, :driver1, :driver2, :progress
71
+
72
+ class Progress
73
+ attr_accessor :driver1_type, :driver2_type
74
+ attr_accessor :stage, :prod
75
+ attr_accessor :bad_links, :to_be_scraped, :scraped
76
+
77
+ def initialize
78
+ @driver1_type = ""
79
+ @driver2_type = ""
80
+ @stage = ""
81
+ @prod = ""
82
+ @bad_links = []
83
+ @to_be_scraped = []
84
+ @scraped = []
85
+ end
86
+ end
87
+
88
+
89
+ def initialize(site, test, base, browser = :firefox)
90
+ @site = site.to_s
91
+ FileUtils::mkdir_p "#{ENV['HOME']}/DoubleTake_data/desktop/"+@site
92
+ puts "* Screenshots are saved in #{ENV['HOME']}/DoubleTake_data/desktop/"+@site
93
+ $config.to_be_scraped << test
94
+ @test_domain_length = test.length
95
+ puts "Crawler initialized"
96
+ @driver1 = SiteContext.new
97
+ @driver1 = @driver1.set_driver(browser)
98
+ @driver1.get(test)
99
+ @driver1.manage.window.resize_to(1400, 800) #Desktop size
100
+ unless $cf == "scrape"
101
+ @driver2 = SiteContext.new
102
+ @driver2 = @driver2.set_driver(browser)
103
+ @driver2.get(base)
104
+ @driver2.manage.window.resize_to(1400, 800) #Desktop size
105
+ end
106
+ end
107
+
108
+ def clean_up
109
+ @driver1.quit
110
+ @driver2.quit unless $cf == "scrape"
111
+ puts "Destroyed WebDriver instances."
112
+ end
113
+
114
+ def bad_link?(link)
115
+ # This should populate bad_links[] and return bool
116
+ # regarding the link being passed in.
117
+ # $config.scraped << link
118
+ # Bad link could be parameterised URLs(?, #) or
119
+ # External domains such as facebook, twitter etc. or
120
+ # link is non http Ex: mailto: ftp: file: etc.
121
+ # puts $config.scraped
122
+ if link.include?("$") || link.include?("#") || link.include?(".png") || link.include?(".js")
123
+ puts "Bad Link: "+link
124
+ $config.scraped << link if ($config.scraped.include?(link) == false)
125
+ $config.bad_links << link if ($config.bad_links.include?(link) == false)
126
+ return true
127
+ elsif link.include?(".pdf") || link.include?(".jpeg") || link.include?(".css") || link.include?(".jpg")
128
+ puts "Bad Link: "+link
129
+ $config.scraped << link if ($config.scraped.include?(link) == false)
130
+ $config.bad_links << link if ($config.bad_links.include?(link) == false)
131
+ return true
132
+ elsif link.include?("video/pop") || link.include?("?") || link.include?("/user/logout")
133
+ puts "Bad Link: "+link
134
+ $config.scraped << link if ($config.scraped.include?(link) == false)
135
+ $config.bad_links << link if ($config.bad_links.include?(link) == false)
136
+ return true
137
+ elsif link[0..3] != "http" #TODO: This doesn't seem to work.
138
+ puts "Bad Link: "+link
139
+ $config.scraped << link if ($config.scraped.include?(link) == false)
140
+ $config.bad_links << link if ($config.bad_links.include?(link) == false)
141
+ return true
142
+ elsif link[0..@test_domain_length-1] != $config.stage || link.include?("%")
143
+ puts "Out of Scope: "+link
144
+ $config.scraped << link if ($config.scraped.include?(link) == false)
145
+ $config.bad_links << link if ($config.bad_links.include?(link) == false)
146
+ return true
147
+ elsif $config.scraped.include?(link)
148
+ puts "Already scraped: "+link
149
+ $config.bad_links << link if ($config.bad_links.include?(link) == false)
150
+ return true
151
+ else
152
+ puts "Good Link: "+link
153
+ return false
154
+ end #End of `if`
155
+
156
+ end
157
+
158
+ def crawl
159
+ # unless $config.to_be_scraped.empty?
160
+ loop do
161
+ puts "length of progress.to_be_scraped: #{$config.to_be_scraped.length.to_s}"
162
+ break if $config.to_be_scraped.length < 1
163
+ puts "length of @to_be_scrapped: #{$config.to_be_scraped.length.to_s}"
164
+ $config.to_be_scraped.each do |each_link|
165
+ puts "* length of @to_be_scrapped: #{$config.to_be_scraped.length.to_s}"
166
+ puts "** length of $config.scraped: #{$config.scraped.length.to_s}"
167
+ puts "Working on: #{each_link}"
168
+ begin
169
+ @driver1.get(each_link)
170
+ #### This Code block collects New Links and cleans
171
+ # $config.to_be_scraped Array.
172
+ all_a_objs = @driver1.find_elements(:xpath, '//a')
173
+ all_a_objs.each do |each_a_obj|
174
+ if each_a_obj.attribute("href") != nil #Why? Cause some link obj are dicks and don't have a href
175
+ # TODO: ^^ This if should be changed to begin - rescue
176
+ $config.to_be_scraped << each_a_obj.attribute("href") if (each_a_obj.attribute("href").include?("http") && bad_link?(each_a_obj.attribute("href")) == false)
177
+ end
178
+ end #all_a_objs.each do |each_a_obj|
179
+ $config.to_be_scraped.uniq! # Remove duplicate links.
180
+ $config.to_be_scraped.each do |each_new_link|
181
+ #This code block cleans the $config.to_be_scraped Array
182
+ $config.to_be_scraped = $config.to_be_scraped - [each_new_link] if ($config.scraped.include?(each_new_link) || bad_link?(each_new_link))
183
+ end #$config.to_be_scraped.each do |each_new_link|
184
+ if $config.scraped.include?(each_link)
185
+ # In case a bad link makes it into the loop
186
+ # This code-block will skip over it.
187
+ # It will also delte it from the $config.to_be_scraped Array
188
+ $config.to_be_scraped = $config.to_be_scraped - [each_link]
189
+ puts "Already Scrapped linked creeped in: #{each_link}"
190
+ # next
191
+ end
192
+
193
+ #
194
+ ### End of Code Block to collect URL's to be scraped
195
+ stage_uri = each_link[@test_domain_length..-1]
196
+ prod_link = $config.prod + stage_uri
197
+ # *****************************************
198
+ if $cf == "scrape"
199
+ warning_log = CSV.open("#{ENV['HOME']}/DoubleTake_data/desktop/#{@site}.csv", "a")
200
+ name = sanitize(stage_uri)
201
+ @driver1.save_screenshot("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
202
+ @driver1.find_elements(:css, ".messages.error").each do |ele|
203
+ warning_log << [@driver1.current_url, ele.text] unless ele.nil?
204
+ end
205
+ warning_log.close
206
+ else
207
+ @driver2.get(prod_link)
208
+ image_stuff(stage_uri)
209
+ end
210
+ # *****************************************
211
+ $config.scraped << each_link # Last Step: Marking the URL as scraped!
212
+ $config.scraped.uniq! # bad_link? may add duplicate entries
213
+ $config.to_be_scraped = $config.to_be_scraped - [nil] # This was issue when .delete() was used which resulted in element replaced by nil
214
+ $config.to_be_scraped = $config.to_be_scraped - [each_link]
215
+ $config.to_be_scraped.uniq!
216
+ File.open("#{ENV['HOME']}/DoubleTake_data/desktop/progress_#{@site}.yml", "w") {|f| f.write($config.to_yaml)}
217
+ rescue Selenium::WebDriver::Error::StaleElementReferenceError => e
218
+ puts "Stale element error occured moving to next link: #{stage_uri}"
219
+ puts e
220
+ next
221
+ rescue Exception => e
222
+ puts "Generic Exception occured"
223
+ binding.pry
224
+ puts e.backtrace
225
+ next
226
+ end #End of begin
227
+ end #$config.to_be_scraped.each do |each_link|
228
+ end # Loop do
229
+ puts "to_be_scraped: " + $config.to_be_scraped.to_s
230
+ puts "scraped : " + $config.scraped.to_s
231
+ end # Crawl Ending
232
+
233
+ def image_stuff(stage_uri)
234
+ name = sanitize(stage_uri)
235
+ @driver1.save_screenshot("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
236
+ @driver2.save_screenshot("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_prod.png")
237
+ # a, b = IO.read("#{ENV['HOME']}/DoubleTake_data/desktop/stage_"+@site+"/"+name+".png")[0x10..0x18].unpack('NN')
238
+ img1 = ImageList.new("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
239
+ img2 = ImageList.new("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_prod.png")
240
+ diff_img, diff_metric = img1[0].compare_channel( img2[0], Magick::MeanSquaredErrorMetric )
241
+ if diff_metric > $config.IMAGE_THRESHOLD
242
+ diff_img.write("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_diff.png")
243
+ else
244
+ File.delete("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_stage.png")
245
+ File.delete("#{ENV['HOME']}/DoubleTake_data/desktop/"+@site+"/"+name+"_prod.png")
246
+ end # if diff_metric > $IMAGE_THRESHOLD
247
+ end #def image_stuff(image1, image2)
248
+ end #Class Crawler < SiteContext
metadata ADDED
@@ -0,0 +1,158 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: doubletake
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Melchi Salins
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2015-04-26 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ~>
18
+ - !ruby/object:Gem::Version
19
+ version: '1.9'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ~>
25
+ - !ruby/object:Gem::Version
26
+ version: '1.9'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ~>
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ~>
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: pry
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ~>
46
+ - !ruby/object:Gem::Version
47
+ version: 0.10.1
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ~>
53
+ - !ruby/object:Gem::Version
54
+ version: 0.10.1
55
+ - !ruby/object:Gem::Dependency
56
+ name: rmagick
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ! '>='
60
+ - !ruby/object:Gem::Version
61
+ version: 2.13.4
62
+ - - ~>
63
+ - !ruby/object:Gem::Version
64
+ version: 2.13.4
65
+ type: :runtime
66
+ prerelease: false
67
+ version_requirements: !ruby/object:Gem::Requirement
68
+ requirements:
69
+ - - ! '>='
70
+ - !ruby/object:Gem::Version
71
+ version: 2.13.4
72
+ - - ~>
73
+ - !ruby/object:Gem::Version
74
+ version: 2.13.4
75
+ - !ruby/object:Gem::Dependency
76
+ name: selenium-webdriver
77
+ requirement: !ruby/object:Gem::Requirement
78
+ requirements:
79
+ - - ! '>='
80
+ - !ruby/object:Gem::Version
81
+ version: 2.45.0
82
+ - - ~>
83
+ - !ruby/object:Gem::Version
84
+ version: 2.45.0
85
+ type: :runtime
86
+ prerelease: false
87
+ version_requirements: !ruby/object:Gem::Requirement
88
+ requirements:
89
+ - - ! '>='
90
+ - !ruby/object:Gem::Version
91
+ version: 2.45.0
92
+ - - ~>
93
+ - !ruby/object:Gem::Version
94
+ version: 2.45.0
95
+ - !ruby/object:Gem::Dependency
96
+ name: thor
97
+ requirement: !ruby/object:Gem::Requirement
98
+ requirements:
99
+ - - ~>
100
+ - !ruby/object:Gem::Version
101
+ version: 0.19.1
102
+ type: :runtime
103
+ prerelease: false
104
+ version_requirements: !ruby/object:Gem::Requirement
105
+ requirements:
106
+ - - ~>
107
+ - !ruby/object:Gem::Version
108
+ version: 0.19.1
109
+ description: This is test description
110
+ email:
111
+ - melchisalins@gmail.com
112
+ executables:
113
+ - doubletake
114
+ extensions: []
115
+ extra_rdoc_files: []
116
+ files:
117
+ - .gitignore
118
+ - .rspec
119
+ - .travis.yml
120
+ - Gemfile
121
+ - README.md
122
+ - Rakefile
123
+ - bin/console
124
+ - bin/setup
125
+ - doubletake.gemspec
126
+ - exe/doubletake
127
+ - lib/config.rb
128
+ - lib/crawler_lib.rb
129
+ - lib/doubletake.rb
130
+ - lib/doubletake/version.rb
131
+ - lib/site_context.rb
132
+ homepage: http://melchisalins.users.sf.net/
133
+ licenses:
134
+ - MIT
135
+ metadata:
136
+ allowed_push_host: https://rubygems.org
137
+ post_install_message: Thanks for installing!
138
+ rdoc_options: []
139
+ require_paths:
140
+ - lib
141
+ required_ruby_version: !ruby/object:Gem::Requirement
142
+ requirements:
143
+ - - ! '>='
144
+ - !ruby/object:Gem::Version
145
+ version: 1.9.3
146
+ required_rubygems_version: !ruby/object:Gem::Requirement
147
+ requirements:
148
+ - - ! '>='
149
+ - !ruby/object:Gem::Version
150
+ version: '0'
151
+ requirements:
152
+ - Firefox browser
153
+ rubyforge_project:
154
+ rubygems_version: 2.4.6
155
+ signing_key:
156
+ specification_version: 4
157
+ summary: Visual regression testing tool and more
158
+ test_files: []