RubyGems - grell - Versions diffs - 1.5.1 → 1.6 - Mend

grell 1.5.1 → 1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +42 -38
data/README.md +20 -0
data/lib/grell/capybara_driver.rb +3 -0
data/lib/grell/crawler.rb +16 -7
data/lib/grell/page.rb +7 -2
data/lib/grell/page_collection.rb +6 -2
data/lib/grell/version.rb +1 -1
data/spec/lib/crawler_spec.rb +59 -28
data/spec/lib/page_collection_spec.rb +28 -18
data/spec/lib/page_spec.rb +92 -61
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: c0b801af08bb6f8c55bff8588ce08d7c814a5bad
-  data.tar.gz: 147b1eaa0520fdd48807cc0e846f033be7b2a529
+  metadata.gz: 2be8992c96b83e9b1a98474ada3b49ea7e5adb69
+  data.tar.gz: 3eed1bea205812e8e9ab7dc8678da57efea1fea1
 SHA512:
-  metadata.gz: 14660ba159234cd2d311f01073e71e195b644a22b955596fa0296e72f8c65fb5192d138d350be6e2505131fa7685573089faa33b6e7676b24fd2f42b36e6d1c6
-  data.tar.gz: e8a0fc782b08b415302e46a6563fcf8a0b9ef56ccaa528bfc8d773241bdd72103b131291fb37ab86c45b5dce067fdb6a8904264b591295be84e69d377ea8c40d
+  metadata.gz: baa6e37b2ce80491b05688618b6ad0576149236c2367b2f6c52a84dfeae25edb6d340abfdcae4e3b6f7363072db0dc0c8c052cd83410e1f28e1725305db99993
+  data.tar.gz: 7c246e8b2a02494d5e44dc6fc4b0029ab254e63764b46791e9135ed9ec1657627d4b6f7e5cd921a951c062cfe815ac1fd7b4e7d87ffb11f786e0989d44c3083a

data/CHANGELOG.md CHANGED

@@ -1,51 +1,55 @@
-* Version 1.5.1
-  Fixed deprecation warning (Thanks scott)
-  Updated Poltergeist dependency
+# 1.6
+  * Support custom URL comparison when adding new pages during crawling
+  * Don't rescue Timeout error, so that Delayed Job can properly terminate hanging jobs
+  * Fail early if Capybara doesn't initialize properly
-* Version 1.5.0
-  Grell will follow redirects.
-  Added #followed_redirects? #error? #current_url methods to the Page class
+# 1.5.1
+  * Fixed deprecation warning (Thanks scott)
+  * Updated Poltergeist dependency
-* Version 1.4.0
-  Added crawler.restart to restart browser process
-  The block of code can make grell retry any given page.
+# 1.5.0
+  * Grell will follow redirects.
+  * Added #followed_redirects? #error? #current_url methods to the Page class
-* Version 1.3.2
-  Rescue Timeout error and return an empty page when that happens
+# 1.4.0
+  * Added crawler.restart to restart browser process
+  * The block of code can make grell retry any given page.
-* Version 1.3.1
-  Added whitelisting and blacklisting
-  Better info in gemspec
+# 1.3.2
+  * Rescue Timeout error and return an empty page when that happens
-* Version 1.3
-  The Crawler object allows you to provide an external logger object.
-  Clearer semantics when an error happens, special headers are returned so the user can inspect the error
+# 1.3.1
+  * Added whitelisting and blacklisting
+  * Better info in gemspec
-  Caveats:
-  - The 'debug' option in the crawler does not have any affect anymore. Provide an external logger with 'logger' instead
-  - The errors provided in the headers by grell has changed from 'grell_status' to 'grellStatus'.
-  - The 'visited' property in the page was never supposed to be accesible. Use 'visited?' instead.
+# 1.3
+  * The Crawler object allows you to provide an external logger object.
+  * Clearer semantics when an error happens, special headers are returned so the user can inspect the error
+  * Caveats:
+    - The 'debug' option in the crawler does not have any affect anymore. Provide an external logger with 'logger' instead
+    - The errors provided in the headers by grell has changed from 'grell_status' to 'grellStatus'.
+    - The 'visited' property in the page was never supposed to be accesible. Use 'visited?' instead.
-* Version 1.2.1
-  Solve bug: URLs are case insensitive
+# 1.2.1
+  * Solve bug: URLs are case insensitive
-* Version 1.2
-  Grell now will consider two links to point to the same page only when the whole URL is exactly the same.
-  Versions previously would only consider two links to be the same when they shared the path.
+# 1.2
+  * Grell now will consider two links to point to the same page only when the whole URL is exactly the same.
+    Versions previously would only consider two links to be the same when they shared the path.
-* Version 1.1.2
-  Solve bug where we were adding links in heads as if there were normal links in the body
+# 1.1.2
+  * Solve bug where we were adding links in heads as if there were normal links in the body
-* Version 1.1.1
-  Solve bug with the new data-href functionality
+# 1.1.1
+  * Solve bug with the new data-href functionality
-* Version 1.1
-  Solve problem with randomly failing spec
-  Search for elements with 'href' or 'data-href' to find links
+# 1.1
+  * Solve problem with randomly failing spec
+  * Search for elements with 'href' or 'data-href' to find links
-* Version 1.0.1
-  Rescueing Javascript errors
+# 1.0.1
+  * Rescueing Javascript errors
-* Version 1.0
-  Initial implementation
-  Basic support to crawling pages.
+# 1.0
+  * Initial implementation
+  * Basic support to crawling pages.

data/README.md CHANGED

@@ -80,6 +80,26 @@ your are crawling. It will never follow links linking outside your site.
 If you want to further limit the amount of links crawled, you can use
 whitelisting, blacklisting or manual filtering.
+#### Custom URL Comparison
+By default, Grell will detect new URLs to visit by comparing the full URL
+with the URLs of the discovered and visited links. This functionality can
+be changed by passing a block of code to Grells `start_crawling` method.
+In the below example, the path of the URLs (instead of the full URL) will
+be compared.
+```ruby
+require 'grell'
+crawler = Grell::Crawler.new
+add_match_block = Proc.new do |collection_page, page|
+  collection_page.path == page.path
+end
+crawler.start_crawling('http://www.google.com', add_match_block: add_match_block) do |current_page|
+...
+end
+```
 #### Whitelisting

data/lib/grell/capybara_driver.rb CHANGED

@@ -29,6 +29,9 @@ module Grell
         "DNT" => 1,
         "User-Agent" => USER_AGENT
       }
+      fail "Poltergeist Driver could not be properly initialized" unless @poltergeist_driver
       @poltergeist_driver
     end
   end

data/lib/grell/crawler.rb CHANGED

@@ -15,8 +15,6 @@ module Grell
       else
         Grell.logger = Logger.new(STDOUT)
       end
-      @collection = PageCollection.new
     end
     # Restarts the PhantomJS process without modifying the state of visited and discovered pages.
@@ -37,13 +35,15 @@ module Grell
     end
     # Main method, it starts crawling on the given URL and calls a block for each of the pages found.
-    def start_crawling(url, &block)
+    def start_crawling(url, options = {}, &block)
       Grell.logger.info "GRELL Started crawling"
-      @collection = PageCollection.new
+      @collection = PageCollection.new(options[:add_match_block] || default_add_match)
       @collection.create_page(url, nil)
       while !@collection.discovered_pages.empty?
         crawl(@collection.next_page, block)
       end
       Grell.logger.info "GRELL finished crawling"
     end
@@ -53,7 +53,7 @@ module Grell
       filter!(site.links)
       if block #The user of this block can send us a :retry to retry accessing the page
-        while(block.call(site) == :retry)
+        while block.call(site) == :retry
           Grell.logger.info "Retrying our visit to #{site.url}"
           site.navigate
           filter!(site.links)
@@ -66,9 +66,18 @@ module Grell
     end
     private
     def filter!(links)
-      links.select!{ |link| link =~ @whitelist_regexp } if @whitelist_regexp
-      links.delete_if{ |link| link =~ @blacklist_regexp } if @blacklist_regexp
+      links.select! { |link| link =~ @whitelist_regexp } if @whitelist_regexp
+      links.delete_if { |link| link =~ @blacklist_regexp } if @blacklist_regexp
+    end
+    # If options[:add_match_block] is not provided, url matching to determine if a
+    # new page should be added the page collection will default to this proc
+    def default_add_match
+      Proc.new do |collection_page, page|
+        collection_page.url.downcase == page.url.downcase
+      end
     end
   end

data/lib/grell/page.rb CHANGED

@@ -44,8 +44,6 @@ module Grell
       unavailable_page(404, e)
     rescue Capybara::Poltergeist::StatusFailError => e
       unavailable_page(404, e)
-    rescue Timeout::Error => e #This error inherits from Interruption, do not inherit from StandardError
-      unavailable_page(404, e)
     end
     # Number of times we have retried the current page
@@ -68,6 +66,13 @@ module Grell
       !!(status.to_s =~ /[4|5]\d\d/)
     end
+    # Extracts the path (e.g. /actions/test_action) from the URL
+    def path
+      URI.parse(@url).path
+    rescue URI::InvalidURIError # Invalid URLs will be added and caught when we try to navigate to them
+      @url
+    end
     private
     def unavailable_page(status, exception)
       Grell.logger.warn "The page with the URL #{@url} was not available. Exception #{exception}"

data/lib/grell/page_collection.rb CHANGED

@@ -6,8 +6,11 @@ module Grell
   class PageCollection
     attr_reader :collection
-    def initialize
+    # A block containing the logic that determines if a new URL should be added
+    # to the collection or if it is already present will be passed to the initializer.
+    def initialize(add_match_block)
       @collection = []
+      @add_match_block = add_match_block
     end
     def create_page(url, parent_id)
@@ -39,8 +42,9 @@ module Grell
       # Although finding unique pages based on URL will add pages with different query parameters,
       # in some cases we do link to different pages depending on the query parameters like when using proxies
       new_url = @collection.none? do |collection_page|
-        collection_page.url.downcase == page.url.downcase
+        @add_match_block.call(collection_page, page)
       end
       if new_url
         @collection.push page
       end

data/lib/grell/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Grell
-  VERSION = "1.5.1"
+  VERSION = "1.6"
 end

data/spec/lib/crawler_spec.rb CHANGED

@@ -1,12 +1,17 @@
 RSpec.describe Grell::Crawler do
-  let(:page_id) { rand(10).floor + 10}
-  let(:parent_page_id) {rand(10).floor}
-  let(:page) {Grell::Page.new(url, page_id, parent_page_id)}
-  let(:host) {"http://www.example.com"}
-  let(:url) {"http://www.example.com/test"}
-  let(:crawler) { Grell::Crawler.new(logger: Logger.new(nil), external_driver: true)}
-  let(:body) {'body'}
+  let(:page_id) { rand(10).floor + 10 }
+  let(:parent_page_id) { rand(10).floor }
+  let(:page) { Grell::Page.new(url, page_id, parent_page_id) }
+  let(:host) { 'http://www.example.com' }
+  let(:url) { 'http://www.example.com/test' }
+  let(:crawler) { Grell::Crawler.new(logger: Logger.new(nil), external_driver: true) }
+  let(:body) { 'body' }
+  let(:custom_add_match) do
+    Proc.new do |collection_page, page|
+      collection_page.path == page.path
+    end
+  end
   before do
     proxy.stub(url).and_return(body: body, code: 200)
@@ -17,6 +22,7 @@ RSpec.describe Grell::Crawler do
       Grell::Crawler.new(external_driver: true, logger: 33)
       expect(Grell.logger).to eq(33)
     end
     it 'provides a stdout logger if nothing provided' do
       crawler
       expect(Grell.logger).to be_instance_of(Logger)
@@ -24,6 +30,10 @@ RSpec.describe Grell::Crawler do
   end
   describe '#crawl' do
+    before do
+      crawler.instance_variable_set('@collection', Grell::PageCollection.new(custom_add_match))
+    end
     it 'yields the result if a block is given' do
       result = []
       block = Proc.new {|n| result.push(n) }
@@ -62,7 +72,8 @@ RSpec.describe Grell::Crawler do
       </body></html>
       EOS
     end
-    let(:url_visited) {"http://www.example.com/musmis.html"}
+    let(:url_visited) { "http://www.example.com/musmis.html" }
     before do
       proxy.stub(url_visited).and_return(body: 'body', code: 200)
     end
@@ -75,6 +86,16 @@ RSpec.describe Grell::Crawler do
       expect(result[0].url).to eq(url)
       expect(result[1].url).to eq(url_visited)
     end
+    it 'can use a custom url add matcher block' do
+      expect(crawler).to_not receive(:default_add_match)
+      crawler.start_crawling(url, add_match_block: custom_add_match)
+    end
+    it 'uses a default url add matched if not provided' do
+      expect(crawler).to receive(:default_add_match).and_return(custom_add_match)
+      crawler.start_crawling(url)
+    end
   end
   shared_examples_for 'visits all available pages' do
@@ -82,6 +103,7 @@ RSpec.describe Grell::Crawler do
       crawler.start_crawling(url)
       expect(crawler.collection.visited_pages.size).to eq(visited_pages_count)
     end
     it 'has no more pages to discover' do
       crawler.start_crawling(url)
       expect(crawler.collection.discovered_pages.size).to eq(0)
@@ -100,13 +122,17 @@ RSpec.describe Grell::Crawler do
       Hello world!
       </body></html>"
     end
-    let(:visited_pages_count) {1}
-    let(:visited_pages) {['http://www.example.com/test']}
+    let(:visited_pages_count) { 1 }
+    let(:visited_pages) { ['http://www.example.com/test'] }
     it_behaves_like 'visits all available pages'
   end
   context 'the url has several links' do
+    let(:visited_pages_count) { 3 }
+    let(:visited_pages) do
+      ['http://www.example.com/test', 'http://www.example.com/trusmis.html', 'http://www.example.com/help.html']
+    end
     let(:body) do
       "<html><head></head><body>
       <a href=\"/trusmis.html\">trusmis</a>
@@ -114,14 +140,11 @@ RSpec.describe Grell::Crawler do
       Hello world!
       </body></html>"
     end
     before do
       proxy.stub('http://www.example.com/trusmis.html').and_return(body: 'body', code: 200)
       proxy.stub('http://www.example.com/help.html').and_return(body: 'body', code: 200)
     end
-    let(:visited_pages_count) {3}
-    let(:visited_pages) do
-      ['http://www.example.com/test','http://www.example.com/trusmis.html', 'http://www.example.com/help.html']
-    end
     it_behaves_like 'visits all available pages'
   end
@@ -144,9 +167,10 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.whitelist('/trusmis.html')
       end
-      let(:visited_pages_count) {2} #my own page + trusmis
+      let(:visited_pages_count) { 2 } # my own page + trusmis
       let(:visited_pages) do
-        ['http://www.example.com/test','http://www.example.com/trusmis.html']
+        ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
       end
       it_behaves_like 'visits all available pages'
@@ -156,9 +180,10 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.whitelist(['/trusmis.html', '/nothere', 'another.html'])
       end
-      let(:visited_pages_count) {2}
+      let(:visited_pages_count) { 2 }
       let(:visited_pages) do
-        ['http://www.example.com/test','http://www.example.com/trusmis.html']
+        ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
       end
       it_behaves_like 'visits all available pages'
@@ -168,9 +193,10 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.whitelist(/\/trusmis\.html/)
       end
-      let(:visited_pages_count) {2}
+      let(:visited_pages_count) { 2 }
       let(:visited_pages) do
-        ['http://www.example.com/test','http://www.example.com/trusmis.html']
+        ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
       end
       it_behaves_like 'visits all available pages'
@@ -180,9 +206,10 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.whitelist([/\/trusmis\.html/])
       end
-      let(:visited_pages_count) {2}
+      let(:visited_pages_count) { 2 }
       let(:visited_pages) do
-        ['http://www.example.com/test','http://www.example.com/trusmis.html']
+        ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
       end
       it_behaves_like 'visits all available pages'
@@ -192,7 +219,8 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.whitelist([])
       end
-      let(:visited_pages_count) {1} #my own page only
+      let(:visited_pages_count) { 1 } # my own page only
       let(:visited_pages) do
         ['http://www.example.com/test']
       end
@@ -204,7 +232,8 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.whitelist(['/trusmis', '/help'])
       end
-      let(:visited_pages_count) {3} #all links
+      let(:visited_pages_count) { 3 } # all links
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/trusmis.html', 'http://www.example.com/help.html']
       end
@@ -280,7 +309,7 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.blacklist([])
       end
-      let(:visited_pages_count) {3} #all links
+      let(:visited_pages_count) { 3 } # all links
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/trusmis.html', 'http://www.example.com/help.html']
       end
@@ -292,7 +321,7 @@ RSpec.describe Grell::Crawler do
       before do
         crawler.blacklist(['/trusmis', '/help'])
       end
-      let(:visited_pages_count) {1}
+      let(:visited_pages_count) { 1 }
       let(:visited_pages) do
         ['http://www.example.com/test']
       end
@@ -321,7 +350,8 @@ RSpec.describe Grell::Crawler do
         crawler.whitelist('/trusmis.html')
         crawler.blacklist('/trusmis.html')
       end
-      let(:visited_pages_count) {1}
+      let(:visited_pages_count) { 1 }
       let(:visited_pages) do
         ['http://www.example.com/test']
       end
@@ -334,7 +364,8 @@ RSpec.describe Grell::Crawler do
         crawler.whitelist('/trusmis.html')
         crawler.blacklist('/raistlin.html')
       end
-      let(:visited_pages_count) {2}
+      let(:visited_pages_count) { 2 }
       let(:visited_pages) do
         ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
       end

data/spec/lib/page_collection_spec.rb CHANGED

@@ -1,8 +1,14 @@
 RSpec.describe Grell::PageCollection do
-  let(:collection) {Grell::PageCollection.new}
-  let(:url) {'http://www.github.com/SomeUser/dragonlance?search=false'}
-  let(:url2) {'http://www.github.com/OtherUser/forgotten?search=false'}
+  let(:add_match_block) do
+    Proc.new do |collection_page, page|
+      collection_page.url.downcase == page.url.downcase
+    end
+  end
+  let(:collection) { Grell::PageCollection.new(add_match_block) }
+  let(:url) { 'http://www.github.com/SomeUser/dragonlance?search=false' }
+  let(:url2) { 'http://www.github.com/OtherUser/forgotten?search=false' }
   context 'empty collection' do
@@ -20,7 +26,8 @@ RSpec.describe Grell::PageCollection do
   end
   context 'one unvisited page' do
-    let(:page) {collection.create_page(url, 0)}
+    let(:page) { collection.create_page(url, 0) }
     before do
       allow(page).to receive(:visited?).and_return(false)
     end
@@ -40,7 +47,8 @@ RSpec.describe Grell::PageCollection do
   end
   context 'one visited page' do
-    let(:page) {collection.create_page(url, 0)}
+    let(:page) { collection.create_page(url, 0) }
     before do
       allow(page).to receive(:visited?).and_return(true)
     end
@@ -59,8 +67,9 @@ RSpec.describe Grell::PageCollection do
   end
   context 'one visited and one unvisited page with the same url' do
-    let(:page) {collection.create_page(url, 0)}
-    let(:unvisited)  {collection.create_page(url.upcase, 0)}
+    let(:page) { collection.create_page(url, 0) }
+    let(:unvisited) { collection.create_page(url.upcase, 0) }
     before do
       allow(page).to receive(:visited?).and_return(true)
       allow(unvisited).to receive(:visited?).and_return(false)
@@ -88,8 +97,9 @@ RSpec.describe Grell::PageCollection do
   end
   context 'one visited and one unvisited page with different URLs' do
-    let(:page) {collection.create_page(url, 0)}
-    let(:unvisited)  {collection.create_page(url2, 0)}
+    let(:page) { collection.create_page(url, 0) }
+    let(:unvisited) { collection.create_page(url2, 0) }
     before do
       allow(page).to receive(:visited?).and_return(true)
       allow(unvisited).to receive(:visited?).and_return(false)
@@ -109,9 +119,10 @@ RSpec.describe Grell::PageCollection do
   end
   context 'one visited and one unvisited page with different URLs only different by the query' do
-    let(:page) {collection.create_page(url, 0)}
-    let(:url3) {'http://www.github.com/SomeUser/dragonlance?search=true'}
-    let(:unvisited)  {collection.create_page(url3, 0)}
+    let(:page) { collection.create_page(url, 0) }
+    let(:url3) { 'http://www.github.com/SomeUser/dragonlance?search=true' }
+    let(:unvisited) { collection.create_page(url3, 0) }
     before do
       allow(page).to receive(:visited?).and_return(true)
       allow(unvisited).to receive(:visited?).and_return(false)
@@ -131,19 +142,18 @@ RSpec.describe Grell::PageCollection do
   end
   context 'several unvisited pages' do
-    let(:page) {collection.create_page(url, 2)}
-    let(:page2) {collection.create_page(url2, 0)}
+    let(:page) { collection.create_page(url, 2) }
+    let(:page2) { collection.create_page(url2, 0) }
     before do
       allow(page).to receive(:visited?).and_return(true)
       allow(page2).to receive(:visited?).and_return(false)
     end
-    it "returns the page which has an earlier parent" do
+    it 'returns the page which has an earlier parent' do
       expect(collection.next_page).to eq(page2)
     end
   end
-end
+end

data/spec/lib/page_spec.rb CHANGED

@@ -1,26 +1,31 @@
 RSpec.describe Grell::Page do
-  let(:page_id) { rand(10).floor + 10}
-  let(:parent_page_id) {rand(10).floor}
-  let(:page) {Grell::Page.new(url, page_id, parent_page_id)}
-  let(:host) {"http://www.example.com"}
-  let(:url) {"http://www.example.com/test"}
+  let(:page_id) { rand(10).floor + 10 }
+  let(:parent_page_id) { rand(10).floor }
+  let(:page) { Grell::Page.new(url, page_id, parent_page_id) }
+  let(:host) { 'http://www.example.com' }
+  let(:url) { 'http://www.example.com/test' }
   let(:returned_headers)  { { 'Other-Header' => 'yes', 'Content-Type' => 'text/html' }}
-  let(:now) {Time.now}
+  let(:now) { Time.now }
   before do
     allow(Time).to receive(:now).and_return(now)
-    Grell.logger = Logger.new(nil) #avoids noise in rspec output
+    Grell.logger = Logger.new(nil) # avoids noise in rspec output
   end
-  it "gives access to the url" do
+  it 'gives access to the url' do
     expect(page.url).to eq(url)
   end
-  it "gives access to the page id" do
+  it 'gives access to the path' do
+    expect(page.path).to eq('/test')
+  end
+  it 'gives access to the page id' do
     expect(page.id).to eq(page_id)
   end
-  it "gives access to the parent page id" do
+  it 'gives access to the parent page id' do
     expect(page.parent_id).to eq(parent_page_id)
   end
@@ -68,6 +73,7 @@ RSpec.describe Grell::Page do
         proxy.stub(url).and_return(body: '', code: 200, headers: {})
         page.navigate
       end
       it '#retries return 0' do
         expect(page.retries).to eq(0)
       end
@@ -79,6 +85,7 @@ RSpec.describe Grell::Page do
         page.navigate
         page.navigate
       end
       it '#retries return 1' do
         expect(page.retries).to eq(1)
       end
@@ -98,8 +105,8 @@ RSpec.describe Grell::Page do
     end
   end
-  [Capybara::Poltergeist::JavascriptError, Capybara::Poltergeist::BrowserError, URI::InvalidURIError,
-   Capybara::Poltergeist::TimeoutError, Capybara::Poltergeist::StatusFailError, Timeout::Error ].each do |error_type|
+  [ Capybara::Poltergeist::JavascriptError, Capybara::Poltergeist::BrowserError, URI::InvalidURIError,
+    Capybara::Poltergeist::TimeoutError, Capybara::Poltergeist::StatusFailError ].each do |error_type|
     context "#{error_type}" do
       let(:headers) do
@@ -109,25 +116,27 @@ RSpec.describe Grell::Page do
           errorMessage: error_message
         }
       end
-      let(:error_message) {'Trusmis broke it again'}
-      let(:now) {Time.now}
+      let(:error_message) { 'Trusmis broke it again' }
+      let(:now) { Time.now }
       before do
         allow_any_instance_of(Grell::RawPage).to receive(:navigate).and_raise(error_type, 'error')
         allow_any_instance_of(error_type).to receive(:message).and_return(error_message)
         page.navigate
       end
       it_behaves_like 'an errored grell page'
     end
   end
   context 'we have not yet navigated to the page' do
-    let(:visited) {false}
-    let(:status) {nil}
-    let(:body) {''}
-    let(:links) {[]}
-    let(:expected_headers) {{}}
-    let(:now) {nil}
+    let(:visited) { false }
+    let(:status) { nil }
+    let(:body) { '' }
+    let(:links) { [] }
+    let(:expected_headers) { {} }
+    let(:now) { nil }
     before do
       proxy.stub(url).and_return(body: body, code: status, headers: returned_headers.dup)
@@ -138,11 +147,11 @@ RSpec.describe Grell::Page do
   end
   context 'navigating to the URL we get a 404' do
-    let(:visited) {true}
-    let(:status) { 404}
-    let(:body) {'<html><head></head><body>nothing cool</body></html>'}
-    let(:links) {[]}
-    let(:expected_headers) {returned_headers}
+    let(:visited) { true }
+    let(:status) { 404 }
+    let(:body) { '<html><head></head><body>nothing cool</body></html>' }
+    let(:links) { [] }
+    let(:expected_headers) { returned_headers }
     before do
       proxy.stub(url).and_return(body: body, code: status, headers: returned_headers.dup)
@@ -154,17 +163,19 @@ RSpec.describe Grell::Page do
   end
   context 'navigating to an URL with redirects, follows them transparently' do
-    let(:visited) {true}
-    let(:status) { 200}
-    let(:body) {'<html><head></head><body>nothing cool</body></html>'}
-    let(:links) {[]}
-    let(:expected_headers) {returned_headers}
-    let(:real_url) {'http://example.com/other'}
+    let(:visited) { true }
+    let(:status) { 200 }
+    let(:body) { '<html><head></head><body>nothing cool</body></html>' }
+    let(:links) { [] }
+    let(:expected_headers) { returned_headers }
+    let(:real_url) { 'http://example.com/other' }
     before do
       proxy.stub(url).and_return(:redirect_to => real_url)
       proxy.stub(real_url).and_return(body: body, code: status, headers: returned_headers.dup)
       page.navigate
     end
     it_behaves_like 'a grell page'
     it 'followed_redirects? is true' do
@@ -178,11 +189,11 @@ RSpec.describe Grell::Page do
   #Here also add examples that may happen for almost all pages (no errors, no redirects)
   context 'navigating to the URL we get page with no links' do
-    let(:visited) {true}
-    let(:status) { 200}
-    let(:body) {'<html><head></head><body>nothing cool</body></html>'}
-    let(:links) {[]}
-    let(:expected_headers) {returned_headers}
+    let(:visited) { true }
+    let(:status) { 200 }
+    let(:body) { '<html><head></head><body>nothing cool</body></html>' }
+    let(:links) { [] }
+    let(:expected_headers) { returned_headers }
     before do
       proxy.stub(url).and_return(body: body, code: status, headers: returned_headers.dup)
@@ -205,8 +216,8 @@ RSpec.describe Grell::Page do
   end
   context 'navigating to the URL we get page with links using a elements' do
-    let(:visited) {true}
-    let(:status) { 200}
+    let(:visited) { true }
+    let(:status) { 200 }
     let(:body) do
       "<html><head></head><body>
       Hello world!
@@ -215,8 +226,8 @@ RSpec.describe Grell::Page do
       <a href=\"http://www.outsidewebsite.com/help.html\">help</a>
       </body></html>"
     end
-    let(:links) {["http://www.example.com/trusmis.html", "http://www.example.com/help.html"]}
-    let(:expected_headers) {returned_headers}
+    let(:links) { ['http://www.example.com/trusmis.html', 'http://www.example.com/help.html'] }
+    let(:expected_headers) { returned_headers }
     before do
       proxy.stub(url).and_return(body: body, code: status, headers: returned_headers.dup)
@@ -231,8 +242,8 @@ RSpec.describe Grell::Page do
   end
   context 'navigating to the URL we get page with links with absolute links' do
-    let(:visited) {true}
-    let(:status) { 200}
+    let(:visited) { true }
+    let(:status) { 200 }
     let(:body) do
       "<html><head></head><body>
       Hello world!
@@ -241,8 +252,8 @@ RSpec.describe Grell::Page do
       <a href=\"http://www.outsidewebsite.com/help.html\">help</a>
       </body></html>"
     end
-    let(:links) {["http://www.example.com/trusmis.html", "http://www.example.com/help.html"]}
-    let(:expected_headers) {returned_headers}
+    let(:links) { ['http://www.example.com/trusmis.html', 'http://www.example.com/help.html'] }
+    let(:expected_headers) { returned_headers }
     before do
       proxy.stub(url).and_return(body: body, code: status, headers: returned_headers.dup)
@@ -257,8 +268,8 @@ RSpec.describe Grell::Page do
   end
   context 'navigating to the URL we get page with links using a mix of elements' do
-    let(:visited) {true}
-    let(:status) { 200}
+    let(:visited) { true }
+    let(:status) { 200 }
     let(:body) do
       "<html><head></head><body>
       Hello world!
@@ -274,11 +285,10 @@ RSpec.describe Grell::Page do
       </body></html>"
     end
     let(:links) do
-      ["http://www.example.com/trusmis.html", "http://www.example.com/help.html",
-       'http://www.example.com/more_help.html', 'http://www.example.com/help_me.html'
-      ]
+      [ 'http://www.example.com/trusmis.html', 'http://www.example.com/help.html',
+        'http://www.example.com/more_help.html', 'http://www.example.com/help_me.html' ]
     end
-    let(:expected_headers) {returned_headers}
+    let(:expected_headers) { returned_headers }
     before do
       proxy.stub(url).and_return(body: body, code: status, headers: returned_headers.dup)
@@ -287,16 +297,36 @@ RSpec.describe Grell::Page do
     it_behaves_like 'a grell page'
+    describe '#path' do
+      context 'proper url' do
+        let(:url) { 'http://www.anyurl.com/path' }
+        let(:page) { Grell::Page.new(url, page_id, parent_page_id) }
+        it 'returns the path' do
+          expect(page.path).to eq('/path')
+        end
+      end
+      context 'broken url' do
+        let(:url) { 'www.an.asda.fasfasf.yurl.com/path' }
+        let(:page) { Grell::Page.new(url, page_id, parent_page_id) }
+        it 'returns the path' do
+          expect(page.path).to eq(url)
+        end
+      end
+    end
     it 'do not return links to external websites' do
       expect(page.links).to_not include('http://www.outsidewebsite.com/help.html')
     end
   end
  context 'navigating to the URL we get page with links inside the header section of the code' do
-    let(:visited) {true}
-    let(:status) { 200}
-    let(:css) {'/application.css'}
-    let(:favicon) {'/favicon.ico'}
+    let(:visited) { true }
+    let(:status) { 200 }
+    let(:css) { '/application.css' }
+    let(:favicon) { '/favicon.ico' }
     let(:body) do
       "<html><head>
       <title>mimi</title>
@@ -309,9 +339,9 @@ RSpec.describe Grell::Page do
       </body></html>"
     end
     let(:links) do
-      ["http://www.example.com/trusmis.html"]
+      ['http://www.example.com/trusmis.html']
     end
-    let(:expected_headers) {returned_headers}
+    let(:expected_headers) { returned_headers }
     before do
       proxy.stub(url).and_return(body: body, code: status, headers: returned_headers.dup)
@@ -338,11 +368,12 @@ RSpec.describe Grell::Page do
       proxy.stub(url).and_return(body: body, code: nil, headers: {})
       page.navigate
     end
-    let(:visited) {true}
-    let(:status) { nil}
-    let(:body) {''}
-    let(:links) {[]}
-    let(:expected_headers) {{}}
+    let(:visited) { true }
+    let(:status) { nil }
+    let(:body) { '' }
+    let(:links) { [] }
+    let(:expected_headers) { {} }
     it_behaves_like 'a grell page'
   end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: grell
 version: !ruby/object:Gem::Version
-  version: 1.5.1
+  version: '1.6'
 platform: ruby
 authors:
 - Jordi Polo Carres
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2015-11-18 00:00:00.000000000 Z
+date: 2016-02-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: capybara