RubyGems - grell - Versions diffs - 2.1.1 → 2.1.2 - Mend

grell 2.1.1 → 2.1.2

Files changed (10) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2017936bf360a1baa9b97e7534ab7c27598b04abca86c1f212d8e37c9761904c
-  data.tar.gz: f3feb1d80bf9aa644c2040818330279dc3695978879b6f456f5ebc05cfc17dee
+  metadata.gz: c17856255ff1e871cc5e12cc2a9f0f4870156923ab924ea11db16b053a6742fb
+  data.tar.gz: d619076b40cbb4b057015a8bbcb8a07f555c282aa0ec971aa36b4e867fbfbd86
 SHA512:
-  metadata.gz: c18a0f0dd5a9d9b2f6a71c5e41a23b219a31a38bb28c9405baf14357a4f47d4e23f7a77c3464673ff7e855da7f98f0c934d85d73d9797ec2c3179f2b9cd8a4b6
-  data.tar.gz: fa3010a9e9c37bccc8cd7442fd96693f93fcc15a1d85f672b2f6dcfedf827629b67f897611d17f6eee693b0c55816e52d63e0c41dcdfd51149cbeff6842ed59e
+  metadata.gz: 28860f331fc02f6976bcfd8717bf8c33ca89984ae5d2ce9eede6abb31b5f06b44e2135468c6d75374dd649378cc3d719474979c2f27e67a5a7e5301fc561113f
+  data.tar.gz: 77f68dbdb006803c517de4e0b72a11ac9eba265781703f1b03f98af52b147cab7ba02429371038d426fed1073e1a5f3dcdc0a6838cbf93c64c0c4307f605eea6

data/.travis.yml CHANGED Viewed

@@ -1,20 +1,28 @@
 language: ruby
 cache: bundler
-sudo: false
 rvm:
-- 2.2.4
-- 2.3.0
-- 2.4.2
-script: bundle exec rspec
+  - 2.2.4
+  - 2.3.0
+  - 2.4.2
 before_install:
-- mkdir travis-phantomjs
-- wget https://github.com/JordiPolo/phantomjs/blob/master/phantomjs-2.1.1-linux-x86_64.tar.bz2?raw=true
-  -O $PWD/travis-phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
-- tar -xvf $PWD/travis-phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2 -C $PWD/travis-phantomjs
-- export PATH=$PWD/travis-phantomjs/phantomjs-2.1.1-linux-x86_64/bin:$PATH
+  - mkdir travis-phantomjs
+  - wget https://github.com/JordiPolo/phantomjs/blob/master/phantomjs-2.1.1-linux-x86_64.tar.bz2?raw=true
+    -O $PWD/travis-phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2
+  - tar -xvf $PWD/travis-phantomjs/phantomjs-2.1.1-linux-x86_64.tar.bz2 -C $PWD/travis-phantomjs
+  - export PATH=$PWD/travis-phantomjs/phantomjs-2.1.1-linux-x86_64/bin:$PATH
+install:
+  - bundle install --jobs=3 --retry=3
+script:
+  - bundle exec rspec
 deploy:
   provider: rubygems
   api_key:
     secure: czStDI0W6MWL70sDwu53oNNCc8vKtT61pgvii+ZWIC9A41C2p7BzmbtosXsnLk2ApxmpWvFIgtQE0XIH7jkM5mY05cHinXDphtOTkNLFVjck3ZOMkx/cc+QRFW8K4FHkrzFsC+/Xx4t2/Psh35LpzhfJd0XzKKoCstXUVgJsfGcAK3DMpjXHSUbwLXGDZ4lzmsk52OLf0oL+in2447TJfVOvGXtYmfh1PjXRwDxKB0dan7w5mVgajS52b6wUhVPTaMe/JgCbMuV7BaQ1Goq8u7V4aaxU+liPAhzHWfMB6tF4TEW8yu2tvGLdOA0+1jmM8E9Q5saPWtwKiHvBxN8CzRpkiNDzyFAf8ljrWT5yKX3aRQCyPp3NNyhoumWap36b+O/zwZ3HxoAe22Yg0rjz8z8NxMR/ELPvjPYjCiF5zY7fO9PAzmIynMRUrxDnFj+/JGHdzx0ZMo3fEXgHHSaHPNxIzEffVVQk4XLVnFHDjBLY4mVp4sbHbja5qnui20RkdM/H9Yi/fQyl1ODhk+LUPoh45ZneDZq7GPrl+WKK06oEjXIXLU+1iEuqnSqybbmJMTUJlUV+7EJdtq2DgfDB4KXwLm2LLOR/IX63AzEav4NIxx3hIXifSKa9rp6D7nMTzdQwF0FFzIj/Y3qLrAe1WWt0gx3Vxq67pSwOJthk5Fc=
   on:
     tags: true
+    rvm: 2.4.2

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,6 @@
+# 2.1.2
+  * Change white/black lists to allow/deny lists
 # 2.1.1
   * Update phantomjs_options to use 'TLSv1.2'

data/README.md CHANGED Viewed

@@ -92,15 +92,15 @@ crawler.manager.quit # quits and destroys the crawler
 The `Grell:Crawler` class can be passed options to customize its behavior:
 - `logger`: Sets the logger object, for instance `Rails.logger`. Default: `Logger.new(STDOUT)`
 - `on_periodic_restart`: Sets periodic restarts of the crawler each certain number of visits. Default: 100 pages.
-- `whitelist`: Setups a whitelist filter for URLs to be visited. Default: all URLs are whitelisted.
-- `blacklist`: Setups a blacklist filter for URLs to be avoided. Default: no URL is blacklisted.
+- `allowlist`: Sets a allowlist filter for URLs to be visited. Default: all URLs are allowlisted.
+- `denylist`: Sets a denylist filter for URLs to be avoided. Default: no URL is denylisted.
 - `add_match_block`: Block evaluated to consider if a given page should be part of the pages to be visited. Default: add unique URLs.
 - `evaluate_in_each_page`: Javascript block to be evaluated on each page visited. Default: Nothing evaluated.
 Grell by default will follow all the links it finds in the site being crawled.
 It will never follow links linking outside your site.
 If you want to further limit the amount of links crawled, you can use
-whitelisting, blacklisting or manual filtering.
+allowlisting, denylisting or manual filtering.
 Below further details on these and other options.
@@ -123,32 +123,32 @@ The crawler can be restarted manually by calling `crawler.manager.restart` or au
  between restarts. A restart will destroy the cookies so for instance this custom block can be used to relogin.
- #### Whitelisting
+ #### Allowlisting
  ```ruby
  require 'grell'
- crawler = Grell::Crawler.new(whitelist: [/games\/.*/, '/fun'])
+ crawler = Grell::Crawler.new(allowlist: [/games\/.*/, '/fun'])
  crawler.start_crawling('http://www.google.com')
  ```
  Grell here will only follow links to games and '/fun' and ignore all
  other links. You can provide a regexp, strings (if any part of the
- string match is whitelisted) or an array with regexps and/or strings.
+ string match is allowlisted) or an array with regexps and/or strings.
- #### Blacklisting
+ #### Denylisting
  ```ruby
  require 'grell'
- crawler = Grell::Crawler.new(blacklist: /games\/.*/)
+ crawler = Grell::Crawler.new(denylist: /games\/.*/)
  crawler.start_crawling('http://www.google.com')
  ```
- Similar to whitelisting. But now Grell will follow every other link in
+ Similar to allowlisting. But now Grell will follow every other link in
  this site which does not go to /games/...
- If you call both whitelist and blacklist then both will apply, a link
+ If you call both allowlist and denylist then both will apply, a link
  has to fullfill both conditions to survive. If you do not call any, then
  all links on this site will be crawled. Think of these methods as
  filters.

data/grell.gemspec CHANGED Viewed

@@ -24,7 +24,7 @@ Gem::Specification.new do |spec|
   spec.add_dependency 'capybara', '~> 2.10'
   spec.add_dependency 'poltergeist', '~> 1.11'
-  spec.add_development_dependency 'bundler', '~> 1.6'
+  # spec.add_development_dependency 'bundler', '~> 1.6'
   spec.add_development_dependency 'byebug', '~> 4.0'
   spec.add_development_dependency 'kender', '~> 0.2'
   spec.add_development_dependency 'rake', '~> 10.0'

data/lib/grell/crawler.rb CHANGED Viewed

@@ -7,15 +7,15 @@ module Grell
     # evaluate_in_each_page: javascript block to evaluate in each page we crawl
     # add_match_block: block to evaluate to consider if a page is part of the collection
     # manager_options: options passed to the manager class
-    # whitelist: Setups a whitelist filter, allows a regexp, string or array of either to be matched.
-    # blacklist: Setups a blacklist filter, allows a regexp, string or array of either to be matched.
-    def initialize(evaluate_in_each_page: nil, add_match_block: nil, whitelist: /.*/, blacklist: /a^/, **manager_options)
+    # allowlist: Sets an allowlist filter, allows a regexp, string or array of either to be matched.
+    # denylist: Sets a denylist filter, allows a regexp, string or array of either to be matched.
+    def initialize(evaluate_in_each_page: nil, add_match_block: nil, allowlist: /.*/, denylist: /a^/, **manager_options)
       @collection = nil
       @manager = CrawlerManager.new(manager_options)
       @evaluate_in_each_page = evaluate_in_each_page
       @add_match_block = add_match_block
-      @whitelist_regexp = Regexp.union(whitelist)
-      @blacklist_regexp = Regexp.union(blacklist)
+      @allowlist_regexp = Regexp.union(allowlist)
+      @denylist_regexp = Regexp.union(denylist)
     end
     # Main method, it starts crawling on the given URL and calls a block for each of the pages found.
@@ -67,8 +67,8 @@ module Grell
     end
     def filter!(links)
-      links.select! { |link| link =~ @whitelist_regexp } if @whitelist_regexp
-      links.delete_if { |link| link =~ @blacklist_regexp } if @blacklist_regexp
+      links.select! { |link| link =~ @allowlist_regexp } if @allowlist_regexp
+      links.delete_if { |link| link =~ @denylist_regexp } if @denylist_regexp
     end
     # Store the resulting redirected URL along with the original URL

data/lib/grell/crawler_manager.rb CHANGED Viewed

@@ -70,7 +70,7 @@ module Grell
       rescue Errno::ESRCH, Errno::ECHILD
         # successfully terminated
       rescue => e
-        Grell.logger.exception e, "GRELL. PhantomJS process could not be killed"
+        Grell.logger.error ["GRELL. PhantomJS process could not be killed", e.message, *e.backtrace].join($/)
       end
       def force_kill(pid)

data/lib/grell/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Grell
-  VERSION = "2.1.1".freeze
+  VERSION = "2.1.2".freeze
 end

data/spec/lib/crawler_spec.rb CHANGED Viewed

@@ -6,16 +6,16 @@ RSpec.describe Grell::Crawler do
   let(:host) { 'http://www.example.com' }
   let(:url) { 'http://www.example.com/test' }
   let(:add_match_block) { nil }
-  let(:blacklist) { /a^/ }
-  let(:whitelist) { /.*/ }
+  let(:denylist) { /a^/ }
+  let(:allowlist) { /.*/ }
   let(:crawler) do
     Grell::Crawler.new(
       logger: Logger.new(nil),
       driver: double(nil),
       evaluate_in_each_page: script,
       add_match_block: add_match_block,
-      blacklist: blacklist,
-      whitelist: whitelist)
+      denylist: denylist,
+      allowlist: allowlist)
   end
   let(:script) { nil }
   let(:body) { 'body' }
@@ -128,7 +128,7 @@ RSpec.describe Grell::Crawler do
       expect(crawler.collection.discovered_pages.size).to eq(0)
     end
-    it 'contains the whitelisted page and the base page only' do
+    it 'contains the allowlisted page and the base page only' do
       crawler.start_crawling(url)
       expect(crawler.collection.visited_pages.map(&:url)).
         to eq(visited_pages)
@@ -168,7 +168,7 @@ RSpec.describe Grell::Crawler do
     it_behaves_like 'visits all available pages'
   end
-  describe '#whitelist' do
+  describe '#allowlist' do
     let(:body) do
       "<html><head></head><body>
       <a href=\"/trusmis.html\">trusmis</a>
@@ -183,7 +183,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using a single string' do
-      let(:whitelist) { '/trusmis.html' }
+      let(:allowlist) { '/trusmis.html' }
       let(:visited_pages_count) { 2 } # my own page + trusmis
       let(:visited_pages) do
         ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
@@ -193,7 +193,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using an array of strings' do
-      let(:whitelist) { ['/trusmis.html', '/nothere', 'another.html'] }
+      let(:allowlist) { ['/trusmis.html', '/nothere', 'another.html'] }
       let(:visited_pages_count) { 2 }
       let(:visited_pages) do
         ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
@@ -203,7 +203,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using a regexp' do
-      let(:whitelist) { /\/trusmis\.html/ }
+      let(:allowlist) { /\/trusmis\.html/ }
       let(:visited_pages_count) { 2 }
       let(:visited_pages) do
         ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
@@ -213,7 +213,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using an array of regexps' do
-      let(:whitelist) { [/\/trusmis\.html/] }
+      let(:allowlist) { [/\/trusmis\.html/] }
       let(:visited_pages_count) { 2 }
       let(:visited_pages) do
         ['http://www.example.com/test', 'http://www.example.com/trusmis.html']
@@ -223,7 +223,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using an empty array' do
-      let(:whitelist) { [] }
+      let(:allowlist) { [] }
       let(:visited_pages_count) { 1 } # my own page only
       let(:visited_pages) do
         ['http://www.example.com/test']
@@ -232,8 +232,8 @@ RSpec.describe Grell::Crawler do
       it_behaves_like 'visits all available pages'
     end
-    context 'adding all links to the whitelist' do
-      let(:whitelist) { ['/trusmis', '/help'] }
+    context 'adding all links to the allowlist' do
+      let(:allowlist) { ['/trusmis', '/help'] }
       let(:visited_pages_count) { 3 } # all links
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/trusmis.html', 'http://www.example.com/help.html']
@@ -244,7 +244,7 @@ RSpec.describe Grell::Crawler do
   end
-  describe '#blacklist' do
+  describe '#denylist' do
     let(:body) do
       "<html><head></head><body>
       <a href=\"/trusmis.html\">trusmis</a>
@@ -259,7 +259,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using a single string' do
-      let(:blacklist) { '/trusmis.html' }
+      let(:denylist) { '/trusmis.html' }
       let(:visited_pages_count) {2}
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/help.html']
@@ -269,7 +269,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using an array of strings' do
-      let(:blacklist) { ['/trusmis.html', '/nothere', 'another.html'] }
+      let(:denylist) { ['/trusmis.html', '/nothere', 'another.html'] }
       let(:visited_pages_count) {2}
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/help.html']
@@ -279,7 +279,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using a regexp' do
-      let(:blacklist) { /\/trusmis\.html/ }
+      let(:denylist) { /\/trusmis\.html/ }
       let(:visited_pages_count) {2}
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/help.html']
@@ -289,7 +289,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using an array of regexps' do
-      let(:blacklist) { [/\/trusmis\.html/] }
+      let(:denylist) { [/\/trusmis\.html/] }
       let(:visited_pages_count) {2}
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/help.html']
@@ -299,7 +299,7 @@ RSpec.describe Grell::Crawler do
     end
     context 'using an empty array' do
-      let(:blacklist) { [] }
+      let(:denylist) { [] }
       let(:visited_pages_count) { 3 } # all links
       let(:visited_pages) do
         ['http://www.example.com/test','http://www.example.com/trusmis.html', 'http://www.example.com/help.html']
@@ -308,8 +308,8 @@ RSpec.describe Grell::Crawler do
       it_behaves_like 'visits all available pages'
     end
-    context 'adding all links to the blacklist' do
-      let(:blacklist) { ['/trusmis', '/help'] }
+    context 'adding all links to the denylist' do
+      let(:denylist) { ['/trusmis', '/help'] }
       let(:visited_pages_count) { 1 }
       let(:visited_pages) do
         ['http://www.example.com/test']
@@ -320,7 +320,7 @@ RSpec.describe Grell::Crawler do
   end
-  describe 'Whitelisting and blacklisting' do
+  describe 'allowlisting and denylisting' do
     let(:body) do
       "<html><head></head><body>
       <a href=\"/trusmis.html\">trusmis</a>
@@ -334,9 +334,9 @@ RSpec.describe Grell::Crawler do
       proxy.stub('http://www.example.com/help.html').and_return(body: 'body', code: 200)
     end
-    context 'we blacklist the only whitelisted page' do
-      let(:whitelist) { '/trusmis.html' }
-      let(:blacklist) { '/trusmis.html' }
+    context 'we denylist the only allowlisted page' do
+      let(:allowlist) { '/trusmis.html' }
+      let(:denylist) { '/trusmis.html' }
       let(:visited_pages_count) { 1 }
       let(:visited_pages) do
         ['http://www.example.com/test']
@@ -345,9 +345,9 @@ RSpec.describe Grell::Crawler do
       it_behaves_like 'visits all available pages'
     end
-    context 'we blacklist none of the whitelisted pages' do
-      let(:whitelist) { '/trusmis.html' }
-      let(:blacklist) { '/raistlin.html' }
+    context 'we denylist none of the allowlisted pages' do
+      let(:allowlist) { '/trusmis.html' }
+      let(:denylist) { '/raistlin.html' }
       let(:visited_pages_count) { 2 }
       let(:visited_pages) do
         ['http://www.example.com/test', 'http://www.example.com/trusmis.html']

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: grell
 version: !ruby/object:Gem::Version
-  version: 2.1.1
+  version: 2.1.2
 platform: ruby
 authors:
 - Jordi Polo Carres
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2017-11-08 00:00:00.000000000 Z
+date: 2021-02-17 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: capybara
@@ -38,20 +38,6 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: '1.11'
-- !ruby/object:Gem::Dependency
-  name: bundler
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.6'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.6'
 - !ruby/object:Gem::Dependency
   name: byebug
   requirement: !ruby/object:Gem::Requirement
@@ -215,8 +201,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.7.1
+rubygems_version: 3.0.8
 signing_key:
 specification_version: 4
 summary: Ruby web crawler