RubyGems - spidey-mongo - Versions diffs - 0.2.0 → 0.3.0 - Mend

spidey-mongo 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

checksums.yaml +7 -0
data/.travis.yml +11 -0
data/CHANGELOG.md +15 -0
data/CONTRIBUTING.md +116 -0
data/Gemfile +13 -1
data/LICENSE.txt +2 -2
data/README.md +49 -43
data/Rakefile +12 -1
data/lib/spidey-mongo.rb +3 -1
data/lib/spidey-mongo/version.rb +1 -1
data/lib/spidey/strategies/mongo.rb +9 -12
data/lib/spidey/strategies/mongo2.rb +59 -0
data/lib/spidey/strategies/moped.rb +9 -12
data/spec/spec_helper.rb +12 -2
data/spec/spidey/strategies/mongo2_spec.rb +61 -0
data/spec/spidey/strategies/mongo_spec.rb +24 -25
data/spec/spidey/strategies/moped_spec.rb +24 -25
data/spidey-mongo.gemspec +14 -17
metadata +19 -75

checksums.yaml ADDED

@@ -0,0 +1,7 @@
+---
+SHA1:
+  metadata.gz: 096f00db0e8368887d3546d1909984af270ef83e
+  data.tar.gz: dbc9ebec27c141264076557098c83e7a1576dc28
+SHA512:
+  metadata.gz: 50b3fc0d2fa3ea7837ff06e5d2f3f5bebf99047d10a43ff27dff755df492a95c649532b0ab5d0b04b8d37755b96a5558237d4c303e89f89bafc1f602e00d0e24
+  data.tar.gz: 6392e63f6d3eba223dade821bedf9c56bf44b1cf48146e0906fb74186ff81d80368522b4f1dbfe9d2cd14335870e64455c66c2ab86c60abfc1c506de00d3e9a6

data/.travis.yml ADDED

@@ -0,0 +1,11 @@
+services:
+  - mongodb
+env:
+  - MONGO_VERSION=moped
+  - MONGO_VERSION=mongo
+  - MONGO_VERSION=mongo2
+rvm:
+  - 2.2

data/CHANGELOG.md ADDED

@@ -0,0 +1,15 @@
+### Next
+* Your contribution here...
+### 0.3.0
+* [#3](https://github.com/joeyAghion/spidey-mongo/pull/3): Added support for Mongo Ruby Driver 2.x - [@dblock](https://github.com/dblock).
+### 0.2.0
+* [#1](https://github.com/joeyAghion/spidey-mongo/pull/1): Added support for Moped - [@fancyremarker](https://github.com/fancyremarker).
+### 0.1.0
+* Initial public release - [@joeyAghion](https://github.com/joeyAghion).

data/CONTRIBUTING.md ADDED

@@ -0,0 +1,116 @@
+Contributing
+============
+This gem is work of [many of contributors](https://github.com/joeyAghion/spidey-mongo/graphs/contributors). You're encouraged to submit [pull requests](https://github.com/joeyAghion/spidey-mongo/pulls), [propose features, ask questions and discuss issues](https://github.com/joeyAghion/spidey-mongo/issues).
+#### Fork the Project
+Fork the [project on Github](https://github.com/joeyAghion/spidey-mongo) and check out your copy.
+```
+git clone https://github.com/contributor/spidey-mongo.git
+cd spidey-mongo
+git remote add upstream https://github.com/joeyAghion/spidey-mongo.git
+```
+#### Create a Topic Branch
+Make sure your fork is up-to-date and create a topic branch for your feature or bug fix.
+```
+git checkout master
+git pull upstream master
+git checkout -b my-feature-branch
+```
+#### Bundle Install and Test
+Ensure that you can build the project and run tests.
+```
+bundle install
+bundle exec rake
+```
+#### Write Tests
+Try to write a test that reproduces the problem you're trying to fix or describes a feature that you want to build. Add to [spec/mongoid](spec/mongoid).
+We definitely appreciate pull requests that highlight or reproduce a problem, even without a fix.
+#### Write Code
+Implement your feature or bug fix.
+Make sure that `bundle exec rake` completes without errors.
+#### Write Documentation
+Document any external behavior in the [README](README.md).
+#### Update Changelog
+Add a line to [CHANGELOG](CHANGELOG.md) under *Next*. Make it look like every other line, including your name and link to your Github account.
+#### Commit Changes
+Make sure git knows your name and email address:
+```
+git config --global user.name "Your Name"
+git config --global user.email "contributor@example.com"
+```
+Writing good commit logs is important. A commit log should describe what changed and why.
+```
+git add ...
+git commit
+```
+#### Push
+```
+git push origin my-feature-branch
+```
+#### Make a Pull Request
+Go to https://github.com/contributor/spidey-mongo and select your feature branch. Click the 'Pull Request' button and fill out the form. Pull requests are usually reviewed within a few days.
+#### Rebase
+If you've been working on a change for a while, rebase with upstream/master.
+```
+git fetch upstream
+git rebase upstream/master
+git push origin my-feature-branch -f
+```
+#### Update CHANGELOG Again
+Update the [CHANGELOG](CHANGELOG.md) with the pull request number. A typical entry looks as follows.
+```
+* [#123](https://github.com/joeyAghion/spidey-mongo/pull/123): Reticulated splines - [@contributor](https://github.com/contributor).
+```
+Amend your previous commit and force push the changes.
+```
+git commit --amend
+git push origin my-feature-branch -f
+```
+#### Check on Your Pull Request
+Go back to your pull request after a few minutes and see whether it passed muster with Travis-CI. Everything should look green, otherwise fix issues and amend your commit as described above.
+#### Be Patient
+It's likely that your change will not be merged and that the nitpicky maintainers will ask you to do more, or fix seemingly benign problems. Hang on there!
+#### Thank You
+Please do know that we really appreciate and value your time and work. We love you, really.

data/Gemfile CHANGED

@@ -1,4 +1,16 @@
-source "http://rubygems.org"
+source 'http://rubygems.org'
+case version = ENV['MONGO_VERSION'] || 'mongo2'
+when /^moped/
+  gem 'moped', '~> 2.0'
+when /^mongo2/
+  gem 'mongo', '~> 2.0'
+when /^mongo/
+  gem 'mongo', '~> 1.12'
+  gem 'bson_ext'
+else
+  fail "Invalid MONGO_VERSION: #{ENV['MONGO_VERSION']}."
+end
 # Specify your gem's dependencies in spidey-mongo.gemspec

data/LICENSE.txt CHANGED

@@ -1,4 +1,4 @@
-Copyright (c) 2012 Joey Aghion, Art.sy Inc.
+Copyright (c) 2012-2015 Joey Aghion, Artsy Inc., and Contributors
 Permission is hereby granted, free of charge, to any person obtaining
 a copy of this software and associated documentation files (the
@@ -17,4 +17,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
 NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
 LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md CHANGED

@@ -1,6 +1,9 @@
 Spidey-Mongo
 ============
+[![Build Status](https://travis-ci.org/joeyAghion/spidey-mongo.svg?branch=master)](https://travis-ci.org/joeyAghion/spidey-mongo)
+[![Gem Version](https://badge.fury.io/rb/spidey-mongo.svg)](https://badge.fury.io/rb/spidey-mongo)
 This gem implements a [MongoDB](http://www.mongodb.org/) back-end for [Spidey](https://github.com/joeyAghion/spidey), a very simple framework for crawling and scraping web sites.
 See [Spidey](https://githubcom/joeyAghion/spidey)'s documentation for a basic example spider class.
@@ -12,45 +15,52 @@ Usage
 ### Install the gem
-    gem install spidey-mongo
+``` ruby
+gem install spidey-mongo
+```
 ### `mongo` versus `moped`
-Spidey-Mongo provides two strategies:
+Spidey-Mongo provides three strategies:
-* `Spidey::Strategies::Mongo`: Compatible with 10gen's [`mongo`](https://github.com/mongodb/mongo-ruby-driver) gem
-* `Spidey::Strategies::Moped`: Compatible with the [`moped`](https://github.com/mongoid/moped) gem, e.g., for use with Mongoid 3.x
+* `Spidey::Strategies::Mongo`: Compatible with Mongo Ruby Driver 1.x, [`mongo`](https://github.com/mongodb/mongo-ruby-driver)
+* `Spidey::Strategies::Mongo2`: Compatible with Mongo Ruby Driver 2.x, [`mongo`](https://github.com/mongodb/mongo-ruby-driver), e.g., for use with Mongoid 5.x
+* `Spidey::Strategies::Moped`: Compatible with the [`moped`](https://github.com/mongoid/moped) 2.x, e.g., for use with Mongoid 3.x and 4.x
 You can include either strategy in your classes, as appropriate. All the examples in this README assume `Spidey::Strategies::Mongo`.
 ### Example spider class
-    class EbaySpider < Spidey::AbstractSpider
-      include Spidey::Strategies::Mongo
-      handle "http://www.ebay.com", :process_home
-      def process_home(page, default_data = {})
-        # ...
-      end
-    end
+```ruby
+class EbaySpider < Spidey::AbstractSpider
+  include Spidey::Strategies::Mongo
+  handle "http://www.ebay.com", :process_home
+  def process_home(page, default_data = {})
+    # ...
+  end
+end
+```
 ### Invocation
 The spider's constructor accepts new parameters for each of the MongoDB collections to employ: `url_collection`, `result_collection`, and `error_collection`.
-    db = Mongo::Connection.new['example']
-    spider = EbaySpider.new(
-      url_collection: db['urls'],
-      result_collection: db['results'],
-      error_collection: db['errors'])
+```ruby
+db = Mongo::Connection.new['example']
+spider = EbaySpider.new(
+  url_collection: db['urls'],
+  result_collection: db['results'],
+  error_collection: db['errors'])
+```
 With persistent storage of the URL-crawling queue, it's now possible to stop crawling and resume at a later point. The `crawl` method accepts a new optional `crawl_for` parameter specifying the number of seconds after which to stop.
-    spider.crawl crawl_for: 600  # seconds, or more conveniently (w/ActiveSupport): 10.minutes
+```
+spider.crawl crawl_for: 600  # seconds, or more conveniently (w/ActiveSupport): 10.minutes
+```
 (The base implementation's `max_urls` parameter is also useful for this purpose.)
@@ -58,32 +68,28 @@ With persistent storage of the URL-crawling queue, it's now possible to stop cra
 By default, invocations of `record(data)` by the spider simply insert new documents into the result collection. If corresponding results may already exist in the collection and should instead be updated, define a `result_key` method that returns a key by which to find the corresponding document. The method is called with a hash of the data being recorded:
-    class EbaySpider < Spidey::AbstractSpider
-      include Spidey::Strategies::Mongo
-      def result_key(data)
-        data[:detail_url]
-      end
+```ruby
+class EbaySpider < Spidey::AbstractSpider
+  include Spidey::Strategies::Mongo
-      # ...
-    end
+  def result_key(data)
+    data[:detail_url]
+  end
-This performs an `upsert` instead of the usual `insert` (i.e., an update if a result document matching the key already exists, or insert otherwise).
+  # ...
+end
+```
-Testing
--------
-    bundle exec rspec
-Contributors
-------------
+This performs an `upsert` instead of the usual `insert` (i.e., an update if a result document matching the key already exists, or insert otherwise).
-[Joey Aghion](https://github.com/joeyAghion), [Frank Macreery](https://github.com/fancyremarker)
+Contrbuting
+-----------
-To Do
------
-* Extract behaviors shared by `Mongo` and `Moped` strategies.
+Please contribute! See [CONTRIBUTING](CONTRIBUTING.md) for details.
 Copyright
 ---------
-Copyright (c) 2012, 2013 Joey Aghion, Artsy Inc. See [LICENSE.txt](LICENSE.txt) for further details.
+Copyright (c) 2012-2015 Joey Aghion, Artsy Inc., and Contributors.
+See [LICENSE.txt](LICENSE.txt) for further details.

data/Rakefile CHANGED

@@ -1 +1,12 @@
-require "bundler/gem_tasks"
+require 'bundler/gem_tasks'
+Bundler.setup :default, :development
+require 'rspec/core'
+require 'rspec/core/rake_task'
+RSpec::Core::RakeTask.new(:spec) do |spec|
+  spec.pattern = FileList["spec/**/#{ENV['MONGO_VERSION'] || 'mongo2'}_spec.rb"]
+end
+task default: :spec

data/lib/spidey-mongo.rb CHANGED

@@ -1,5 +1,7 @@
 require 'spidey'
 require 'spidey-mongo/version'
 require 'spidey/strategies/mongo'
-require 'spidey/strategies/moped'
+require 'spidey/strategies/mongo2'
+require 'spidey/strategies/moped'

data/lib/spidey-mongo/version.rb CHANGED

@@ -1,5 +1,5 @@
 module Spidey
   module Mongo
-    VERSION = "0.2.0"
+    VERSION = '0.3.0'
   end
 end

data/lib/spidey/strategies/mongo.rb CHANGED

@@ -18,8 +18,8 @@ module Spidey::Strategies
     def handle(url, handler, default_data = {})
       Spidey.logger.info "Queueing #{url.inspect[0..200]}..."
       url_collection.update(
-        {'spider' => self.class.name, 'url' => url},
-        {'$set' => {'handler' => handler, 'default_data' => default_data}},
+        { 'spider' => self.class.name, 'url' => url },
+        { '$set' => { 'handler' => handler, 'default_data' => default_data } },
         upsert: true
       )
     end
@@ -28,16 +28,16 @@ module Spidey::Strategies
       doc = data.merge('spider' => self.class.name)
       Spidey.logger.info "Recording #{doc.inspect[0..500]}..."
       if respond_to?(:result_key) && key = result_key(doc)
-        result_collection.update({'key' => key}, {'$set' => doc}, upsert: true)
+        result_collection.update({ 'key' => key }, { '$set' => doc }, upsert: true)
       else
         result_collection.insert doc
       end
     end
-    def each_url(&block)
+    def each_url(&_block)
       while url = get_next_url
-        break if url['last_crawled_at'] && url['last_crawled_at'] >= @crawl_started_at  # crawled already in this batch
-        url_collection.update({'_id' => url['_id']}, '$set' => {last_crawled_at: Time.now})
+        break if url['last_crawled_at'] && url['last_crawled_at'] >= @crawl_started_at # crawled already in this batch
+        url_collection.update({ '_id' => url['_id'] }, '$set' => { last_crawled_at: Time.now })
         yield url['url'], url['handler'], url['default_data'].symbolize_keys
       end
     end
@@ -49,14 +49,11 @@ module Spidey::Strategies
       Spidey.logger.error "Error on #{attrs[:url]}. #{error.class}: #{error.message}"
     end
-  private
+    private
     def get_next_url
-      return nil if (@until && Time.now >= @until)  # exceeded time bound
-      url_collection.find_one({spider: self.class.name}, {
-        sort: [[:last_crawled_at, ::Mongo::ASCENDING], [:_id, ::Mongo::ASCENDING]]
-      })
+      return nil if @until && Time.now >= @until # exceeded time bound
+      url_collection.find_one({ spider: self.class.name }, sort: [[:last_crawled_at, ::Mongo::ASCENDING], [:_id, ::Mongo::ASCENDING]])
     end
   end
 end

data/lib/spidey/strategies/mongo2.rb ADDED

@@ -0,0 +1,59 @@
+module Spidey::Strategies
+  module Mongo2
+    attr_accessor :url_collection, :result_collection, :error_collection
+    def initialize(attrs = {})
+      self.url_collection = attrs.delete(:url_collection)
+      self.result_collection = attrs.delete(:result_collection)
+      self.error_collection = attrs.delete(:error_collection)
+      super attrs
+    end
+    def crawl(options = {})
+      @crawl_started_at = Time.now
+      @until = Time.now + options[:crawl_for] if options[:crawl_for]
+      super options
+    end
+    def handle(url, handler, default_data = {})
+      Spidey.logger.info "Queueing #{url.inspect[0..200]}..."
+      url_collection.update_one(
+        { 'spider' => self.class.name, 'url' => url },
+        { '$set' => { 'handler' => handler, 'default_data' => default_data } },
+        upsert: true
+      )
+    end
+    def record(data)
+      doc = data.merge('spider' => self.class.name)
+      Spidey.logger.info "Recording #{doc.inspect[0..500]}..."
+      if respond_to?(:result_key) && key = result_key(doc)
+        result_collection.update_one({ 'key' => key }, { '$set' => doc }, upsert: true)
+      else
+        result_collection.insert_one doc
+      end
+    end
+    def each_url(&_block)
+      while url = get_next_url
+        break if url['last_crawled_at'] && url['last_crawled_at'] >= @crawl_started_at # crawled already in this batch
+        url_collection.update_one({ '_id' => url['_id'] }, '$set' => { last_crawled_at: Time.now })
+        yield url['url'], url['handler'], url['default_data'].symbolize_keys
+      end
+    end
+    def add_error(attrs)
+      error = attrs.delete(:error)
+      doc = attrs.merge(created_at: Time.now, error: error.class.name, message: error.message, spider: self.class.name)
+      error_collection.insert_one doc
+      Spidey.logger.error "Error on #{attrs[:url]}. #{error.class}: #{error.message}"
+    end
+    private
+    def get_next_url
+      return nil if @until && Time.now >= @until # exceeded time bound
+      url_collection.find({ spider: self.class.name }, sort: [[:last_crawled_at, ::Mongo::ASCENDING], [:_id, ::Mongo::ASCENDING]]).first
+    end
+  end
+end

data/lib/spidey/strategies/moped.rb CHANGED

@@ -18,9 +18,9 @@ module Spidey::Strategies
     def handle(url, handler, default_data = {})
       Spidey.logger.info "Queueing #{url.inspect[0..200]}..."
       url_collection.find(
-        {'spider' => self.class.name, 'url' => url}
+        'spider' => self.class.name, 'url' => url
       ).upsert(
-        {'$set' => {'handler' => handler, 'default_data' => default_data}}
+        '$set' => { 'handler' => handler, 'default_data' => default_data }
       )
     end
@@ -28,16 +28,16 @@ module Spidey::Strategies
       doc = data.merge('spider' => self.class.name)
       Spidey.logger.info "Recording #{doc.inspect[0..500]}..."
       if respond_to?(:result_key) && key = result_key(doc)
-        result_collection.find({'key' => key}).upsert({'$set' => doc})
+        result_collection.find('key' => key).upsert('$set' => doc)
       else
         result_collection.insert doc
       end
     end
-    def each_url(&block)
+    def each_url(&_block)
       while url = get_next_url
-        break if url['last_crawled_at'] && url['last_crawled_at'] >= @crawl_started_at  # crawled already in this batch
-        url_collection.find({'_id' => url['_id']}).update('$set' => {last_crawled_at: Time.now})
+        break if url['last_crawled_at'] && url['last_crawled_at'] >= @crawl_started_at # crawled already in this batch
+        url_collection.find('_id' => url['_id']).update('$set' => { last_crawled_at: Time.now })
         yield url['url'], url['handler'], url['default_data'].symbolize_keys
       end
     end
@@ -49,14 +49,11 @@ module Spidey::Strategies
       Spidey.logger.error "Error on #{attrs[:url]}. #{error.class}: #{error.message}"
     end
-  private
+    private
     def get_next_url
-      return nil if (@until && Time.now >= @until)  # exceeded time bound
-      url_collection.find({spider: self.class.name}).sort({
-        'last_crawled_at' => 1, '_id' => 1
-      }).first
+      return nil if @until && Time.now >= @until # exceeded time bound
+      url_collection.find(spider: self.class.name).sort('last_crawled_at' => 1, '_id' => 1).first
     end
   end
 end

data/spec/spec_helper.rb CHANGED

@@ -1,8 +1,18 @@
-$:.unshift(File.dirname(__FILE__) + '/../lib')
+$LOAD_PATH.unshift(File.dirname(__FILE__) + '/../lib')
+case version = ENV['MONGO_VERSION'] || 'mongo2'
+when /^moped/
+  require 'moped'
+when /^mongo/
+  require 'mongo'
+else
+  fail "Invalid MONGO_VERSION: #{ENV['MONGO_VERSION']}."
+end
 require 'spidey-mongo'
 RSpec.configure do |config|
-  config.treat_symbols_as_metadata_keys_with_true_values = true
   config.run_all_when_everything_filtered = true
   config.filter_run :focus
+  config.raise_errors_for_deprecations!
 end

data/spec/spidey/strategies/mongo2_spec.rb ADDED

@@ -0,0 +1,61 @@
+require 'spec_helper'
+require 'mongo'
+describe Spidey::Strategies::Mongo do
+  class TestMongoSpider < Spidey::AbstractSpider
+    include Spidey::Strategies::Mongo2
+    handle 'http://www.cnn.com', :process_home
+    def result_key(data)
+      data[:detail_url]
+    end
+  end
+  before(:each) do
+    @db = Mongo::Client.new('mongodb://127.0.0.1:27017/spidey-mongo-test')
+    @spider = TestMongoSpider.new(
+      url_collection: @db['urls'],
+      result_collection: @db['results'],
+      error_collection: @db['errors'])
+  end
+  after(:each) do
+    %w( urls results errors ).each { |col| @db[col].drop }
+  end
+  it 'should add initial URLs to collection' do
+    doc = @db['urls'].find(url: 'http://www.cnn.com').first
+    expect(doc['handler']).to eq(:process_home)
+    expect(doc['spider']).to eq('TestMongoSpider')
+  end
+  it 'should not add duplicate URLs' do
+    @spider.send :handle, 'http://www.cnn.com', :process_home
+    expect(@db['urls'].find(url: 'http://www.cnn.com').count).to eq(1)
+  end
+  it 'should add results' do
+    @spider.record detail_url: 'http://www.cnn.com', foo: 'bar'
+    expect(@db['results'].count).to eq(1)
+    doc = @db['results'].find.first
+    expect(doc['detail_url']).to eq('http://www.cnn.com')
+    expect(doc['foo']).to eq('bar')
+    expect(doc['spider']).to eq('TestMongoSpider')
+  end
+  it 'should update existing result' do
+    @db['results'].insert_one key: 'http://foo.bar', detail_url: 'http://foo.bar'
+    @spider.record detail_url: 'http://foo.bar', foo: 'bar'
+    expect(@db['results'].count).to eq(1)
+  end
+  it 'should add error' do
+    @spider.add_error error: Exception.new('WTF'), url: 'http://www.cnn.com', handler: :blah
+    doc = @db['errors'].find.first
+    expect(doc['error']).to eq('Exception')
+    expect(doc['url']).to eq('http://www.cnn.com')
+    expect(doc['handler']).to eq(:blah)
+    expect(doc['message']).to eq('WTF')
+    expect(doc['spider']).to eq('TestMongoSpider')
+  end
+end

data/spec/spidey/strategies/mongo_spec.rb CHANGED

@@ -4,7 +4,7 @@ require 'mongo'
 describe Spidey::Strategies::Mongo do
   class TestMongoSpider < Spidey::AbstractSpider
     include Spidey::Strategies::Mongo
-    handle "http://www.cnn.com", :process_home
+    handle 'http://www.cnn.com', :process_home
     def result_key(data)
       data[:detail_url]
@@ -20,43 +20,42 @@ describe Spidey::Strategies::Mongo do
   end
   after(:each) do
-    %w{ urls results errors }.each{ |col| @db[col].drop }
+    %w( urls results errors ).each { |col| @db[col].drop }
   end
-  it "should add initial URLs to collection" do
-    doc = @db['urls'].find_one(url: "http://www.cnn.com")
-    doc['handler'].should == :process_home
-    doc['spider'].should == 'TestMongoSpider'
+  it 'should add initial URLs to collection' do
+    doc = @db['urls'].find_one(url: 'http://www.cnn.com')
+    expect(doc['handler']).to eq(:process_home)
+    expect(doc['spider']).to eq('TestMongoSpider')
   end
-  it "should not add duplicate URLs" do
-    @spider.send :handle, "http://www.cnn.com", :process_home
-    @db['urls'].find(url: "http://www.cnn.com").count.should == 1
+  it 'should not add duplicate URLs' do
+    @spider.send :handle, 'http://www.cnn.com', :process_home
+    expect(@db['urls'].find(url: 'http://www.cnn.com').count).to eq(1)
   end
-  it "should add results" do
+  it 'should add results' do
     @spider.record detail_url: 'http://www.cnn.com', foo: 'bar'
-    @db['results'].count.should == 1
+    expect(@db['results'].count).to eq(1)
     doc = @db['results'].find_one
-    doc['detail_url'].should == 'http://www.cnn.com'
-    doc['foo'].should == 'bar'
-    doc['spider'].should == 'TestMongoSpider'
+    expect(doc['detail_url']).to eq('http://www.cnn.com')
+    expect(doc['foo']).to eq('bar')
+    expect(doc['spider']).to eq('TestMongoSpider')
   end
-  it "should update existing result" do
+  it 'should update existing result' do
     @db['results'].insert key: 'http://foo.bar', detail_url: 'http://foo.bar'
     @spider.record detail_url: 'http://foo.bar', foo: 'bar'
-    @db['results'].count.should == 1
+    expect(@db['results'].count).to eq(1)
   end
-  it "should add error" do
-    @spider.add_error error: Exception.new("WTF"), url: "http://www.cnn.com", handler: :blah
+  it 'should add error' do
+    @spider.add_error error: Exception.new('WTF'), url: 'http://www.cnn.com', handler: :blah
     doc = @db['errors'].find_one
-    doc['error'].should == 'Exception'
-    doc['url'].should == 'http://www.cnn.com'
-    doc['handler'].should == :blah
-    doc['message'].should == 'WTF'
-    doc['spider'].should == 'TestMongoSpider'
+    expect(doc['error']).to eq('Exception')
+    expect(doc['url']).to eq('http://www.cnn.com')
+    expect(doc['handler']).to eq(:blah)
+    expect(doc['message']).to eq('WTF')
+    expect(doc['spider']).to eq('TestMongoSpider')
   end
-end
+end

data/spec/spidey/strategies/moped_spec.rb CHANGED

@@ -4,7 +4,7 @@ require 'moped'
 describe Spidey::Strategies::Moped do
   class TestMopedSpider < Spidey::AbstractSpider
     include Spidey::Strategies::Moped
-    handle "http://www.cnn.com", :process_home
+    handle 'http://www.cnn.com', :process_home
     def result_key(data)
       data[:detail_url]
@@ -21,43 +21,42 @@ describe Spidey::Strategies::Moped do
   end
   after(:each) do
-    %w{ urls results errors }.each{ |col| @db[col].drop }
+    %w( urls results errors ).each { |col| @db[col].drop }
   end
-  it "should add initial URLs to collection" do
-    doc = @db['urls'].find(url: "http://www.cnn.com").first
-    doc['handler'].should == :process_home
-    doc['spider'].should == 'TestMopedSpider'
+  it 'should add initial URLs to collection' do
+    doc = @db['urls'].find(url: 'http://www.cnn.com').first
+    expect(doc['handler']).to eq(:process_home)
+    expect(doc['spider']).to eq('TestMopedSpider')
   end
-  it "should not add duplicate URLs" do
-    @spider.send :handle, "http://www.cnn.com", :process_home
-    @db['urls'].find(url: "http://www.cnn.com").count.should == 1
+  it 'should not add duplicate URLs' do
+    @spider.send :handle, 'http://www.cnn.com', :process_home
+    expect(@db['urls'].find(url: 'http://www.cnn.com').count).to eq(1)
   end
-  it "should add results" do
+  it 'should add results' do
     @spider.record detail_url: 'http://www.cnn.com', foo: 'bar'
-    @db['results'].find.count.should == 1
+    expect(@db['results'].find.count).to eq(1)
     doc = @db['results'].find.first
-    doc['detail_url'].should == 'http://www.cnn.com'
-    doc['foo'].should == 'bar'
-    doc['spider'].should == 'TestMopedSpider'
+    expect(doc['detail_url']).to eq('http://www.cnn.com')
+    expect(doc['foo']).to eq('bar')
+    expect(doc['spider']).to eq('TestMopedSpider')
   end
-  it "should update existing result" do
+  it 'should update existing result' do
     @db['results'].insert key: 'http://foo.bar', detail_url: 'http://foo.bar'
     @spider.record detail_url: 'http://foo.bar', foo: 'bar'
-    @db['results'].find.count.should == 1
+    expect(@db['results'].find.count).to eq(1)
   end
-  it "should add error" do
-    @spider.add_error error: Exception.new("WTF"), url: "http://www.cnn.com", handler: :blah
+  it 'should add error' do
+    @spider.add_error error: Exception.new('WTF'), url: 'http://www.cnn.com', handler: :blah
     doc = @db['errors'].find.first
-    doc['error'].should == 'Exception'
-    doc['url'].should == 'http://www.cnn.com'
-    doc['handler'].should == :blah
-    doc['message'].should == 'WTF'
-    doc['spider'].should == 'TestMopedSpider'
+    expect(doc['error']).to eq('Exception')
+    expect(doc['url']).to eq('http://www.cnn.com')
+    expect(doc['handler']).to eq(:blah)
+    expect(doc['message']).to eq('WTF')
+    expect(doc['spider']).to eq('TestMopedSpider')
   end
-end
+end

data/spidey-mongo.gemspec CHANGED

@@ -1,29 +1,26 @@
 # -*- encoding: utf-8 -*-
-$:.push File.expand_path("../lib", __FILE__)
-require "spidey-mongo/version"
+$LOAD_PATH.push File.expand_path('../lib', __FILE__)
+require 'spidey-mongo/version'
 Gem::Specification.new do |s|
-  s.name        = "spidey-mongo"
+  s.name        = 'spidey-mongo'
   s.version     = Spidey::Mongo::VERSION
-  s.authors     = ["Joey Aghion"]
-  s.email       = ["joey@aghion.com"]
-  s.homepage    = "https://github.com/joeyAghion/spidey-mongo"
-  s.summary     = %q{Implements a MongoDB back-end for Spidey, a framework for crawling and scraping web sites.}
-  s.description = %q{Implements a MongoDB back-end for Spidey, a framework for crawling and scraping web sites.}
+  s.authors     = ['Joey Aghion']
+  s.email       = ['joey@aghion.com']
+  s.homepage    = 'https://github.com/joeyAghion/spidey-mongo'
+  s.summary     = 'Implements a MongoDB back-end for Spidey, a framework for crawling and scraping web sites.'
+  s.description = 'Implements a MongoDB back-end for Spidey, a framework for crawling and scraping web sites.'
   s.license     = 'MIT'
-  s.rubyforge_project = "spidey-mongo"
+  s.rubyforge_project = 'spidey-mongo'
   s.files         = `git ls-files`.split("\n")
   s.test_files    = `git ls-files -- {test,spec,features}/*`.split("\n")
-  s.executables   = `git ls-files -- bin/*`.split("\n").map{ |f| File.basename(f) }
-  s.require_paths = ["lib"]
+  s.executables   = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
+  s.require_paths = ['lib']
-  s.add_development_dependency "rake"
-  s.add_development_dependency "rspec"
-  s.add_development_dependency "mongo"
-  s.add_development_dependency "bson_ext"
-  s.add_development_dependency "moped"
+  s.add_development_dependency 'rake'
+  s.add_development_dependency 'rspec'
-  s.add_runtime_dependency "spidey", ">= 0.1.0"
+  s.add_runtime_dependency 'spidey', '>= 0.1.0'
 end

metadata CHANGED

@@ -1,110 +1,55 @@
 --- !ruby/object:Gem::Specification
 name: spidey-mongo
 version: !ruby/object:Gem::Version
-  version: 0.2.0
-  prerelease:
+  version: 0.3.0
 platform: ruby
 authors:
 - Joey Aghion
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-08-02 00:00:00.000000000 Z
+date: 2015-11-04 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
-    none: false
     requirements:
-    - - ! '>='
+    - - '>='
       - !ruby/object:Gem::Version
         version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
-    none: false
     requirements:
-    - - ! '>='
+    - - '>='
       - !ruby/object:Gem::Version
         version: '0'
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
-    none: false
     requirements:
-    - - ! '>='
+    - - '>='
       - !ruby/object:Gem::Version
         version: '0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
-    none: false
     requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: mongo
-  requirement: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: bson_ext
-  requirement: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: moped
-  requirement: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    none: false
-    requirements:
-    - - ! '>='
+    - - '>='
       - !ruby/object:Gem::Version
         version: '0'
 - !ruby/object:Gem::Dependency
   name: spidey
   requirement: !ruby/object:Gem::Requirement
-    none: false
     requirements:
-    - - ! '>='
+    - - '>='
       - !ruby/object:Gem::Version
         version: 0.1.0
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
-    none: false
     requirements:
-    - - ! '>='
+    - - '>='
       - !ruby/object:Gem::Version
         version: 0.1.0
 description: Implements a MongoDB back-end for Spidey, a framework for crawling and
@@ -116,6 +61,9 @@ extensions: []
 extra_rdoc_files: []
 files:
 - .gitignore
+- .travis.yml
+- CHANGELOG.md
+- CONTRIBUTING.md
 - Gemfile
 - LICENSE.txt
 - README.md
@@ -123,44 +71,40 @@ files:
 - lib/spidey-mongo.rb
 - lib/spidey-mongo/version.rb
 - lib/spidey/strategies/mongo.rb
+- lib/spidey/strategies/mongo2.rb
 - lib/spidey/strategies/moped.rb
 - spec/spec_helper.rb
+- spec/spidey/strategies/mongo2_spec.rb
 - spec/spidey/strategies/mongo_spec.rb
 - spec/spidey/strategies/moped_spec.rb
 - spidey-mongo.gemspec
 homepage: https://github.com/joeyAghion/spidey-mongo
 licenses:
 - MIT
+metadata: {}
 post_install_message:
 rdoc_options: []
 require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
-  - - ! '>='
+  - - '>='
     - !ruby/object:Gem::Version
       version: '0'
-      segments:
-      - 0
-      hash: 987129952958952365
 required_rubygems_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
-  - - ! '>='
+  - - '>='
     - !ruby/object:Gem::Version
       version: '0'
-      segments:
-      - 0
-      hash: 987129952958952365
 requirements: []
 rubyforge_project: spidey-mongo
-rubygems_version: 1.8.25
+rubygems_version: 2.0.14
 signing_key:
-specification_version: 3
+specification_version: 4
 summary: Implements a MongoDB back-end for Spidey, a framework for crawling and scraping
   web sites.
 test_files:
 - spec/spec_helper.rb
+- spec/spidey/strategies/mongo2_spec.rb
 - spec/spidey/strategies/mongo_spec.rb
 - spec/spidey/strategies/moped_spec.rb