storexplore 0.0.1 → 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 4e30362dde2dd4fe6383bc76ff48c9de16642f54
4
- data.tar.gz: 7d81d28ab8f78dc61c8334239161d437f1ce1f0b
3
+ metadata.gz: f7da700982a0fbe6f80b7aa37ff7e4f231f43935
4
+ data.tar.gz: 5fdb4f7684e506964f3fff42bdde441f70e71e92
5
5
  SHA512:
6
- metadata.gz: 3d3b2741dd23d1142d54c99dbc4069445b46bc50b47dffed6f58a84d8f27fb3d70f3f359d980e1bb74ce433a83c32c80ffbba68bbc46723298df51cf3ad0d385
7
- data.tar.gz: 5f9dcd46b4549a41a966cee549b956b3f32defef3ca013aaf996028a1e0d7c9e9e7f1857de603bd28201570f964f0cb0bd86e36db3eae6e058e09c1ab49c70bd
6
+ metadata.gz: 97e8fe6604856ffe5c4baa1287fe40ce5cfe40c4a950af74496bf848b3e42210973b218a117fc9a011d17814f82ca499e141fe810035e78a1d76a6bc45c58170
7
+ data.tar.gz: c5645631ef4ec8569277f15c2d7b9354f26a134833ee79e9cefd52ac0b3fea37919fabe8d51eab67be94479dd18c8cf2400aa8ca362d6aca846c6c1e41971c79
checksums.yaml.gz.sig CHANGED
Binary file
data.tar.gz.sig CHANGED
Binary file
data/.travis.yml ADDED
@@ -0,0 +1,8 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.0.0
4
+ # no lazy for enumerators in jruby 2.0
5
+ # - jruby-20mode
6
+ # no rubynius working at the moment
7
+ # - rbx-20mode
8
+ script: bundle exec rake
data/README.md CHANGED
@@ -4,6 +4,9 @@ A declarative scrapping DSL that lets one define directory like apis to an onlin
4
4
 
5
5
  ## Installation
6
6
 
7
+ In order to be able to enumerate all items of a store in constant memory,
8
+ Storexplore requires Matz Ruby 2.0 for its lazy enumerators.
9
+
7
10
  Add this line to your application's Gemfile:
8
11
 
9
12
  gem 'storexplore'
@@ -18,7 +21,132 @@ Or install it yourself as:
18
21
 
19
22
  ## Usage
20
23
 
21
- TODO: Write usage instructions here
24
+ The library builds hierarchical APIs on online stores. Stores are typicaly
25
+ organized in the following way :
26
+
27
+ Store > Categories > ... > Sub Categories > Items
28
+
29
+ The store is like a root category. Any category, at any depth level can have
30
+ both children categories and items. Items cannot have children of any kind.
31
+ Both categories and items can have attributes.
32
+
33
+ All searching of children and attributes is done through mechanize/nokogiri
34
+ selectors (css or xpath).
35
+
36
+ Here is a sample store api declaration :
37
+
38
+ ```ruby
39
+ Storexplore::define_api 'dummy-store.com' do
40
+
41
+ categories 'a.category' do
42
+ attributes do
43
+ { :name => page.get_one("h1").content }
44
+ end
45
+
46
+ categories 'a.category' do
47
+ attributes do
48
+ { :name => page.get_one("h1").content }
49
+ end
50
+
51
+ items 'a.item' do
52
+ attributes do
53
+ {
54
+ :name => page.get_one('h1').content,
55
+ :brand => page.get_one('#brand').content,
56
+ :price => page.get_one('#price').content.to_f,
57
+ :image => page.get_one('#image').content,
58
+ :remote_id => page.get_one('#remote_id').content
59
+ }
60
+ end
61
+ end
62
+ end
63
+ end
64
+ end
65
+ ```
66
+
67
+ This build a hierarchical API on the 'dummy-store.com' online store. This
68
+ registers a new api definition that will be used to browse any store which
69
+ uri contains 'dummy-store.com'.
70
+
71
+ Now here is how this API can be accessed to pretty print all its content:
72
+
73
+ ```ruby
74
+ Api.browse('http://www.dummy-store.com').categories.each do |category|
75
+
76
+ puts "category: #{category.title}"
77
+ puts "attributes: #{category.attributes}"
78
+
79
+ category.categories.each do |sub_category|
80
+
81
+ puts " category: #{sub_category.title}"
82
+ puts " attributes: #{sub_category.attributes}"
83
+
84
+ sub_category.items.each do |item|
85
+
86
+ puts " item: #{item.title}"
87
+ puts " attributes: #{item.attributes}"
88
+
89
+ end
90
+ end
91
+ end
92
+ ```
93
+
94
+ ### Testing
95
+
96
+ Storexplore ships with some dummy store generation utilities. Dummy stores can
97
+ be generated to the file system using the Storexplore::Testing::DummyStore and
98
+ Storexplore::Testing::DummyStoreGenerator classes. This is particularly useful
99
+ while testing.
100
+
101
+ To use it, add the following, to your spec_helper.rb for example :
102
+
103
+ ```ruby
104
+ require 'storexplore/testing'
105
+
106
+ Storexplore::Testing.config do |config|
107
+ config.dummy_store_generation_dir= File.join(Rails.root, '../tmp')
108
+ end
109
+ ```
110
+
111
+ It is then possible to generate a store with the following :
112
+
113
+ ```ruby
114
+ DummyStore.wipe_out_store(store_name)
115
+ @store_generator = DummyStore.open(store_name)
116
+ @store_generator.generate(3).categories.and(3).categories.and(item_count).items
117
+ ```
118
+
119
+ It is also possibe to add elements with explicit values :
120
+
121
+ ```ruby
122
+ @store_generator.
123
+ category(cat_name = "extra long category name").
124
+ category(sub_cat_name = "extra long sub category name").
125
+ item(item_name = "super extra long item name").generate().
126
+ attributes(price: 12.3)
127
+ ```
128
+
129
+ Storexplore provides an api definition for dummy stores in
130
+ 'storexplore/testing/dummy_store_api'. It can be required independently if
131
+ needed.
132
+
133
+ ### RSpec shared examples
134
+
135
+ Storexplore also ships with an rspec shared examples macro. It can be used for
136
+ any custom store API definition.
137
+
138
+ ```ruby
139
+ require 'storexplore/testing'
140
+
141
+ describe "MyStoreApi" do
142
+ include Storexplore::Testing::ApiSpecMacros
143
+
144
+ it_should_behave_like_any_store_items_api
145
+
146
+ ...
147
+
148
+ end
149
+ ```
22
150
 
23
151
  ## Contributing
24
152
 
data/Rakefile CHANGED
@@ -1 +1,7 @@
1
1
  require "bundler/gem_tasks"
2
+ require 'rspec/core/rake_task'
3
+
4
+ desc 'Run specs'
5
+ RSpec::Core::RakeTask.new(:spec)
6
+
7
+ task :default => :spec
data/lib/storexplore.rb CHANGED
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # storexplore.rb
4
4
  #
5
- # Copyright (c) 2010, 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2010, 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -22,9 +22,9 @@
22
22
  require "storexplore/version"
23
23
  require "storexplore/array_utils"
24
24
  require "storexplore/api"
25
- require "storexplore/api_builder"
26
25
  require "storexplore/browsing_error"
27
26
  require "storexplore/digger"
27
+ require "storexplore/dsl"
28
28
  require "storexplore/hash_utils"
29
29
  require "storexplore/null_digger"
30
30
  require "storexplore/uri_utils"
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # api.rb
4
4
  #
5
- # Copyright (c) 2010, 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2010, 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -24,12 +24,18 @@ module Storexplore
24
24
  # Objects able to walk a store and discover available items
25
25
  class Api
26
26
 
27
+ def self.define(name, &block)
28
+ builder = Dsl.walker_builder(&block)
29
+
30
+ register_builder(name, builder)
31
+ end
32
+
27
33
  def self.browse(store_url)
28
- builder(store_url).new(WalkerPage.open(store_url))
34
+ builder(store_url).new_walker(WalkerPage.open(store_url))
29
35
  end
30
36
 
31
- def self.register_builder(name, builder)
32
- builders[name] = builder
37
+ def self.undef(name)
38
+ builders.delete(name)
33
39
  end
34
40
 
35
41
  # Uri of the main page of the store
@@ -46,6 +52,10 @@ module Storexplore
46
52
 
47
53
  private
48
54
 
55
+ def self.register_builder(name, builder)
56
+ builders[name] = builder
57
+ end
58
+
49
59
  def self.builder(store_url)
50
60
  builders.each do |name, builder|
51
61
  if store_url.include?(name)
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # digger.rb
4
4
  #
5
- # Copyright (c) 2010, 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2010, 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -21,14 +21,14 @@
21
21
 
22
22
  module Storexplore
23
23
  class Digger
24
- def initialize(selector, factory)
24
+ def initialize(selector, sub_walker_builder)
25
25
  @selector = selector
26
- @factory = factory
26
+ @sub_walker_builder = sub_walker_builder
27
27
  end
28
28
 
29
29
  def sub_walkers(page, father)
30
30
  page.search_links(@selector).each_with_index.to_a.lazy.map do |link, i|
31
- @factory.new(link, father, i)
31
+ @sub_walker_builder.new_walker(link, father, i)
32
32
  end
33
33
  end
34
34
  end
@@ -0,0 +1,61 @@
1
+ # -*- encoding: utf-8 -*-
2
+ #
3
+ # dsl.rb
4
+ #
5
+ # Copyright (c) 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
+ #
7
+ # This library is free software; you can redistribute it and/or
8
+ # modify it under the terms of the GNU Lesser General Public
9
+ # License as published by the Free Software Foundation; either
10
+ # version 3.0 of the License, or (at your option) any later version.
11
+ #
12
+ # This library is distributed in the hope that it will be useful,
13
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15
+ # Lesser General Public License for more details.
16
+ #
17
+ # You should have received a copy of the GNU Lesser General Public
18
+ # License along with this library; if not, write to the Free Software
19
+ # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
20
+ # MA 02110-1301 USA
21
+
22
+ module Storexplore
23
+
24
+ class Dsl
25
+
26
+ def self.walker_builder(&block)
27
+ new.tap do |dsl|
28
+ dsl.instance_eval(&block)
29
+ end
30
+ end
31
+
32
+ def initialize()
33
+ @scrap_attributes_block = lambda do |_| {} end
34
+ @categories_digger = NullDigger.new
35
+ @items_digger = NullDigger.new
36
+ end
37
+
38
+ def attributes(&block)
39
+ @scrap_attributes_block = block
40
+ end
41
+
42
+ def categories(selector, &block)
43
+ @categories_digger = Digger.new(selector, Dsl.walker_builder(&block))
44
+ end
45
+
46
+ def items(selector, &block)
47
+ @items_digger = Digger.new(selector, Dsl.walker_builder(&block))
48
+ end
49
+
50
+ def new_walker(page_getter, father = nil, index = nil)
51
+ Walker.new(page_getter).tap do |walker|
52
+ walker.categories_digger = @categories_digger
53
+ walker.items_digger = @items_digger
54
+ walker.scrap_attributes_block = @scrap_attributes_block
55
+ walker.father = father
56
+ walker.index = index
57
+ end
58
+ end
59
+ end
60
+
61
+ end
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # dummy_store_api.rb
4
4
  #
5
- # Copyright (c) 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -24,7 +24,7 @@ require_relative 'dummy_store_constants'
24
24
  module Storexplore
25
25
  module Testing
26
26
 
27
- Storexplore::define_api DummyStoreConstants::NAME do
27
+ Storexplore::Api.define DummyStoreConstants::NAME do
28
28
 
29
29
  categories 'a.category' do
30
30
  attributes do
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # version.rb
4
4
  #
5
- # Copyright (c) 2010, 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2010, 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -20,5 +20,5 @@
20
20
  # MA 02110-1301 USA
21
21
 
22
22
  module Storexplore
23
- VERSION = "0.0.1"
23
+ VERSION = "0.1.0"
24
24
  end
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # walker_page.rb
4
4
  #
5
- # Copyright (c) 2010, 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2010, 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -21,19 +21,6 @@
21
21
 
22
22
  require 'mechanize'
23
23
 
24
- # monkey patch to avoid a regex uri encoding error when importing
25
- # incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string) (Encoding::CompatibilityError)
26
- # /home/philou/.rbenv/versions/1.9.3-p194/lib/ruby/1.9.1/webrick/httputils.rb:353:in `gsub'
27
- # /home/philou/.rbenv/versions/1.9.3-p194/lib/ruby/1.9.1/webrick/httputils.rb:353:in `_escape'
28
- # /home/philou/.rbenv/versions/1.9.3-p194/lib/ruby/1.9.1/webrick/httputils.rb:363:in `escape'
29
- # from uri method
30
- require "webrick/httputils"
31
- module WEBrick::HTTPUtils
32
- def self.escape(s)
33
- URI.escape(s)
34
- end
35
- end
36
-
37
24
  module Storexplore
38
25
 
39
26
  class WalkerPage
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # api_spec.rb
4
4
  #
5
- # Copyright (C) 2010, 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (C) 2010, 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -25,20 +25,27 @@ module Storexplore
25
25
 
26
26
  describe Api do
27
27
 
28
- before :each do
29
- Api.register_builder(my_store = "www.my-store.com", builder = double(ApiBuilder.class))
30
- @url = "http://#{my_store}"
31
- WalkerPage.stub(:open).with(@url).and_return(walker = double(WalkerPage))
32
- builder.stub(:new).with(walker).and_return(@store_api = double(ApiBuilder))
28
+ before :all do
29
+ Storexplore::Api.define 'cats' do
30
+ attributes do
31
+ {animal: :cats}
32
+ end
33
+ end
33
34
  end
34
35
 
35
36
  it "select the good store items api builder to browse a store" do
36
- expect(Api.browse(@url)).to eq @store_api
37
+ expect(Api.browse("http://www.cats.net").attributes[:animal]).to eq(:cats)
37
38
  end
38
39
 
39
40
  it "fails when it does not know how to browse a store" do
40
41
  expect(lambda { Api.browse("http://unknown.store.com") }).to raise_error(NotImplementedError)
41
42
  end
42
43
 
44
+ it "allows to unregister an installed api (mostly for testing)" do
45
+ Api.undef 'cats'
46
+
47
+ expect(lambda { Api.browse("http://www.cats.com") }).to raise_error(NotImplementedError)
48
+ end
49
+
43
50
  end
44
51
  end
@@ -0,0 +1,215 @@
1
+ # -*- encoding: utf-8 -*-
2
+ #
3
+ # dsl_spec.rb
4
+ #
5
+ # Copyright (C) 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
+ #
7
+ # This library is free software; you can redistribute it and/or
8
+ # modify it under the terms of the GNU Lesser General Public
9
+ # License as published by the Free Software Foundation; either
10
+ # version 3.0 of the License, or (at your option) any later version.
11
+ #
12
+ # This library is distributed in the hope that it will be useful,
13
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15
+ # Lesser General Public License for more details.
16
+ #
17
+ # You should have received a copy of the GNU Lesser General Public
18
+ # License along with this library; if not, write to the Free Software
19
+ # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
20
+ # MA 02110-1301 USA
21
+
22
+ require "spec_helper"
23
+
24
+ module Storexplore
25
+
26
+ describe Dsl do
27
+
28
+ def browse
29
+ @walker = Storexplore::Api.browse("http://www.cats-surplus.com")
30
+ end
31
+
32
+ after :each do
33
+ Storexplore::Api.undef 'cats'
34
+ end
35
+
36
+ context 'a simple store' do
37
+ before :each do
38
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com", content_type: 'text/html', body: <<-eos)
39
+ <html>
40
+ <body>
41
+ <a href="category1.html" class="category">Cats with fur</a>
42
+ <a href="category2.html" class="category">Naked cats</a>
43
+ <a href="category3.html" class="category">Cats with feathers</a>
44
+
45
+ <a href="item1.html" class="item">The first thing we sell</a>
46
+ <a href="item2.html" class="item">The second thing we sell</a>
47
+ <a href="legal.html" class="legal">How we sell it</a>
48
+ </body>
49
+ </html>
50
+ eos
51
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/category1.html", content_type: 'text/html', body: <<-eos)
52
+ <html>
53
+ <body>
54
+ <a href="category4.html" class="sub-category">Cats with red fur</a>
55
+ <a href="category5.html" class="sub-category">Cats with green fur</a>
56
+
57
+ <a href="item3.html" class="item">The first thing we sell</a>
58
+ <a href="item4.html" class="item">The second thing we sell</a>
59
+ <a href="item5.html" class="item">The second thing we sell</a>
60
+ </body>
61
+ </html>
62
+ eos
63
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/category2.html", content_type: 'text/html', body: "")
64
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/category3.html", content_type: 'text/html', body: "")
65
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/category4.html", content_type: 'text/html', body: "")
66
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/category5.html", content_type: 'text/html', body: "")
67
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/item1.html", content_type: 'text/html', body: "")
68
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/item2.html", content_type: 'text/html', body: "")
69
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/item3.html", content_type: 'text/html', body: "")
70
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/item4.html", content_type: 'text/html', body: "")
71
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/item5.html", content_type: 'text/html', body: "")
72
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/legal.html", content_type: 'text/html', body: "")
73
+
74
+ Storexplore::Api.define 'cats' do
75
+ items 'a.item' do
76
+ attributes do
77
+ { page: page,
78
+ this_is_the: :item}
79
+ end
80
+ end
81
+ categories 'a.category' do
82
+ categories 'a.sub-category' do
83
+ end
84
+ items 'a.item' do
85
+ attributes do
86
+ raise WalkerPageError.new("Dummy error message")
87
+ end
88
+ end
89
+ attributes do
90
+ { page: page,
91
+ this_is_the: :category}
92
+ end
93
+ end
94
+ attributes do
95
+ { page: page,
96
+ this_is_the: :root}
97
+ end
98
+ end
99
+
100
+ browse
101
+ end
102
+
103
+ it "root walker has the uri of the home page" do
104
+ expect(@walker.uri).to eq URI("http://www.cats-surplus.com")
105
+ end
106
+
107
+ it "sub walkers have the uri of their page" do
108
+ expect(@walker.items.first.uri).to eq URI("http://www.cats-surplus.com/item1.html")
109
+ end
110
+
111
+ it "root walker title is the store uri" do
112
+ expect(@walker.title).to eq "http://www.cats-surplus.com"
113
+ end
114
+
115
+ it "sub walkers title is the text of its origin link" do
116
+ expect(@walker.items.first.title).to eq "The first thing we sell"
117
+ end
118
+
119
+ it "uses a selector to spot sub items" do
120
+ expect(@walker.items).to have_exactly(2).items
121
+ end
122
+ it "uses a selector to spot sub categories" do
123
+ expect(@walker.categories).to have_exactly(3).categories
124
+ end
125
+ it "root walker can have attributes" do
126
+ expect(@walker.attributes[:this_is_the]).to eq :root
127
+ expect(@walker.attributes[:page]).to be_instance_of(WalkerPage)
128
+ end
129
+ it "categories can have sub categories" do
130
+ expect(@walker.categories.first.categories).to have_exactly(2).categories
131
+ end
132
+ it "categories can have sub items" do
133
+ expect(@walker.categories.first.items).to have_exactly(3).items
134
+ end
135
+ it "categories can have attributes" do
136
+ expect(@walker.categories.first.attributes[:this_is_the]).to eq :category
137
+ expect(@walker.categories.first.attributes[:page]).to be_instance_of(WalkerPage)
138
+ end
139
+ it "items can have attributes" do
140
+ expect(@walker.items.first.attributes[:this_is_the]).to eq :item
141
+ expect(@walker.items.first.attributes[:page]).to be_instance_of(WalkerPage)
142
+ end
143
+
144
+
145
+ context "when troubleshooting" do
146
+
147
+ before :each do
148
+ @sub_walker = @walker.categories.first.items.drop(1).first
149
+ end
150
+
151
+ it "walkers have an index" do
152
+ expect(@sub_walker.index).to eq 1
153
+ end
154
+
155
+ it "has a meaningfull string representation" do
156
+ expect(@sub_walker.to_s).to include(Walker.to_s)
157
+ expect(@sub_walker.to_s).to include("##{@sub_walker.index}")
158
+ expect(@sub_walker.to_s).to include("@#{@sub_walker.uri}")
159
+ end
160
+
161
+ it "has a full genealogy" do
162
+ genealogy = @sub_walker.genealogy.split("\n")
163
+
164
+ expect(genealogy).to eq [@walker.to_s, @walker.categories.first.to_s, @sub_walker.to_s]
165
+ end
166
+
167
+ it "wraps parsing errors with debug errors" do
168
+ expect(lambda { @sub_walker.attributes }).to raise_error(BrowsingError, "Dummy error message\n#{@sub_walker.genealogy}")
169
+ end
170
+ end
171
+
172
+ end
173
+
174
+ context 'a redirected home page' do
175
+ before :each do
176
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com", status: [301, "Moved Permanently"], location: "http://www.cats-surplus.com/index.html")
177
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com/index.html", content_type: 'text/html', body: "")
178
+
179
+ Storexplore::Api.define 'cats' do
180
+ end
181
+
182
+ browse
183
+ end
184
+
185
+ it "root walker has the uri of finaly page" do
186
+ expect(@walker.uri).to eq URI("http://www.cats-surplus.com/index.html")
187
+ end
188
+
189
+ it "root walker title is the store uri" do
190
+ expect(@walker.title).to eq "http://www.cats-surplus.com"
191
+ end
192
+ end
193
+
194
+ context 'an empty store' do
195
+ before :each do
196
+ FakeWeb.register_uri(:get, "http://www.cats-surplus.com", content_type: 'text/html', body: "")
197
+
198
+ Storexplore::Api.define 'cats' do
199
+ end
200
+
201
+ browse
202
+ end
203
+
204
+ it "has no items" do
205
+ expect(@walker.items).to be_empty
206
+ end
207
+ it "has no sub categories" do
208
+ expect(@walker.categories).to be_empty
209
+ end
210
+ it "has no sub attributes" do
211
+ expect(@walker.attributes).to be_empty
212
+ end
213
+ end
214
+ end
215
+ end
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # dummy_store_api_spec.rb
4
4
  #
5
- # Copyright (c) 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -59,19 +59,28 @@ module Storexplore
59
59
  end
60
60
 
61
61
  it "should use constant memory" do
62
- warm_up_measure = memory_usage_for_items(1)
62
+ FEW = 10
63
+ MANY = 100
64
+ RUNS = 2
63
65
 
64
- small_inputs_memory = memory_usage_for_items(1)
65
- large_inputs_memory = memory_usage_for_items(200)
66
+ many_inputs_memory = memory_usage_for_items(MANY, RUNS)
67
+ few_inputs_memory = memory_usage_for_items(FEW, RUNS)
66
68
 
67
- expect(large_inputs_memory).to be <= small_inputs_memory * 1.25
69
+ slope = (many_inputs_memory - few_inputs_memory) / (MANY - FEW)
70
+
71
+ zero_inputs_memory = few_inputs_memory - FEW * slope
72
+
73
+ expect(slope).to be_within(zero_inputs_memory * 0.05).of(0.0)
68
74
  end
69
75
 
70
- def memory_usage_for_items(item_count)
76
+ def memory_usage_for_items(item_count, runs)
71
77
  generate_store(store_name = "www.spec-perf-store.com", item_count)
72
- memory_peak_of do
73
- walk_store(store_name)
78
+ data = runs.times.map do
79
+ memory_peak_of do
80
+ walk_store(store_name)
81
+ end
74
82
  end
83
+ mean(data)
75
84
  end
76
85
 
77
86
  def memory_peak_of
@@ -95,25 +104,38 @@ module Storexplore
95
104
  end
96
105
 
97
106
  def current_living_objects
98
- GC.start
99
107
  object_counts = ObjectSpace.count_objects
100
108
  object_counts[:TOTAL] - object_counts[:FREE]
101
109
  end
102
110
 
103
111
  def walk_store(store_name)
104
112
  new_store(store_name).categories.each do |category|
105
- @title = category.title
106
- @attributes = category.attributes
113
+ register(category)
114
+
107
115
  category.categories.each do |sub_category|
108
- @title = sub_category.title
109
- @attributes = sub_category.attributes
110
- category.items.each do |item|
111
- @title = item.title
112
- @attributes = item.attributes
116
+ register(sub_category)
117
+
118
+ sub_category.items.each do |item|
119
+ register(item)
113
120
  end
114
121
  end
115
122
  end
116
123
  end
124
+
125
+ def register(store_node)
126
+ @title = store_node.title
127
+ @attributes = store_node.attributes
128
+
129
+ # No GC is explicitly done, because:
130
+ # - large inputs forces it anyway
131
+ # - it greatly slows tests
132
+ # - GCing should not change the complexity of the system
133
+ # GC.start
134
+ end
135
+
136
+ def mean(data)
137
+ data.reduce(:+)/data.size
138
+ end
117
139
  end
118
140
 
119
141
  end
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # walker_page_spec.rb
4
4
  #
5
- # Copyright (c) 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2011, 2012, 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -23,7 +23,6 @@ require 'spec_helper'
23
23
 
24
24
  module Storexplore
25
25
 
26
- # @integration
27
26
  describe WalkerPage, slow: true do
28
27
 
29
28
  before :each do
data/spec/spec_helper.rb CHANGED
@@ -2,7 +2,7 @@
2
2
  #
3
3
  # spec_helper.rb
4
4
  #
5
- # Copyright (c) 2013 by Philippe Bourgau. All rights reserved.
5
+ # Copyright (c) 2013, 2014 by Philippe Bourgau. All rights reserved.
6
6
  #
7
7
  # This library is free software; you can redistribute it and/or
8
8
  # modify it under the terms of the GNU Lesser General Public
@@ -19,10 +19,20 @@
19
19
  # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
20
20
  # MA 02110-1301 USA
21
21
 
22
+ require 'fakeweb'
22
23
  require 'spec_combos'
23
24
  require 'storexplore'
24
25
  require 'storexplore/testing'
25
26
 
27
+ # Dummy store generation
26
28
  Storexplore::Testing.config do |config|
27
29
  config.dummy_store_generation_dir= File.join(File.dirname(__FILE__), '../tmp')
28
30
  end
31
+
32
+ # Clean up fakeweb registry after every test
33
+ FakeWeb.allow_net_connect = false
34
+ RSpec.configure do |config|
35
+ config.after(:each) do
36
+ FakeWeb.clean_registry
37
+ end
38
+ end
data/storexplore.gemspec CHANGED
@@ -24,4 +24,5 @@ Gem::Specification.new do |spec|
24
24
  spec.add_development_dependency "rake", "~> 10.1"
25
25
  spec.add_development_dependency "guard-rspec", "~> 4.0"
26
26
  spec.add_development_dependency "spec_combos", "~> 0.2"
27
+ spec.add_development_dependency "fakeweb", "~> 1.3"
27
28
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: storexplore
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.1
4
+ version: 0.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Philou
@@ -31,7 +31,7 @@ cert_chain:
31
31
  yLcl1cmm5ALtJ/+Bkkmp0i4amXeTDMvq9r8PBsVsQwxYOYJBP+Umxz3PX6HjFHrQ
32
32
  XdkXx3oZ
33
33
  -----END CERTIFICATE-----
34
- date: 2013-12-29 00:00:00.000000000 Z
34
+ date: 2014-01-14 00:00:00.000000000 Z
35
35
  dependencies:
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: mechanize
@@ -103,6 +103,20 @@ dependencies:
103
103
  - - ~>
104
104
  - !ruby/object:Gem::Version
105
105
  version: '0.2'
106
+ - !ruby/object:Gem::Dependency
107
+ name: fakeweb
108
+ requirement: !ruby/object:Gem::Requirement
109
+ requirements:
110
+ - - ~>
111
+ - !ruby/object:Gem::Version
112
+ version: '1.3'
113
+ type: :development
114
+ prerelease: false
115
+ version_requirements: !ruby/object:Gem::Requirement
116
+ requirements:
117
+ - - ~>
118
+ - !ruby/object:Gem::Version
119
+ version: '1.3'
106
120
  description: A declarative scrapping DSL that lets one define directory like apis
107
121
  to an online store
108
122
  email:
@@ -113,6 +127,7 @@ extra_rdoc_files: []
113
127
  files:
114
128
  - .gitignore
115
129
  - .rspec
130
+ - .travis.yml
116
131
  - Gemfile
117
132
  - Guardfile
118
133
  - LICENSE
@@ -120,10 +135,10 @@ files:
120
135
  - Rakefile
121
136
  - lib/storexplore.rb
122
137
  - lib/storexplore/api.rb
123
- - lib/storexplore/api_builder.rb
124
138
  - lib/storexplore/array_utils.rb
125
139
  - lib/storexplore/browsing_error.rb
126
140
  - lib/storexplore/digger.rb
141
+ - lib/storexplore/dsl.rb
127
142
  - lib/storexplore/hash_utils.rb
128
143
  - lib/storexplore/null_digger.rb
129
144
  - lib/storexplore/testing.rb
@@ -141,14 +156,12 @@ files:
141
156
  - lib/storexplore/walker.rb
142
157
  - lib/storexplore/walker_page.rb
143
158
  - lib/storexplore/walker_page_error.rb
144
- - spec/lib/storexplore/api_builder_spec.rb
145
159
  - spec/lib/storexplore/api_spec.rb
146
- - spec/lib/storexplore/digger_spec.rb
160
+ - spec/lib/storexplore/dsl_spec.rb
147
161
  - spec/lib/storexplore/store_walker_page_spec_fixture.html
148
162
  - spec/lib/storexplore/testing/dummy_store_api_spec.rb
149
163
  - spec/lib/storexplore/uri_utils_spec.rb
150
164
  - spec/lib/storexplore/walker_page_spec.rb
151
- - spec/lib/storexplore/walker_spec.rb
152
165
  - spec/spec_helper.rb
153
166
  - storexplore.gemspec
154
167
  homepage: https://github.com/philou/storexplore
@@ -176,12 +189,10 @@ signing_key:
176
189
  specification_version: 4
177
190
  summary: Online store scraping library
178
191
  test_files:
179
- - spec/lib/storexplore/api_builder_spec.rb
180
192
  - spec/lib/storexplore/api_spec.rb
181
- - spec/lib/storexplore/digger_spec.rb
193
+ - spec/lib/storexplore/dsl_spec.rb
182
194
  - spec/lib/storexplore/store_walker_page_spec_fixture.html
183
195
  - spec/lib/storexplore/testing/dummy_store_api_spec.rb
184
196
  - spec/lib/storexplore/uri_utils_spec.rb
185
197
  - spec/lib/storexplore/walker_page_spec.rb
186
- - spec/lib/storexplore/walker_spec.rb
187
198
  - spec/spec_helper.rb
metadata.gz.sig CHANGED
Binary file
@@ -1,68 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- #
3
- # api_builder.rb
4
- #
5
- # Copyright (c) 2011, 2012, 2013 by Philippe Bourgau. All rights reserved.
6
- #
7
- # This library is free software; you can redistribute it and/or
8
- # modify it under the terms of the GNU Lesser General Public
9
- # License as published by the Free Software Foundation; either
10
- # version 3.0 of the License, or (at your option) any later version.
11
- #
12
- # This library is distributed in the hope that it will be useful,
13
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15
- # Lesser General Public License for more details.
16
- #
17
- # You should have received a copy of the GNU Lesser General Public
18
- # License along with this library; if not, write to the Free Software
19
- # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
20
- # MA 02110-1301 USA
21
-
22
- module Storexplore
23
-
24
- class ApiBuilder
25
-
26
- def self.define(api_class, digger_class, &block)
27
- new(api_class, digger_class).tap do |result|
28
- result.instance_eval(&block)
29
- end
30
- end
31
-
32
- def initialize(api_class, digger_class)
33
- @api_class = api_class
34
- @digger_class = digger_class
35
- @scrap_attributes_block = lambda do {} end
36
- @categories_digger = NullDigger.new
37
- @items_digger = NullDigger.new
38
- end
39
-
40
- def attributes(&block)
41
- @scrap_attributes_block = block
42
- end
43
-
44
- def categories(selector, &block)
45
- @categories_digger = @digger_class.new(selector, ApiBuilder.define(@api_class, @digger_class, &block))
46
- end
47
-
48
- def items(selector, &block)
49
- @items_digger = @digger_class.new(selector, ApiBuilder.define(@api_class, @digger_class, &block))
50
- end
51
-
52
- def new(page_getter, father = nil, index = nil)
53
- @api_class.new(page_getter).tap do |result|
54
- result.categories_digger = @categories_digger
55
- result.items_digger = @items_digger
56
- result.scrap_attributes_block = @scrap_attributes_block
57
- result.father = father
58
- result.index = index
59
- end
60
- end
61
- end
62
-
63
- def self.define_api(name, &block)
64
- builder = ApiBuilder.define(Walker, Digger, &block)
65
-
66
- Api.register_builder(name, builder)
67
- end
68
- end
@@ -1,99 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- #
3
- # api_builder_spec.rb
4
- #
5
- # Copyright (c) 2012, 2013 by Philippe Bourgau. All rights reserved.
6
- #
7
- # This library is free software; you can redistribute it and/or
8
- # modify it under the terms of the GNU Lesser General Public
9
- # License as published by the Free Software Foundation; either
10
- # version 3.0 of the License, or (at your option) any later version.
11
- #
12
- # This library is distributed in the hope that it will be useful,
13
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15
- # Lesser General Public License for more details.
16
- #
17
- # You should have received a copy of the GNU Lesser General Public
18
- # License along with this library; if not, write to the Free Software
19
- # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
20
- # MA 02110-1301 USA
21
-
22
- require "spec_helper"
23
-
24
- module Storexplore
25
-
26
- describe ApiBuilder do
27
-
28
- before :each do
29
- @url = "http://www.mega-store.com"
30
- @api = double("Store api").as_null_object
31
- @api_class = double("Store api class")
32
- @api_class.stub(:new).with(@url).and_return(@api)
33
-
34
- @selector = "a.child"
35
- @digger = double("Digger")
36
- @digger_class = double("Digger class")
37
- end
38
-
39
- context "using define method" do
40
- it "creates new store api" do
41
- @builder = ApiBuilder.define(@api_class, Digger) { }
42
-
43
- expect(@builder.new(@url)).to eq @api
44
- end
45
-
46
- it "initializes nested definition through its block" do
47
- ApiBuilder.stub(:new).and_return(builder = double(ApiBuilder))
48
-
49
- expect(builder).to receive(:complex_builder_initialization)
50
-
51
- ApiBuilder.define(@api_class, Digger) do
52
- complex_builder_initialization
53
- end
54
- end
55
- end
56
-
57
- context "when nesting definitions" do
58
-
59
- before :each do
60
- @builder = ApiBuilder.new(@api_class, @digger_class)
61
- end
62
-
63
- after :each do
64
- @builder.new(@url)
65
- end
66
-
67
- [:categories, :items].each do |sub_definition|
68
-
69
- before :each do
70
- ApiBuilder.stub(:new).and_return(@sub_builder = double(ApiBuilder))
71
- @digger_class.stub(:new).with(@selector, @sub_builder).and_return(@digger)
72
- end
73
-
74
- it "tells the store api how to find sub #{sub_definition}" do
75
- expect(@api).to receive("#{sub_definition}_digger=").with(@digger)
76
-
77
- @builder.send(sub_definition, @selector) do end
78
- end
79
-
80
- it "initialises the sub #{sub_definition} builder" do
81
- expect(@sub_builder).to receive(:sub_builder_initialization)
82
-
83
- @builder.send(sub_definition, @selector) do
84
- sub_builder_initialization
85
- end
86
- end
87
- end
88
-
89
- it "tells the store api how to parse attributes" do
90
- scrap_attributes_block = Proc.new { |page| @scrap_attributes_block_is_unique = true }
91
-
92
- expect(@api).to receive(:scrap_attributes_block=).with(scrap_attributes_block)
93
-
94
- @builder.attributes(&scrap_attributes_block)
95
- end
96
-
97
- end
98
- end
99
- end
@@ -1,53 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- #
3
- # digger_spec.rb
4
- #
5
- # Copyright (c) 2012, 2013 by Philippe Bourgau. All rights reserved.
6
- #
7
- # This library is free software; you can redistribute it and/or
8
- # modify it under the terms of the GNU Lesser General Public
9
- # License as published by the Free Software Foundation; either
10
- # version 3.0 of the License, or (at your option) any later version.
11
- #
12
- # This library is distributed in the hope that it will be useful,
13
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15
- # Lesser General Public License for more details.
16
- #
17
- # You should have received a copy of the GNU Lesser General Public
18
- # License along with this library; if not, write to the Free Software
19
- # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
20
- # MA 02110-1301 USA
21
-
22
- require "spec_helper"
23
-
24
- module Storexplore
25
-
26
- describe Digger do
27
-
28
- before :each do
29
- @digger = Digger.new(@selector = "a.items", @factory = double("Sub walker factory"))
30
- @page = double(WalkerPage)
31
- @page.stub(:search_links).with(@selector).and_return(@links = [double("Link"),double("Link")])
32
- end
33
-
34
- it "creates sub walkers for each link it finds" do
35
- @links.each do |link|
36
- expect(@factory).to receive(:new).with(link, anything, anything)
37
- end
38
-
39
- @digger.sub_walkers(@page, nil).to_a
40
- end
41
-
42
- it "for debugging purpose, provides father walker and link index to sub walkers" do
43
- father = double("Father walker")
44
-
45
- @links.each_with_index do |link, index|
46
- expect(@factory).to receive(:new).with(link, father, index)
47
- end
48
-
49
- @digger.sub_walkers(@page, father).to_a
50
- end
51
-
52
- end
53
- end
@@ -1,97 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- #
3
- # walker_spec.rb
4
- #
5
- # Copyright (C) 2012, 2013 by Philippe Bourgau. All rights reserved.
6
- #
7
- # This library is free software; you can redistribute it and/or
8
- # modify it under the terms of the GNU Lesser General Public
9
- # License as published by the Free Software Foundation; either
10
- # version 3.0 of the License, or (at your option) any later version.
11
- #
12
- # This library is distributed in the hope that it will be useful,
13
- # but WITHOUT ANY WARRANTY; without even the implied warranty of
14
- # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
15
- # Lesser General Public License for more details.
16
- #
17
- # You should have received a copy of the GNU Lesser General Public
18
- # License along with this library; if not, write to the Free Software
19
- # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
20
- # MA 02110-1301 USA
21
-
22
- require "spec_helper"
23
-
24
- module Storexplore
25
-
26
- describe Walker do
27
-
28
- before :each do
29
- @page = double("Page", :uri => "http://www.maxi-discount.com")
30
- @page_getter = double("Getter", :get => @page, :text => "Conserves")
31
- @walker = Walker.new(@page_getter)
32
-
33
- @sub_walkers = [double("Sub walker")]
34
- @digger = double(Digger)
35
- @digger.stub(:sub_walkers).with(@page, @walker).and_return(@sub_walkers)
36
- end
37
-
38
- it "has the uri of its page" do
39
- expect(@walker.uri).to eq @page.uri
40
- end
41
-
42
- it "it uses the text of its origin (ex: link) as title" do
43
- expect(@walker.title).to eq @page_getter.text
44
- end
45
-
46
- context "by default" do
47
- it "has no items" do
48
- expect(@walker.items).to be_empty
49
- end
50
- it "has no sub categories" do
51
- expect(@walker.categories).to be_empty
52
- end
53
- it "has no attributes" do
54
- expect(@walker.attributes).to be_empty
55
- end
56
- end
57
-
58
- it "uses its items digger to collect its items" do
59
- @walker.items_digger = @digger
60
-
61
- expect(@walker.items).to eq @sub_walkers
62
- end
63
- it "uses its categories digger to collect its sub categories" do
64
- @walker.categories_digger = @digger
65
-
66
- expect(@walker.categories).to eq @sub_walkers
67
- end
68
- it "uses its scrap attributes block to collect its attributes" do
69
- attributes = { :name => "Candy" }
70
- @walker.scrap_attributes_block = lambda { |page| attributes }
71
-
72
- expect(@walker.attributes).to eq attributes
73
- end
74
-
75
- context "when troubleshooting" do
76
-
77
- it "has a meaningfull string representation" do
78
- walker = Walker.new(@page_getter)
79
- walker.index= 23
80
- expect(walker.to_s).to include(Walker.to_s)
81
- expect(walker.to_s).to include("##{walker.index}")
82
- expect(walker.to_s).to include("@#{walker.uri}")
83
- end
84
- it "has a full genealogy" do
85
- link = double("Link")
86
- link.stub_chain(:get, :uri).and_return(@page.uri + "/viandes")
87
- child_walker = Walker.new(link)
88
- child_walker.index = 12
89
- child_walker.father = @walker
90
-
91
- genealogy = child_walker.genealogy.split("\n")
92
-
93
- expect(genealogy).to eq [@walker.to_s, child_walker.to_s]
94
- end
95
- end
96
- end
97
- end