proto 0.0.5 → 0.0.6

Sign up to get free protection for your applications and to get access to all the features.
data/README.md CHANGED
@@ -6,9 +6,9 @@ It is the evolution of [another project](https://github.com/kcurtin/scrape_sourc
6
6
 
7
7
  Proto is meant to be lightweight and flexible, the objects you get back inherit from OpenStruct. New methods can be dynamically added to the objects, you won't ever get method_missing errors, and you can access the data in a bunch of different ways. Check out the documentation for more info: [OpenStruct](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html)
8
8
 
9
- ## Usage
9
+ ## Usage
10
10
 
11
- Create a new Scraper object with the URL you want to scrape data from
11
+ ####Scraping a single page
12
12
 
13
13
  ```ruby
14
14
  proto = Proto::Scraper.new('http://twitter.com/kcurtin')
@@ -20,7 +20,7 @@ proto.inspect
20
20
  #=> #<Proto::Scraper:0x007fc6fb852860 @doc=#<Nokogiri::HTML::Document:0x3fe37d0b1634...>
21
21
  ```
22
22
 
23
- Currently, the API is strict. There is a single public method you can call. This method accepts a constant name and a hash as arguments:
23
+ ```.fetch``` method accepts a constant name and a hash as arguments:
24
24
  ```ruby
25
25
  tweets = proto.fetch('Tweet', {:name => 'strong.fullname',
26
26
  :content => 'p.js-tweet-text',
@@ -36,6 +36,21 @@ tweets.inspect
36
36
  #=> [#<Proto::Tweet name="Kevin Curtin", content="@cawebs06 just a tad over my head... You guys are smart :)", created_at="11h">,
37
37
  #<Proto::Tweet name="Kevin Curtin", content="@garybernhardt awesome, thanks. any plans to be in nyc soon? @FlatironSchool would love to have you stop by. we love DAS", created_at="12h">...]
38
38
  ```
39
+ ####Scraping multiple pages using an index page
40
+
41
+ ```ruby
42
+ #index page url
43
+ obj = Proto::Scraper.new('http://jobs.rubynow.com/')
44
+ #selector for the a tags with the links you want to visit
45
+ obj.collect_urls('ul.jobs li h2 a:first')
46
+ #attributes and selectors you want
47
+ jobs = obj.fetch( { :title => 'h2#headline',
48
+ :company => 'h2#headline a',
49
+ :location => 'h3#location',
50
+ :type => 'strong:last',
51
+ :description => 'div#info' }
52
+ )
53
+ ```
39
54
 
40
55
  OpenStruct features:
41
56
 
@@ -7,9 +7,9 @@ module Proto
7
7
  @doc = Nokogiri::HTML(open(url))
8
8
  end
9
9
 
10
- def collect_urls(selector)
10
+ def collect_urls(base_url=self.url, selector)
11
11
  @url_collection = doc.css(selector).map do |link|
12
- "#{url}#{link['href']}"
12
+ "#{base_url}#{link['href']}"
13
13
  end
14
14
  end
15
15
 
@@ -27,7 +27,7 @@ module Proto
27
27
  private
28
28
 
29
29
  def scrape_multiple_pages(attributes)
30
- url_collection.each_with_object([]).map do |url, hash_array|
30
+ url_collection.map do |url|
31
31
  gather_data(url, attributes)
32
32
  end
33
33
  end
@@ -1,3 +1,3 @@
1
1
  module Proto
2
- VERSION = "0.0.5"
2
+ VERSION = "0.0.6"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proto
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-11-29 00:00:00.000000000 Z
12
+ date: 2012-12-05 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec