proto 0.0.5 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -6,9 +6,9 @@ It is the evolution of [another project](https://github.com/kcurtin/scrape_sourc
6
6
 
7
7
  Proto is meant to be lightweight and flexible, the objects you get back inherit from OpenStruct. New methods can be dynamically added to the objects, you won't ever get method_missing errors, and you can access the data in a bunch of different ways. Check out the documentation for more info: [OpenStruct](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html)
8
8
 
9
- ## Usage
9
+ ## Usage
10
10
 
11
- Create a new Scraper object with the URL you want to scrape data from
11
+ ####Scraping a single page
12
12
 
13
13
  ```ruby
14
14
  proto = Proto::Scraper.new('http://twitter.com/kcurtin')
@@ -20,7 +20,7 @@ proto.inspect
20
20
  #=> #<Proto::Scraper:0x007fc6fb852860 @doc=#<Nokogiri::HTML::Document:0x3fe37d0b1634...>
21
21
  ```
22
22
 
23
- Currently, the API is strict. There is a single public method you can call. This method accepts a constant name and a hash as arguments:
23
+ ```.fetch``` method accepts a constant name and a hash as arguments:
24
24
  ```ruby
25
25
  tweets = proto.fetch('Tweet', {:name => 'strong.fullname',
26
26
  :content => 'p.js-tweet-text',
@@ -36,6 +36,21 @@ tweets.inspect
36
36
  #=> [#<Proto::Tweet name="Kevin Curtin", content="@cawebs06 just a tad over my head... You guys are smart :)", created_at="11h">,
37
37
  #<Proto::Tweet name="Kevin Curtin", content="@garybernhardt awesome, thanks. any plans to be in nyc soon? @FlatironSchool would love to have you stop by. we love DAS", created_at="12h">...]
38
38
  ```
39
+ ####Scraping multiple pages using an index page
40
+
41
+ ```ruby
42
+ #index page url
43
+ obj = Proto::Scraper.new('http://jobs.rubynow.com/')
44
+ #selector for the a tags with the links you want to visit
45
+ obj.collect_urls('ul.jobs li h2 a:first')
46
+ #attributes and selectors you want
47
+ jobs = obj.fetch( { :title => 'h2#headline',
48
+ :company => 'h2#headline a',
49
+ :location => 'h3#location',
50
+ :type => 'strong:last',
51
+ :description => 'div#info' }
52
+ )
53
+ ```
39
54
 
40
55
  OpenStruct features:
41
56
 
@@ -7,9 +7,9 @@ module Proto
7
7
  @doc = Nokogiri::HTML(open(url))
8
8
  end
9
9
 
10
- def collect_urls(selector)
10
+ def collect_urls(base_url=self.url, selector)
11
11
  @url_collection = doc.css(selector).map do |link|
12
- "#{url}#{link['href']}"
12
+ "#{base_url}#{link['href']}"
13
13
  end
14
14
  end
15
15
 
@@ -27,7 +27,7 @@ module Proto
27
27
  private
28
28
 
29
29
  def scrape_multiple_pages(attributes)
30
- url_collection.each_with_object([]).map do |url, hash_array|
30
+ url_collection.map do |url|
31
31
  gather_data(url, attributes)
32
32
  end
33
33
  end
@@ -1,3 +1,3 @@
1
1
  module Proto
2
- VERSION = "0.0.5"
2
+ VERSION = "0.0.6"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: proto
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.5
4
+ version: 0.0.6
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-11-29 00:00:00.000000000 Z
12
+ date: 2012-12-05 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec