RubyGems - proto - Versions diffs - 0.0.5 → 0.0.6 - Mend

proto 0.0.5 → 0.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

data/README.md CHANGED

@@ -6,9 +6,9 @@ It is the evolution of [another project](https://github.com/kcurtin/scrape_sourc
 Proto is meant to be lightweight and flexible, the objects you get back inherit from OpenStruct.  New methods can be dynamically added to the objects, you won't ever get method_missing errors, and you can access the data in a bunch of different ways. Check out the documentation for more info: [OpenStruct](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html)
-## Usage
+## Usage
-Create a new Scraper object with the URL you want to scrape data from
+####Scraping a single page
 ```ruby
 proto = Proto::Scraper.new('http://twitter.com/kcurtin')
@@ -20,7 +20,7 @@ proto.inspect
 #=> #<Proto::Scraper:0x007fc6fb852860 @doc=#<Nokogiri::HTML::Document:0x3fe37d0b1634...>
 ```
-Currently, the API is strict. There is a single public method you can call. This method accepts a constant name and a hash as arguments:
+```.fetch``` method accepts a constant name and a hash as arguments:
 ```ruby
 tweets = proto.fetch('Tweet', {:name => 'strong.fullname',
                                :content => 'p.js-tweet-text',
@@ -36,6 +36,21 @@ tweets.inspect
 #=> [#<Proto::Tweet name="Kevin Curtin", content="@cawebs06 just a tad over my head... You guys are smart :)", created_at="11h">,
      #<Proto::Tweet name="Kevin Curtin", content="@garybernhardt awesome, thanks. any plans to be in nyc soon? @FlatironSchool would love to have you stop by. we love DAS", created_at="12h">...]
 ```
+####Scraping multiple pages using an index page
+```ruby
+#index page url
+obj = Proto::Scraper.new('http://jobs.rubynow.com/')
+#selector for the a tags with the links you want to visit
+obj.collect_urls('ul.jobs li h2 a:first')
+#attributes and selectors you want
+jobs = obj.fetch( { :title => 'h2#headline',
+                    :company => 'h2#headline a',
+                    :location => 'h3#location',
+                    :type => 'strong:last',
+                    :description => 'div#info' }
+                )
+```
 OpenStruct features:

data/lib/proto/scraper.rb CHANGED

@@ -7,9 +7,9 @@ module Proto
       @doc = Nokogiri::HTML(open(url))
     end
-    def collect_urls(selector)
+    def collect_urls(base_url=self.url, selector)
       @url_collection = doc.css(selector).map do |link|
-        "#{url}#{link['href']}"
+        "#{base_url}#{link['href']}"
       end
     end
@@ -27,7 +27,7 @@ module Proto
   private
     def scrape_multiple_pages(attributes)
-      url_collection.each_with_object([]).map do |url, hash_array|
+      url_collection.map do |url|
          gather_data(url, attributes)
       end
     end

data/lib/proto/version.rb CHANGED

@@ -1,3 +1,3 @@
 module Proto
-  VERSION = "0.0.5"
+  VERSION = "0.0.6"
 end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: proto
 version: !ruby/object:Gem::Version
-  version: 0.0.5
+  version: 0.0.6
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-11-29 00:00:00.000000000 Z
+date: 2012-12-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec