RubyGems - amazon-search - Versions diffs - 1.4.2 → 1.4.4 - Mend

amazon-search 1.4.2 → 1.4.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/{Readme.rdoc → README.md} +13 -10
data/amazon-search.gemspec +2 -2
data/lib/amazon-search.rb +157 -16
metadata +4 -15
data/lib/amazon-search/form.rb +0 -26
data/lib/amazon-search/products.rb +0 -87
data/lib/amazon-search/scan.rb +0 -55
data/test/lib/amazon-search.rb +0 -24
data/test/lib/amazon-search/form.rb +0 -26
data/test/lib/amazon-search/products.rb +0 -87
data/test/lib/amazon-search/scan.rb +0 -57

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 906821671967d660123351f7ab6b0fb00a33fdf6
-  data.tar.gz: 0c737cf462e32ea69d9bfbd4b5f914db3b7b6487
+  metadata.gz: 9e25e25215f49b4726b1e06db9901e738f866993
+  data.tar.gz: fffdba88068487fb177d04b1a3fa66a67a435ea1
 SHA512:
-  metadata.gz: 03955719426cdda4f0cbf0a406dd6933ccdd4caba6d724c6c42182f531841dbfbfc96a71a252f6b27cc1db6e8b0f792d6dfafa3e822d101c0e90e02807d6fc7b
-  data.tar.gz: acb67b2d654f4f4b8e4fbd67e2ce75ad9593d2afc07f309e045fbf83d2780efb62ddd1816fab842dafcfa25305e5274c0c3daa0979ad744b81798555e240ab0a
+  metadata.gz: 61d0e4ce208691c9cc62929bd3bd91dc62ca74bee9b0523a580d80b8ad874bbebb4cf9ec11de7407d6929fbc8173f0943dcfe914dfd295c783a20f30915d2374
+  data.tar.gz: 9bc77c89ddc0e9843e128649925538c069acf75e3f951a7628c1be12ffe5e6401d7a24a079b8c44413bcc608e6e8cefeed58210525e55e434e4da448c1e7b94f

data/{Readme.rdoc → README.md} RENAMED

@@ -1,12 +1,12 @@
-== amazon-search
+# amazon-search
 Amazon Search is a simple Ruby tool to search for Amazon products.
 This tool screenscrapes an Amazon search and returns a hash of the product results. Configuration of Amazon's API is not needed.
-The functionality is centered around mechanize pagination for the screen scraping of nokogiri elements.  XML and CSS selectors are currently being used.  In the event that Amazon updates their site, the selectors will need to be updated.
+The functionality is centered around mechanize pagination for the screen scraping of nokogiri elements.  XPath and CSS selectors are currently being used.  In the event that Amazon updates their site, the selectors will need to be updated.
-== DATA COLLECTED
+## DATA COLLECTED
 * title
 * price
 * stars
@@ -16,13 +16,15 @@ The functionality is centered around mechanize pagination for the screen scrapin
 * seller
-== INSTALLATION
+## INSTALLATION
+```
   $ gem install amazon-search
+```
+## EXAMPLE
-== EXAMPLE
+```ruby
     require 'amazon-search'
     # search for products by string
@@ -37,12 +39,12 @@ The functionality is centered around mechanize pagination for the screen scrapin
     # reference any product by the order it appeared in search results
     $products[0] # => references the first product found in search
     $products[30] # => references the 29th product found in search
-    # reference any product by the order it appeared in search results
-    # and display attributes of that product
+    # display attributes of specific product
     # all available attributes are:
     $products[0][:title] # => the first product's title
@@ -71,10 +73,11 @@ The functionality is centered around mechanize pagination for the screen scrapin
     	puts product[:title]
     	puts product[:stars]
     	# etc ...
+    end
+```
-== MIT LICENSE
+## MIT LICENSE
 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

data/amazon-search.gemspec CHANGED

@@ -2,7 +2,7 @@
 Gem::Specification.new do |gem|
   gem.name        = %q{amazon-search}
-  gem.version     = '1.4.2'
+  gem.version     = '1.4.4'
   gem.date        = '2015-09-19'
   gem.platform = Gem::Platform::RUBY
   gem.required_ruby_version = '>= 1.8'
@@ -14,7 +14,7 @@ Gem::Specification.new do |gem|
   gem.description = "Simple screenscraper to search Amazon and return product titles, urls, image href, etc."
   gem.authors     = ["John Mason"]
   gem.email       = 'mace2345@gmail.com'
-  gem.homepage    = 'https://github.com/m8ss/amazon-search'
+  gem.homepage    = 'https://github.com/m8ss/amazon-search'
   gem.license       = 'MIT'
   gem.add_runtime_dependency('mechanize',    '~> 2.7')

data/lib/amazon-search.rb CHANGED

@@ -1,25 +1,166 @@
-#!/usr/bin/env ruby
 require 'mechanize'
-require 'amazon-search/form'
-require 'amazon-search/scan'
-require 'amazon-search/products'
+# actions of Amazon search
 module Amazon
-    class << self
-        # main method: process Amazon search
-        def search(keywords)
-            $keywords = keywords
+  class << self
+    attr_accessor :products, :title, :price, :stars, :reviews, :seller,
+                  :image_url, :product_url, :product_num
-            set_agent
-            find_form
-            submit_form
-            scan
+    # main method: process Amazon search
+    def search(keywords)
+      @keywords = keywords
+      set_initial_values
+      set_agent
+      find_form
+      submit_form
+      scan
+      $products
+    end
-            $products
-        end
+    def set_initial_values
+      $products = {}
+      @product_num = 0
     end
-end
+    # prepares Mechanize
+    def set_agent
+      @agent = Mechanize.new { |a| a.user_agent_alias = 'Mac Safari' }
+    end
+    # finds Amazon search box
+    def find_form
+      @main_page = @agent.get('http://amazon.com')
+      @search_form = @main_page.form_with :name => 'site-search'
+    end
+    # submits Amazon search box
+    def submit_form
+      @search_form.field_with(:name => 'field-keywords').value = @keywords
+      @current_page = @agent.submit @search_form # submits form
+    end
+    # examine current_pagenum
+    def examine_current_pagenum
+      @current_pagenum =
+        @current_page.search '//*[contains(concat( " ", @class, " " ),
+          concat( " ", "pagnCur", " " ))]'
+      @current_pagenum = @current_pagenum.text.to_i # need integer for checks
+    end
+    # find last page number
+    def find_last_pagenum
+      @last_pagenum =
+       @current_page.search '//*[contains(concat( " ", @class, " " ),
+         concat( " ", "pagnDisabled", " " ))]'
+      @last_pagenum = @last_pagenum.text.to_i # need integer for checks
+    end
+    # load next page
+    def load_next_page
+      examine_current_pagenum # does this need to be here?
+      # find next page link
+      @next_page_link = @current_page.link_with :text => /Next Page/
+      @next_page = @next_page_link.click unless @current_pagenum == @last_pagenum
+      @current_page = @agent.get(@next_page.uri)
+    end
+    # cycle through search result pages and store product html
+    def scan
+      @pages = {}
+      find_last_pagenum
+      @last_pagenum.times do # paginate until on last page.
+        examine_current_pagenum
+        @current_divs = @current_page.search('//li[starts-with(@id, "result")]')
+        @pages[@page_num] = @current_divs # store page results
+        extract_product_data
+        load_next_page
+      end
+      puts "\n(scan complete.)"
+    end
+    # used for checking strings
+    def numeric?(s)
+      !!Float(s) rescue false
+    end
+    # puts product details to console
+    def display_product
+      STDOUT.puts '--' * 50
+      STDOUT.puts "title: \t\t#{@title}"
+      STDOUT.puts "seller: \t#{@seller}"
+      STDOUT.puts "price: \t\t#{@price}"
+      STDOUT.puts "stars: \t\t#{@stars}"
+      STDOUT.puts "reviews: \t#{@reviews}"
+      STDOUT.puts "image url: \t#{@image_href}"
+      STDOUT.puts "product url: \t#{@url}"
+    end
+    # extract product data
+    def extract_product_data
+      # TODO: fix this global variable...
+      # nokogiri syntax is needed when iterating...not mechanize!
+      # extract useful stuff from product html
+      @current_divs.each do |html|
+        # first select raw html
+        title = html.at_css('.s-access-title')
+        seller = html.at_css('.a-row > .a-spacing-none')
+        price = html.at_css('.s-price')
+        stars = html.at_css('.a-icon-star')
+        reviews = html.at_css('span+ .a-text-normal')
+        image_href = html.at_css('.s-access-image')
+        url = html.at_css('.a-row > a')
+        break if title.nil? == true # if it's nil it's prob an ad
+        break if price.nil? == true # no price? prob not worthy item
+        break if stars.nil? == true # no stars? not worth it
+        # extract text and set variables for puts
+        @title = title.text
+        @price = price.text
+        @stars = stars.text
+        @image_href = image_href['src']
+        @url = url['href']
+        # movies sometimes have text in review class
+        if numeric?(reviews.text)
+          @reviews = reviews.text
+        else
+          @reviews = 'Unknown'
+        end
+        if seller.nil? == true # sometimes seller is nil on movies, etc.
+          @seller = 'Unknown'
+        else
+          @seller = seller.text
+        end
+        # don't overload the server
+        sleep(0.05)
+        display_product
+        # store extracted text in products hash
+        # key is product count
+        $products[@product_num] = {
+          :title => @title,
+          :price => @price,
+          :stars => @stars,
+          :reviews => @reviews,
+          :image_href => @image_href,
+          :url => @url,
+          :seller => @seller
+        }
+        @product_num += 1 # ready for next product
+      end
+    end
+  end
+end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: amazon-search
 version: !ruby/object:Gem::Version
-  version: 1.4.2
+  version: 1.4.4
 platform: ruby
 authors:
 - John Mason
@@ -31,16 +31,9 @@ executables: []
 extensions: []
 extra_rdoc_files: []
 files:
-- Readme.rdoc
+- README.md
 - amazon-search.gemspec
 - lib/amazon-search.rb
-- lib/amazon-search/form.rb
-- lib/amazon-search/products.rb
-- lib/amazon-search/scan.rb
-- test/lib/amazon-search.rb
-- test/lib/amazon-search/form.rb
-- test/lib/amazon-search/products.rb
-- test/lib/amazon-search/scan.rb
 homepage: https://github.com/m8ss/amazon-search
 licenses:
 - MIT
@@ -61,12 +54,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.4.6
+rubygems_version: 2.4.5
 signing_key:
 specification_version: 4
 summary: A simple screenscraper to search Amazon
-test_files:
-- test/lib/amazon-search.rb
-- test/lib/amazon-search/form.rb
-- test/lib/amazon-search/products.rb
-- test/lib/amazon-search/scan.rb
+test_files: []

data/lib/amazon-search/form.rb DELETED

@@ -1,26 +0,0 @@
-require 'mechanize'
-require_relative './scan'
-require_relative './products'
-module Amazon
-    class << self
-        # prepares Mechanize
-        def set_agent
-            $agent = Mechanize.new{ |a|  a.user_agent_alias = "Mac Safari"}
-        end
-        # finds Amazon search box
-        def find_form
-            $main_page = $agent.get("http://amazon.com")
-            $search_form = $main_page.form_with :name => "site-search"
-        end
-        # submits Amazon search box
-        def submit_form
-            $search_form.field_with(:name => "field-keywords").value = $keywords # sets value of search box
-            $current_page =  $agent.submit $search_form # submits form
-        end
-    end
-end

data/lib/amazon-search/products.rb DELETED

@@ -1,87 +0,0 @@
-require 'mechanize'
-require_relative './scan'
-require_relative './form'
-module Amazon
-    class << self
-        $products = {}
-        $product_num = 0
-        # used for checking strings
-        def is_numeric?(s)
-         !!Float(s) rescue false
-        end
-        # puts product details to console
-        def display_product
-            STDOUT.puts "--"*50
-            STDOUT.puts "title: \t\t#{$title}"
-            STDOUT.puts "seller: \t#{$seller}"
-            STDOUT.puts "price: \t\t#{$price}"
-            STDOUT.puts "stars: \t\t#{$stars}"
-            STDOUT.puts "reviews: \t#{$reviews}"
-            STDOUT.puts "image url: \t#{$image_href}"
-            STDOUT.puts "product url: \t#{$url}"
-        end
-        # extract product data
-        def extract_product_data
-            # nokogiri syntax is needed when iterating...not mechanize!
-            # extract useful stuff from product html
-            $current_divs.each do |html|
-                # first select raw html
-                title = html.at_css(".s-access-title")
-                seller = html.at_css(".a-row > .a-spacing-none")
-                price = html.at_css(".s-price")
-                stars = html.at_css(".a-icon-star")
-                reviews = html.at_css("span+ .a-text-normal")
-                image_href = html.at_css(".s-access-image")
-                url = html.at_css(".a-row > a")
-                break if title == nil # if it's nil it's prob an ad
-                break if price == nil # no price? prob not worthy item
-                break if stars == nil # no stars? not worth it
-                # extract text and set variables for puts
-                $title = title.text
-                $price = price.text
-                $stars = stars.text
-                $image_href = image_href['src']
-                $url = url['href']
-                # movies sometimes have text in review class
-                if is_numeric?(reviews.text)
-                    $reviews = reviews.text
-                else
-                    $reviews = "Unknown"
-                end
-                if seller == nil # sometimes seller is nil on movies, etc.
-                    $seller = "Unknown"
-                else
-                    $seller = seller.text
-                end
-                # don't overload the server
-                sleep(0.05)
-                display_product
-                # store extracted text in products hash
-                # key is product count
-                $products[$product_num] = {
-                    title: $title,
-                    price: $price,
-                    stars: $stars,
-                    reviews: $reviews,
-                    image_href: $image_href,
-                    url: $url,
-                    seller: $seller,
-                }
-                $product_num +=1 # ready for next product
-            end
-        end
-    end
-end

data/lib/amazon-search/scan.rb DELETED

@@ -1,55 +0,0 @@
-require 'mechanize'
-require_relative './products'
-require_relative './form'
-module Amazon
-    class << self
-        # examine current_pagenum
-        def examine_current_pagenum
-            $current_pagenum = $current_page.search '//*[contains(concat( " ", @class, " " ), concat( " ", "pagnCur", " " ))]'
-            $current_pagenum = $current_pagenum.text.to_i # need integer for checks
-        end
-        # find last page number
-        def find_last_pagenum
-            $last_pagenum = $current_page.search '//*[contains(concat( " ", @class, " " ), concat( " ", "pagnDisabled", " " ))]'
-            $last_pagenum = $last_pagenum.text.to_i # need integer for checks
-        end
-        # load next page
-        def load_next_page
-            examine_current_pagenum # does this need to be here?
-            $next_page_link = $current_page.link_with text: /Next Page/ # find next page link
-            $next_page = $next_page_link.click unless $current_pagenum == $last_pagenum # click to next page unless on last page
-            $current_page = $agent.get($next_page.uri)
-        end
-        # cycle through search result pages and store product html
-        def scan
-            $pages = {}
-            find_last_pagenum
-            $last_pagenum.times do # paginate until on last page.
-                examine_current_pagenum
-                $current_divs = $current_page.search('//li[starts-with(@id, "result")]')
-                $pages[$page_num] = $current_divs # store page results
-                extract_product_data
-                load_next_page
-            end
-            puts "\n(scan complete.)"
-        end
-    end
-end

data/test/lib/amazon-search.rb DELETED

@@ -1,24 +0,0 @@
-#!/usr/bin/env ruby
-require 'mechanize'
-require './amazon-search/form'
-require './amazon-search/scan'
-require './amazon-search/products'
-module Amazon
-    class << self
-        def search(keywords)
-            $keywords = keywords
-            set_agent
-            find_form
-            submit_form
-            scan
-            $products
-        end
-    end
-end

data/test/lib/amazon-search/form.rb DELETED

@@ -1,26 +0,0 @@
-require 'mechanize'
-require_relative './scan'
-require_relative './products'
-module Amazon
-    class << self
-        # prepares Mechanize
-        def set_agent
-            $agent = Mechanize.new{ |a|  a.user_agent_alias = "Mac Safari"}
-        end
-        # finds Amazon search box
-        def find_form
-            $main_page = $agent.get("http://amazon.com")
-            $search_form = $main_page.form_with :name => "site-search"
-        end
-        # submits Amazon search box
-        def submit_form
-            $search_form.field_with(:name => "field-keywords").value = $keywords # sets value of search box
-            $current_page =  $agent.submit $search_form # submits form
-        end
-    end
-end

data/test/lib/amazon-search/products.rb DELETED

@@ -1,87 +0,0 @@
-require 'mechanize'
-require_relative './scan'
-require_relative './form'
-module Amazon
-    class << self
-        $products = {}
-        $product_num = 0
-        # used for checking strings
-        def is_numeric?(s)
-         !!Float(s) rescue false
-        end
-        # currently not being used and needs adjusting
-        def display_product
-            STDOUT.puts "--"*50
-            STDOUT.puts "title: \t\t#{$title}"
-            STDOUT.puts "seller: \t#{$seller}"
-            STDOUT.puts "price: \t\t#{$price}"
-            STDOUT.puts "stars: \t\t#{$stars}"
-            STDOUT.puts "reviews: \t#{$reviews}"
-            STDOUT.puts "image url: \t#{$image_href}"
-            STDOUT.puts "product url: \t#{$url}"
-        end
-        # extract product data
-        def extract_product_data
-            # nokogiri syntax is needed when iterating...not mechanize!
-            # extract useful stuff from product html
-            $current_divs.each do |html|
-                # first select raw html
-                title = html.at_css(".s-access-title")
-                seller = html.at_css(".a-row > .a-spacing-none")
-                price = html.at_css(".s-price")
-                stars = html.at_css(".a-icon-star")
-                reviews = html.at_css("span+ .a-text-normal")
-                image_href = html.at_css(".s-access-image")
-                url = html.at_css(".a-row > a")
-                break if title == nil # if it's nil it's prob an ad
-                break if price == nil # no price? prob not worthy item
-                break if stars == nil # no stars? not worth it
-                # extract text and set variables for puts
-                $title = title.text
-                $price = price.text
-                $stars = stars.text
-                $image_href = image_href['src']
-                $url = url['href']
-                # movies sometimes have text in review class
-                if is_numeric?(reviews.text)
-                    $reviews = reviews.text
-                else
-                    $reviews = "Unknown"
-                end
-                if seller == nil # sometimes seller is nil on movies, etc.
-                    $seller = "Unknown"
-                else
-                    $seller = seller.text
-                end
-                # don't overload the server
-                sleep(0.05)
-                display_product
-                # store extracted text in products hash
-                # key is product count
-                $products[$product_num] = {
-                    title: $title,
-                    price: $price,
-                    stars: $stars,
-                    reviews: $reviews,
-                    image_href: $image_href,
-                    url: $url,
-                    seller: $seller,
-                }
-                $product_num +=1 # ready for next product
-            end
-        end
-    end
-end

data/test/lib/amazon-search/scan.rb DELETED

@@ -1,57 +0,0 @@
-require 'mechanize'
-require_relative './products'
-require_relative './form'
-module Amazon
-    class << self
-        # examine current_pagenum
-        def examine_current_pagenum
-            $current_pagenum = $current_page.search '//*[contains(concat( " ", @class, " " ), concat( " ", "pagnCur", " " ))]'
-            $current_pagenum = $current_pagenum.text.to_i # need integer for checks
-        end
-        # find last page number
-        def find_last_pagenum
-            $last_pagenum = $current_page.search '//*[contains(concat( " ", @class, " " ), concat( " ", "pagnDisabled", " " ))]'
-            $last_pagenum = $last_pagenum.text.to_i # need integer for checks
-        end
-        # load next page
-        def load_next_page
-            examine_current_pagenum # does this need to be here?
-            $next_page_link = $current_page.link_with text: /Next Page/ # find next page link
-            $next_page = $next_page_link.click unless $current_pagenum == $last_pagenum # click to next page unless on last page
-            $current_page = $agent.get($next_page.uri)
-        end
-        # cycle through search result pages and store product html
-        def scan
-            $pages = {}
-            find_last_pagenum
-            $last_pagenum.times do # paginate until on last page.
-                examine_current_pagenum
-                puts "\nscanning page #{$current_pagenum} of #{$last_pagenum} @ #{$main_page.uri+$current_page.uri}"
-                $current_divs = $current_page.search('//li[starts-with(@id, "result")]')
-                $pages[$page_num] = $current_divs # store page results
-                extract_product_data
-                load_next_page
-            end
-            puts "\n(scan complete.)"
-        end
-    end
-end