craigslist_price_it_right 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.gitignore +10 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +21 -0
- data/README.md +68 -0
- data/Rakefile +2 -0
- data/bin/console +15 -0
- data/bin/craigslist_price_it_right +6 -0
- data/bin/setup +8 -0
- data/config/environment.rb +14 -0
- data/craigslist_price_it_right.gemspec +38 -0
- data/lib/concerns/concerns.rb +114 -0
- data/lib/craigslist_price_it_right.rb +20 -0
- data/lib/craigslist_price_it_right/version.rb +3 -0
- data/lib/craigslist_scraper.rb +97 -0
- data/lib/item.rb +21 -0
- data/lib/price_manager.rb +158 -0
- data/spec.md +18 -0
- metadata +106 -0
checksums.yaml
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
---
|
|
2
|
+
SHA1:
|
|
3
|
+
metadata.gz: 5f92207699472dfd8f70a68969a5e2dd07392fbd
|
|
4
|
+
data.tar.gz: eb06b28a7c66da9e94806661a0a7322cda731439
|
|
5
|
+
SHA512:
|
|
6
|
+
metadata.gz: cd0e215f2ee523078a6f2ae87d6e2e40f66f3575ff5422125400ce73ccf344826705fcd81e014bf2cc07df0523a7821641b396bca217da3f913f43bba895b612
|
|
7
|
+
data.tar.gz: 3844028748e9bb932a4134788aa2352a489bfc068a0e0f7bcff572793a9a4aec07765c89ac156407731be90c8725714bb3dbec2c5a19051e1a039cf8ece71857
|
data/.gitignore
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
The MIT License (MIT)
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2017 zoebisch
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in
|
|
13
|
+
all copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
21
|
+
THE SOFTWARE.
|
data/README.md
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
# CraigslistPriceItRight
|
|
2
|
+
|
|
3
|
+
The Price is Right
|
|
4
|
+
|
|
5
|
+
A CLI to scrape similar sale items from Craigslist to help form a suggested price.
|
|
6
|
+
|
|
7
|
+
***Use this at your own risk! Although I have done my best to shield the user from an IP ban from Craigslist, there is no guarantee this will not happen as CL is rather strict on people scraping information from their sites. If this happens, you can appeal to CL (assuming you are not doing anything commercial with this code) and request a removal of the ban. However, I have been successfully using this tool for some time now and have received no further bans***
|
|
8
|
+
|
|
9
|
+
Future ideas:
|
|
10
|
+
near-term: Identify outliers OR weighted average.
|
|
11
|
+
Advanced analysis and price suggestion.
|
|
12
|
+
Extend search to ALL categories, not just for sale
|
|
13
|
+
wishlist: Classify $1 price (these are sometimes spam or oddball posts)
|
|
14
|
+
Smart search (experiment with advanced search methods)
|
|
15
|
+
crazy wishlist: Make an autopost function!
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
## Installation
|
|
19
|
+
|
|
20
|
+
Add this line to your application's Gemfile:
|
|
21
|
+
|
|
22
|
+
```ruby
|
|
23
|
+
gem 'craigslist_price_it_right'
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
And then execute:
|
|
27
|
+
|
|
28
|
+
$ bundle
|
|
29
|
+
|
|
30
|
+
Or install it yourself as:
|
|
31
|
+
|
|
32
|
+
$ gem install craigslist_price_it_right
|
|
33
|
+
|
|
34
|
+
## Usage
|
|
35
|
+
|
|
36
|
+
Program Flow:
|
|
37
|
+
User selects which Craigslist site to use (copy/paste, type. Default is set in code to https://seattle.craigslist.org)
|
|
38
|
+
User selects Craigslist category from scraped categories
|
|
39
|
+
Program scrapes all potential instances from site
|
|
40
|
+
User enters search item:
|
|
41
|
+
1) This does not have a smart search, it only looks for a match to item string the user selects
|
|
42
|
+
2) Items with no price listing will not have price key
|
|
43
|
+
|
|
44
|
+
Program attempts to form novel price analysis:
|
|
45
|
+
1) Determine low and high pricing.
|
|
46
|
+
2) Determine novel statistics (min, max, mean)
|
|
47
|
+
3) Allow user to specify range and return condensed novel statistics
|
|
48
|
+
|
|
49
|
+
main menu is self-explanatory and interactive
|
|
50
|
+
|
|
51
|
+
## Limitations
|
|
52
|
+
|
|
53
|
+
It should be noted that CL on larger sites appears to pull listings from what is immediately visible at the browser level. I stumbled upon this when one of my listings did not appear in furniture nor in the for sale category. My guess is that stale (or reposts) are only accessible through a search string query and cannot be brought up any other way. In other words a scraping method will always be limited to more current listing. While this does not change the intent or dynamic of the program intent, it does add another future layer that will have to do direct queries for a search string in order to truly pull in all items of that kind. Go ahead and prove this to yourself, that is if you have an old listing. It won't appear in the browse.
|
|
54
|
+
|
|
55
|
+
## Development
|
|
56
|
+
|
|
57
|
+
After checking out the repo, run `bin/setup` to install dependencies. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
|
58
|
+
|
|
59
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
|
60
|
+
|
|
61
|
+
## Contributing
|
|
62
|
+
|
|
63
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/zoebisch/craigslist_price_it_right.
|
|
64
|
+
|
|
65
|
+
|
|
66
|
+
## License
|
|
67
|
+
|
|
68
|
+
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
|
data/Rakefile
ADDED
data/bin/console
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
|
|
3
|
+
require "bundler/setup"
|
|
4
|
+
require_relative '../config/environment'
|
|
5
|
+
require './lib/craigslist_price_it_right'
|
|
6
|
+
|
|
7
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
|
8
|
+
# with your gem easier. You can also use a different console, if you like.
|
|
9
|
+
|
|
10
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
|
11
|
+
# require "pry"
|
|
12
|
+
# Pry.start
|
|
13
|
+
|
|
14
|
+
require "irb"
|
|
15
|
+
IRB.start(__FILE__)
|
data/bin/setup
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
1
|
+
require 'nokogiri'
|
|
2
|
+
require 'open-uri'
|
|
3
|
+
#require 'pry' #uncomment this to enable debugging session
|
|
4
|
+
|
|
5
|
+
require_relative '../lib/concerns/concerns.rb'
|
|
6
|
+
require_relative '../lib/craigslist_price_it_right/version'
|
|
7
|
+
require_relative '../lib/craigslist_price_it_right'
|
|
8
|
+
require_relative '../lib/craigslist_scraper.rb'
|
|
9
|
+
require_relative '../lib/price_manager.rb'
|
|
10
|
+
require_relative '../lib/item.rb'
|
|
11
|
+
require_relative '../lib/concerns/concerns.rb'
|
|
12
|
+
|
|
13
|
+
|
|
14
|
+
puts 'Environment Sucessfully Loaded!'
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# coding: utf-8
|
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
|
4
|
+
require 'craigslist_price_it_right/version'
|
|
5
|
+
|
|
6
|
+
Gem::Specification.new do |spec|
|
|
7
|
+
spec.name = "craigslist_price_it_right"
|
|
8
|
+
spec.version = CraigslistPriceItRight::VERSION
|
|
9
|
+
spec.authors = ["zoebisch"]
|
|
10
|
+
spec.email = ["zoebisch@gmail.com"]
|
|
11
|
+
|
|
12
|
+
spec.summary = "Craigslist scraper program for helping to price an item for sale"
|
|
13
|
+
spec.description = "Program sets up menu from scraped for-sale items, allows user to select a category, drills into second level data and self constructs sub-menu, allows user to search for a for-sale item, returns list of items, can sort by price, sort by price range, third level scraping and auto-object-merge based on CL pid"
|
|
14
|
+
spec.homepage = "https://github.com/zoebisch/craigslist-price-it-right"
|
|
15
|
+
spec.license = "MIT"
|
|
16
|
+
|
|
17
|
+
# Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
|
|
18
|
+
# to allow pushing to a single host or delete this section to allow pushing to any host.
|
|
19
|
+
# if spec.respond_to?(:metadata)
|
|
20
|
+
# spec.metadata['allowed_push_host'] = "TODO: add url for private host"
|
|
21
|
+
# else
|
|
22
|
+
# raise "RubyGems 2.0 or newer is required to protect against " \
|
|
23
|
+
# "public gem pushes."
|
|
24
|
+
# end
|
|
25
|
+
|
|
26
|
+
spec.files = `git ls-files -z`.split("\x0").reject do |f|
|
|
27
|
+
f.match(%r{^(test|spec|features)/})
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
spec.bindir = "exe"
|
|
31
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
|
32
|
+
spec.require_paths = ["lib"]
|
|
33
|
+
|
|
34
|
+
spec.add_development_dependency "bundler", "~> 1.14"
|
|
35
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
|
36
|
+
spec.add_development_dependency "nokogiri", "~> 1.8"
|
|
37
|
+
|
|
38
|
+
end
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
module Concerns
|
|
2
|
+
|
|
3
|
+
module Searchable
|
|
4
|
+
|
|
5
|
+
def search_items
|
|
6
|
+
Item.all.select{|item| yield(item)}
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
def search_by_type
|
|
10
|
+
search_items{|item| item if item.title.include?(self.item)}
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
def search_by_category
|
|
14
|
+
search_items{|item| item if item.category == @category}
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def search_by_pid
|
|
18
|
+
search_items{|item| item.pid == @pid}.first
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def items_with_price
|
|
22
|
+
search_by_type.select{|item| item if item.price}
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def items_in_price_range
|
|
26
|
+
items_with_price.select{|item| item if item.price.between?(@min,@max)}
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
def get_link_from_key
|
|
30
|
+
@site.menu_hash.fetch(self.category)
|
|
31
|
+
end
|
|
32
|
+
|
|
33
|
+
def get_subcategory_info
|
|
34
|
+
@site.submenu_hash.fetch(@subcategory)
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
module Sortable
|
|
40
|
+
|
|
41
|
+
def sort_by_price
|
|
42
|
+
items_with_price.sort{|a,b| a.price <=> b.price}
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
def sort_by_price_in_range
|
|
46
|
+
items_in_price_range.sort{|a,b| a.price <=> b.price}
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def sort_by_location
|
|
50
|
+
search_by_type.sort{|a,b| a.location <=> b.location}
|
|
51
|
+
end
|
|
52
|
+
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
module Printable
|
|
56
|
+
|
|
57
|
+
def print_items_in_category
|
|
58
|
+
search_by_category.each{|item| puts "pid: #{item.pid} :#{item.title} $#{item.price}"}
|
|
59
|
+
puts "There are a total of #{search_by_category.length} items in #{@category}"
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
def print_items_by_price
|
|
63
|
+
sort_by_price.each{|item| puts "pid: #{item.pid} :#{item.title} $#{item.price}"}
|
|
64
|
+
end
|
|
65
|
+
|
|
66
|
+
def print_item_by_pid
|
|
67
|
+
item = search_by_pid
|
|
68
|
+
item.instance_variables.each{|var| puts "#{var.to_s.gsub(/@/,"")}: #{item.instance_variable_get(var)}"} #We cannot know ahead of time which attributes will be populated!
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
def print_basic_stats
|
|
72
|
+
basic_stats{items_with_price}.each_pair{|key,val| puts "#{key} is #{val}"}
|
|
73
|
+
end
|
|
74
|
+
|
|
75
|
+
def print_items_in_range
|
|
76
|
+
sort_by_price_in_range.each{|item| puts "pid: #{item.pid} :#{item.title} $#{item.price}"}
|
|
77
|
+
puts "#{items_in_price_range.length} #{@item} found in #{@category} between $#{@min} and $#{@max}"
|
|
78
|
+
basic_stats{items_in_price_range}.each_pair{|key,val| puts "#{key} is #{val}"}
|
|
79
|
+
end
|
|
80
|
+
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
module Mergable
|
|
84
|
+
|
|
85
|
+
def merge_price_manager_attr
|
|
86
|
+
Item.all.each do |item|
|
|
87
|
+
item.category = @category if item.category == nil
|
|
88
|
+
item.url = @url if item.url == nil
|
|
89
|
+
end
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
def merge_item(pid, item_details)
|
|
93
|
+
item = search_items{|item| item.pid == pid.to_s}.first
|
|
94
|
+
item_details.each_pair{|key,value| item.send("#{key}=", value)}
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
module Statistical
|
|
100
|
+
|
|
101
|
+
def basic_stats
|
|
102
|
+
values = yield.collect{|item| item.price}
|
|
103
|
+
if values != []
|
|
104
|
+
@stats[:volume] = values.count
|
|
105
|
+
@stats[:mean] = values.reduce(:+)/@stats[:volume]
|
|
106
|
+
@stats[:min] = values.min
|
|
107
|
+
@stats[:max] = values.max
|
|
108
|
+
end
|
|
109
|
+
@stats
|
|
110
|
+
end
|
|
111
|
+
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
end
|
|
@@ -0,0 +1,20 @@
|
|
|
1
|
+
require_relative "./craigslist_price_it_right/version"
|
|
2
|
+
|
|
3
|
+
class CraigslistPriceItRight::CLI
|
|
4
|
+
def main_menu
|
|
5
|
+
puts "----------------------------------------------"
|
|
6
|
+
puts " !Welcome to Price it Right!"
|
|
7
|
+
puts " A Friendly Price Scraper for CL"
|
|
8
|
+
puts " To Begin, Let's Set the Default Page."
|
|
9
|
+
puts "Please copy and paste the main CraigsList URL"
|
|
10
|
+
puts " (e.g. https://seattle.craigslist.org)"
|
|
11
|
+
puts "----------------------------------------------"
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
def call
|
|
15
|
+
main_menu
|
|
16
|
+
url = gets.chomp.downcase
|
|
17
|
+
url == "" ? url = "https://seattle.craigslist.org" : url
|
|
18
|
+
PriceManager.new(url).call
|
|
19
|
+
end
|
|
20
|
+
end
|
|
@@ -0,0 +1,97 @@
|
|
|
1
|
+
class CL_Scraper
|
|
2
|
+
USER_AGENT = ["Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36",
|
|
3
|
+
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36",
|
|
4
|
+
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1"]
|
|
5
|
+
PER_PAGE = 120 #CL is broken into 120 items per page. #USER_AGENT pool to pretend we are different browsers on a subnet.
|
|
6
|
+
|
|
7
|
+
attr_accessor :menu_hash, :submenu_hash, :all
|
|
8
|
+
|
|
9
|
+
def initialize(url)
|
|
10
|
+
@all = []
|
|
11
|
+
@url = url if url
|
|
12
|
+
@menu_hash = {}
|
|
13
|
+
@submenu_hash = {}
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def scrape_for_sale_categories
|
|
17
|
+
sss = noko_page.search("#center #sss a")
|
|
18
|
+
sss.each{|category| @menu_hash[category.children.text] = @url + category.attribute("href").text}
|
|
19
|
+
@menu_hash
|
|
20
|
+
end
|
|
21
|
+
|
|
22
|
+
def scrape_second_level_menus(main_category)
|
|
23
|
+
@submenu_hash.clear
|
|
24
|
+
sub_page = noko_page(main_category)
|
|
25
|
+
sub_lists = sub_page.search(".ul")
|
|
26
|
+
sub_headers = sub_page.search("h3")
|
|
27
|
+
sub_headers.each do |header|
|
|
28
|
+
@submenu_hash[header.text.downcase] = {}
|
|
29
|
+
sub_lists.each do |item|
|
|
30
|
+
info = item.search("a")
|
|
31
|
+
@submenu_hash[header.text.downcase][info[0].text] = info.attribute("href").text
|
|
32
|
+
end
|
|
33
|
+
end
|
|
34
|
+
@submenu_hash
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
def scrape_category(category)
|
|
38
|
+
listings = noko_page(category)
|
|
39
|
+
num_listings = listings.search(".totalcount").first.text.to_i
|
|
40
|
+
page_count = 1
|
|
41
|
+
while page_count <= (num_listings/PER_PAGE).floor
|
|
42
|
+
page_url = category + "?s=" + "#{page_count*PER_PAGE}"
|
|
43
|
+
scrape_page(page_url)
|
|
44
|
+
sleep rand(5..8) #Sleep to help avoid CL API from banning IP!
|
|
45
|
+
page_count += 1
|
|
46
|
+
end
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def scrape_page(page_url)
|
|
50
|
+
puts "Scraping #{page_url}"
|
|
51
|
+
listings = noko_page(page_url)
|
|
52
|
+
item_list = listings.search(".rows .result-row")
|
|
53
|
+
item_list.each do |item|
|
|
54
|
+
item_info = {}
|
|
55
|
+
item_info[:pid] = item.attribute("data-pid").text
|
|
56
|
+
item_info[:link] = item.search("a")[1].attribute("href").text
|
|
57
|
+
item_info[:price] = item.search(".result-price").first.text.gsub(/\$/, "").to_i if item.search(".result-price").first != nil
|
|
58
|
+
item_info[:title] = item.search(".result-title").text.downcase
|
|
59
|
+
item_info[:location] = item.search(".result-info .result-meta .result-hood").text if item.search(".result-info .result-meta .result-hood").text != ""
|
|
60
|
+
@all << item_info
|
|
61
|
+
end
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def scrape_by_pid(pid_link)
|
|
65
|
+
puts "Scraping #{pid_link}"
|
|
66
|
+
listing = noko_page(pid_link)
|
|
67
|
+
listing.search(".rows .result-row")
|
|
68
|
+
item_info = {}
|
|
69
|
+
item_info[:postingbody] = listing.search("#postingbody").text
|
|
70
|
+
attrgroup = listing.search(".attrgroup span")
|
|
71
|
+
attrgroup.each do |attribute|
|
|
72
|
+
if attribute.children[1] == nil
|
|
73
|
+
item_info[:year] = attribute.children[0].text #special case, has no associated attrgroup identifier
|
|
74
|
+
else
|
|
75
|
+
if attribute.children[1].text == "\nmore ads by this user "
|
|
76
|
+
item_info[:other_ads] = attrgroup.search("a").attribute("href").text
|
|
77
|
+
elsif attribute.children[0].text == "\n "
|
|
78
|
+
item_info[:venue_date] = attribute.children[1].text
|
|
79
|
+
else
|
|
80
|
+
item_info[info_to_sym(attribute)] = attribute.children[1].text
|
|
81
|
+
end
|
|
82
|
+
end
|
|
83
|
+
end
|
|
84
|
+
item_info[:timeago] = listing.search(".timeago").first.text
|
|
85
|
+
item_info
|
|
86
|
+
end
|
|
87
|
+
|
|
88
|
+
def info_to_sym(attribute)
|
|
89
|
+
base = attribute.children[0].text.split(" ")[0]
|
|
90
|
+
base.include?(":") ? base.gsub(/:/, "").to_sym : base.to_sym
|
|
91
|
+
end
|
|
92
|
+
|
|
93
|
+
def noko_page(page=@url)
|
|
94
|
+
Nokogiri::HTML(open(page, 'User-Agent' => USER_AGENT[rand(0..USER_AGENT.length-1)]))
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
end
|
data/lib/item.rb
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
class Item
|
|
2
|
+
attr_accessor :category, :url, :link, :pid, :title, :price, :condition, :location, :postingbody, :make, :model, :size, :timeago,
|
|
3
|
+
:other_ads, :VIN, :fuel, :paint, :title, :transmission, :drive, :year, :number, :cylinders, :odometer, :venue, :venue_date, :type
|
|
4
|
+
extend Concerns::Searchable
|
|
5
|
+
extend Concerns::Mergable
|
|
6
|
+
@@all = []
|
|
7
|
+
|
|
8
|
+
def initialize(item_hash)
|
|
9
|
+
item_hash.each{|key,value| self.send("#{key}=", value)}
|
|
10
|
+
@@all << self if @@all.none?{|item| item.pid == self.pid}
|
|
11
|
+
end
|
|
12
|
+
|
|
13
|
+
def self.create_from_collection(site_hash)
|
|
14
|
+
site_hash.each{|hash| Item.new(hash)}
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def self.all
|
|
18
|
+
@@all
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
end
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
class PriceManager
|
|
2
|
+
attr_accessor :category, :subcategory, :item, :pid, :min, :max, :stats
|
|
3
|
+
attr_reader :url, :menu, :site
|
|
4
|
+
include Concerns::Searchable
|
|
5
|
+
include Concerns::Sortable
|
|
6
|
+
include Concerns::Printable
|
|
7
|
+
include Concerns::Statistical
|
|
8
|
+
include Concerns::Mergable
|
|
9
|
+
MENU = ["\n",
|
|
10
|
+
"Available Actions:",
|
|
11
|
+
"----------------------------------------------",
|
|
12
|
+
"category -> View and Select Category",
|
|
13
|
+
"view -> View Items in Category",
|
|
14
|
+
"item -> Enter Search Item",
|
|
15
|
+
"price -> View Price Information",
|
|
16
|
+
"range -> View Items in Range",
|
|
17
|
+
"pid -> View Item Advanced Info",
|
|
18
|
+
"q -> Quit",
|
|
19
|
+
"----------------------------------------------",
|
|
20
|
+
"Please type in your selection",
|
|
21
|
+
"----------------------------------------------"]
|
|
22
|
+
|
|
23
|
+
def initialize(url)
|
|
24
|
+
@url = url
|
|
25
|
+
puts "OK, we are working with #{@url}"
|
|
26
|
+
sleep 1
|
|
27
|
+
@site = CL_Scraper.new(@url)
|
|
28
|
+
@site.scrape_for_sale_categories
|
|
29
|
+
@stats = {}
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def call
|
|
33
|
+
process_category
|
|
34
|
+
run = true
|
|
35
|
+
while run
|
|
36
|
+
case actions_menu
|
|
37
|
+
when "category"
|
|
38
|
+
process_category
|
|
39
|
+
when "view"
|
|
40
|
+
print_items_in_category
|
|
41
|
+
when "item"
|
|
42
|
+
process_item
|
|
43
|
+
when "price"
|
|
44
|
+
process_price
|
|
45
|
+
when "range"
|
|
46
|
+
process_range
|
|
47
|
+
when "pid"
|
|
48
|
+
process_pid
|
|
49
|
+
when "debug"
|
|
50
|
+
#binding.pry #uncomment this line and require 'pry' to allow debug session
|
|
51
|
+
when "q"
|
|
52
|
+
run = false
|
|
53
|
+
end
|
|
54
|
+
end
|
|
55
|
+
|
|
56
|
+
end
|
|
57
|
+
|
|
58
|
+
def actions_menu
|
|
59
|
+
MENU.each{|message| puts "#{message}"}
|
|
60
|
+
gets.chomp
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
def process_category
|
|
64
|
+
@site.scrape_category(category_menu)
|
|
65
|
+
#@site.scrape_page(category_menu) #Better for testing.
|
|
66
|
+
Item.create_from_collection(@site.all)
|
|
67
|
+
merge_price_manager_attr #Set item category and url if they are not set.
|
|
68
|
+
print_items_in_category
|
|
69
|
+
process_item
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
def process_item
|
|
73
|
+
puts "Please Enter your sale item:"
|
|
74
|
+
@item = gets.chomp.downcase
|
|
75
|
+
search_by_type
|
|
76
|
+
process_price
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
def process_pid
|
|
80
|
+
puts "Please Enter the PID:"
|
|
81
|
+
@pid = gets.chomp
|
|
82
|
+
if search_by_pid != []
|
|
83
|
+
Item.merge_item(@pid, @site.scrape_by_pid(@url+search_by_pid.link))
|
|
84
|
+
print_item_by_pid
|
|
85
|
+
else
|
|
86
|
+
puts "PID: #{@pid} unavailable in #{@category}"
|
|
87
|
+
print_item_by_pid
|
|
88
|
+
end
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
def process_price
|
|
92
|
+
print_items_by_price
|
|
93
|
+
print_basic_stats
|
|
94
|
+
end
|
|
95
|
+
|
|
96
|
+
def process_range
|
|
97
|
+
puts "Enter a minimum price"
|
|
98
|
+
@min = gets.chomp.to_i
|
|
99
|
+
puts "Enter a maximum price"
|
|
100
|
+
@max = gets.chomp.to_i
|
|
101
|
+
print_items_in_range
|
|
102
|
+
end
|
|
103
|
+
|
|
104
|
+
def category_menu
|
|
105
|
+
puts "\n"
|
|
106
|
+
puts "Available 'for sale' categories are:"
|
|
107
|
+
puts "----------------------------------------------"
|
|
108
|
+
@site.menu_hash.each_key{|key| puts key}
|
|
109
|
+
puts "----------------------------------------------"
|
|
110
|
+
puts "Enter the category you want to browse"
|
|
111
|
+
puts "----------------------------------------------"
|
|
112
|
+
@category = gets.strip.downcase
|
|
113
|
+
unless @site.menu_hash.has_key?(@category)
|
|
114
|
+
puts "Category: #{@category} not found! Please check the spelling."
|
|
115
|
+
sleep 1
|
|
116
|
+
category_menu
|
|
117
|
+
end
|
|
118
|
+
check_subcategory_menu
|
|
119
|
+
end
|
|
120
|
+
|
|
121
|
+
def check_subcategory_menu
|
|
122
|
+
list = ["auto parts", "bikes", "boats", "cars+trucks", "computers", "motorcycles"]
|
|
123
|
+
if list.include?(@category)
|
|
124
|
+
@site.scrape_second_level_menus(get_link_from_key)
|
|
125
|
+
subcategory_menu
|
|
126
|
+
else
|
|
127
|
+
get_link_from_key
|
|
128
|
+
end
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
def subcategory_menu
|
|
132
|
+
puts "----------------------------------------------"
|
|
133
|
+
puts "Available categories in #{@category} are:"
|
|
134
|
+
puts "----------------------------------------------"
|
|
135
|
+
@site.submenu_hash.each_key{|key| puts key}
|
|
136
|
+
puts "----------------------------------------------"
|
|
137
|
+
puts "\n"
|
|
138
|
+
puts "Enter the subcategory you want to browse"
|
|
139
|
+
puts "----------------------------------------------"
|
|
140
|
+
@subcategory = gets.strip
|
|
141
|
+
unless @site.submenu_hash.has_key?(@subcategory)
|
|
142
|
+
puts "******************************************************************"
|
|
143
|
+
puts "Subcategory: #{@subcategory} not found! Please check the spelling."
|
|
144
|
+
puts "******************************************************************"
|
|
145
|
+
sleep 1 #pause so user can see warning
|
|
146
|
+
subcategory_menu
|
|
147
|
+
else
|
|
148
|
+
puts "\n"
|
|
149
|
+
puts "Please type in a selection from the following:"
|
|
150
|
+
puts "----------------------------------------------"
|
|
151
|
+
get_subcategory_info.each_key{|key| puts key.downcase}
|
|
152
|
+
puts "----------------------------------------------"
|
|
153
|
+
choice = gets.strip.upcase
|
|
154
|
+
@url + get_subcategory_info.fetch(choice)
|
|
155
|
+
end
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
end
|
data/spec.md
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
# Specifications for the CLI Assessment
|
|
2
|
+
|
|
3
|
+
Specs:
|
|
4
|
+
- [x] Have a CLI for interfacing with the application.
|
|
5
|
+
1)The program begins by prompting the user for the main Craigslist (CL) page and the user selects the category.
|
|
6
|
+
2)The user is then guided to look for a search item. This can be any string. All results from the category are returned. The user can then perform the following:
|
|
7
|
+
A)view items as-is
|
|
8
|
+
B)view items narrowed by search string (price ascending, I find this the most helpful fashion).
|
|
9
|
+
C)view items narrowed within a price range established by the user.
|
|
10
|
+
D)view advanced item info by unique identifier (pid)
|
|
11
|
+
- [X] Pull data from an external source
|
|
12
|
+
1)The program scrapes all categories (within the "for sale" categories on the CL page). Only if the user selects one of the unique categories ["auto parts", "bikes", "boats", "cars+trucks", "computers", "motorcycles"] does the program drill down into the second page level for these categories as well, allowing the user to further select those.
|
|
13
|
+
2)The program scrapes all items from a category, loading each page and scraping them for a collection of all items within a category. These items are then turned into objects. All items are objects and are manipulated as objects.
|
|
14
|
+
3)The program also will drill down to the third level, grabbing all information for a selected unique id (pid). This data is merged with the original second level scraped data automatically.
|
|
15
|
+
- [X] Implement both list and detail views
|
|
16
|
+
1) Provides list of items within a category.
|
|
17
|
+
2) Provides list of items with price sorted by ascending price.
|
|
18
|
+
3) Provides a list of detail by scraping a third level page by pid.
|
metadata
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
|
2
|
+
name: craigslist_price_it_right
|
|
3
|
+
version: !ruby/object:Gem::Version
|
|
4
|
+
version: 0.1.0
|
|
5
|
+
platform: ruby
|
|
6
|
+
authors:
|
|
7
|
+
- zoebisch
|
|
8
|
+
autorequire:
|
|
9
|
+
bindir: exe
|
|
10
|
+
cert_chain: []
|
|
11
|
+
date: 2017-07-28 00:00:00.000000000 Z
|
|
12
|
+
dependencies:
|
|
13
|
+
- !ruby/object:Gem::Dependency
|
|
14
|
+
name: bundler
|
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
|
16
|
+
requirements:
|
|
17
|
+
- - "~>"
|
|
18
|
+
- !ruby/object:Gem::Version
|
|
19
|
+
version: '1.14'
|
|
20
|
+
type: :development
|
|
21
|
+
prerelease: false
|
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
23
|
+
requirements:
|
|
24
|
+
- - "~>"
|
|
25
|
+
- !ruby/object:Gem::Version
|
|
26
|
+
version: '1.14'
|
|
27
|
+
- !ruby/object:Gem::Dependency
|
|
28
|
+
name: rake
|
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
|
30
|
+
requirements:
|
|
31
|
+
- - "~>"
|
|
32
|
+
- !ruby/object:Gem::Version
|
|
33
|
+
version: '10.0'
|
|
34
|
+
type: :development
|
|
35
|
+
prerelease: false
|
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
37
|
+
requirements:
|
|
38
|
+
- - "~>"
|
|
39
|
+
- !ruby/object:Gem::Version
|
|
40
|
+
version: '10.0'
|
|
41
|
+
- !ruby/object:Gem::Dependency
|
|
42
|
+
name: nokogiri
|
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
|
44
|
+
requirements:
|
|
45
|
+
- - "~>"
|
|
46
|
+
- !ruby/object:Gem::Version
|
|
47
|
+
version: '1.8'
|
|
48
|
+
type: :development
|
|
49
|
+
prerelease: false
|
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
|
51
|
+
requirements:
|
|
52
|
+
- - "~>"
|
|
53
|
+
- !ruby/object:Gem::Version
|
|
54
|
+
version: '1.8'
|
|
55
|
+
description: Program sets up menu from scraped for-sale items, allows user to select
|
|
56
|
+
a category, drills into second level data and self constructs sub-menu, allows user
|
|
57
|
+
to search for a for-sale item, returns list of items, can sort by price, sort by
|
|
58
|
+
price range, third level scraping and auto-object-merge based on CL pid
|
|
59
|
+
email:
|
|
60
|
+
- zoebisch@gmail.com
|
|
61
|
+
executables: []
|
|
62
|
+
extensions: []
|
|
63
|
+
extra_rdoc_files: []
|
|
64
|
+
files:
|
|
65
|
+
- ".gitignore"
|
|
66
|
+
- Gemfile
|
|
67
|
+
- LICENSE.txt
|
|
68
|
+
- README.md
|
|
69
|
+
- Rakefile
|
|
70
|
+
- bin/console
|
|
71
|
+
- bin/craigslist_price_it_right
|
|
72
|
+
- bin/setup
|
|
73
|
+
- config/environment.rb
|
|
74
|
+
- craigslist_price_it_right.gemspec
|
|
75
|
+
- lib/concerns/concerns.rb
|
|
76
|
+
- lib/craigslist_price_it_right.rb
|
|
77
|
+
- lib/craigslist_price_it_right/version.rb
|
|
78
|
+
- lib/craigslist_scraper.rb
|
|
79
|
+
- lib/item.rb
|
|
80
|
+
- lib/price_manager.rb
|
|
81
|
+
- spec.md
|
|
82
|
+
homepage: https://github.com/zoebisch/craigslist-price-it-right
|
|
83
|
+
licenses:
|
|
84
|
+
- MIT
|
|
85
|
+
metadata: {}
|
|
86
|
+
post_install_message:
|
|
87
|
+
rdoc_options: []
|
|
88
|
+
require_paths:
|
|
89
|
+
- lib
|
|
90
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
|
91
|
+
requirements:
|
|
92
|
+
- - ">="
|
|
93
|
+
- !ruby/object:Gem::Version
|
|
94
|
+
version: '0'
|
|
95
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
96
|
+
requirements:
|
|
97
|
+
- - ">="
|
|
98
|
+
- !ruby/object:Gem::Version
|
|
99
|
+
version: '0'
|
|
100
|
+
requirements: []
|
|
101
|
+
rubyforge_project:
|
|
102
|
+
rubygems_version: 2.6.12
|
|
103
|
+
signing_key:
|
|
104
|
+
specification_version: 4
|
|
105
|
+
summary: Craigslist scraper program for helping to price an item for sale
|
|
106
|
+
test_files: []
|