amazoned 0.1.4 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +36 -3
- data/lib/amazoned/parser.rb +10 -6
- data/lib/amazoned/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3ba7bfa8d1ab695dd9354a1a05830ac7bb03950a
|
4
|
+
data.tar.gz: 58a0bc4368b26e1e0dfe34bdb7e45f05664b480c
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c0c1505564a7145a43bd02363959eec227c50714684c943615718842cbbb35f832c44812e73cac0f3697666d27c907b5b2f435bad086b197bbd8c3787513a283
|
7
|
+
data.tar.gz: 58151e3decb3a18e00078e828bee389e4928c826d8834c045941bb59c6e3a98ff141cbf884b3108f036adb833b34d9c490d7c0982a8f0b38ee34dc73254ad61b
|
data/README.md
CHANGED
@@ -1,8 +1,7 @@
|
|
1
1
|
# Amazoned
|
2
2
|
|
3
|
-
|
3
|
+
Amazoned is a ruby HTTP scraper for retrieving product best seller and category rankings from Amazon. Designed for those who don't have the time to register for Amazon's official product API.
|
4
4
|
|
5
|
-
TODO: Delete this and the text above, and describe your gem
|
6
5
|
|
7
6
|
## Installation
|
8
7
|
|
@@ -22,7 +21,41 @@ Or install it yourself as:
|
|
22
21
|
|
23
22
|
## Usage
|
24
23
|
|
25
|
-
|
24
|
+
Use Amazoned to scrape data about a product using the ASIN of the Amazon product. E.g. for product with an ASIN of B078SX6STW:
|
25
|
+
|
26
|
+
```ruby
|
27
|
+
Amazoned::Client.new("B078SX6STW").call
|
28
|
+
```
|
29
|
+
|
30
|
+
will return back a Ruby hash of:
|
31
|
+
|
32
|
+
```ruby
|
33
|
+
{
|
34
|
+
:best_sellers_rank=>[
|
35
|
+
{:rank=>45,
|
36
|
+
:ladder=>"Baby > Baby Care > Pacifiers, Teethers & Teething Relief > Teethers"}
|
37
|
+
],
|
38
|
+
:rank=>1602,
|
39
|
+
:category=>"Baby",
|
40
|
+
:package_dimensions=>"6.8 x 6.3 x 1.9 inches"
|
41
|
+
}
|
42
|
+
```
|
43
|
+
|
44
|
+
Amazoned will raise the error `Amazoned::ProductNotFoundError` if the product ASIN does not exist.
|
45
|
+
|
46
|
+
Amazoned will raise the error `Amazoned::BotDeniedAccessError` if the scraper is unable to get past a CAPTCHA wall after trying multiple times for the same ASIN.
|
47
|
+
|
48
|
+
To avoid anti-scraper detection, the bot spoofs a new User Agent every request and uses timing jitter to vary how long it sleeps in-between each request.
|
49
|
+
|
50
|
+
|
51
|
+
## Configuring Automatic Retries
|
52
|
+
The library can be configured to automatically retry requests that fail due to the scraper bot hitting a CAPTCHA page:
|
53
|
+
|
54
|
+
```ruby
|
55
|
+
Amazoned.max_network_retries = 2
|
56
|
+
```
|
57
|
+
|
58
|
+
By default, `max_network_retries` is set to `3`.
|
26
59
|
|
27
60
|
## Development
|
28
61
|
|
data/lib/amazoned/parser.rb
CHANGED
@@ -63,12 +63,16 @@ class Amazoned::Parser
|
|
63
63
|
product_hash[:rank] = parsed_parent_category.first.delete(',').to_i
|
64
64
|
product_hash[:category] = parsed_parent_category.last
|
65
65
|
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
66
|
+
parsed_categories = str.partition(")").last.split("#").map(&:strip)
|
67
|
+
parsed_categories.each do |pc|
|
68
|
+
next if pc.blank?
|
69
|
+
parsed_category = pc.partition("in").map(&:strip).map{|i| i.gsub("#", "")} - ["in"]
|
70
|
+
|
71
|
+
hsh = {}
|
72
|
+
hsh[:rank] = parsed_category.first.delete(',').to_i
|
73
|
+
hsh[:ladder] = parsed_category.last
|
74
|
+
product_hash[:best_sellers_rank] << hsh
|
75
|
+
end
|
72
76
|
end
|
73
77
|
end
|
74
78
|
|
data/lib/amazoned/version.rb
CHANGED