amzpy 0.1.1__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
amzpy-0.2.0/PKG-INFO ADDED
@@ -0,0 +1,221 @@
1
+ Metadata-Version: 2.4
2
+ Name: amzpy
3
+ Version: 0.2.0
4
+ Summary: A lightweight Amazon scraper library.
5
+ Home-page: https://github.com/theonlyanil/amzpy
6
+ Author: Anil Sardiwal
7
+ Author-email: theonlyanil@gmail.com
8
+ Keywords: amazon,scraper,web-scraping,product-data,e-commerce,curl_cffi,anti-bot
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Operating System :: OS Independent
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.6
15
+ Classifier: Programming Language :: Python :: 3.7
16
+ Classifier: Programming Language :: Python :: 3.8
17
+ Classifier: Programming Language :: Python :: 3.9
18
+ Classifier: Programming Language :: Python :: 3.10
19
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
20
+ Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
21
+ Requires-Python: >=3.6
22
+ Description-Content-Type: text/markdown
23
+ Requires-Dist: curl_cffi>=0.5.7
24
+ Requires-Dist: beautifulsoup4>=4.11.0
25
+ Requires-Dist: lxml>=4.9.0
26
+ Requires-Dist: fake_useragent>=1.1.1
27
+ Dynamic: author
28
+ Dynamic: author-email
29
+ Dynamic: classifier
30
+ Dynamic: description
31
+ Dynamic: description-content-type
32
+ Dynamic: home-page
33
+ Dynamic: keywords
34
+ Dynamic: requires-dist
35
+ Dynamic: requires-python
36
+ Dynamic: summary
37
+
38
+ # AmzPy - Amazon Product Scraper [![PyPI](https://img.shields.io/pypi/v/amzpy)](https://pypi.org/project/amzpy/)
39
+ ![AmzPy Logo](https://i.imgur.com/QxrE60H.png)
40
+
41
+ <a href="https://www.producthunt.com/posts/amzpy?embed=true&utm_source=badge-featured&utm_medium=badge&utm_souce=badge-amzpy" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=812920&theme=neutral&t=1737654254074" alt="AmzPy - A&#0032;lightweight&#0032;Amazon&#0032;product&#0032;scraper&#0032;library&#0046; | Product Hunt" style="width: 250px; height: 54px;" width="250" height="54" /></a>
42
+
43
+ AmzPy is a lightweight Python library for scraping product information from Amazon. It provides a simple interface to fetch product details like title, price, currency, and image URL while handling anti-bot measures automatically using curl_cffi for enhanced protection.
44
+
45
+ ## Features
46
+
47
+ - Easy-to-use API for scraping Amazon product data
48
+ - Supports multiple Amazon domains (.com, .in, .co.uk, etc.)
49
+ - Enhanced anti-bot protection using curl_cffi with browser impersonation
50
+ - Automatic retries on detection with intelligent delay management
51
+ - Support for proxies to distribute requests
52
+ - Dynamic configuration options
53
+ - Extract color variants, discounts, delivery information, and more
54
+ - Clean and typed Python interface
55
+
56
+ ## Installation
57
+
58
+ Install using pip:
59
+ ```bash
60
+ pip install amzpy
61
+ ```
62
+
63
+ ## Basic Usage
64
+
65
+ ### Fetching Product Details
66
+
67
+ ```python
68
+ from amzpy import AmazonScraper
69
+
70
+ # Create scraper with default settings (amazon.com)
71
+ scraper = AmazonScraper()
72
+
73
+ # Fetch product details
74
+ url = "https://www.amazon.com/dp/B0D4J2QDVY"
75
+ product = scraper.get_product_details(url)
76
+
77
+ if product:
78
+ print(f"Title: {product['title']}")
79
+ print(f"Price: {product['currency']}{product['price']}")
80
+ print(f"Brand: {product['brand']}")
81
+ print(f"Rating: {product['rating']}")
82
+ print(f"Image URL: {product['img_url']}")
83
+ ```
84
+
85
+ ### Searching for Products
86
+
87
+ ```python
88
+ from amzpy import AmazonScraper
89
+
90
+ # Create scraper for a specific Amazon domain
91
+ scraper = AmazonScraper(country_code="in")
92
+
93
+ # Search by query - get up to 2 pages of results
94
+ products = scraper.search_products(query="wireless earbuds", max_pages=2)
95
+
96
+ # Display the results
97
+ for i, product in enumerate(products[:5], 1):
98
+ print(f"{i}. {product['title']} - {product['currency']}{product['price']}")
99
+ ```
100
+
101
+ ## Advanced Usage
102
+
103
+ ### Configuration Options
104
+
105
+ AmzPy offers flexible configuration options that can be set in multiple ways:
106
+
107
+ ```python
108
+ # Method 1: Set at initialization
109
+ scraper = AmazonScraper(
110
+ country_code="in",
111
+ impersonate="chrome119",
112
+ proxies={"http": "http://user:pass@proxy.example.com:8080"}
113
+ )
114
+
115
+ # Method 2: Using string-based configuration
116
+ scraper.config('MAX_RETRIES = 5, REQUEST_TIMEOUT = 30, DELAY_BETWEEN_REQUESTS = (3, 8)')
117
+
118
+ # Method 3: Using keyword arguments
119
+ scraper.config(MAX_RETRIES=4, DEFAULT_IMPERSONATE="safari15")
120
+ ```
121
+
122
+ ### Advanced Search Features
123
+
124
+ The search functionality can extract rich product data including:
125
+
126
+ ```python
127
+ # Search for products with 5 pages of results
128
+ products = scraper.search_products(query="men sneakers size 9", max_pages=5)
129
+
130
+ # Or search using a pre-constructed URL (e.g., filtered searches)
131
+ url = "https://www.amazon.in/s?i=shoes&rh=n%3A1983518031&s=popularity-rank"
132
+ products = scraper.search_products(search_url=url, max_pages=3)
133
+
134
+ # Access comprehensive product data
135
+ for product in products:
136
+ # Basic information
137
+ print(f"Title: {product.get('title')}")
138
+ print(f"ASIN: {product.get('asin')}")
139
+ print(f"URL: https://www.amazon.{scraper.country_code}/dp/{product.get('asin')}")
140
+ print(f"Brand: {product.get('brand')}")
141
+ print(f"Price: {product.get('currency')}{product.get('price')}")
142
+
143
+ # Discount information
144
+ if 'original_price' in product:
145
+ print(f"Original Price: {product.get('currency')}{product.get('original_price')}")
146
+ print(f"Discount: {product.get('discount_percent')}% off")
147
+
148
+ # Ratings and reviews
149
+ print(f"Rating: {product.get('rating')} / 5.0 ({product.get('reviews_count')} reviews)")
150
+
151
+ # Color variants
152
+ if 'color_variants' in product:
153
+ print(f"Available in {len(product['color_variants'])} colors")
154
+ for variant in product['color_variants']:
155
+ print(f" - {variant['name']}: https://www.amazon.{scraper.country_code}/dp/{variant['asin']}")
156
+
157
+ # Additional information
158
+ print(f"Prime Eligible: {'Yes' if product.get('prime') else 'No'}")
159
+ if 'delivery_info' in product:
160
+ print(f"Delivery: {product.get('delivery_info')}")
161
+ if 'badge' in product:
162
+ print(f"Badge: {product.get('badge')}")
163
+ ```
164
+
165
+ ### Working with Proxies
166
+
167
+ To distribute requests and avoid IP blocks, you can use proxies:
168
+
169
+ ```python
170
+ # HTTP/HTTPS proxies
171
+ proxies = {
172
+ "http": "http://user:pass@proxy.example.com:8080",
173
+ "https": "http://user:pass@proxy.example.com:8080"
174
+ }
175
+
176
+ # SOCKS5 proxies
177
+ proxies = {
178
+ "http": "socks5://user:pass@proxy.example.com:1080",
179
+ "https": "socks5://user:pass@proxy.example.com:1080"
180
+ }
181
+
182
+ scraper = AmazonScraper(proxies=proxies)
183
+ ```
184
+
185
+ ### Browser Impersonation
186
+
187
+ AmzPy uses curl_cffi's browser impersonation to mimic real browser requests, significantly improving success rates when scraping Amazon:
188
+
189
+ ```python
190
+ # Specify a particular browser to impersonate
191
+ scraper = AmazonScraper(impersonate="chrome119") # Chrome 119
192
+ scraper = AmazonScraper(impersonate="safari15") # Safari 15
193
+ scraper = AmazonScraper(impersonate="firefox115") # Firefox 115
194
+ ```
195
+
196
+ ## Configuration Reference
197
+
198
+ These configuration parameters can be adjusted:
199
+
200
+ | Parameter | Default | Description |
201
+ |-----------|---------|-------------|
202
+ | MAX_RETRIES | 3 | Maximum number of retry attempts for failed requests |
203
+ | REQUEST_TIMEOUT | 25 | Request timeout in seconds |
204
+ | DELAY_BETWEEN_REQUESTS | (2, 5) | Random delay range between requests (min, max) in seconds |
205
+ | DEFAULT_IMPERSONATE | 'chrome119' | Default browser to impersonate |
206
+
207
+ ## Requirements
208
+
209
+ - Python 3.6+
210
+ - curl_cffi (for enhanced anti-bot protection)
211
+ - beautifulsoup4
212
+ - lxml (for faster HTML parsing)
213
+ - fake_useragent
214
+
215
+ ## License
216
+
217
+ This project is licensed under the MIT License - see the LICENSE file for details.
218
+
219
+ ## Contributing
220
+
221
+ Contributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to contribute to this project.
amzpy-0.2.0/README.md ADDED
@@ -0,0 +1,184 @@
1
+ # AmzPy - Amazon Product Scraper [![PyPI](https://img.shields.io/pypi/v/amzpy)](https://pypi.org/project/amzpy/)
2
+ ![AmzPy Logo](https://i.imgur.com/QxrE60H.png)
3
+
4
+ <a href="https://www.producthunt.com/posts/amzpy?embed=true&utm_source=badge-featured&utm_medium=badge&utm_souce=badge-amzpy" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=812920&theme=neutral&t=1737654254074" alt="AmzPy - A&#0032;lightweight&#0032;Amazon&#0032;product&#0032;scraper&#0032;library&#0046; | Product Hunt" style="width: 250px; height: 54px;" width="250" height="54" /></a>
5
+
6
+ AmzPy is a lightweight Python library for scraping product information from Amazon. It provides a simple interface to fetch product details like title, price, currency, and image URL while handling anti-bot measures automatically using curl_cffi for enhanced protection.
7
+
8
+ ## Features
9
+
10
+ - Easy-to-use API for scraping Amazon product data
11
+ - Supports multiple Amazon domains (.com, .in, .co.uk, etc.)
12
+ - Enhanced anti-bot protection using curl_cffi with browser impersonation
13
+ - Automatic retries on detection with intelligent delay management
14
+ - Support for proxies to distribute requests
15
+ - Dynamic configuration options
16
+ - Extract color variants, discounts, delivery information, and more
17
+ - Clean and typed Python interface
18
+
19
+ ## Installation
20
+
21
+ Install using pip:
22
+ ```bash
23
+ pip install amzpy
24
+ ```
25
+
26
+ ## Basic Usage
27
+
28
+ ### Fetching Product Details
29
+
30
+ ```python
31
+ from amzpy import AmazonScraper
32
+
33
+ # Create scraper with default settings (amazon.com)
34
+ scraper = AmazonScraper()
35
+
36
+ # Fetch product details
37
+ url = "https://www.amazon.com/dp/B0D4J2QDVY"
38
+ product = scraper.get_product_details(url)
39
+
40
+ if product:
41
+ print(f"Title: {product['title']}")
42
+ print(f"Price: {product['currency']}{product['price']}")
43
+ print(f"Brand: {product['brand']}")
44
+ print(f"Rating: {product['rating']}")
45
+ print(f"Image URL: {product['img_url']}")
46
+ ```
47
+
48
+ ### Searching for Products
49
+
50
+ ```python
51
+ from amzpy import AmazonScraper
52
+
53
+ # Create scraper for a specific Amazon domain
54
+ scraper = AmazonScraper(country_code="in")
55
+
56
+ # Search by query - get up to 2 pages of results
57
+ products = scraper.search_products(query="wireless earbuds", max_pages=2)
58
+
59
+ # Display the results
60
+ for i, product in enumerate(products[:5], 1):
61
+ print(f"{i}. {product['title']} - {product['currency']}{product['price']}")
62
+ ```
63
+
64
+ ## Advanced Usage
65
+
66
+ ### Configuration Options
67
+
68
+ AmzPy offers flexible configuration options that can be set in multiple ways:
69
+
70
+ ```python
71
+ # Method 1: Set at initialization
72
+ scraper = AmazonScraper(
73
+ country_code="in",
74
+ impersonate="chrome119",
75
+ proxies={"http": "http://user:pass@proxy.example.com:8080"}
76
+ )
77
+
78
+ # Method 2: Using string-based configuration
79
+ scraper.config('MAX_RETRIES = 5, REQUEST_TIMEOUT = 30, DELAY_BETWEEN_REQUESTS = (3, 8)')
80
+
81
+ # Method 3: Using keyword arguments
82
+ scraper.config(MAX_RETRIES=4, DEFAULT_IMPERSONATE="safari15")
83
+ ```
84
+
85
+ ### Advanced Search Features
86
+
87
+ The search functionality can extract rich product data including:
88
+
89
+ ```python
90
+ # Search for products with 5 pages of results
91
+ products = scraper.search_products(query="men sneakers size 9", max_pages=5)
92
+
93
+ # Or search using a pre-constructed URL (e.g., filtered searches)
94
+ url = "https://www.amazon.in/s?i=shoes&rh=n%3A1983518031&s=popularity-rank"
95
+ products = scraper.search_products(search_url=url, max_pages=3)
96
+
97
+ # Access comprehensive product data
98
+ for product in products:
99
+ # Basic information
100
+ print(f"Title: {product.get('title')}")
101
+ print(f"ASIN: {product.get('asin')}")
102
+ print(f"URL: https://www.amazon.{scraper.country_code}/dp/{product.get('asin')}")
103
+ print(f"Brand: {product.get('brand')}")
104
+ print(f"Price: {product.get('currency')}{product.get('price')}")
105
+
106
+ # Discount information
107
+ if 'original_price' in product:
108
+ print(f"Original Price: {product.get('currency')}{product.get('original_price')}")
109
+ print(f"Discount: {product.get('discount_percent')}% off")
110
+
111
+ # Ratings and reviews
112
+ print(f"Rating: {product.get('rating')} / 5.0 ({product.get('reviews_count')} reviews)")
113
+
114
+ # Color variants
115
+ if 'color_variants' in product:
116
+ print(f"Available in {len(product['color_variants'])} colors")
117
+ for variant in product['color_variants']:
118
+ print(f" - {variant['name']}: https://www.amazon.{scraper.country_code}/dp/{variant['asin']}")
119
+
120
+ # Additional information
121
+ print(f"Prime Eligible: {'Yes' if product.get('prime') else 'No'}")
122
+ if 'delivery_info' in product:
123
+ print(f"Delivery: {product.get('delivery_info')}")
124
+ if 'badge' in product:
125
+ print(f"Badge: {product.get('badge')}")
126
+ ```
127
+
128
+ ### Working with Proxies
129
+
130
+ To distribute requests and avoid IP blocks, you can use proxies:
131
+
132
+ ```python
133
+ # HTTP/HTTPS proxies
134
+ proxies = {
135
+ "http": "http://user:pass@proxy.example.com:8080",
136
+ "https": "http://user:pass@proxy.example.com:8080"
137
+ }
138
+
139
+ # SOCKS5 proxies
140
+ proxies = {
141
+ "http": "socks5://user:pass@proxy.example.com:1080",
142
+ "https": "socks5://user:pass@proxy.example.com:1080"
143
+ }
144
+
145
+ scraper = AmazonScraper(proxies=proxies)
146
+ ```
147
+
148
+ ### Browser Impersonation
149
+
150
+ AmzPy uses curl_cffi's browser impersonation to mimic real browser requests, significantly improving success rates when scraping Amazon:
151
+
152
+ ```python
153
+ # Specify a particular browser to impersonate
154
+ scraper = AmazonScraper(impersonate="chrome119") # Chrome 119
155
+ scraper = AmazonScraper(impersonate="safari15") # Safari 15
156
+ scraper = AmazonScraper(impersonate="firefox115") # Firefox 115
157
+ ```
158
+
159
+ ## Configuration Reference
160
+
161
+ These configuration parameters can be adjusted:
162
+
163
+ | Parameter | Default | Description |
164
+ |-----------|---------|-------------|
165
+ | MAX_RETRIES | 3 | Maximum number of retry attempts for failed requests |
166
+ | REQUEST_TIMEOUT | 25 | Request timeout in seconds |
167
+ | DELAY_BETWEEN_REQUESTS | (2, 5) | Random delay range between requests (min, max) in seconds |
168
+ | DEFAULT_IMPERSONATE | 'chrome119' | Default browser to impersonate |
169
+
170
+ ## Requirements
171
+
172
+ - Python 3.6+
173
+ - curl_cffi (for enhanced anti-bot protection)
174
+ - beautifulsoup4
175
+ - lxml (for faster HTML parsing)
176
+ - fake_useragent
177
+
178
+ ## License
179
+
180
+ This project is licensed under the MIT License - see the LICENSE file for details.
181
+
182
+ ## Contributing
183
+
184
+ Contributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to contribute to this project.
@@ -3,6 +3,7 @@ AmzPy - Amazon Product Scraper
3
3
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4
4
 
5
5
  A lightweight Python library for scraping product information from Amazon.
6
+ Now using curl_cffi for better anti-bot protection.
6
7
 
7
8
  Basic usage:
8
9
  >>> from amzpy import AmazonScraper
@@ -16,6 +17,6 @@ Basic usage:
16
17
 
17
18
  from .scraper import AmazonScraper
18
19
 
19
- __version__ = "0.1.0"
20
+ __version__ = "0.2.0"
20
21
  __author__ = "Anil Sardiwal"
21
22
  __license__ = "MIT"