PyPI - amzpy - Versions diffs - 0.1.1__tar.gz → 0.2.0__tar.gz - Mend

amzpy 0.1.1tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

amzpy-0.2.0/PKG-INFO +221 -0
amzpy-0.2.0/README.md +184 -0
{amzpy-0.1.1 → amzpy-0.2.0}/amzpy/__init__.py +2 -1
amzpy-0.2.0/amzpy/parser.py +580 -0
amzpy-0.2.0/amzpy/scraper.py +223 -0
amzpy-0.2.0/amzpy/session.py +222 -0
amzpy-0.2.0/amzpy/utils.py +80 -0
amzpy-0.2.0/amzpy.egg-info/PKG-INFO +221 -0
{amzpy-0.1.1 → amzpy-0.2.0}/amzpy.egg-info/SOURCES.txt +1 -1
amzpy-0.2.0/amzpy.egg-info/requires.txt +4 -0
{amzpy-0.1.1 → amzpy-0.2.0}/setup.py +6 -5
amzpy-0.1.1/PKG-INFO +0 -93
amzpy-0.1.1/README.md +0 -57
amzpy-0.1.1/amzpy/engine.py +0 -36
amzpy-0.1.1/amzpy/parser.py +0 -59
amzpy-0.1.1/amzpy/scraper.py +0 -52
amzpy-0.1.1/amzpy/utils.py +0 -38
amzpy-0.1.1/amzpy.egg-info/PKG-INFO +0 -93
amzpy-0.1.1/amzpy.egg-info/requires.txt +0 -3
{amzpy-0.1.1 → amzpy-0.2.0}/amzpy.egg-info/dependency_links.txt +0 -0
{amzpy-0.1.1 → amzpy-0.2.0}/amzpy.egg-info/top_level.txt +0 -0
{amzpy-0.1.1 → amzpy-0.2.0}/setup.cfg +0 -0
{amzpy-0.1.1 → amzpy-0.2.0}/tests/__init__.py +0 -0
{amzpy-0.1.1 → amzpy-0.2.0}/tests/test_scraper.py +0 -0

amzpy-0.2.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,221 @@
+Metadata-Version: 2.4
+Name: amzpy
+Version: 0.2.0
+Summary: A lightweight Amazon scraper library.
+Home-page: https://github.com/theonlyanil/amzpy
+Author: Anil Sardiwal
+Author-email: theonlyanil@gmail.com
+Keywords: amazon,scraper,web-scraping,product-data,e-commerce,curl_cffi,anti-bot
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.6
+Classifier: Programming Language :: Python :: 3.7
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
+Requires-Python: >=3.6
+Description-Content-Type: text/markdown
+Requires-Dist: curl_cffi>=0.5.7
+Requires-Dist: beautifulsoup4>=4.11.0
+Requires-Dist: lxml>=4.9.0
+Requires-Dist: fake_useragent>=1.1.1
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: keywords
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+# AmzPy - Amazon Product Scraper [![PyPI](https://img.shields.io/pypi/v/amzpy)](https://pypi.org/project/amzpy/)
+![AmzPy Logo](https://i.imgur.com/QxrE60H.png)
+<a href="https://www.producthunt.com/posts/amzpy?embed=true&utm_source=badge-featured&utm_medium=badge&utm_souce=badge-amzpy" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=812920&theme=neutral&t=1737654254074" alt="AmzPy - A&#0032;lightweight&#0032;Amazon&#0032;product&#0032;scraper&#0032;library&#0046; | Product Hunt" style="width: 250px; height: 54px;" width="250" height="54" /></a>
+AmzPy is a lightweight Python library for scraping product information from Amazon. It provides a simple interface to fetch product details like title, price, currency, and image URL while handling anti-bot measures automatically using curl_cffi for enhanced protection.
+## Features
+- Easy-to-use API for scraping Amazon product data
+- Supports multiple Amazon domains (.com, .in, .co.uk, etc.)
+- Enhanced anti-bot protection using curl_cffi with browser impersonation
+- Automatic retries on detection with intelligent delay management
+- Support for proxies to distribute requests
+- Dynamic configuration options
+- Extract color variants, discounts, delivery information, and more
+- Clean and typed Python interface
+## Installation
+Install using pip:
+```bash
+pip install amzpy
+```
+## Basic Usage
+### Fetching Product Details
+```python
+from amzpy import AmazonScraper
+# Create scraper with default settings (amazon.com)
+scraper = AmazonScraper()
+# Fetch product details
+url = "https://www.amazon.com/dp/B0D4J2QDVY"
+product = scraper.get_product_details(url)
+if product:
+    print(f"Title: {product['title']}")
+    print(f"Price: {product['currency']}{product['price']}")
+    print(f"Brand: {product['brand']}")
+    print(f"Rating: {product['rating']}")
+    print(f"Image URL: {product['img_url']}")
+```
+### Searching for Products
+```python
+from amzpy import AmazonScraper
+# Create scraper for a specific Amazon domain
+scraper = AmazonScraper(country_code="in")
+# Search by query - get up to 2 pages of results
+products = scraper.search_products(query="wireless earbuds", max_pages=2)
+# Display the results
+for i, product in enumerate(products[:5], 1):
+    print(f"{i}. {product['title']} - {product['currency']}{product['price']}")
+```
+## Advanced Usage
+### Configuration Options
+AmzPy offers flexible configuration options that can be set in multiple ways:
+```python
+# Method 1: Set at initialization
+scraper = AmazonScraper(
+    country_code="in",
+    impersonate="chrome119",
+    proxies={"http": "http://user:pass@proxy.example.com:8080"}
+)
+# Method 2: Using string-based configuration
+scraper.config('MAX_RETRIES = 5, REQUEST_TIMEOUT = 30, DELAY_BETWEEN_REQUESTS = (3, 8)')
+# Method 3: Using keyword arguments
+scraper.config(MAX_RETRIES=4, DEFAULT_IMPERSONATE="safari15")
+```
+### Advanced Search Features
+The search functionality can extract rich product data including:
+```python
+# Search for products with 5 pages of results
+products = scraper.search_products(query="men sneakers size 9", max_pages=5)
+# Or search using a pre-constructed URL (e.g., filtered searches)
+url = "https://www.amazon.in/s?i=shoes&rh=n%3A1983518031&s=popularity-rank"
+products = scraper.search_products(search_url=url, max_pages=3)
+# Access comprehensive product data
+for product in products:
+    # Basic information
+    print(f"Title: {product.get('title')}")
+    print(f"ASIN: {product.get('asin')}")
+    print(f"URL: https://www.amazon.{scraper.country_code}/dp/{product.get('asin')}")
+    print(f"Brand: {product.get('brand')}")
+    print(f"Price: {product.get('currency')}{product.get('price')}")
+    # Discount information
+    if 'original_price' in product:
+        print(f"Original Price: {product.get('currency')}{product.get('original_price')}")
+        print(f"Discount: {product.get('discount_percent')}% off")
+    # Ratings and reviews
+    print(f"Rating: {product.get('rating')} / 5.0 ({product.get('reviews_count')} reviews)")
+    # Color variants
+    if 'color_variants' in product:
+        print(f"Available in {len(product['color_variants'])} colors")
+        for variant in product['color_variants']:
+            print(f"  - {variant['name']}: https://www.amazon.{scraper.country_code}/dp/{variant['asin']}")
+    # Additional information
+    print(f"Prime Eligible: {'Yes' if product.get('prime') else 'No'}")
+    if 'delivery_info' in product:
+        print(f"Delivery: {product.get('delivery_info')}")
+    if 'badge' in product:
+        print(f"Badge: {product.get('badge')}")
+```
+### Working with Proxies
+To distribute requests and avoid IP blocks, you can use proxies:
+```python
+# HTTP/HTTPS proxies
+proxies = {
+    "http": "http://user:pass@proxy.example.com:8080",
+    "https": "http://user:pass@proxy.example.com:8080"
+}
+# SOCKS5 proxies
+proxies = {
+    "http": "socks5://user:pass@proxy.example.com:1080",
+    "https": "socks5://user:pass@proxy.example.com:1080"
+}
+scraper = AmazonScraper(proxies=proxies)
+```
+### Browser Impersonation
+AmzPy uses curl_cffi's browser impersonation to mimic real browser requests, significantly improving success rates when scraping Amazon:
+```python
+# Specify a particular browser to impersonate
+scraper = AmazonScraper(impersonate="chrome119")  # Chrome 119
+scraper = AmazonScraper(impersonate="safari15")   # Safari 15
+scraper = AmazonScraper(impersonate="firefox115") # Firefox 115
+```
+## Configuration Reference
+These configuration parameters can be adjusted:
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| MAX_RETRIES | 3 | Maximum number of retry attempts for failed requests |
+| REQUEST_TIMEOUT | 25 | Request timeout in seconds |
+| DELAY_BETWEEN_REQUESTS | (2, 5) | Random delay range between requests (min, max) in seconds |
+| DEFAULT_IMPERSONATE | 'chrome119' | Default browser to impersonate |
+## Requirements
+- Python 3.6+
+- curl_cffi (for enhanced anti-bot protection)
+- beautifulsoup4
+- lxml (for faster HTML parsing)
+- fake_useragent
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## Contributing
+Contributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to contribute to this project.

amzpy-0.2.0/README.md ADDED Viewed

@@ -0,0 +1,184 @@
+# AmzPy - Amazon Product Scraper [![PyPI](https://img.shields.io/pypi/v/amzpy)](https://pypi.org/project/amzpy/)
+![AmzPy Logo](https://i.imgur.com/QxrE60H.png)
+<a href="https://www.producthunt.com/posts/amzpy?embed=true&utm_source=badge-featured&utm_medium=badge&utm_souce=badge-amzpy" target="_blank"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=812920&theme=neutral&t=1737654254074" alt="AmzPy - A&#0032;lightweight&#0032;Amazon&#0032;product&#0032;scraper&#0032;library&#0046; | Product Hunt" style="width: 250px; height: 54px;" width="250" height="54" /></a>
+AmzPy is a lightweight Python library for scraping product information from Amazon. It provides a simple interface to fetch product details like title, price, currency, and image URL while handling anti-bot measures automatically using curl_cffi for enhanced protection.
+## Features
+- Easy-to-use API for scraping Amazon product data
+- Supports multiple Amazon domains (.com, .in, .co.uk, etc.)
+- Enhanced anti-bot protection using curl_cffi with browser impersonation
+- Automatic retries on detection with intelligent delay management
+- Support for proxies to distribute requests
+- Dynamic configuration options
+- Extract color variants, discounts, delivery information, and more
+- Clean and typed Python interface
+## Installation
+Install using pip:
+```bash
+pip install amzpy
+```
+## Basic Usage
+### Fetching Product Details
+```python
+from amzpy import AmazonScraper
+# Create scraper with default settings (amazon.com)
+scraper = AmazonScraper()
+# Fetch product details
+url = "https://www.amazon.com/dp/B0D4J2QDVY"
+product = scraper.get_product_details(url)
+if product:
+    print(f"Title: {product['title']}")
+    print(f"Price: {product['currency']}{product['price']}")
+    print(f"Brand: {product['brand']}")
+    print(f"Rating: {product['rating']}")
+    print(f"Image URL: {product['img_url']}")
+```
+### Searching for Products
+```python
+from amzpy import AmazonScraper
+# Create scraper for a specific Amazon domain
+scraper = AmazonScraper(country_code="in")
+# Search by query - get up to 2 pages of results
+products = scraper.search_products(query="wireless earbuds", max_pages=2)
+# Display the results
+for i, product in enumerate(products[:5], 1):
+    print(f"{i}. {product['title']} - {product['currency']}{product['price']}")
+```
+## Advanced Usage
+### Configuration Options
+AmzPy offers flexible configuration options that can be set in multiple ways:
+```python
+# Method 1: Set at initialization
+scraper = AmazonScraper(
+    country_code="in",
+    impersonate="chrome119",
+    proxies={"http": "http://user:pass@proxy.example.com:8080"}
+)
+# Method 2: Using string-based configuration
+scraper.config('MAX_RETRIES = 5, REQUEST_TIMEOUT = 30, DELAY_BETWEEN_REQUESTS = (3, 8)')
+# Method 3: Using keyword arguments
+scraper.config(MAX_RETRIES=4, DEFAULT_IMPERSONATE="safari15")
+```
+### Advanced Search Features
+The search functionality can extract rich product data including:
+```python
+# Search for products with 5 pages of results
+products = scraper.search_products(query="men sneakers size 9", max_pages=5)
+# Or search using a pre-constructed URL (e.g., filtered searches)
+url = "https://www.amazon.in/s?i=shoes&rh=n%3A1983518031&s=popularity-rank"
+products = scraper.search_products(search_url=url, max_pages=3)
+# Access comprehensive product data
+for product in products:
+    # Basic information
+    print(f"Title: {product.get('title')}")
+    print(f"ASIN: {product.get('asin')}")
+    print(f"URL: https://www.amazon.{scraper.country_code}/dp/{product.get('asin')}")
+    print(f"Brand: {product.get('brand')}")
+    print(f"Price: {product.get('currency')}{product.get('price')}")
+    # Discount information
+    if 'original_price' in product:
+        print(f"Original Price: {product.get('currency')}{product.get('original_price')}")
+        print(f"Discount: {product.get('discount_percent')}% off")
+    # Ratings and reviews
+    print(f"Rating: {product.get('rating')} / 5.0 ({product.get('reviews_count')} reviews)")
+    # Color variants
+    if 'color_variants' in product:
+        print(f"Available in {len(product['color_variants'])} colors")
+        for variant in product['color_variants']:
+            print(f"  - {variant['name']}: https://www.amazon.{scraper.country_code}/dp/{variant['asin']}")
+    # Additional information
+    print(f"Prime Eligible: {'Yes' if product.get('prime') else 'No'}")
+    if 'delivery_info' in product:
+        print(f"Delivery: {product.get('delivery_info')}")
+    if 'badge' in product:
+        print(f"Badge: {product.get('badge')}")
+```
+### Working with Proxies
+To distribute requests and avoid IP blocks, you can use proxies:
+```python
+# HTTP/HTTPS proxies
+proxies = {
+    "http": "http://user:pass@proxy.example.com:8080",
+    "https": "http://user:pass@proxy.example.com:8080"
+}
+# SOCKS5 proxies
+proxies = {
+    "http": "socks5://user:pass@proxy.example.com:1080",
+    "https": "socks5://user:pass@proxy.example.com:1080"
+}
+scraper = AmazonScraper(proxies=proxies)
+```
+### Browser Impersonation
+AmzPy uses curl_cffi's browser impersonation to mimic real browser requests, significantly improving success rates when scraping Amazon:
+```python
+# Specify a particular browser to impersonate
+scraper = AmazonScraper(impersonate="chrome119")  # Chrome 119
+scraper = AmazonScraper(impersonate="safari15")   # Safari 15
+scraper = AmazonScraper(impersonate="firefox115") # Firefox 115
+```
+## Configuration Reference
+These configuration parameters can be adjusted:
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| MAX_RETRIES | 3 | Maximum number of retry attempts for failed requests |
+| REQUEST_TIMEOUT | 25 | Request timeout in seconds |
+| DELAY_BETWEEN_REQUESTS | (2, 5) | Random delay range between requests (min, max) in seconds |
+| DEFAULT_IMPERSONATE | 'chrome119' | Default browser to impersonate |
+## Requirements
+- Python 3.6+
+- curl_cffi (for enhanced anti-bot protection)
+- beautifulsoup4
+- lxml (for faster HTML parsing)
+- fake_useragent
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## Contributing
+Contributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to contribute to this project.

{amzpy-0.1.1 → amzpy-0.2.0}/amzpy/__init__.py RENAMED Viewed

@@ -3,6 +3,7 @@ AmzPy - Amazon Product Scraper
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 A lightweight Python library for scraping product information from Amazon.
+Now using curl_cffi for better anti-bot protection.
 Basic usage:
     >>> from amzpy import AmazonScraper
@@ -16,6 +17,6 @@ Basic usage:
 from .scraper import AmazonScraper
-__version__ = "0.1.0"
+__version__ = "0.2.0"
 __author__ = "Anil Sardiwal"
 __license__ = "MIT"

amzpy 0.1.1__tar.gz → 0.2.0__tar.gz

amzpy 0.1.1tar.gz → 0.2.0tar.gz