pingrabber 0.1.7__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1 @@
1
+ include requirements.txt
@@ -0,0 +1,269 @@
1
+ Metadata-Version: 2.4
2
+ Name: pingrabber
3
+ Version: 0.1.7
4
+ Summary: A lightweight Python library that scrapes high-quality images from any public Pinterest board using the official RSS feed
5
+ Home-page: https://github.com/VVui-blip/pin_grabber
6
+ Author: VVui-blip
7
+ Author-email: VVui-blip <vuv54581@gmail.com>
8
+ Maintainer-email: VVui-blip <vuv54581@gmail.com>
9
+ License: MIT
10
+ Project-URL: Homepage, https://github.com/VVui-blip/image-data-scraping-resource-pack-from-Pinterest-
11
+ Project-URL: Repository, https://github.com/VVui-blip/image-data-scraping-resource-pack-from-Pinterest-.git
12
+ Project-URL: Issues, https://github.com/VVui-blip/image-data-scraping-resource-pack-from-Pinterest-/issues
13
+ Project-URL: Documentation, https://github.com/VVui-blip/image-data-scraping-resource-pack-from-Pinterest-/blob/main/README.md
14
+ Keywords: pinterest,scraper,image-downloader,rss,web-scraping
15
+ Classifier: Development Status :: 4 - Beta
16
+ Classifier: Intended Audience :: Developers
17
+ Classifier: License :: OSI Approved :: MIT License
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.7
20
+ Classifier: Programming Language :: Python :: 3.8
21
+ Classifier: Programming Language :: Python :: 3.9
22
+ Classifier: Programming Language :: Python :: 3.10
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
25
+ Classifier: Topic :: Multimedia :: Graphics :: Capture :: Screen Capture
26
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
27
+ Requires-Python: >=3.7
28
+ Description-Content-Type: text/markdown
29
+ Requires-Dist: requests>=2.25.0
30
+ Requires-Dist: beautifulsoup4>=4.9.0
31
+ Requires-Dist: lxml>=4.6.0
32
+ Provides-Extra: search
33
+ Requires-Dist: ddgs>=1.0.0; extra == "search"
34
+ Provides-Extra: dev
35
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
36
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
37
+ Requires-Dist: black>=22.0.0; extra == "dev"
38
+ Requires-Dist: flake8>=6.0.0; extra == "dev"
39
+ Requires-Dist: mypy>=1.0.0; extra == "dev"
40
+ Dynamic: author
41
+ Dynamic: home-page
42
+ Dynamic: requires-python
43
+
44
+ # PinGrabber
45
+
46
+ [![PyPI version](https://img.shields.io/badge/pypi-v0.1.0-blue)](https://pypi.org/project/pingrabber/)
47
+ [![Python 3.7+](https://img.shields.io/badge/python-3.7+-green.svg)](https://www.python.org/downloads/)
48
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
49
+
50
+ **PinGrabber** is a lightweight Python library that scrapes high‑quality images from any public Pinterest board using the official RSS feed provided by Pinterest. It extracts the original, full‑resolution images and downloads them to your local machine with minimal effort.
51
+
52
+ ![Pinterest Banner](https://www.logo.wine/a/logo/Pinterest/Pinterest-Icon-White-Dark-Background-Logo.wine.svg)
53
+
54
+ > **Important** – Use this tool only with public boards and for personal or educational purposes. Always respect Pinterest’s [Terms of Service](https://www.pinterest.com/terms/) and the copyright of image authors.
55
+
56
+ ---
57
+
58
+ ## Table of Contents
59
+
60
+ - [Features](#features)
61
+ - [Installation](#installation)
62
+ - [Quick Start](#quick-start)
63
+ - [Advanced Usage](#advanced-usage)
64
+ - [How It Works](#how-it-works)
65
+ - [Project Structure](#project-structure)
66
+ - [Dependencies](#dependencies)
67
+ - [License](#license)
68
+
69
+ ---
70
+
71
+ ## Features
72
+
73
+ - Simple shortcut function – download an entire board or a single pin with one line of code.
74
+ - Automatic conversion of thumbnail URLs to original high‑resolution images.
75
+ - Customizable output directory.
76
+ - Both high‑level wrapper and low‑level class for fine‑grained control.
77
+ - Keyword search that returns raw image URLs without downloading.
78
+ - Built with `requests`, `BeautifulSoup`, and `lxml` for fast and reliable parsing.
79
+
80
+ ---
81
+
82
+ ## Installation
83
+
84
+ You can install PinGrabber directly from the GitHub repository:
85
+
86
+ ```bash
87
+ pip install git+https://github.com/VVui-blip/image-data-scraping-resource-pack-from-Pinterest-.git
88
+ ```
89
+
90
+ Alternatively, if you have the source code locally:
91
+
92
+ ```bash
93
+ pip install -r requirements.txt
94
+ pip install .
95
+ ```
96
+
97
+ The required dependencies (requests, beautifulsoup4, lxml) will be installed automatically.
98
+
99
+ Optional dependency for better keyword search
100
+
101
+ For a more robust and stable keyword search, we recommend installing the ddgs package (community‑maintained, formerly duckduckgo-search):
102
+
103
+ ```bash
104
+ pip install ddgs
105
+ ```
106
+
107
+ Or install it together with PinGrabber:
108
+
109
+ ```bash
110
+ pip install .[search]
111
+ ```
112
+
113
+ ddgs provides a more reliable way to query search engines (DuckDuckGo, Bing, Google, etc.) and supports proxy configuration right in the code. Without ddgs, the search() function will fall back to direct requests calls to search engines, which are more prone to blocking.
114
+
115
+ ---
116
+
117
+ ## Quick Start
118
+
119
+ pingrabber.download(url) automatically detects whether the given URL is a board or a single pin and handles it accordingly – you don't need to differentiate manually.
120
+
121
+ Download all images from a board
122
+
123
+ ```python
124
+ import pingrabber
125
+
126
+ pingrabber.download("https://www.pinterest.com/username/boardname/")
127
+ ```
128
+
129
+ Download a single pin
130
+
131
+ ```python
132
+ import pingrabber
133
+
134
+ pingrabber.download("https://www.pinterest.com/pin/119134352618387326/")
135
+ ```
136
+
137
+ To save images to a custom directory:
138
+
139
+ ```python
140
+ import pingrabber
141
+
142
+ pingrabber.download(
143
+ "https://www.pinterest.com/username/boardname/",
144
+ output_dir="my_pinterest_images"
145
+ )
146
+ ```
147
+
148
+ Search images by keyword (returns raw links, no download)
149
+
150
+ ```python
151
+ import pingrabber
152
+
153
+ links = pingrabber.search("nature")
154
+ for url in links:
155
+ print(url)
156
+ ```
157
+
158
+ This function does not download any images – it only returns a list of high‑quality raw image URLs. You can preview them, filter, or download them manually with requests if needed.
159
+
160
+ How it works: search() tries multiple search engines (DuckDuckGo HTML, DuckDuckGo Lite, Bing) using the site:pinterest.com <keyword> query. It rotates user‑agents, retries, and adds delays between attempts to reduce the chance of being blocked. If one engine fails (403/429), it automatically switches to the next. It does not scrape Pinterest’s search page directly, because that page requires JavaScript rendering which requests cannot handle.
161
+
162
+ Customise the number of boards to scan, images per board, retries, and delay:
163
+
164
+ ```python
165
+ links = pingrabber.search(
166
+ "nature",
167
+ max_boards=5,
168
+ max_images_per_board=10,
169
+ max_retries=3,
170
+ delay_seconds=2.5
171
+ )
172
+ ```
173
+
174
+ If search() always returns empty: check the logged WARNING/ERROR messages. If you see 403/429 errors for all fallback engines, your network/IP is likely being rate‑limited by the search engines (common on cloud servers, VPNs, or IPs that have sent many requests). In that case:
175
+
176
+ · Install ddgs if you haven’t (pip install ddgs) – this is the most significant improvement.
177
+ · If ddgs is installed but still returns nothing, try using a proxy directly with the package:
178
+ ```python
179
+ from ddgs import DDGS
180
+ with DDGS(proxy="socks5://127.0.0.1:9050", timeout=15) as ddgs:
181
+ results = ddgs.text("site:pinterest.com nature", max_results=5)
182
+ print(results)
183
+ ```
184
+ If that returns results, you can initialise PinGrabber with a similarly proxied session, or simply use the found board URLs with download().
185
+ · Increase max_retries and delay_seconds (this only affects the fallback method).
186
+ · Try a different network/VPN.
187
+ · Alternatively, use the most reliable approach: find a board manually through your browser and call pingrabber.download(board_url) directly – this does not depend on search engines and is always stable.
188
+
189
+ ---
190
+
191
+ ## Advanced Usage
192
+
193
+ For more control, use the PinGrabber class:
194
+
195
+ ```python
196
+ from pingrabber import PinGrabber
197
+
198
+ grabber = PinGrabber(timeout=30)
199
+
200
+ # Download all original images
201
+ saved_files = grabber.download(
202
+ "https://www.pinterest.com/username/boardname/",
203
+ output_dir="high_res_pins"
204
+ )
205
+
206
+ print(f"Downloaded {len(saved_files)} images")
207
+ ```
208
+
209
+ If you only need the image URLs (without downloading):
210
+
211
+ ```python
212
+ from pingrabber import PinGrabber
213
+
214
+ grabber = PinGrabber()
215
+ rss_url = grabber.build_rss_url("https://www.pinterest.com/username/boardname/")
216
+ rss_content = grabber.fetch_rss(rss_url)
217
+ image_urls = grabber.extract_image_urls(rss_content)
218
+
219
+ for url in image_urls:
220
+ print(url)
221
+ ```
222
+
223
+ ---
224
+
225
+ ## How It Works
226
+
227
+ 1. Board/Pin URL to RSS Feed – The provided URL is converted to an RSS feed URL (appending .rss for boards, or using the pin’s RSS endpoint).
228
+ 2. Fetch RSS – The RSS content is retrieved via a requests GET request.
229
+ 3. Parse and Extract – BeautifulSoup with the lxml parser extracts all <img> tags inside the RSS items.
230
+ 4. Upgrade to Original – Thumbnail URLs (e.g., 236x) are transformed into originals URLs to fetch the highest available quality.
231
+ 5. Download – Each image is downloaded and saved to the specified output directory with a unique filename.
232
+
233
+ ---
234
+
235
+ ## Project Structure
236
+
237
+ ```
238
+ pin_grabber/
239
+ ├── pingrabber/
240
+ │ ├── __init__.py # Package entry point
241
+ │ └── core.py # Main logic (PinGrabber class + helper functions)
242
+ ├── requirements.txt # Python dependencies
243
+ ├── README.md # This file
244
+ ├── setup.py # Packaging configuration
245
+ └── LICENSE # MIT License
246
+ ```
247
+
248
+ ---
249
+
250
+ ## Dependencies
251
+
252
+ · Python 3.7+
253
+ · requests – HTTP requests.
254
+ · beautifulsoup4 – HTML/XML parsing.
255
+ · lxml – Fast XML/HTML parser.
256
+
257
+ All dependencies are listed in requirements.txt and will be installed when using pip install . or the Git install command.
258
+
259
+ ---
260
+
261
+ ## License
262
+
263
+ This project is released under the MIT License. See the LICENSE file for details.
264
+
265
+ Disclaimer: This tool is provided “as is”. You are solely responsible for ensuring that your usage complies with Pinterest’s Terms of Service and applicable copyright laws.
266
+
267
+ ---
268
+
269
+ Built with attention for developers who need a quick, clean Pinterest image scraper.
@@ -0,0 +1,226 @@
1
+ # PinGrabber
2
+
3
+ [![PyPI version](https://img.shields.io/badge/pypi-v0.1.0-blue)](https://pypi.org/project/pingrabber/)
4
+ [![Python 3.7+](https://img.shields.io/badge/python-3.7+-green.svg)](https://www.python.org/downloads/)
5
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
6
+
7
+ **PinGrabber** is a lightweight Python library that scrapes high‑quality images from any public Pinterest board using the official RSS feed provided by Pinterest. It extracts the original, full‑resolution images and downloads them to your local machine with minimal effort.
8
+
9
+ ![Pinterest Banner](https://www.logo.wine/a/logo/Pinterest/Pinterest-Icon-White-Dark-Background-Logo.wine.svg)
10
+
11
+ > **Important** – Use this tool only with public boards and for personal or educational purposes. Always respect Pinterest’s [Terms of Service](https://www.pinterest.com/terms/) and the copyright of image authors.
12
+
13
+ ---
14
+
15
+ ## Table of Contents
16
+
17
+ - [Features](#features)
18
+ - [Installation](#installation)
19
+ - [Quick Start](#quick-start)
20
+ - [Advanced Usage](#advanced-usage)
21
+ - [How It Works](#how-it-works)
22
+ - [Project Structure](#project-structure)
23
+ - [Dependencies](#dependencies)
24
+ - [License](#license)
25
+
26
+ ---
27
+
28
+ ## Features
29
+
30
+ - Simple shortcut function – download an entire board or a single pin with one line of code.
31
+ - Automatic conversion of thumbnail URLs to original high‑resolution images.
32
+ - Customizable output directory.
33
+ - Both high‑level wrapper and low‑level class for fine‑grained control.
34
+ - Keyword search that returns raw image URLs without downloading.
35
+ - Built with `requests`, `BeautifulSoup`, and `lxml` for fast and reliable parsing.
36
+
37
+ ---
38
+
39
+ ## Installation
40
+
41
+ You can install PinGrabber directly from the GitHub repository:
42
+
43
+ ```bash
44
+ pip install git+https://github.com/VVui-blip/image-data-scraping-resource-pack-from-Pinterest-.git
45
+ ```
46
+
47
+ Alternatively, if you have the source code locally:
48
+
49
+ ```bash
50
+ pip install -r requirements.txt
51
+ pip install .
52
+ ```
53
+
54
+ The required dependencies (requests, beautifulsoup4, lxml) will be installed automatically.
55
+
56
+ Optional dependency for better keyword search
57
+
58
+ For a more robust and stable keyword search, we recommend installing the ddgs package (community‑maintained, formerly duckduckgo-search):
59
+
60
+ ```bash
61
+ pip install ddgs
62
+ ```
63
+
64
+ Or install it together with PinGrabber:
65
+
66
+ ```bash
67
+ pip install .[search]
68
+ ```
69
+
70
+ ddgs provides a more reliable way to query search engines (DuckDuckGo, Bing, Google, etc.) and supports proxy configuration right in the code. Without ddgs, the search() function will fall back to direct requests calls to search engines, which are more prone to blocking.
71
+
72
+ ---
73
+
74
+ ## Quick Start
75
+
76
+ pingrabber.download(url) automatically detects whether the given URL is a board or a single pin and handles it accordingly – you don't need to differentiate manually.
77
+
78
+ Download all images from a board
79
+
80
+ ```python
81
+ import pingrabber
82
+
83
+ pingrabber.download("https://www.pinterest.com/username/boardname/")
84
+ ```
85
+
86
+ Download a single pin
87
+
88
+ ```python
89
+ import pingrabber
90
+
91
+ pingrabber.download("https://www.pinterest.com/pin/119134352618387326/")
92
+ ```
93
+
94
+ To save images to a custom directory:
95
+
96
+ ```python
97
+ import pingrabber
98
+
99
+ pingrabber.download(
100
+ "https://www.pinterest.com/username/boardname/",
101
+ output_dir="my_pinterest_images"
102
+ )
103
+ ```
104
+
105
+ Search images by keyword (returns raw links, no download)
106
+
107
+ ```python
108
+ import pingrabber
109
+
110
+ links = pingrabber.search("nature")
111
+ for url in links:
112
+ print(url)
113
+ ```
114
+
115
+ This function does not download any images – it only returns a list of high‑quality raw image URLs. You can preview them, filter, or download them manually with requests if needed.
116
+
117
+ How it works: search() tries multiple search engines (DuckDuckGo HTML, DuckDuckGo Lite, Bing) using the site:pinterest.com <keyword> query. It rotates user‑agents, retries, and adds delays between attempts to reduce the chance of being blocked. If one engine fails (403/429), it automatically switches to the next. It does not scrape Pinterest’s search page directly, because that page requires JavaScript rendering which requests cannot handle.
118
+
119
+ Customise the number of boards to scan, images per board, retries, and delay:
120
+
121
+ ```python
122
+ links = pingrabber.search(
123
+ "nature",
124
+ max_boards=5,
125
+ max_images_per_board=10,
126
+ max_retries=3,
127
+ delay_seconds=2.5
128
+ )
129
+ ```
130
+
131
+ If search() always returns empty: check the logged WARNING/ERROR messages. If you see 403/429 errors for all fallback engines, your network/IP is likely being rate‑limited by the search engines (common on cloud servers, VPNs, or IPs that have sent many requests). In that case:
132
+
133
+ · Install ddgs if you haven’t (pip install ddgs) – this is the most significant improvement.
134
+ · If ddgs is installed but still returns nothing, try using a proxy directly with the package:
135
+ ```python
136
+ from ddgs import DDGS
137
+ with DDGS(proxy="socks5://127.0.0.1:9050", timeout=15) as ddgs:
138
+ results = ddgs.text("site:pinterest.com nature", max_results=5)
139
+ print(results)
140
+ ```
141
+ If that returns results, you can initialise PinGrabber with a similarly proxied session, or simply use the found board URLs with download().
142
+ · Increase max_retries and delay_seconds (this only affects the fallback method).
143
+ · Try a different network/VPN.
144
+ · Alternatively, use the most reliable approach: find a board manually through your browser and call pingrabber.download(board_url) directly – this does not depend on search engines and is always stable.
145
+
146
+ ---
147
+
148
+ ## Advanced Usage
149
+
150
+ For more control, use the PinGrabber class:
151
+
152
+ ```python
153
+ from pingrabber import PinGrabber
154
+
155
+ grabber = PinGrabber(timeout=30)
156
+
157
+ # Download all original images
158
+ saved_files = grabber.download(
159
+ "https://www.pinterest.com/username/boardname/",
160
+ output_dir="high_res_pins"
161
+ )
162
+
163
+ print(f"Downloaded {len(saved_files)} images")
164
+ ```
165
+
166
+ If you only need the image URLs (without downloading):
167
+
168
+ ```python
169
+ from pingrabber import PinGrabber
170
+
171
+ grabber = PinGrabber()
172
+ rss_url = grabber.build_rss_url("https://www.pinterest.com/username/boardname/")
173
+ rss_content = grabber.fetch_rss(rss_url)
174
+ image_urls = grabber.extract_image_urls(rss_content)
175
+
176
+ for url in image_urls:
177
+ print(url)
178
+ ```
179
+
180
+ ---
181
+
182
+ ## How It Works
183
+
184
+ 1. Board/Pin URL to RSS Feed – The provided URL is converted to an RSS feed URL (appending .rss for boards, or using the pin’s RSS endpoint).
185
+ 2. Fetch RSS – The RSS content is retrieved via a requests GET request.
186
+ 3. Parse and Extract – BeautifulSoup with the lxml parser extracts all <img> tags inside the RSS items.
187
+ 4. Upgrade to Original – Thumbnail URLs (e.g., 236x) are transformed into originals URLs to fetch the highest available quality.
188
+ 5. Download – Each image is downloaded and saved to the specified output directory with a unique filename.
189
+
190
+ ---
191
+
192
+ ## Project Structure
193
+
194
+ ```
195
+ pin_grabber/
196
+ ├── pingrabber/
197
+ │ ├── __init__.py # Package entry point
198
+ │ └── core.py # Main logic (PinGrabber class + helper functions)
199
+ ├── requirements.txt # Python dependencies
200
+ ├── README.md # This file
201
+ ├── setup.py # Packaging configuration
202
+ └── LICENSE # MIT License
203
+ ```
204
+
205
+ ---
206
+
207
+ ## Dependencies
208
+
209
+ · Python 3.7+
210
+ · requests – HTTP requests.
211
+ · beautifulsoup4 – HTML/XML parsing.
212
+ · lxml – Fast XML/HTML parser.
213
+
214
+ All dependencies are listed in requirements.txt and will be installed when using pip install . or the Git install command.
215
+
216
+ ---
217
+
218
+ ## License
219
+
220
+ This project is released under the MIT License. See the LICENSE file for details.
221
+
222
+ Disclaimer: This tool is provided “as is”. You are solely responsible for ensuring that your usage complies with Pinterest’s Terms of Service and applicable copyright laws.
223
+
224
+ ---
225
+
226
+ Built with attention for developers who need a quick, clean Pinterest image scraper.
@@ -0,0 +1,24 @@
1
+ """
2
+ pingrabber
3
+ ==========
4
+
5
+ Thư viện đơn giản giúp cào (scrape) ảnh chất lượng cao từ một board
6
+ Pinterest công khai thông qua RSS feed của Pinterest.
7
+
8
+ Cách dùng nhanh:
9
+
10
+ import pingrabber
11
+
12
+ # Tải ảnh từ board hoặc pin đơn lẻ (tự nhận diện loại URL)
13
+ pingrabber.download("https://www.pinterest.com/username/boardname/")
14
+
15
+ # Tìm theo từ khóa, lấy link ảnh raw (không tải file về máy)
16
+ links = pingrabber.search("thiên nhiên")
17
+
18
+ """
19
+
20
+ from .core import PinGrabber, download, search
21
+
22
+ __all__ = ["PinGrabber", "download", "search"]
23
+
24
+ __version__ = "0.1.7"