imperium-crawl 2.5.1 → 2.5.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +50 -8
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -17,13 +17,27 @@
|
|
|
17
17
|
|
|
18
18
|
---
|
|
19
19
|
|
|
20
|
-
## What's new in 2.5.
|
|
20
|
+
## What's new in 2.5.1
|
|
21
21
|
|
|
22
|
-
|
|
22
|
+
**Browser-based image extraction overhaul** — 100% coverage on any website:
|
|
23
23
|
|
|
24
|
-
-
|
|
25
|
-
-
|
|
26
|
-
-
|
|
24
|
+
- **Full browser rendering (L3)** for image discovery — JavaScript, lazy-load, shadow DOM, same-origin iframes
|
|
25
|
+
- **7 image sources**: `<img>`, `<picture>`, CSS `background-image`, shadow DOM, JSON-LD, inline scripts, iframes
|
|
26
|
+
- **Precise targeting**: `--selector`, `--index`, `--alt-match`, `--min-width`, `--max-width`
|
|
27
|
+
- **Auto-click** "Load more" / "Gallery" buttons with multilingual keyword matching
|
|
28
|
+
- **Referer injection** fixes 403 errors on image CDN anti-hotlink protection
|
|
29
|
+
- **New `auto_click` action** in `interact` tool for standalone browser automation
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
# Download ALL images from any page (100% coverage)
|
|
33
|
+
imperium-crawl download <url> --images --output ./slike
|
|
34
|
+
|
|
35
|
+
# Target exactly the 3rd image
|
|
36
|
+
imperium-crawl download <url> --images --index 3
|
|
37
|
+
|
|
38
|
+
# Auto-click "Prikaži više" + scan iframes
|
|
39
|
+
imperium-crawl download <url> --images --auto-click --iframe-scan
|
|
40
|
+
```
|
|
27
41
|
|
|
28
42
|
See [CHANGELOG.md](./CHANGELOG.md) for the full release notes.
|
|
29
43
|
|
|
@@ -48,7 +62,7 @@ npm install -g imperium-crawl
|
|
|
48
62
|
**Install from a local tarball** (e.g. pre-release testing):
|
|
49
63
|
|
|
50
64
|
```bash
|
|
51
|
-
npm install -g ./imperium-crawl-2.5.
|
|
65
|
+
npm install -g ./imperium-crawl-2.5.2.tgz
|
|
52
66
|
```
|
|
53
67
|
|
|
54
68
|
> That's it. 33 of 39 tools work with zero API keys. Add optional keys later to unlock search, AI extraction, and CAPTCHA solving.
|
|
@@ -105,6 +119,34 @@ imperium-crawl ai-extract --url https://amazon.com/dp/B0D1XD1ZV3 \
|
|
|
105
119
|
}
|
|
106
120
|
```
|
|
107
121
|
|
|
122
|
+
### Extract ALL images from any page (100% coverage)
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
imperium-crawl download https://www.njuskalo.hr/nekretnine/stan-Zagreb --images --output ./slike
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
Discovered 23 unique images
|
|
130
|
+
✅ njuskalo.hr-001.jpg — 142KB
|
|
131
|
+
✅ njuskalo.hr-002.jpg — 89KB
|
|
132
|
+
✅ njuskalo.hr-003.jpg — 256KB
|
|
133
|
+
→ 23/23 downloaded. Total: 4.2MB
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**Target a specific image:**
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
imperium-crawl download https://olx.ba/artikal/12345 \
|
|
140
|
+
--images --selector "img.gallery-main" --output ./oglas.jpg
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
**Auto-click "Load more" + iframe scan:**
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
imperium-crawl download https://www.leboncoin.fr/ad/12345 \
|
|
147
|
+
--images --auto-click --iframe-scan --limit 50
|
|
148
|
+
```
|
|
149
|
+
|
|
108
150
|
### Batch scrape with resume
|
|
109
151
|
|
|
110
152
|
```bash
|
|
@@ -292,7 +334,7 @@ Second visit to cloudflare.com:
|
|
|
292
334
|
|
|
293
335
|
| Tool | What It Does |
|
|
294
336
|
|------|-------------|
|
|
295
|
-
| **interact** | Browser automation with
|
|
337
|
+
| **interact** | Browser automation with 20 action types (click, type, scroll, wait, screenshot, evaluate, select, hover, press, navigate, drag, upload, storage, cookies, pdf, auth_login, refresh, **auto_click**). Ref targeting via ARIA snapshot, session encryption, action policy, domain filter, network interception, device emulation. **auto_click** finds and clicks "load more" / "gallery" buttons with multilingual keyword matching. |
|
|
296
338
|
| **snapshot** | ARIA-based page snapshot with interactive element refs. Use refs in interact for precise targeting. Annotated screenshots. |
|
|
297
339
|
|
|
298
340
|
### 📱 Social Media (no API key needed)
|
|
@@ -307,7 +349,7 @@ Second visit to cloudflare.com:
|
|
|
307
349
|
|
|
308
350
|
| Tool | What It Does |
|
|
309
351
|
|------|-------------|
|
|
310
|
-
| **download** | Download media files from any URL — images, video, YouTube, TikTok, bulk. Auto-
|
|
352
|
+
| **download** | Download media files from any URL — images, video, YouTube, TikTok, bulk. **v2.5.1**: Browser-based image extraction with 100% coverage (lazy-load, shadow DOM, iframes, JSON-LD, CSS backgrounds). Target specific images via `--selector`, `--index`, `--alt-match`. Auto-click "load more" buttons. Referer injection fixes 403 on CDNs. |
|
|
311
353
|
| **rss** | Fetch and parse RSS/Atom feeds. Filter by date, output as JSON or Markdown. |
|
|
312
354
|
|
|
313
355
|
### 📦 Batch Processing (no API key needed)
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "imperium-crawl",
|
|
3
|
-
"version": "2.5.
|
|
3
|
+
"version": "2.5.2",
|
|
4
4
|
"description": "39-tool open-source CLI for web scraping, PDF extraction, content monitoring, reusable browser flows, RSS aggregation, and custom skills. Zero API keys for core tools.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"bin": {
|