imperium-crawl 1.1.8 → 1.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -79,6 +79,30 @@ npm i rebrowser-playwright
79
79
  npx playwright install chromium
80
80
  ```
81
81
 
82
+ ### AI Agent Guide (SKILL.md)
83
+
84
+ imperium-crawl ships with [`SKILL.md`](./SKILL.md) — a structured guide that teaches AI agents (Claude, GPT, etc.) how to use all 16 tools effectively. It includes 6 proven workflows, decision trees, error recovery strategies, and advanced patterns like manual skill refinement.
85
+
86
+ **Without SKILL.md**, agents can call tools but won't know which tool to try first, when to fallback, or how to chain tools together optimally.
87
+
88
+ **With SKILL.md**, agents follow battle-tested workflows — readability → scrape → extract fallback chains, auto-detect → manual refinement for skills, search → select → deep-scrape for research, and more.
89
+
90
+ **Two ways to connect SKILL.md to any agent:**
91
+
92
+ | Method | Setup | Works with |
93
+ |--------|-------|-----------|
94
+ | **MCP + SKILL.md** | Add imperium-crawl as MCP server + SKILL.md in agent context | Claude Code, Cursor, Windsurf, any MCP client |
95
+ | **CLI + SKILL.md** | `npm i -g imperium-crawl` + SKILL.md in agent context | **Any agent with bash access** — OpenClaw, ChatGPT, GPT agents, custom agents, anything |
96
+
97
+ The CLI approach is universal — any agent that can run shell commands can use all 16 tools. No MCP required.
98
+
99
+ | AI Agent | How to add SKILL.md |
100
+ |----------|-------------------|
101
+ | **Claude Code** | Copy `SKILL.md` to your project root — Claude Code reads it automatically |
102
+ | **Cursor / Windsurf** | Add `SKILL.md` to project rules or include in system prompt |
103
+ | **OpenClaw / custom agents** | Include SKILL.md content in your system prompt or context window |
104
+ | **ChatGPT / GPT agents** | Paste SKILL.md content into custom instructions |
105
+
82
106
  ---
83
107
 
84
108
  ## CLI Mode
@@ -332,27 +356,22 @@ Every tool tested against production websites with real anti-bot defenses:
332
356
 
333
357
  | Tool | Target | Result |
334
358
  |------|--------|--------|
335
- | 🕷️ **extract** | Amazon (AirPods Pro 2) | Product title, 45,297 reviews, brand extracted |
336
- | 🔓 **discover_apis** | Spotify | **8 hidden APIs** — access token exposed, client ID, dealer servers, analytics |
337
- | 🕷️ **extract** | Stack Overflow | **15 top questions** — #1 with 27,520 votes |
338
- | 📡 **monitor_websocket** | Binance BTC/USDT | **3 WebSocket connections, 23 live messages** — real-time price $69,390 |
339
- | 🔓 **discover_apis** | Airbnb Paris | **34 hidden APIs** — DataDome anti-bot, Google Maps key exposed, internal search/polygon/viewport APIs |
340
- | 🕷️ **extract** | Hacker News | **30 front-page posts** — titles + URLs extracted |
341
- | 🔓 **discover_apis** | Netflix | **5 APIs** — OneTrust consent, geolocation (detected country: Serbia 🇷🇸) |
342
359
  | 📄 **scrape** | BBC News | Full markdown content, stealth level 3 auto-escalation |
343
360
  | 🕸️ **crawl** | Cloudflare Blog | **213K characters** crawled with depth control |
344
361
  | 🗺️ **map** | BBC | Full URL discovery via sitemap + page link extraction |
362
+ | 🕷️ **extract** | Amazon (AirPods Pro 2) | Product title, 45,297 reviews, brand extracted |
345
363
  | 📖 **readability** | Medium article | Clean extraction — title, author, content, publish date |
346
364
  | 📸 **screenshot** | ProductHunt | Captured Cloudflare Turnstile challenge page |
347
- | 🔓 **discover_apis** | weather.com | **11 hidden APIs** — main weather API with exposed key |
348
- | ⚡ **query_api** | jsonplaceholder | Direct JSON API call with stealth headers |
349
365
  | 🔍 **search** | Brave Web Search | Web results with snippets and URLs |
350
366
  | 📰 **news_search** | Brave News Search | News results with freshness ranking |
351
367
  | 🖼️ **image_search** | Brave Image Search | Images with thumbnails and source URLs |
352
368
  | 🎬 **video_search** | Brave Video Search | Video results across platforms |
353
- | 🛠️ **create_skill** | Any page | Auto-detects repeating patterns, generates CSS selectors |
369
+ | 🛠️ **create_skill** | Hacker News | Auto-detected 30 repeating stories with CSS selectors |
354
370
  | ▶️ **run_skill** | Saved skill | Fresh structured data from saved extraction config |
355
371
  | 📋 **list_skills** | — | Lists all saved skills with configurations |
372
+ | 🔓 **discover_apis** | Airbnb Paris | **34 hidden APIs** — DataDome anti-bot, Google Maps key, internal APIs |
373
+ | ⚡ **query_api** | jsonplaceholder | Direct JSON API call with stealth headers |
374
+ | 📡 **monitor_websocket** | Binance BTC/USDT | **3 WebSocket connections, 23 live messages** — BTC price live |
356
375
 
357
376
  > 🏆 **16/16 tools working. 58 hidden APIs discovered. Live crypto feed captured. Zero API keys needed for scraping.**
358
377
 
@@ -1,5 +1,5 @@
1
1
  export declare const PACKAGE_NAME = "imperium-crawl";
2
- export declare const PACKAGE_VERSION = "1.1.8";
2
+ export declare const PACKAGE_VERSION = "1.1.9";
3
3
  export declare const DEFAULT_TIMEOUT_MS = 30000;
4
4
  export declare const DEFAULT_MAX_PAGES = 10;
5
5
  export declare const DEFAULT_MAX_DEPTH = 2;
package/dist/constants.js CHANGED
@@ -1,5 +1,5 @@
1
1
  export const PACKAGE_NAME = "imperium-crawl";
2
- export const PACKAGE_VERSION = "1.1.8";
2
+ export const PACKAGE_VERSION = "1.1.9";
3
3
  export const DEFAULT_TIMEOUT_MS = 30_000;
4
4
  export const DEFAULT_MAX_PAGES = 10;
5
5
  export const DEFAULT_MAX_DEPTH = 2;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "imperium-crawl",
3
- "version": "1.1.8",
3
+ "version": "1.1.9",
4
4
  "description": "Open-source MCP server with Firecrawl-like scraping, crawling, search, and custom skills",
5
5
  "type": "module",
6
6
  "bin": {