webclaw-hybrid-engine-ln 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,385 @@
1
+ # WebClaw Hybrid Engine
2
+
3
+ **Official repository:** [github.com/ngoclinh15994/webclaw-gateway](https://github.com/ngoclinh15994/webclaw-gateway)
4
+
5
+ **Save up to 90% on LLM scraping costs with a hybrid stealth pipeline.**
6
+ [Crawlee](https://crawlee.dev/) **Cheerio** fast path for static HTML, **Playwright** for SPAs and bot-heavy pages—100% Node.js, Windows-friendly.
7
+
8
+ ---
9
+
10
+ ## Why WebClaw Hybrid Engine?
11
+
12
+ Most scraping stacks force a bad tradeoff:
13
+ - fast but blocked,
14
+ - or robust but expensive and slow.
15
+
16
+ This project uses a **smart hybrid architecture** to get both:
17
+
18
+ 1. **Fast Path (Crawlee Cheerio)**
19
+ HTTP fetch + Cheerio parsing for static pages (≈30s timeout per request).
20
+ 2. **Stealth Path (Crawlee Playwright)**
21
+ Headless Chromium when HTML is too small, blocked, or SPA-like (Crawlee browser pool + your synced cookies).
22
+ 3. **Optional extension bridge**
23
+ Set `WEBCLAW_EXTENSION_WS` for a WebSocket that returns `{ "html": "..." }` if Playwright fails.
24
+ 4. **Token-First Output**
25
+ HTML is purified to Markdown (Readability + Turndown for `article` mode; full-body Turndown for `ecommerce`), then measured with `tiktoken`.
26
+
27
+ Bottom line: you stop paying to send useless HTML noise into GPT/Claude.
28
+
29
+ ---
30
+
31
+ ## Key Features
32
+
33
+ - **Hybrid extractor brain (Crawlee)**
34
+ - Auto-routing between **CheerioCrawler** and **PlaywrightCrawler** (~30s caps per level).
35
+ - **Primary KPI: token reduction**
36
+ - Returns `raw_tokens`, `cleaned_tokens`, `tokens_saved`, `reduction_percentage`.
37
+ - **Markdown purification pipeline**
38
+ - `jsdom` + `@mozilla/readability` + `turndown`.
39
+ - **Anti-bot fallback support**
40
+ - Handles SPA shells, Cloudflare/Captcha-like failures via browser render path.
41
+ - **Cookie sync-ready**
42
+ - Cookie API and `data/cookies.json` support for 1-click sync workflows (including Chrome extension integration).
43
+ - **Zero-Docker Node.js setup**
44
+ - `npm install`, `npm run setup` (`npx playwright install chromium`), `npm start` from repo root.
45
+ - **Built-in dashboard (API monitoring)**
46
+ - UI on `http://localhost:8822`: aggregate stats, paginated scrape history, cookie manager, exclude-URL list, OpenClaw skill installer, bilingual toggle (EN/VI).
47
+ - **SQLite history at scale**
48
+ - Scrape history is persisted in `data/webclaw.db` (no flat `history.json` bottleneck).
49
+ - **User settings (exclude URLs)**
50
+ - `data/settings.json` with `exclude_urls`; blocked URLs return `EXCLUDED_BY_USER` before any crawl (Cheerio/Playwright).
51
+ - **OpenClaw agent integration**
52
+ - One-click auto-install into `~/.openclaw/skills/webclaw_scraper/SKILL.md` (Markdown skill, not a TS tool).
53
+
54
+ ---
55
+
56
+ ## Architecture
57
+
58
+ Native Node.js hybrid runtime (no Docker required):
59
+
60
+ - `services/gateway/src/orchestrator.js`
61
+ - **CheerioCrawler** → **PlaywrightCrawler** → optional **`extensionFallback.js`** (`WEBCLAW_EXTENSION_WS`)
62
+ - `services/gateway/src/extensionFallback.js`
63
+ - Optional WebSocket bridge for a Chrome extension (`{ "type":"scrape","url" }` / `{ "html" }`)
64
+ - `services/gateway/src/tokenMetrics.js`
65
+ - KPI calculation using `tiktoken`
66
+ - `services/gateway/src/db.js`
67
+ - SQLite initialization + migrations + history/stats queries
68
+ - `services/gateway/src/settings.js`
69
+ - Reads/writes `data/settings.json`; scrape path checks exclude list before orchestration
70
+ - `services/gateway/src/templates/openclaw-skill.md`
71
+ - OpenClaw `SKILL.md` template (curl to the local API, `EXCLUDED_BY_USER` fallback rule)
72
+
73
+ ---
74
+
75
+ ## Quick Start
76
+
77
+ ### Requirements
78
+
79
+ - **Node.js 20+** and **npm**
80
+ - **Git** (recommended) — clone from [github.com/ngoclinh15994/webclaw-gateway](https://github.com/ngoclinh15994/webclaw-gateway)
81
+
82
+ ### Install and run (all platforms)
83
+
84
+ **From npm (no clone):**
85
+
86
+ ```bash
87
+ npx webclaw-hybrid-engine-ln
88
+ ```
89
+
90
+ Wait until the terminal prints **Ready on port 8822**, then open **http://localhost:8822**.
91
+
92
+ **From source:** clone the official repo (review the code on GitHub first if you are security-conscious):
93
+
94
+ ```bash
95
+ git clone https://github.com/ngoclinh15994/webclaw-gateway.git
96
+ cd webclaw-gateway
97
+ ```
98
+
99
+ From the **repository root**:
100
+
101
+ ```bash
102
+ npm install
103
+ npm run setup
104
+ npm start
105
+ ```
106
+
107
+ - **`npm run setup`** runs `npx playwright install chromium` (required for the Playwright crawler tier).
108
+
109
+ Dashboard and API: **http://localhost:8822**
110
+
111
+ ### Windows (convenience)
112
+
113
+ ```bat
114
+ Start_WebClaw.bat
115
+ ```
116
+
117
+ Runs `npm install`, `npm run setup`, then `npm start`.
118
+
119
+ ### macOS / Linux (convenience)
120
+
121
+ ```bash
122
+ chmod +x Start_WebClaw.sh
123
+ ./Start_WebClaw.sh
124
+ ```
125
+
126
+ ---
127
+
128
+ ## API Reference
129
+
130
+ ### POST `/api/v1/scrape`
131
+
132
+ Endpoint:
133
+
134
+ ```text
135
+ http://localhost:8822/api/v1/scrape
136
+ ```
137
+
138
+ Request body:
139
+
140
+ ```json
141
+ {
142
+ "url": "https://example.com/article",
143
+ "mode": "auto",
144
+ "extract_mode": "article"
145
+ }
146
+ ```
147
+
148
+ Optional **`extract_mode`**: `"article"` (default, Readability) or `"ecommerce"` (full body, no Readability). Omit or use `"article"` for most pages.
149
+
150
+ Modes:
151
+ - `auto`: Cheerio first, then Playwright (then optional extension WS if configured)
152
+ - `fast_only`: Cheerio only (errors if HTML is too small / SPA-like)
153
+ - `playwright_only`: Playwright only (then extension WS if Playwright fails)
154
+
155
+ If the URL matches any string in `exclude_urls` (substring match, case-insensitive), the API returns immediately:
156
+
157
+ ```json
158
+ {
159
+ "status": "error",
160
+ "message": "EXCLUDED_BY_USER: This URL is blacklisted by user settings. Please use your default browser tool."
161
+ }
162
+ ```
163
+
164
+ Example:
165
+
166
+ ```bash
167
+ curl -X POST "http://localhost:8822/api/v1/scrape" \
168
+ -H "Content-Type: application/json" \
169
+ -d '{"url":"https://example.com/article","mode":"auto"}'
170
+ ```
171
+
172
+ Success response shape:
173
+
174
+ ```json
175
+ {
176
+ "status": "success",
177
+ "extract_mode": "article",
178
+ "engine_used": "crawlee_cheerio",
179
+ "engine_label": "Crawlee Hybrid (Cheerio + Playwright)",
180
+ "data": {
181
+ "title": "Page title",
182
+ "markdown": "# Clean markdown"
183
+ },
184
+ "metrics": {
185
+ "raw_tokens": 12000,
186
+ "cleaned_tokens": 900,
187
+ "tokens_saved": 11100,
188
+ "reduction_percentage": 92.5
189
+ }
190
+ }
191
+ ```
192
+
193
+ ### GET `/api/v1/history`
194
+
195
+ Returns recent history from SQLite, newest first.
196
+
197
+ Query params:
198
+ - `limit` (default `50`, max `200`)
199
+ - `offset` (default `0`)
200
+
201
+ Example:
202
+
203
+ ```bash
204
+ curl "http://localhost:8822/api/v1/history?limit=50&offset=0"
205
+ ```
206
+
207
+ ### GET `/api/v1/stats`
208
+
209
+ Returns aggregate stats from SQLite.
210
+
211
+ Example:
212
+
213
+ ```bash
214
+ curl "http://localhost:8822/api/v1/stats"
215
+ ```
216
+
217
+ Response:
218
+
219
+ ```json
220
+ {
221
+ "status": "success",
222
+ "stats": {
223
+ "total_requests": 1234,
224
+ "total_tokens_saved": 9876543,
225
+ "overall_reduction_percentage": 85.2
226
+ }
227
+ }
228
+ ```
229
+
230
+ ### GET `/api/v1/settings`
231
+
232
+ Returns user settings (`exclude_urls` list).
233
+
234
+ ```bash
235
+ curl "http://localhost:8822/api/v1/settings"
236
+ ```
237
+
238
+ Response:
239
+
240
+ ```json
241
+ {
242
+ "status": "success",
243
+ "settings": {
244
+ "exclude_urls": ["youtube.com", "localhost"]
245
+ }
246
+ }
247
+ ```
248
+
249
+ ### POST `/api/v1/settings`
250
+
251
+ Updates settings. Body must include `exclude_urls` (array of strings).
252
+
253
+ ```bash
254
+ curl -X POST "http://localhost:8822/api/v1/settings" \
255
+ -H "Content-Type: application/json" \
256
+ -d '{"exclude_urls":["youtube.com"]}'
257
+ ```
258
+
259
+ ### POST `/api/v1/integrate/openclaw`
260
+
261
+ Automatically installs the OpenClaw skill file into the local user profile.
262
+
263
+ Behavior:
264
+ - detects `~/.openclaw`
265
+ - creates `~/.openclaw/skills/webclaw_scraper/` if needed
266
+ - writes `SKILL.md` from the OpenClaw skill template
267
+
268
+ Success:
269
+
270
+ ```json
271
+ {
272
+ "status": "success",
273
+ "message": "WebClaw Skill successfully installed into OpenClaw!"
274
+ }
275
+ ```
276
+
277
+ If OpenClaw is not installed (`~/.openclaw` missing), returns `404` with an error message.
278
+
279
+ ---
280
+
281
+ ## Cookie Manager API
282
+
283
+ ### GET `/api/v1/cookies`
284
+ Returns cookie entries from `data/cookies.json`.
285
+
286
+ ### POST `/api/v1/cookies`
287
+ Stores cookie entries used for Playwright fallback.
288
+
289
+ Request example:
290
+
291
+ ```json
292
+ {
293
+ "cookies": [
294
+ {
295
+ "domain": "portal.example.com",
296
+ "cookie_string": "session=abc123; cf_clearance=xyz"
297
+ }
298
+ ]
299
+ }
300
+ ```
301
+
302
+ This format is designed for quick automation and browser-extension sync.
303
+
304
+ ---
305
+
306
+ ## Optional Docker (deprecated)
307
+
308
+ Legacy `Dockerfile` / `docker-compose` samples live under **`.deprecated/docker/`** for reference only. The supported workflow is **native Node** (above).
309
+
310
+ Health check:
311
+
312
+ ```text
313
+ http://localhost:8822/health
314
+ ```
315
+
316
+ ---
317
+
318
+ ## For n8n / Automation Builders
319
+
320
+ Use `POST /api/v1/scrape` as a standard HTTP node:
321
+ - Input: URL + mode
322
+ - Output: clean Markdown + token reduction KPI
323
+ - Branch on `engine_used` if you want analytics by path
324
+
325
+ This is optimized for agents and workflows where token waste directly hits your cloud bill.
326
+
327
+ ---
328
+
329
+ ## OpenClaw Integration
330
+
331
+ This repository ships a ready-to-use OpenClaw **skill** (Markdown `SKILL.md`, installed under `~/.openclaw/skills/webclaw_scraper/`):
332
+
333
+ - Template source: `services/gateway/src/templates/openclaw-skill.md`
334
+ - Skill name: `webclaw-hybrid-engine-ln`
335
+ - Auto-install endpoint: `POST /api/v1/integrate/openclaw`
336
+ - The skill uses `curl` against `http://localhost:8822/api/v1/scrape` and documents fallback when the API returns `EXCLUDED_BY_USER`.
337
+
338
+ Quick flow:
339
+
340
+ 1. Clone and run the engine from the [official repository](https://github.com/ngoclinh15994/webclaw-gateway) (see **Quick Start** above).
341
+ 2. Start WebClaw Hybrid Engine (`Start_WebClaw.bat` or `Start_WebClaw.sh`)
342
+ 3. Click `⚡ Install OpenClaw Skill` in dashboard (or call API)
343
+ 4. Restart OpenClaw so it reloads skills
344
+
345
+ Detailed notes:
346
+ - `openclaw-skill/README.md`
347
+
348
+ ---
349
+
350
+ ## Project Structure
351
+
352
+ ```text
353
+ .
354
+ ├─ data/
355
+ │ ├─ cookies.json
356
+ │ ├─ settings.json
357
+ │ └─ webclaw.db
358
+ ├─ scripts/
359
+ │ └─ setup.js # npx playwright install chromium
360
+ ├─ openclaw-skill/
361
+ │ └─ README.md
362
+ ├─ services/
363
+ │ └─ gateway/
364
+ │ ├─ src/
365
+ │ │ ├─ app.js
366
+ │ │ ├─ extensionFallback.js
367
+ │ │ ├─ orchestrator.js
368
+ │ │ ├─ db.js
369
+ │ │ ├─ settings.js
370
+ │ │ └─ templates/openclaw-skill.md
371
+ │ ├─ ui/
372
+ │ └─ package.json
373
+ ├─ package.json # workspace root (npm start / setup)
374
+ ├─ Start_WebClaw.bat
375
+ └─ Start_WebClaw.sh
376
+ ```
377
+
378
+ ---
379
+
380
+ ## Credits
381
+
382
+ - Crawlee: [Apify Crawlee](https://crawlee.dev/)
383
+ - Markdown stack: Mozilla Readability, Turndown, tiktoken
384
+
385
+ WebClaw Hybrid Engine is a **Node.js** orchestration layer around Crawlee and your local Playwright install.
package/bin/cli.js ADDED
@@ -0,0 +1,17 @@
1
+ #!/usr/bin/env node
2
+ const { execSync } = require("child_process");
3
+ const path = require("path");
4
+
5
+ console.log("🚀 Booting WebClaw Hybrid Engine...");
6
+
7
+ try {
8
+ execSync("npx playwright install chromium", { stdio: "ignore" });
9
+ } catch {
10
+ console.warn(
11
+ "⚠️ Note: Could not auto-install Playwright browser. If scraping fails, manually run: npx playwright install chromium"
12
+ );
13
+ }
14
+
15
+ console.log("✅ Engine dependencies verified. Starting Gateway...");
16
+
17
+ require(path.join(__dirname, "../services/gateway/src/app.js"));
package/package.json ADDED
@@ -0,0 +1,45 @@
1
+ {
2
+ "name": "webclaw-hybrid-engine-ln",
3
+ "version": "1.0.2",
4
+ "description": "WebClaw Hybrid Engine — privacy-first local bridge for OpenClaw (Crawlee + Playwright).",
5
+ "repository": {
6
+ "type": "git",
7
+ "url": "https://github.com/ngoclinh15994/webclaw-gateway.git"
8
+ },
9
+ "bin": {
10
+ "webclaw-hybrid-engine-ln": "./bin/cli.js"
11
+ },
12
+ "files": [
13
+ "bin",
14
+ "services/gateway/src",
15
+ "services/gateway/ui",
16
+ "services/gateway/package.json",
17
+ "scripts/setup.js"
18
+ ],
19
+ "workspaces": [
20
+ "services/gateway"
21
+ ],
22
+ "scripts": {
23
+ "setup": "node scripts/setup.js",
24
+ "start": "node services/gateway/src/app.js",
25
+ "dev": "nodemon services/gateway/src/app.js",
26
+ "webclaw": "node bin/cli.js"
27
+ },
28
+ "engines": {
29
+ "node": ">=20.0.0"
30
+ },
31
+ "dependencies": {
32
+ "@mozilla/readability": "^0.6.0",
33
+ "better-sqlite3": "^12.8.0",
34
+ "crawlee": "^3.15.3",
35
+ "express": "^4.21.2",
36
+ "jsdom": "^26.1.0",
37
+ "playwright": "^1.53.2",
38
+ "tiktoken": "^1.0.22",
39
+ "turndown": "^7.2.0",
40
+ "ws": "^8.18.0"
41
+ },
42
+ "devDependencies": {
43
+ "nodemon": "^3.1.9"
44
+ }
45
+ }
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env node
2
+ const { execSync } = require("child_process");
3
+
4
+ execSync("npx playwright install chromium", { stdio: "inherit" });
@@ -0,0 +1,25 @@
1
+ {
2
+ "name": "webclaw-gateway",
3
+ "version": "0.1.0",
4
+ "description": "WebClaw Hybrid Engine service — stealth web scraping and LLM token reduction (Crawlee + Playwright).",
5
+ "main": "src/app.js",
6
+ "type": "commonjs",
7
+ "scripts": {
8
+ "start": "node src/app.js",
9
+ "dev": "node --watch src/app.js"
10
+ },
11
+ "engines": {
12
+ "node": ">=20.0.0"
13
+ },
14
+ "dependencies": {
15
+ "@mozilla/readability": "^0.6.0",
16
+ "better-sqlite3": "^12.8.0",
17
+ "crawlee": "^3.15.3",
18
+ "express": "^4.21.2",
19
+ "jsdom": "^26.1.0",
20
+ "playwright": "^1.53.2",
21
+ "tiktoken": "^1.0.22",
22
+ "turndown": "^7.2.0",
23
+ "ws": "^8.18.0"
24
+ }
25
+ }
@@ -0,0 +1,199 @@
1
+ process.env.CRAWLEE_LOG_LEVEL = process.env.CRAWLEE_LOG_LEVEL || "WARNING";
2
+
3
+ const express = require("express");
4
+ const path = require("path");
5
+ const os = require("os");
6
+ const fs = require("fs/promises");
7
+ const fsSync = require("fs");
8
+ const { PORT } = require("./config");
9
+ const { runOrchestrator } = require("./orchestrator");
10
+ const { ensureCookiesFile, writeCookies, readCookies, normalizeCookiesForStorage } = require("./cookies");
11
+ const { ensureSettingsFile, readSettings, writeSettings, isExcludedUrl } = require("./settings");
12
+ const {
13
+ runMigrations,
14
+ insertScrapeHistory,
15
+ listScrapeHistory,
16
+ countScrapeHistory,
17
+ getAggregateStats
18
+ } = require("./db");
19
+
20
+ const app = express();
21
+ app.use(express.json({ limit: "1mb" }));
22
+ app.use(express.static(path.resolve(__dirname, "../ui")));
23
+
24
+ app.get("/health", (_, res) => {
25
+ res.json({ status: "ok" });
26
+ });
27
+
28
+ app.post("/api/v1/scrape", async (req, res) => {
29
+ try {
30
+ const { url, mode = "auto", extract_mode = "article" } = req.body || {};
31
+ if (!url) {
32
+ return res.status(400).json({ status: "error", message: "url is required" });
33
+ }
34
+
35
+ const settings = await readSettings();
36
+ if (isExcludedUrl(url, settings.exclude_urls)) {
37
+ return res.status(400).json({
38
+ status: "error",
39
+ message: "EXCLUDED_BY_USER: This URL is blacklisted by user settings. Please use your default browser tool."
40
+ });
41
+ }
42
+
43
+ if (!["auto", "fast_only", "playwright_only"].includes(mode)) {
44
+ return res.status(400).json({ status: "error", message: "invalid mode" });
45
+ }
46
+ if (!["article", "ecommerce"].includes(extract_mode)) {
47
+ return res.status(400).json({ status: "error", message: "invalid extract_mode" });
48
+ }
49
+ const result = await runOrchestrator({ url, mode, extract_mode });
50
+ if (result.status === "success" && result.metrics) {
51
+ insertScrapeHistory({
52
+ url,
53
+ engine_used: result.engine_used || "",
54
+ raw_tokens: Number(result.metrics.raw_tokens || 0),
55
+ cleaned_tokens: Number(result.metrics.cleaned_tokens || 0),
56
+ tokens_saved: Number(result.metrics.tokens_saved || 0),
57
+ reduction_percentage: Number(result.metrics.reduction_percentage || 0)
58
+ });
59
+ }
60
+ return res.json(result);
61
+ } catch (error) {
62
+ return res.status(500).json({
63
+ status: "error",
64
+ message: error.message || "Unexpected error"
65
+ });
66
+ }
67
+ });
68
+
69
+ app.get("/api/v1/history", (req, res) => {
70
+ try {
71
+ const requestedLimit = Number(req.query.limit || 50);
72
+ const requestedOffset = Number(req.query.offset || 0);
73
+ const limit = Math.min(Math.max(Number.isFinite(requestedLimit) ? requestedLimit : 50, 1), 200);
74
+ const offset = Math.max(Number.isFinite(requestedOffset) ? requestedOffset : 0, 0);
75
+
76
+ const items = listScrapeHistory(limit, offset);
77
+ const total = countScrapeHistory();
78
+
79
+ return res.json({
80
+ status: "success",
81
+ items,
82
+ pagination: { limit, offset, total }
83
+ });
84
+ } catch (error) {
85
+ return res.status(500).json({ status: "error", message: error.message });
86
+ }
87
+ });
88
+
89
+ app.get("/api/v1/stats", (_, res) => {
90
+ try {
91
+ const stats = getAggregateStats();
92
+ return res.json({ status: "success", stats });
93
+ } catch (error) {
94
+ return res.status(500).json({ status: "error", message: error.message });
95
+ }
96
+ });
97
+
98
+ app.get("/api/v1/integrate/openclaw/status", (_, res) => {
99
+ try {
100
+ const homeDir = os.homedir();
101
+ const openclawRoot = path.join(homeDir, ".openclaw");
102
+ const skillPath = path.join(openclawRoot, "skills", "webclaw_scraper", "SKILL.md");
103
+ const installed = fsSync.existsSync(skillPath);
104
+ const openclawRootExists = fsSync.existsSync(openclawRoot);
105
+ return res.json({ status: "success", installed, openclawRootExists });
106
+ } catch (error) {
107
+ return res.status(500).json({ status: "error", message: error.message });
108
+ }
109
+ });
110
+
111
+ app.post("/api/v1/integrate/openclaw", async (_, res) => {
112
+ try {
113
+ const homeDir = os.homedir();
114
+ const openclawRoot = path.join(homeDir, ".openclaw");
115
+ const targetDirectory = path.join(openclawRoot, "skills", "webclaw_scraper");
116
+ const targetFile = path.join(targetDirectory, "SKILL.md");
117
+ const templatePath = path.resolve(__dirname, "./templates/openclaw-skill.md");
118
+
119
+ try {
120
+ await fs.access(openclawRoot);
121
+ } catch {
122
+ return res.status(404).json({
123
+ status: "error",
124
+ message: "OpenClaw is not installed on this machine."
125
+ });
126
+ }
127
+
128
+ await fs.mkdir(targetDirectory, { recursive: true });
129
+ const templateContent = await fs.readFile(templatePath, "utf8");
130
+ await fs.writeFile(targetFile, templateContent, "utf8");
131
+
132
+ return res.json({
133
+ status: "success",
134
+ message: "WebClaw Skill successfully installed into OpenClaw!"
135
+ });
136
+ } catch (error) {
137
+ return res.status(500).json({ status: "error", message: error.message });
138
+ }
139
+ });
140
+
141
+ app.get("/api/v1/settings", async (_, res) => {
142
+ try {
143
+ const settings = await readSettings();
144
+ return res.json({ status: "success", settings });
145
+ } catch (error) {
146
+ return res.status(500).json({ status: "error", message: error.message });
147
+ }
148
+ });
149
+
150
+ app.post("/api/v1/settings", async (req, res) => {
151
+ try {
152
+ const payload = req.body || {};
153
+ if (!Array.isArray(payload.exclude_urls)) {
154
+ return res.status(400).json({ status: "error", message: "exclude_urls must be an array" });
155
+ }
156
+ const settings = await writeSettings(payload);
157
+ return res.json({ status: "success", settings });
158
+ } catch (error) {
159
+ return res.status(500).json({ status: "error", message: error.message });
160
+ }
161
+ });
162
+
163
+ app.get("/api/v1/cookies", async (_, res) => {
164
+ try {
165
+ const cookies = await readCookies();
166
+ return res.json({ status: "success", cookies });
167
+ } catch (error) {
168
+ return res.status(500).json({ status: "error", message: error.message });
169
+ }
170
+ });
171
+
172
+ app.post("/api/v1/cookies", async (req, res) => {
173
+ try {
174
+ const { cookies } = req.body || {};
175
+ if (!Array.isArray(cookies)) {
176
+ return res.status(400).json({ status: "error", message: "cookies must be an array" });
177
+ }
178
+ const normalized = normalizeCookiesForStorage(cookies);
179
+ await writeCookies(normalized);
180
+ return res.json({ status: "success", count: normalized.length });
181
+ } catch (error) {
182
+ return res.status(500).json({ status: "error", message: error.message });
183
+ }
184
+ });
185
+
186
+ async function start() {
187
+ await ensureCookiesFile();
188
+ await ensureSettingsFile();
189
+ runMigrations();
190
+ app.listen(PORT, () => {
191
+ console.log(`webclaw-hybrid-engine-ln listening on :${PORT}`);
192
+ console.log(`Ready on port ${PORT}`);
193
+ });
194
+ }
195
+
196
+ start().catch((err) => {
197
+ console.error("Failed to start gateway:", err);
198
+ process.exit(1);
199
+ });