@apmantza/greedysearch-pi 1.7.6 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,143 +1,161 @@
1
- # Changelog
2
-
3
- ## v1.7.6 (2026-04-11)
4
-
5
- ### Fixes
6
- - **Close Gemini synthesis tab** — after synthesis completes, the Gemini tab is now closed instead of merely activated, preventing stale tabs from accumulating across searches.
7
-
8
- ## v1.7.5 (2026-04-10)
9
-
10
- ### Plugin
11
- - **Claude Code plugin** — added `.claude-plugin/plugin.json` and `marketplace.json` so GreedySearch can be installed directly as a Claude Code plugin via `claude plugin install`.
12
- - **Auto-mirror GH Action** — every push to `GreedySearch-pi/master` automatically syncs to `GreedySearch-claude/main`, keeping the Claude plugin up to date.
13
- - **Tightened `skill.md`** removed verbose guidance sections; kept parameters, depth table, and coding_task reference. -72 lines.
14
-
15
- ## v1.7.4 (2026-04-10)
16
-
17
- ### Refactor
18
- - **Shared `waitForCopyButton()`**consolidated duplicate copy-button polling loops from `bing-copilot`, `gemini`, and `coding-task` into a single `waitForCopyButton(tab, selector, { timeout, onPoll })` in `common.mjs`. Gemini's scroll-to-bottom logic passed as `onPoll` callback.
19
- - **Shared `TIMING` constants** — replaced 30+ scattered `setTimeout` magic numbers with named constants (`postNav`, `postNavSlow`, `postClick`, `postType`, `inputPoll`, `copyPoll`, `afterVerify`) in `common.mjs`.
20
- - **`waitForStreamComplete` improvements** — added `minLength` option and graceful last-value fallback; `google-ai` now uses the shared implementation instead of its own copy.
21
- - **Removed dead code** — deleted unused `_getOrReuseBlankTab` and `_getOrOpenEngineTab` from `bin/search.mjs`; removed unused `STREAM_POLL_INTERVAL` and `STREAM_STABLE_ROUNDS` from `coding-task`.
22
-
23
- ### Fixes
24
- - **Synthesis tab regression** — `getOrOpenEngineTab("gemini")` call during synthesis was broken by the dead-code removal; replaced with `openNewTab()`.
25
-
26
- ## v1.7.3 (2026-04-10)
27
-
28
- ### Fixes
29
- - **Force English in Google AI results** — Added `hl=en` query parameter to Google AI Mode search URL so responses are always returned in English, regardless of the user's IP-based region (fixes #1).
30
-
31
- ## v1.7.2 (2026-04-08)
32
-
33
- ### Release
34
- - **Patch release** — version bump and npm package verification for the `bin/` runtime layout (`bin/search.mjs`, `bin/launch.mjs`, `bin/cdp.mjs`, `bin/coding-task.mjs`).
35
-
36
- ## v1.7.1 (2026-04-08)
37
-
38
- ### Performance
39
- - **Bounded source-fetch concurrency** — source fetching now uses a small worker pool (default `2`, configurable via `GREEDY_FETCH_CONCURRENCY`) to reduce burstiness while keeping deep-research fast.
40
-
41
- ### Project structure
42
- - **Runtime scripts moved to `bin/`** — `search.mjs`, `launch.mjs`, `cdp.mjs`, and `coding-task.mjs` now live under `bin/` for a cleaner repository root.
43
- - **Path references updated** — extension runtime, tests, extractor shared utilities, and docs now point to `bin/*` paths.
44
-
45
- ### Packaging & docs
46
- - **Package file list updated** — npm package now includes `bin/` directly instead of root script entries.
47
- - **README simplified** — rewritten into a shorter, concise format with quick install, usage, and layout guidance.
48
-
49
- ## v1.6.5 (2026-04-04)
50
-
51
- ### Security
52
- - **Private URL blocking** — Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
53
-
54
- ### Features
55
- - **GitHub URL rewriting** — GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
56
- - **GitHub repo cloning** — Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
57
- - **Head+tail content trimming** — Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
58
- - **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
59
-
60
- ### Fixes
61
- - **Perplexity clipboard retry** — Added single retry with 2s delay when clipboard extraction fails, improving reliability.
62
-
63
- ## v1.6.4 (2026-04-02)
64
-
65
- ### Fixes
66
- - **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
67
- - **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
68
-
69
- ## v1.6.3 (2026-04-02)
70
-
71
- ### Fixes
72
- - **Debug output removed** — Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
73
-
74
- ## v1.6.2 (2026-04-01)
75
-
76
- ### Fixes
77
- - **Anti-bot detection evasion** — Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
78
-
79
- ## v1.6.1 (2026-03-31)
80
-
81
- ### Features
82
- - **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
83
-
84
- ### Code Quality
85
- - **Major refactoring** — extracted 438 lines from `index.ts` (856 418 lines) into modular formatters:
86
- - `src/formatters/coding.ts` — coding task formatting
87
- - `src/formatters/results.ts` — search and deep research formatting
88
- - `src/formatters/sources.ts` — source utilities (URL, label, consensus, formatting)
89
- - `src/formatters/synthesis.ts` — synthesis rendering
90
- - `src/utils/helpers.ts`shared formatting utilities
91
- - **Complexity reduced** — cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
92
- - **Eliminated code duplication** — removed 6 duplicate blocks, consolidated 4+ single-use helper functions
93
-
94
- ### Documentation
95
- - Clarified `greedy_search` is WEB SEARCH ONLY removed "NOT for codebase search" from tool description (still in skill documentation)
96
-
97
- ## v1.6.0 (2026-03-29)
98
-
99
- ### Breaking Changes (Backward Compatible)
100
- - **Merged deep_research into greedy_search** — new `depth` parameter with three levels:
101
- - `fast`: single engine (~15-30s)
102
- - `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
103
- - `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
104
- - **Simpler mental model** one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
105
- - **Deprecated flags still work** `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
106
- - **deep_research tool aliased** still works, calls `greedy_search` with `depth: "deep"`
107
-
108
- ### Documentation
109
- - Updated README with new `depth` parameter and examples
110
- - Updated skill documentation (SKILL.md) to reflect simplified API
111
-
112
- ## v1.5.1 (2026-03-29)
113
-
114
- - **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
115
-
116
- ## v1.5.0 (2026-03-29)
117
-
118
- ### Features
119
- - **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
120
- - **Chrome targeting hardened** all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
121
- - **Shared utilities** extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
122
- - **Documentation leaner** — skill documentation reduced 61% (180 70 lines) while preserving all decision-making info
123
-
124
- ### Notable
125
- - **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
126
-
127
- ## v1.4.2 (2026-03-25)
128
-
129
- - **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
130
- - **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
131
- - **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
132
-
133
- ## v1.4.1
134
-
135
- - **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
136
-
137
- ## v1.4.0
138
-
139
- - **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
140
- - **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
141
- - **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
142
- - **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
143
- - **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
1
+ # Changelog
2
+
3
+ ## v1.8.0 (2026-04-16)
4
+
5
+ ### Fixes
6
+ - **`cdpAvailable()` missing `baseDir` argument** — two callsites in `index.ts` (session_start handler and coding_task handler) were calling `cdpAvailable()` without the required `baseDir` parameter, producing an incorrect path (`join(undefined, "bin", "cdp.mjs")`). Both now pass `__dir` so the CDP check resolves against the correct package directory.
7
+ - **Duplicated `ENGINES` map removed** — `ENGINES` was defined identically in both `src/search/constants.mjs` and `src/search/engines.mjs`. Now `engines.mjs` imports and re-exports from `constants.mjs`, keeping a single canonical source and eliminating sync drift risk.
8
+ - **`ALL_ENGINES` sync comment** — added a `// Keep in sync with src/search/constants.mjs` comment on the `ALL_ENGINES` tuple in `shared.ts` so future maintainers know where the canonical definition lives.
9
+
10
+ ## v1.7.7 (2026-04-14)
11
+
12
+ ### Fixes
13
+ - **`--deep` flag leaking into queries** `depth: "deep"` was passing `--deep` as a bare flag to `search.mjs`, which didn't recognize it and appended it to the query string. Fixed by passing `--depth deep` instead; also added `--deep` as a recognized flag in `search.mjs` for backward compatibility with the legacy `deep_research` tool.
14
+ - **GitHub fetch always failing** — `git clone` was being `await`-ed on a non-Promise `ChildProcess` object (Node `execFile` is callback-based), so the clone never actually completed and content was always empty. Replaced git clone entirely with GitHub REST API calls: repo info + README + file tree fetched via parallel HTTP requests (~2-5s vs 30-60s, no git dependency). Non-existent repos now correctly return `ok: false`.
15
+ - **`--inline` test false negative** — smoke test was interpolating multiline JSON stdout into a `node -e` string, always producing `PARSE_ERROR`. Fixed to write stdout to a temp file and parse from file.
16
+
17
+ ### Features
18
+ - **Rich source metadata** HTTP-fetched sources now include `publishedTime`, `lastModified`, `byline`, `siteName`, and `lang`. `publishedTime` is extracted from Readability's parser plus a fallback chain of 8 `<meta>` selectors (Open Graph, schema.org, Dublin Core). All fields flow through to the Gemini synthesis prompt. Gemini is instructed to flag sources older than 2 years as potentially stale in caveats.
19
+ - **GitHub Fetch Tests** — smoke/edge/quick test modes now include 4 GitHub-specific tests: root repo API fetch (README + tree), blob file via raw URL, blob via HTTP fetcher pipeline, and graceful failure on non-existent repo.
20
+
21
+ ## v1.7.6 (2026-04-11)
22
+
23
+ ### Fixes
24
+ - **Close Gemini synthesis tab** — after synthesis completes, the Gemini tab is now closed instead of merely activated, preventing stale tabs from accumulating across searches.
25
+
26
+ ## v1.7.5 (2026-04-10)
27
+
28
+ ### Plugin
29
+ - **Claude Code plugin** — added `.claude-plugin/plugin.json` and `marketplace.json` so GreedySearch can be installed directly as a Claude Code plugin via `claude plugin install`.
30
+ - **Auto-mirror GH Action** — every push to `GreedySearch-pi/master` automatically syncs to `GreedySearch-claude/main`, keeping the Claude plugin up to date.
31
+ - **Tightened `skill.md`** — removed verbose guidance sections; kept parameters, depth table, and coding_task reference. -72 lines.
32
+
33
+ ## v1.7.4 (2026-04-10)
34
+
35
+ ### Refactor
36
+ - **Shared `waitForCopyButton()`** — consolidated duplicate copy-button polling loops from `bing-copilot`, `gemini`, and `coding-task` into a single `waitForCopyButton(tab, selector, { timeout, onPoll })` in `common.mjs`. Gemini's scroll-to-bottom logic passed as `onPoll` callback.
37
+ - **Shared `TIMING` constants** — replaced 30+ scattered `setTimeout` magic numbers with named constants (`postNav`, `postNavSlow`, `postClick`, `postType`, `inputPoll`, `copyPoll`, `afterVerify`) in `common.mjs`.
38
+ - **`waitForStreamComplete` improvements** — added `minLength` option and graceful last-value fallback; `google-ai` now uses the shared implementation instead of its own copy.
39
+ - **Removed dead code** — deleted unused `_getOrReuseBlankTab` and `_getOrOpenEngineTab` from `bin/search.mjs`; removed unused `STREAM_POLL_INTERVAL` and `STREAM_STABLE_ROUNDS` from `coding-task`.
40
+
41
+ ### Fixes
42
+ - **Synthesis tab regression** — `getOrOpenEngineTab("gemini")` call during synthesis was broken by the dead-code removal; replaced with `openNewTab()`.
43
+
44
+ ## v1.7.3 (2026-04-10)
45
+
46
+ ### Fixes
47
+ - **Force English in Google AI results** — Added `hl=en` query parameter to Google AI Mode search URL so responses are always returned in English, regardless of the user's IP-based region (fixes #1).
48
+
49
+ ## v1.7.2 (2026-04-08)
50
+
51
+ ### Release
52
+ - **Patch release** — version bump and npm package verification for the `bin/` runtime layout (`bin/search.mjs`, `bin/launch.mjs`, `bin/cdp.mjs`, `bin/coding-task.mjs`).
53
+
54
+ ## v1.7.1 (2026-04-08)
55
+
56
+ ### Performance
57
+ - **Bounded source-fetch concurrency** — source fetching now uses a small worker pool (default `2`, configurable via `GREEDY_FETCH_CONCURRENCY`) to reduce burstiness while keeping deep-research fast.
58
+
59
+ ### Project structure
60
+ - **Runtime scripts moved to `bin/`** — `search.mjs`, `launch.mjs`, `cdp.mjs`, and `coding-task.mjs` now live under `bin/` for a cleaner repository root.
61
+ - **Path references updated** — extension runtime, tests, extractor shared utilities, and docs now point to `bin/*` paths.
62
+
63
+ ### Packaging & docs
64
+ - **Package file list updated** — npm package now includes `bin/` directly instead of root script entries.
65
+ - **README simplified** — rewritten into a shorter, concise format with quick install, usage, and layout guidance.
66
+
67
+ ## v1.6.5 (2026-04-04)
68
+
69
+ ### Security
70
+ - **Private URL blocking** — Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
71
+
72
+ ### Features
73
+ - **GitHub URL rewriting** — GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
74
+ - **GitHub repo cloning** — Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
75
+ - **Head+tail content trimming** — Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
76
+ - **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
77
+
78
+ ### Fixes
79
+ - **Perplexity clipboard retry** — Added single retry with 2s delay when clipboard extraction fails, improving reliability.
80
+
81
+ ## v1.6.4 (2026-04-02)
82
+
83
+ ### Fixes
84
+ - **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
85
+ - **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
86
+
87
+ ## v1.6.3 (2026-04-02)
88
+
89
+ ### Fixes
90
+ - **Debug output removed** Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
91
+
92
+ ## v1.6.2 (2026-04-01)
93
+
94
+ ### Fixes
95
+ - **Anti-bot detection evasion** Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
96
+
97
+ ## v1.6.1 (2026-03-31)
98
+
99
+ ### Features
100
+ - **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
101
+
102
+ ### Code Quality
103
+ - **Major refactoring** extracted 438 lines from `index.ts` (856 418 lines) into modular formatters:
104
+ - `src/formatters/coding.ts`coding task formatting
105
+ - `src/formatters/results.ts`search and deep research formatting
106
+ - `src/formatters/sources.ts` source utilities (URL, label, consensus, formatting)
107
+ - `src/formatters/synthesis.ts` — synthesis rendering
108
+ - `src/utils/helpers.ts` — shared formatting utilities
109
+ - **Complexity reduced** cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
110
+ - **Eliminated code duplication** removed 6 duplicate blocks, consolidated 4+ single-use helper functions
111
+
112
+ ### Documentation
113
+ - Clarified `greedy_search` is WEB SEARCH ONLY — removed "NOT for codebase search" from tool description (still in skill documentation)
114
+
115
+ ## v1.6.0 (2026-03-29)
116
+
117
+ ### Breaking Changes (Backward Compatible)
118
+ - **Merged deep_research into greedy_search** — new `depth` parameter with three levels:
119
+ - `fast`: single engine (~15-30s)
120
+ - `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
121
+ - `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
122
+ - **Simpler mental model** — one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
123
+ - **Deprecated flags still work** — `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
124
+ - **deep_research tool aliased** — still works, calls `greedy_search` with `depth: "deep"`
125
+
126
+ ### Documentation
127
+ - Updated README with new `depth` parameter and examples
128
+ - Updated skill documentation (SKILL.md) to reflect simplified API
129
+
130
+ ## v1.5.1 (2026-03-29)
131
+
132
+ - **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
133
+
134
+ ## v1.5.0 (2026-03-29)
135
+
136
+ ### Features
137
+ - **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
138
+ - **Chrome targeting hardened** — all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
139
+ - **Shared utilities** — extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
140
+ - **Documentation leaner** — skill documentation reduced 61% (180 70 lines) while preserving all decision-making info
141
+
142
+ ### Notable
143
+ - **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
144
+
145
+ ## v1.4.2 (2026-03-25)
146
+
147
+ - **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
148
+ - **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
149
+ - **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
150
+
151
+ ## v1.4.1
152
+
153
+ - **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
154
+
155
+ ## v1.4.0
156
+
157
+ - **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
158
+ - **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
159
+ - **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
160
+ - **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
161
+ - **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
@@ -9,12 +9,16 @@
9
9
  // Output (stdout): JSON { engine, task, code: [{language, code}], explanation, raw }
10
10
  // Errors go to stderr only.
11
11
 
12
- import { existsSync, readFileSync, writeFileSync } from "node:fs";
12
+ import { existsSync, readFileSync, statSync, writeFileSync } from "node:fs";
13
13
  import { tmpdir } from "node:os";
14
+ import { dirname, isAbsolute, join, relative } from "node:path";
14
15
  import { fileURLToPath } from "node:url";
15
16
  import { cdp, injectClipboardInterceptor, waitForCopyButton } from "../extractors/common.mjs";
16
17
  import { dismissConsent, handleVerification } from "../extractors/consent.mjs";
17
18
 
19
+ const MAX_FILE_SIZE = 50 * 1024; // 50KB per file
20
+ const MAX_FILES = 5;
21
+
18
22
  const __dir = fileURLToPath(new URL(".", import.meta.url));
19
23
  const PAGES_CACHE = `${tmpdir().replace(/\\/g, "/")}/cdp-pages.json`;
20
24
 
@@ -309,6 +313,28 @@ async function main() {
309
313
  filePaths.push(args[i + 1]);
310
314
  }
311
315
  }
316
+
317
+ // Validate file paths: limit count, check readability, enforce size
318
+ if (filePaths.length > MAX_FILES) {
319
+ process.stderr.write(`Error: too many --file arguments (max ${MAX_FILES})\n`);
320
+ process.exit(1);
321
+ }
322
+ for (const p of filePaths) {
323
+ if (!existsSync(p)) {
324
+ process.stderr.write(`Error: file not found: ${p}\n`);
325
+ process.exit(1);
326
+ }
327
+ if (isAbsolute(p) && !p.startsWith(process.cwd())) {
328
+ process.stderr.write(`Error: file must be within project directory: ${p}\n`);
329
+ process.exit(1);
330
+ }
331
+ const stat = statSync(p);
332
+ if (stat.size > MAX_FILE_SIZE) {
333
+ process.stderr.write(`Error: file too large (${Math.round(stat.size / 1024)}KB, max ${MAX_FILE_SIZE / 1024}KB): ${p}\n`);
334
+ process.exit(1);
335
+ }
336
+ }
337
+
312
338
  const fileContext =
313
339
  filePaths.length > 0
314
340
  ? filePaths