@apmantza/greedysearch-pi 1.8.4 → 1.8.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,188 +1,347 @@
1
- # Changelog
2
-
3
- ## v1.8.4 (2026-04-27)
4
-
5
- ### Fixes
6
- - **Double-escaped enum params (issue #2)** — `pi-coding-agent` v0.70.2 wraps string enum values in extra quotes (e.g. `"all"` → `"\"all\""`) before validation, causing `greedy_search`, `deep_research`, and `coding_task` to reject every call with a validation error. Fixed by switching `engine`, `depth`, and `mode` parameters from strict `Type.Union([Type.Literal(...)])` to `Type.String()` (so the call passes validation), then stripping the extra quotes in each handler via a shared `stripQuotes()` utility.
7
-
8
- ### Tests
9
- - **Unit tests added** — `node test.mjs unit` runs 13 fast, Chrome-free tests covering `stripQuotes` and param normalization for all affected tools. Included in `quick` and `smoke` modes.
10
- - **CI now runs unit tests** — GitHub Actions workflow runs `node test.mjs unit` after install on all three OS targets (ubuntu, windows, macos).
11
-
12
- ## v1.8.3 (2026-04-24)
13
-
14
- ### Fixes
15
- - **Perplexity extraction fixed** The copy button selector was returning the first matching button ("Copy question") instead of the answer copy button. Changed `.find()` to `.filter().pop()` to get the last matching button, which correctly copies the answer text. Fixes `--full` flag returning only the query text instead of the full answer.
16
-
17
- ### Features
18
- - **Reddit JSON API support** Reddit post URLs now use Reddit's public `.json` API instead of HTML scraping. Gets structured post data + top comments with nesting. Falls back to HTTP fetch if API fails.
19
-
20
- ## v1.8.2 (2026-04-20)
21
-
22
- ### Cross-Platform Testing
23
- - **Node.js test runner (`test.mjs`)** — Added cross-platform test runner that works on Windows, macOS, and Linux without requiring bash. Runs smoke tests, quick tests, and edge case tests.
24
- - **Updated npm scripts** — `npm test` now runs the Node.js test runner (was bash-only). Original bash tests available via `npm run test:bash`.
25
-
26
- ### Project Metadata
27
- - **Added `engines` field** — Package now specifies `node: ">=20.11.0"` requirement for `import.meta.dirname` support.
28
- - **Updated README** — Added Testing section documenting both Node.js and bash test runners, clarified Node.js 20.11.0+ requirement.
29
-
30
- ## v1.8.0 (2026-04-16)
31
-
32
- ### Fixes
33
- - **`cdpAvailable()` missing `baseDir` argument** — two callsites in `index.ts` (session_start handler and coding_task handler) were calling `cdpAvailable()` without the required `baseDir` parameter, producing an incorrect path (`join(undefined, "bin", "cdp.mjs")`). Both now pass `__dir` so the CDP check resolves against the correct package directory.
34
- - **Duplicated `ENGINES` map removed** — `ENGINES` was defined identically in both `src/search/constants.mjs` and `src/search/engines.mjs`. Now `engines.mjs` imports and re-exports from `constants.mjs`, keeping a single canonical source and eliminating sync drift risk.
35
- - **`ALL_ENGINES` sync comment** — added a `// Keep in sync with src/search/constants.mjs` comment on the `ALL_ENGINES` tuple in `shared.ts` so future maintainers know where the canonical definition lives.
36
-
37
- ## v1.7.7 (2026-04-14)
38
-
39
- ### Fixes
40
- - **`--deep` flag leaking into queries** — `depth: "deep"` was passing `--deep` as a bare flag to `search.mjs`, which didn't recognize it and appended it to the query string. Fixed by passing `--depth deep` instead; also added `--deep` as a recognized flag in `search.mjs` for backward compatibility with the legacy `deep_research` tool.
41
- - **GitHub fetch always failing** — `git clone` was being `await`-ed on a non-Promise `ChildProcess` object (Node `execFile` is callback-based), so the clone never actually completed and content was always empty. Replaced git clone entirely with GitHub REST API calls: repo info + README + file tree fetched via parallel HTTP requests (~2-5s vs 30-60s, no git dependency). Non-existent repos now correctly return `ok: false`.
42
- - **`--inline` test false negative** — smoke test was interpolating multiline JSON stdout into a `node -e` string, always producing `PARSE_ERROR`. Fixed to write stdout to a temp file and parse from file.
43
-
44
- ### Features
45
- - **Rich source metadata** — HTTP-fetched sources now include `publishedTime`, `lastModified`, `byline`, `siteName`, and `lang`. `publishedTime` is extracted from Readability's parser plus a fallback chain of 8 `<meta>` selectors (Open Graph, schema.org, Dublin Core). All fields flow through to the Gemini synthesis prompt. Gemini is instructed to flag sources older than 2 years as potentially stale in caveats.
46
- - **GitHub Fetch Tests** smoke/edge/quick test modes now include 4 GitHub-specific tests: root repo API fetch (README + tree), blob file via raw URL, blob via HTTP fetcher pipeline, and graceful failure on non-existent repo.
47
-
48
- ## v1.7.6 (2026-04-11)
49
-
50
- ### Fixes
51
- - **Close Gemini synthesis tab** — after synthesis completes, the Gemini tab is now closed instead of merely activated, preventing stale tabs from accumulating across searches.
52
-
53
- ## v1.7.5 (2026-04-10)
54
-
55
- ### Plugin
56
- - **Claude Code plugin** added `.claude-plugin/plugin.json` and `marketplace.json` so GreedySearch can be installed directly as a Claude Code plugin via `claude plugin install`.
57
- - **Auto-mirror GH Action** — every push to `GreedySearch-pi/master` automatically syncs to `GreedySearch-claude/main`, keeping the Claude plugin up to date.
58
- - **Tightened `skill.md`** — removed verbose guidance sections; kept parameters, depth table, and coding_task reference. -72 lines.
59
-
60
- ## v1.7.4 (2026-04-10)
61
-
62
- ### Refactor
63
- - **Shared `waitForCopyButton()`** — consolidated duplicate copy-button polling loops from `bing-copilot`, `gemini`, and `coding-task` into a single `waitForCopyButton(tab, selector, { timeout, onPoll })` in `common.mjs`. Gemini's scroll-to-bottom logic passed as `onPoll` callback.
64
- - **Shared `TIMING` constants** — replaced 30+ scattered `setTimeout` magic numbers with named constants (`postNav`, `postNavSlow`, `postClick`, `postType`, `inputPoll`, `copyPoll`, `afterVerify`) in `common.mjs`.
65
- - **`waitForStreamComplete` improvements** — added `minLength` option and graceful last-value fallback; `google-ai` now uses the shared implementation instead of its own copy.
66
- - **Removed dead code** — deleted unused `_getOrReuseBlankTab` and `_getOrOpenEngineTab` from `bin/search.mjs`; removed unused `STREAM_POLL_INTERVAL` and `STREAM_STABLE_ROUNDS` from `coding-task`.
67
-
68
- ### Fixes
69
- - **Synthesis tab regression** — `getOrOpenEngineTab("gemini")` call during synthesis was broken by the dead-code removal; replaced with `openNewTab()`.
70
-
71
- ## v1.7.3 (2026-04-10)
72
-
73
- ### Fixes
74
- - **Force English in Google AI results** — Added `hl=en` query parameter to Google AI Mode search URL so responses are always returned in English, regardless of the user's IP-based region (fixes #1).
75
-
76
- ## v1.7.2 (2026-04-08)
77
-
78
- ### Release
79
- - **Patch release** — version bump and npm package verification for the `bin/` runtime layout (`bin/search.mjs`, `bin/launch.mjs`, `bin/cdp.mjs`, `bin/coding-task.mjs`).
80
-
81
- ## v1.7.1 (2026-04-08)
82
-
83
- ### Performance
84
- - **Bounded source-fetch concurrency** — source fetching now uses a small worker pool (default `2`, configurable via `GREEDY_FETCH_CONCURRENCY`) to reduce burstiness while keeping deep-research fast.
85
-
86
- ### Project structure
87
- - **Runtime scripts moved to `bin/`** — `search.mjs`, `launch.mjs`, `cdp.mjs`, and `coding-task.mjs` now live under `bin/` for a cleaner repository root.
88
- - **Path references updated** — extension runtime, tests, extractor shared utilities, and docs now point to `bin/*` paths.
89
-
90
- ### Packaging & docs
91
- - **Package file list updated** — npm package now includes `bin/` directly instead of root script entries.
92
- - **README simplified** — rewritten into a shorter, concise format with quick install, usage, and layout guidance.
93
-
94
- ## v1.6.5 (2026-04-04)
95
-
96
- ### Security
97
- - **Private URL blocking** Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
98
-
99
- ### Features
100
- - **GitHub URL rewriting** GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
101
- - **GitHub repo cloning** Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
102
- - **Head+tail content trimming** Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
103
- - **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
104
-
105
- ### Fixes
106
- - **Perplexity clipboard retry** Added single retry with 2s delay when clipboard extraction fails, improving reliability.
107
-
108
- ## v1.6.4 (2026-04-02)
109
-
110
- ### Fixes
111
- - **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
112
- - **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
113
-
114
- ## v1.6.3 (2026-04-02)
115
-
116
- ### Fixes
117
- - **Debug output removed** — Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
118
-
119
- ## v1.6.2 (2026-04-01)
120
-
121
- ### Fixes
122
- - **Anti-bot detection evasion** — Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
123
-
124
- ## v1.6.1 (2026-03-31)
125
-
126
- ### Features
127
- - **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
128
-
129
- ### Code Quality
130
- - **Major refactoring** — extracted 438 lines from `index.ts` (856 → 418 lines) into modular formatters:
131
- - `src/formatters/coding.ts` — coding task formatting
132
- - `src/formatters/results.ts` — search and deep research formatting
133
- - `src/formatters/sources.ts` — source utilities (URL, label, consensus, formatting)
134
- - `src/formatters/synthesis.ts` — synthesis rendering
135
- - `src/utils/helpers.ts` shared formatting utilities
136
- - **Complexity reduced** — cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
137
- - **Eliminated code duplication** — removed 6 duplicate blocks, consolidated 4+ single-use helper functions
138
-
139
- ### Documentation
140
- - Clarified `greedy_search` is WEB SEARCH ONLY removed "NOT for codebase search" from tool description (still in skill documentation)
141
-
142
- ## v1.6.0 (2026-03-29)
143
-
144
- ### Breaking Changes (Backward Compatible)
145
- - **Merged deep_research into greedy_search** — new `depth` parameter with three levels:
146
- - `fast`: single engine (~15-30s)
147
- - `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
148
- - `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
149
- - **Simpler mental model** — one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
150
- - **Deprecated flags still work** — `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
151
- - **deep_research tool aliased** — still works, calls `greedy_search` with `depth: "deep"`
152
-
153
- ### Documentation
154
- - Updated README with new `depth` parameter and examples
155
- - Updated skill documentation (SKILL.md) to reflect simplified API
156
-
157
- ## v1.5.1 (2026-03-29)
158
-
159
- - **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
160
-
161
- ## v1.5.0 (2026-03-29)
162
-
163
- ### Features
164
- - **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
165
- - **Chrome targeting hardened** — all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
166
- - **Shared utilities** — extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
167
- - **Documentation leaner** — skill documentation reduced 61% (180 → 70 lines) while preserving all decision-making info
168
-
169
- ### Notable
170
- - **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
171
-
172
- ## v1.4.2 (2026-03-25)
173
-
174
- - **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
175
- - **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
176
- - **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
177
-
178
- ## v1.4.1
179
-
180
- - **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
181
-
182
- ## v1.4.0
183
-
184
- - **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
185
- - **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
186
- - **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
187
- - **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
188
- - **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
1
+ # Changelog
2
+
3
+ ## [1.8.6] 2026-05-04
4
+
5
+ ### Bing Copilot: Headless Cloudflare Recovery
6
+
7
+ - **Auto-retry triggers on all Bing failures** — Error pattern expanded from `input not found|verification` to include `clipboard` failures, so any extraction failure triggers the visible Chrome recovery.
8
+ - **Clipboard retry** — `bing-copilot.mjs` now retries clipboard extraction once with a 2s delay, matching the Perplexity extractor pattern.
9
+ - **Cloudflare detection** — If the clipboard is empty and the AI copy button is hidden, the extractor checks the accessibility tree for Cloudflare challenge text and logs it explicitly for faster diagnosis.
10
+ - **DOM extraction fallback** — If clipboard fails and the copy button is missing (headless anti-bot behavior), attempts direct text extraction from the `copilot.fun` blob: iframe chain via CDP targets. Falls through to the visible auto-retry if Cloudflare blocks the iframe.
11
+ - **Investigation confirmed** — In headless mode, Copilot renders the AI response inside a `copilot.fun` → blob: iframe sandbox with a Cloudflare Turnstile challenge. The `copy-ai-message-button` (`data-testid`) is hidden. Content is unreachable from both the main frame JS (cross-origin) and CDP iframe traversal (Cloudflare blocks load). The only viable path is visible Chrome recovery — once cookies are cached in the profile, subsequent headless searches pass transparently.
12
+
13
+ ### Visible Chrome Recovery
14
+
15
+ - **Mode-aware `ensureChrome()`**`src/search/chrome.mjs` now reads a mode marker file (`greedysearch-chrome-mode`) written by `launch.mjs`. When `GREEDY_SEARCH_VISIBLE=1` and Chrome is running headless, it kills and relaunches in visible mode with a forced relaunch guard (always relaunches after kill, even if port wasn't freed).
16
+ - **`launch.mjs` mode check on reuse** — When Chrome is already running and visible is requested (`GREEDY_SEARCH_VISIBLE=1`), checks the mode file. If headless, kills the running instance and launches visible instead of reusing.
17
+ - **Mode file cleanup** — Mode marker file cleaned on `--kill`, ghost cleanup, and idle timeout kill.
18
+ - **`bin/launch-visible.mjs`** Standalone visible Chrome launcher. Nukes any process on port 9222 (by PID file + port scan), launches Chrome without `--headless`, and writes `"visible"` to the mode file. No ghost cleanup complexity, no mode switching — fire-and-forget visible Chrome.
19
+ - **`bin/visible.mjs`** — Convenience wrapper: kills headless, then launches visible (delegates to `launch-visible.mjs`).
20
+ - **Progress notification** — When the auto-retry launches visible Chrome for manual Cloudflare verification, a `PROGRESS:bing:needs-human` line is emitted to stderr. The progress tracker renders `🔓 bing needs manual verification` in the Pi UI.
21
+ - **Idle cleanup preserves mode** — Headless idle timeout cleanup now also removes the mode marker file.
22
+
23
+ ### Security & Robustness
24
+
25
+ - **Chrome process cleanup hardening** — `launch-visible.mjs` uses `taskkill /F /PID X /T` (process tree kill) on Windows to prevent orphan renderer processes. Repeated up to 5s until port 9222 is confirmed free.
26
+ - **Zombie Chrome prevention** — `launch.mjs` and `chrome.mjs` now clean up the mode marker and PID file consistently across all kill paths (--kill, ghost cleanup, idle timeout).
27
+
28
+ ### Added
29
+
30
+ - **`google-search` engine** — plain Google search extractor (locale-agnostic, `textarea[name="q"]`). Returns title/URL/snippet for traditional 10-blue-link results. Aliases: `gs`, `googlesearch`.
31
+
32
+ ### Headless Mode (default)
33
+
34
+ - **Chrome now runs headless by default** — no window, no GUI, purely background. Set `GREEDY_SEARCH_VISIBLE=1` to show the browser window.
35
+ - **Anti-detection stealth** — Patches injected via `Page.addScriptToEvaluateOnNewDocument` (runs before any page JS):
36
+ - `Runtime.enable` / CDP marker deletion (`__REBROWSER_*`, `__nightmare`, `__phantom`, etc.)
37
+ - `navigator.webdriver` → `false`, `navigator.plugins` → realistic list, `navigator.languages` → `['en-US', 'en']`
38
+ - `window.chrome` shim, WebGL vendor → Intel Iris, `hardwareConcurrency` → 8, `deviceMemory` → 8
39
+ - `TrustedTypes` policy, `requestAnimationFrame` keep-alive (prevents headless stall detection)
40
+ - `--disable-blink-features=AutomationControlled`, realistic `--user-agent`, `--window-size=1920,1080`
41
+ - **Human click simulation** — All verification/clicks now use CDP `Input.dispatchMouseEvent` with multi-event `mouseMoved→pressed→released`, ±3px coordinate jitter, and random delays (80–180ms hover, 30–90ms hold). Detection scripts return element selectors instead of clicking in-page; `handleVerification` performs human clicks via `humanClickElement()`/`humanClickXY()`. Applies to Turnstile iframes, reCAPTCHA, Cloudflare challenges, Microsoft auth, Copilot modals, and all generic verify/continue buttons.
42
+ - **Idle auto-cleanup** — Headless Chrome auto-killed after `GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES` (default 5 min) of inactivity. Kills only the PID-tracked instance on port 9222 never touches the main Chrome session. Activity timestamp written at search start and end.
43
+
44
+ ### Performance
45
+
46
+ - **Timeouts cut ~40–50%** across all extractors typical search ~60–90s ~30–45s:
47
+ - `TIMING`: postNav 1500→800ms, postNavSlow 2000→1000ms, postClick 400→250ms, postType 400→250ms, inputPoll 400→300ms, copyPoll 600→400ms, afterVerify 3000→2000ms
48
+ - Defaults: waitForCopyButton 60s→30s, waitForStreamComplete 30s→20s, handleVerification 60s→30s
49
+ - Per-extractor: Google stream 45s→30s, Gemini copyButton 120s→60s + inputDeadline 10s→8s, Perplexity inputDeadline 8s→5s + stream 30s→20s, Bing verification 90s→30s + copyButton 60s→30s
50
+ - Engine process timeout: 90s→60s (180s→120s Gemini)
51
+
52
+ ### Security
53
+
54
+ - **SonarCloud security hotspots fixed** — Two open hotspots resolved:
55
+ - _Weak cryptography (S2245)_ in `extractors/consent.mjs`: replaced `Math.random()` with `crypto.randomInt()` for the mouse-jitter RNG. Not actually security-sensitive (used only for ±3px jitter and timing delays), but compliant now.
56
+ - _PATH injection (S4036)_ in `src/search/chrome.mjs`: `spawn("node", ...)` replaced with `spawn(process.execPath, ...)` so the launcher doesn't rely on the `PATH` environment variable.
57
+ - **Query/prompt leakage prevention** — Queries and synthesis prompts no longer appear in OS process tables. All `spawn()` calls now pipe query/prompt through stdin via `--stdin` flag instead of command-line arguments. Affects `runSearch`, `runExtractor`, `synthesizeWithGemini`, and all 5 extractors (`perplexity`, `bing-copilot`, `google-ai`, `google-search`, `gemini`).
58
+
59
+ ### Visual
60
+
61
+ - **Redesigned banner** — Cleaner SVG layout with pi logo icon, no text, no lens graphic. Gemini Synthesizer pill badge integrated. Three design iterations landed on a minimal icon-only look (`docs/banner.svg`).
62
+
63
+ ### Fixed
64
+
65
+ - **Gemini & Bing copy button race condition** — Both extractors were capturing the user's query instead of the AI's answer. Root cause: `document.querySelector()` returns the first copy button in DOM order, which is the user's echoed message (above the assistant's response). For short queries this triggers instantly. Fixed by: (1) replacing `waitForCopyButton` with `waitForStreamComplete` to ensure the response finishes streaming before copying, and (2) clicking the **last** copy button (`querySelectorAll` + `[length-1]`) instead of the first matching Perplexity's proven pattern. Also added periodic scroll-to-bottom alongside stream wait for Gemini to trigger lazy-loaded content.
66
+ - **Progress tracker shows false ✅ for errors** — `makeProgressTracker` in `shared.ts` completely ignored the `status` parameter, always showing `✅ done` for every engine. Now correctly tracks per-engine status and shows `❌ failed` when an engine errors.
67
+ - **Synthesis echoes engine JSON when engines fail** — When Perplexity/Bing fail, Gemini was echoing the engine summary JSON back as its "answer". `synthesis-runner.mjs` now detects this pattern (engine keys without synthesis fields) and treats it as a parse failure, falling back to individual engine results.
68
+ - **`headless=false` parameter ignored** — The `--headless` flag was never checked by `search.mjs` or `launch.mjs`; they only read `GREEDY_SEARCH_VISIBLE`. `shared.ts` now propagates the visibility preference via the env var when `headless=false` is passed.
69
+
70
+ ### Cloudflare / Verification Recovery
71
+
72
+ - **Auto-recovery from Cloudflare blocks** — When Perplexity (`#ask-input` not found) or Bing (`input not found` / `verification required`) fail in headless mode, `search.mjs` now:
73
+ 1. Detects the Cloudflare/verification error pattern
74
+ 2. Kills headless Chrome, relaunches in visible mode
75
+ 3. Retries the blocked engines — Cloudflare bypasses, cookies stored in Chrome profile
76
+ 4. Kills visible Chrome, relaunches headless
77
+ 5. Continues remaining pipeline (source fetch, synthesis)
78
+ 6. Cookies persist — subsequent headless searches pass transparently
79
+
80
+ ### Removed
81
+
82
+ - **`coding_task` tool removed** — `bin/coding-task.mjs`, `src/formatters/coding.ts`, registration deleted (644 lines).
83
+ - **`deep_research` tool removed** — handler, test, and `formatDeepResearch` + helpers deleted (521 lines). Use `greedy_search` with `depth: "deep"`.
84
+ - **Minimize debug logs** — Removed 9 verbose `[minimize]` console.log statements from launch.mjs.
85
+
86
+ ### Fixes
87
+
88
+ - **Code scanning alerts resolved (5 alerts)** — (1) Added `permissions: contents: read` to `sync-to-webaio.yml` workflow (#14). (2) Fixed backslash escaping in `consent.mjs`'s `humanClickElement` selector injection (#10) — selectors containing backslashes (e.g., `\"`) weren't properly escaped before DOM injection. (3) Fixed same backslash escaping in `google-search.mjs`'s `SEARCH_BOX` selector in 3 locations (#11-13).
89
+ - **`cdp.mjs` `getPages()` filter** — Allows `chrome://newtab/` (headless Chrome default initial tab). Prevents "No Chrome tabs found" on cold start.
90
+
91
+ ### Security
92
+
93
+ - **SonarCloud: Log injection vulnerability (1 alert)** — `bin/launch.mjs` no longer logs the raw WebSocket debugger URL (user-controlled data). Replaced with a static "WebSocket URL received" message to prevent query/URL content from leaking into logs.
94
+
95
+ ### Code Quality
96
+
97
+ - **SonarCloud batch fixes (~52 issues resolved)** across 16 source files:
98
+ - `S7781` — Replaced 18 `String#replace()` calls with `String#replaceAll()` for global replacements (regex → literal where applicable).
99
+ - `S1128` — Removed 15 unused imports (`dirname`, `join`, `relative`, `spawn`, `tmpdir`, `existsSync`, `shouldUseBrowser`, `closeTabs`, `cdp`, `openNewTab`, `closeTab`, `activateTab`, `trimText`).
100
+ - `S7773`Migrated 11 `parseInt`/`parseFloat` calls to `Number.parseInt`/`Number.parseFloat`.
101
+ - `S7780`Wrapped 8 CDP eval templates containing backslash sequences in `String.raw()` to eliminate double-escaping.
102
+ - `S7735`Eliminated 13 negated-condition ternaries by inverting the conditional logic (`!== -1 ? ... : null` `=== -1 ? null : ...`).
103
+
104
+ ### Security Hotspot Review
105
+
106
+ - **SonarCloud: 20 security hotspots reviewed and marked Safe** All outstanding hotspots were assessed and resolved in SonarCloud:
107
+ - `S4721` OS Command Injection (×2) — Inputs are hardcoded (`port=9222`) or parsed from system output and validated via `Number.parseInt`. Not user-controlled.
108
+ - `S5852` Regex ReDoS (×10) — Regexes operate on bounded input with negated char classes or short fixed patterns. No practical denial-of-service risk.
109
+ - `S4036` PATH environment variable (×8) — Local CLI extension spawning package-internal Node scripts. PATH is host-controlled; no untrusted input reaches the command.
110
+
111
+ ### Tooling
112
+
113
+ - **SonarCloud configuration** — Added `sonar-project.properties` with exclusions for `test/**`, `test.mjs`, `test.sh`, `test_unit.mjs`, and `scripts/**` so test-only code does not skew source quality metrics.
114
+
115
+ ## v1.8.5 (2026-04-29)
116
+
117
+ ### Security
118
+
119
+ - **CodeQL: Incomplete URL substring sanitization (6 alerts)** — Replaced loose `includes()` / `endsWith()` checks on raw URL strings with proper hostname parsing in `src/github.mjs`, `src/reddit.mjs`, `src/fetcher.mjs`, and `extractors/bing-copilot.mjs`. Prevents bypasses where arbitrary subdomains could spoof trusted domains (e.g. `evilgithub.com`, `reddit.com.evil.com`).
120
+ - **CodeQL: Resource exhaustion (1 alert)** — `cdp loadall` now bounds `intervalMs` to 100–30,000ms to prevent unbounded `setTimeout` durations from untrusted CLI input.
121
+ - **CodeQL: Missing workflow permissions (2 alerts)** — Added explicit `permissions: contents: read` blocks to `.github/workflows/ci.yml` and `.github/workflows/mirror-to-claude.yml`, limiting `GITHUB_TOKEN` scope to the minimum required.
122
+
123
+ ### Dependencies
124
+
125
+ - **Dependabot security updates** — Bumped `basic-ftp`, `yaml`, `brace-expansion`, `protobufjs`, `fast-xml-parser`, and `@mozilla/readability` to latest patched versions.
126
+
127
+ ### Tests
128
+
129
+ - **GitHub fetch test fixes** — Corrected ES module import paths and added `'all'` mode to test block conditions so cross-platform test runs pass cleanly.
130
+
131
+ ## v1.8.4 (2026-04-27)
132
+
133
+ ### Fixes
134
+
135
+ - **Double-escaped enum params (issue #2)** — `pi-coding-agent` v0.70.2 wraps string enum values in extra quotes (e.g. `"all"` `"\"all\""`) before validation, causing `greedy_search`, `deep_research`, and `coding_task` to reject every call with a validation error. Fixed by switching `engine`, `depth`, and `mode` parameters from strict `Type.Union([Type.Literal(...)])` to `Type.String()` (so the call passes validation), then stripping the extra quotes in each handler via a shared `stripQuotes()` utility.
136
+
137
+ ### Tests
138
+
139
+ - **Unit tests added** — `node test.mjs unit` runs 13 fast, Chrome-free tests covering `stripQuotes` and param normalization for all affected tools. Included in `quick` and `smoke` modes.
140
+ - **CI now runs unit tests**GitHub Actions workflow runs `node test.mjs unit` after install on all three OS targets (ubuntu, windows, macos).
141
+
142
+ ## v1.8.3 (2026-04-24)
143
+
144
+ ### Fixes
145
+
146
+ - **Perplexity extraction fixed** — The copy button selector was returning the first matching button ("Copy question") instead of the answer copy button. Changed `.find()` to `.filter().pop()` to get the last matching button, which correctly copies the answer text. Fixes `--full` flag returning only the query text instead of the full answer.
147
+
148
+ ### Features
149
+
150
+ - **Reddit JSON API support** — Reddit post URLs now use Reddit's public `.json` API instead of HTML scraping. Gets structured post data + top comments with nesting. Falls back to HTTP fetch if API fails.
151
+
152
+ ## v1.8.2 (2026-04-20)
153
+
154
+ ### Cross-Platform Testing
155
+
156
+ - **Node.js test runner (`test.mjs`)** — Added cross-platform test runner that works on Windows, macOS, and Linux without requiring bash. Runs smoke tests, quick tests, and edge case tests.
157
+ - **Updated npm scripts** — `npm test` now runs the Node.js test runner (was bash-only). Original bash tests available via `npm run test:bash`.
158
+
159
+ ### Project Metadata
160
+
161
+ - **Added `engines` field** — Package now specifies `node: ">=20.11.0"` requirement for `import.meta.dirname` support.
162
+ - **Updated README** — Added Testing section documenting both Node.js and bash test runners, clarified Node.js 20.11.0+ requirement.
163
+
164
+ ## v1.8.0 (2026-04-16)
165
+
166
+ ### Fixes
167
+
168
+ - **`cdpAvailable()` missing `baseDir` argument** — two callsites in `index.ts` (session_start handler and coding_task handler) were calling `cdpAvailable()` without the required `baseDir` parameter, producing an incorrect path (`join(undefined, "bin", "cdp.mjs")`). Both now pass `__dir` so the CDP check resolves against the correct package directory.
169
+ - **Duplicated `ENGINES` map removed** — `ENGINES` was defined identically in both `src/search/constants.mjs` and `src/search/engines.mjs`. Now `engines.mjs` imports and re-exports from `constants.mjs`, keeping a single canonical source and eliminating sync drift risk.
170
+ - **`ALL_ENGINES` sync comment** — added a `// Keep in sync with src/search/constants.mjs` comment on the `ALL_ENGINES` tuple in `shared.ts` so future maintainers know where the canonical definition lives.
171
+
172
+ ## v1.7.7 (2026-04-14)
173
+
174
+ ### Fixes
175
+
176
+ - **`--deep` flag leaking into queries** — `depth: "deep"` was passing `--deep` as a bare flag to `search.mjs`, which didn't recognize it and appended it to the query string. Fixed by passing `--depth deep` instead; also added `--deep` as a recognized flag in `search.mjs` for backward compatibility with the legacy `deep_research` tool.
177
+ - **GitHub fetch always failing** — `git clone` was being `await`-ed on a non-Promise `ChildProcess` object (Node `execFile` is callback-based), so the clone never actually completed and content was always empty. Replaced git clone entirely with GitHub REST API calls: repo info + README + file tree fetched via parallel HTTP requests (~2-5s vs 30-60s, no git dependency). Non-existent repos now correctly return `ok: false`.
178
+ - **`--inline` test false negative** — smoke test was interpolating multiline JSON stdout into a `node -e` string, always producing `PARSE_ERROR`. Fixed to write stdout to a temp file and parse from file.
179
+
180
+ ### Features
181
+
182
+ - **Rich source metadata** — HTTP-fetched sources now include `publishedTime`, `lastModified`, `byline`, `siteName`, and `lang`. `publishedTime` is extracted from Readability's parser plus a fallback chain of 8 `<meta>` selectors (Open Graph, schema.org, Dublin Core). All fields flow through to the Gemini synthesis prompt. Gemini is instructed to flag sources older than 2 years as potentially stale in caveats.
183
+ - **GitHub Fetch Tests** — smoke/edge/quick test modes now include 4 GitHub-specific tests: root repo API fetch (README + tree), blob file via raw URL, blob via HTTP fetcher pipeline, and graceful failure on non-existent repo.
184
+
185
+ ## v1.7.6 (2026-04-11)
186
+
187
+ ### Fixes
188
+
189
+ - **Close Gemini synthesis tab** — after synthesis completes, the Gemini tab is now closed instead of merely activated, preventing stale tabs from accumulating across searches.
190
+
191
+ ## v1.7.5 (2026-04-10)
192
+
193
+ ### Plugin
194
+
195
+ - **Claude Code plugin** — added `.claude-plugin/plugin.json` and `marketplace.json` so GreedySearch can be installed directly as a Claude Code plugin via `claude plugin install`.
196
+ - **Auto-mirror GH Action** — every push to `GreedySearch-pi/master` automatically syncs to `GreedySearch-claude/main`, keeping the Claude plugin up to date.
197
+ - **Tightened `skill.md`** — removed verbose guidance sections; kept parameters, depth table, and coding_task reference. -72 lines.
198
+
199
+ ## v1.7.4 (2026-04-10)
200
+
201
+ ### Refactor
202
+
203
+ - **Shared `waitForCopyButton()`** — consolidated duplicate copy-button polling loops from `bing-copilot`, `gemini`, and `coding-task` into a single `waitForCopyButton(tab, selector, { timeout, onPoll })` in `common.mjs`. Gemini's scroll-to-bottom logic passed as `onPoll` callback.
204
+ - **Shared `TIMING` constants** — replaced 30+ scattered `setTimeout` magic numbers with named constants (`postNav`, `postNavSlow`, `postClick`, `postType`, `inputPoll`, `copyPoll`, `afterVerify`) in `common.mjs`.
205
+ - **`waitForStreamComplete` improvements** — added `minLength` option and graceful last-value fallback; `google-ai` now uses the shared implementation instead of its own copy.
206
+ - **Removed dead code** — deleted unused `_getOrReuseBlankTab` and `_getOrOpenEngineTab` from `bin/search.mjs`; removed unused `STREAM_POLL_INTERVAL` and `STREAM_STABLE_ROUNDS` from `coding-task`.
207
+
208
+ ### Fixes
209
+
210
+ - **Synthesis tab regression** — `getOrOpenEngineTab("gemini")` call during synthesis was broken by the dead-code removal; replaced with `openNewTab()`.
211
+
212
+ ## v1.7.3 (2026-04-10)
213
+
214
+ ### Fixes
215
+
216
+ - **Force English in Google AI results** — Added `hl=en` query parameter to Google AI Mode search URL so responses are always returned in English, regardless of the user's IP-based region (fixes #1).
217
+
218
+ ## v1.7.2 (2026-04-08)
219
+
220
+ ### Release
221
+
222
+ - **Patch release** — version bump and npm package verification for the `bin/` runtime layout (`bin/search.mjs`, `bin/launch.mjs`, `bin/cdp.mjs`, `bin/coding-task.mjs`).
223
+
224
+ ## v1.7.1 (2026-04-08)
225
+
226
+ ### Performance
227
+
228
+ - **Bounded source-fetch concurrency** — source fetching now uses a small worker pool (default `2`, configurable via `GREEDY_FETCH_CONCURRENCY`) to reduce burstiness while keeping deep-research fast.
229
+
230
+ ### Project structure
231
+
232
+ - **Runtime scripts moved to `bin/`** — `search.mjs`, `launch.mjs`, `cdp.mjs`, and `coding-task.mjs` now live under `bin/` for a cleaner repository root.
233
+ - **Path references updated** — extension runtime, tests, extractor shared utilities, and docs now point to `bin/*` paths.
234
+
235
+ ### Packaging & docs
236
+
237
+ - **Package file list updated** — npm package now includes `bin/` directly instead of root script entries.
238
+ - **README simplified** — rewritten into a shorter, concise format with quick install, usage, and layout guidance.
239
+
240
+ ## v1.6.5 (2026-04-04)
241
+
242
+ ### Security
243
+
244
+ - **Private URL blocking** — Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
245
+
246
+ ### Features
247
+
248
+ - **GitHub URL rewriting** — GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
249
+ - **GitHub repo cloning** — Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
250
+ - **Head+tail content trimming** — Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
251
+ - **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
252
+
253
+ ### Fixes
254
+
255
+ - **Perplexity clipboard retry** — Added single retry with 2s delay when clipboard extraction fails, improving reliability.
256
+
257
+ ## v1.6.4 (2026-04-02)
258
+
259
+ ### Fixes
260
+
261
+ - **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
262
+ - **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
263
+
264
+ ## v1.6.3 (2026-04-02)
265
+
266
+ ### Fixes
267
+
268
+ - **Debug output removed** — Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
269
+
270
+ ## v1.6.2 (2026-04-01)
271
+
272
+ ### Fixes
273
+
274
+ - **Anti-bot detection evasion** — Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
275
+
276
+ ## v1.6.1 (2026-03-31)
277
+
278
+ ### Features
279
+
280
+ - **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
281
+
282
+ ### Code Quality
283
+
284
+ - **Major refactoring** — extracted 438 lines from `index.ts` (856 → 418 lines) into modular formatters:
285
+ - `src/formatters/coding.ts` — coding task formatting
286
+ - `src/formatters/results.ts` — search and deep research formatting
287
+ - `src/formatters/sources.ts` — source utilities (URL, label, consensus, formatting)
288
+ - `src/formatters/synthesis.ts` — synthesis rendering
289
+ - `src/utils/helpers.ts` — shared formatting utilities
290
+ - **Complexity reduced** — cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
291
+ - **Eliminated code duplication** — removed 6 duplicate blocks, consolidated 4+ single-use helper functions
292
+
293
+ ### Documentation
294
+
295
+ - Clarified `greedy_search` is WEB SEARCH ONLY — removed "NOT for codebase search" from tool description (still in skill documentation)
296
+
297
+ ## v1.6.0 (2026-03-29)
298
+
299
+ ### Breaking Changes (Backward Compatible)
300
+
301
+ - **Merged deep_research into greedy_search** — new `depth` parameter with three levels:
302
+ - `fast`: single engine (~15-30s)
303
+ - `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
304
+ - `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
305
+ - **Simpler mental model** — one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
306
+ - **Deprecated flags still work** — `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
307
+ - **deep_research tool aliased** — still works, calls `greedy_search` with `depth: "deep"`
308
+
309
+ ### Documentation
310
+
311
+ - Updated README with new `depth` parameter and examples
312
+ - Updated skill documentation (SKILL.md) to reflect simplified API
313
+
314
+ ## v1.5.1 (2026-03-29)
315
+
316
+ - **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
317
+
318
+ ## v1.5.0 (2026-03-29)
319
+
320
+ ### Features
321
+
322
+ - **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
323
+ - **Chrome targeting hardened** — all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
324
+ - **Shared utilities** — extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
325
+ - **Documentation leaner** — skill documentation reduced 61% (180 → 70 lines) while preserving all decision-making info
326
+
327
+ ### Notable
328
+
329
+ - **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
330
+
331
+ ## v1.4.2 (2026-03-25)
332
+
333
+ - **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
334
+ - **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
335
+ - **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
336
+
337
+ ## v1.4.1
338
+
339
+ - **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
340
+
341
+ ## v1.4.0
342
+
343
+ - **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
344
+ - **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
345
+ - **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
346
+ - **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
347
+ - **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts