@apmantza/greedysearch-pi 1.8.5 → 1.8.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (45) hide show
  1. package/CHANGELOG.md +175 -1
  2. package/README.md +173 -98
  3. package/bin/cdp-greedy.mjs +62 -0
  4. package/bin/cdp-headless.mjs +16 -0
  5. package/bin/cdp-visible.mjs +16 -0
  6. package/bin/cdp.mjs +1095 -1005
  7. package/bin/gschrome.mjs +252 -0
  8. package/bin/kill-visible.mjs +15 -0
  9. package/bin/launch-visible.mjs +233 -0
  10. package/bin/launch.mjs +97 -46
  11. package/bin/search.mjs +252 -17
  12. package/bin/visible.mjs +39 -0
  13. package/docs/banner.png +0 -0
  14. package/extractors/bing-copilot.mjs +317 -162
  15. package/extractors/common.mjs +529 -291
  16. package/extractors/consent.mjs +294 -273
  17. package/extractors/gemini.mjs +28 -24
  18. package/extractors/google-ai.mjs +14 -11
  19. package/extractors/google-search.mjs +234 -0
  20. package/extractors/perplexity.mjs +41 -20
  21. package/extractors/selectors.mjs +54 -54
  22. package/index.ts +50 -129
  23. package/package.json +6 -4
  24. package/skills/greedy-search/skill.md +38 -29
  25. package/src/fetcher.mjs +13 -7
  26. package/src/formatters/results.ts +36 -115
  27. package/src/github.mjs +254 -254
  28. package/src/reddit.mjs +221 -221
  29. package/src/search/chrome.mjs +245 -19
  30. package/src/search/constants.mjs +9 -4
  31. package/src/search/defaults.mjs +14 -14
  32. package/src/search/engines.mjs +66 -62
  33. package/src/search/fetch-source.mjs +208 -19
  34. package/src/search/output.mjs +60 -59
  35. package/src/search/recovery.mjs +26 -0
  36. package/src/search/sources.mjs +446 -446
  37. package/src/search/synthesis-runner.mjs +41 -9
  38. package/src/search/synthesis.mjs +223 -223
  39. package/src/tools/greedy-search-handler.ts +124 -54
  40. package/src/tools/shared.ts +186 -136
  41. package/src/types.ts +103 -103
  42. package/test.mjs +534 -435
  43. package/bin/coding-task.mjs +0 -396
  44. package/src/formatters/coding.ts +0 -68
  45. package/src/tools/deep-research-handler.ts +0 -37
package/CHANGELOG.md CHANGED
@@ -1,6 +1,180 @@
1
1
  # Changelog
2
2
 
3
- ## Unreleased
3
+ ## [Unreleased]
4
+
5
+ ### Fixed
6
+
7
+ - **SonarCloud minor vulnerability false positives** — Confirmed both remaining issues are false positives (internal diagnostic logging in `bin/gschrome.mjs` and test debug output in `test/fetcher-cli.mjs`). Verified via full smoke test suite: all 33 unit tests pass, all 4 engines (Perplexity, Bing, Google, Gemini) return results at all depths (fast/standard/deep), CDP safety wrappers correctly enforce mode boundaries.
8
+
9
+ - **SonarCloud security hotspots** (re-verified) — All previously fixed hotspots remain resolved: replaced `spawn("node", ...)` with `spawn(process.execPath, ...)`, replaced `Math.random()` with `crypto.randomInt()`, 19 remaining hotspots confirmed as false positives (hardcoded `execSync` commands, simple regex patterns).
10
+
11
+ ### Fixed
12
+
13
+ - **Headless→visible mode switching** (`src/search/chrome.mjs`) — `ensureChrome()` only handled the case where visible was requested but headless Chrome was running. When headless was requested (the default) but visible Chrome was running, it silently kept visible mode — causing env var mismatches that broke extractors like Perplexity. Now properly detects both directions and kills/relaunches in the correct mode.
14
+
15
+ - **SonarCloud security hotspots** — Replaced `spawn("node", ...)` with `spawn(process.execPath, ...)` in cdp wrapper, `runExtractor`, `synthesizeWithGemini`, and test helper to prevent PATH-based binary substitution. Replaced `Math.random()` with `crypto.randomInt()` in `jitter()` for non-security-sensitive timing variance. Remaining 19 hotspots are verified false positives (hardcoded `execSync` commands, simple regex patterns).
16
+ - **Bing stealth not active on page load** (`src/search/chrome.mjs`) — `injectHeadlessStealth` was fire-and-forget (`.catch(() => {})`). The CDP `Page.addScriptToEvaluateOnNewDocument` command is async — extractors often navigated to Copilot before stealth registered. Cloudflare saw headless fingerprints and blocked the page. Fixed by awaiting stealth for Bing tabs. Perplexity/Google kept fire-and-forget since Perplexity's anti-bot detects the awaited patches.
17
+ - **Bing copy button handler not hydrated** (`extractors/bing-copilot.mjs`) — Copilot's React copy button exists in the DOM before its click handler is bound. `clickCopyAndPollClipboard` clicked too early → clipboard interceptor empty → 13s wasted polling + DOM fallback. Added 800ms hydration delay after `waitForCopyButton`. Solo Bing went from 37-73s → 16s.
18
+ - **Manual verification blocked synthesis** (`bin/search.mjs`) — When Bing/Perplexity needed manual verification after visible recovery, `search.mjs` returned early with `synthesize: false`, discarding all engine results. Now synthesis continues with whichever engines succeeded. Visible Chrome stays open for the user.
19
+ - **Source-fetch crash after visible→headless recovery** (`src/search/fetch-source.mjs`) — After recovery killed/restarted Chrome, stale CDP tab references in parallel source-fetch workers caused "No target matching prefix" crashes. Workers now catch `fetchSourceContent` errors; `fetchSourceContentBrowser` returns error objects instead of throwing.
20
+ - **Progress tracker "🔄 synthesizing" hang** (`src/tools/shared.ts`) — When synthesis was skipped (manual verification), the progress tracker showed "🔄 synthesizing" forever because no `PROGRESS:synthesis:done` was ever emitted. Now handles `done`/`error`/`skipped` synthesis states.
21
+ - **Gemini synthesis eval timeout** (`bin/cdp.mjs`) — CDP daemon `TIMEOUT` was 30s, but `waitForStreamComplete` uses a single `Runtime.evaluate` call that can run 60-90s for long synthesis prompts. Increased to 90s.
22
+
23
+ ### Performance
24
+
25
+ - **Reduced timeouts across all extractors** — Navigation: 35s→20s, verification retry: 30s→10s (Bing/Perplexity), 60s→10s (Gemini/Google), post-nav settle: 1200ms→600ms (Bing), 1200ms→600ms (Gemini). Turnstile never clears in headless, so 30s of retry loops were pure waste.
26
+ - **Hard per-engine timeouts raised** (`bin/search.mjs`) — Fast: 22s→30s, Standard/Deep: 35s→55s. CDP contention from 3 parallel extractors adds overhead that the old budgets didn't account for.
27
+ - **Tab creation split: Bing gets blank+stealth, others pre-seeded** (`src/search/chrome.mjs`) — `Target.createTarget` navigation is less detectable than CDP `Page.navigate` for Perplexity/Google. Bing needs blank tab + awaited stealth to hide headless fingerprints from Copilot's Cloudflare.
28
+
29
+ ### Performance
30
+
31
+ - **Hard per-engine timeouts** (`bin/search.mjs`) — Fast mode: 22s per engine. Standard/deep: 35s per engine. Slow engines are skipped instead of stalling the whole batch. Previously a single slow engine could push `all` searches to 60–90s.
32
+ - **Parallel tab creation** (`bin/search.mjs`, `src/search/chrome.mjs`) — All engine tabs open simultaneously instead of sequential 300ms staggered delays. Tabs are pre-seeded to each engine's homepage so extractors skip redundant initial navigation.
33
+ - **Reduced settle delays** (`extractors/common.mjs`) — `postNav` 1500→800ms, `postNavSlow` 2000→1200ms, `postClick`/`postType` 400→300ms, `afterVerify` 3000→1500ms. Safe because tabs now load the target domain before the extractor even starts.
34
+ - **Higher source-fetch concurrency** (`src/search/constants.mjs`) — Default `GREEDY_FETCH_CONCURRENCY` raised from 2 → 4.
35
+ - **Faster HTTP timeouts** (`src/search/fetch-source.mjs`) — HTTP fetch timeout 15s → 10s, browser fallback settle 1500ms → 800ms.
36
+ - **Non-blocking cleanup** (`bin/search.mjs`) — Removed the 1500ms hard sleep at process exit; `minimizeChrome` now fire-and-forget.
37
+ - **Domain-aware navigation skip** (`extractors/bing-copilot.mjs`, `extractors/perplexity.mjs`, `extractors/google-ai.mjs`) — When a tab is already on the engine's domain (pre-seeded by orchestrator), skip the redundant `cdp nav` call and settle delay.
38
+ - **Fast mode keeps short engine budgets** (`bin/search.mjs`) — Fast mode still uses 22s per-engine extraction timeouts and skips source fetch/synthesis work. Verification recovery can now run in fast mode when Bing/Perplexity are blocked, because returning no result is worse than the retry cost.
39
+
40
+ ### Anti-Bot Detection Hardening (Anti-CDP Evasion)
41
+
42
+ - **Runtime.enable evasion** (`bin/cdp.mjs`) — The primary CDP detection vector (Cloudflare/DataDome watch for `Runtime.consoleAPICalled` timing) has been eliminated. All `Runtime.evaluate` calls now use an explicit `contextId` captured via brief `Runtime.enable` → `Runtime.disable` at daemon startup (~100ms window). No persistent Runtime domain enable for the session. See: rebrowser.net / DataDome research.
43
+ - **Stale PID / ghost Chrome cleanup** (`src/search/chrome.mjs`) — `killChrome()` now uses port-based process detection via `netstat`/`lsof` instead of relying solely on the PID file. Handles ghost processes that hold port 9222 after the tracked PID dies. Old `killHeadlessChrome` kept as backward-compat alias.
44
+ - **Idle cleanup for both modes** (`src/search/chrome.mjs`) — `checkAndKillIdle()` no longer gates on `GREEDY_SEARCH_HEADLESS=1`. Both headless and visible Chrome auto-kill after idle timeout. Disable with `GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES=0`.
45
+ - **`--disable-blink-features=AutomationControlled` for visible mode** (`bin/launch.mjs`, `bin/gschrome.mjs`) — Previously headless-only. The flag and `--window-size` now apply to both modes, suppressing `navigator.webdriver` in visible Chrome too.
46
+ - **Stealth injection for visible mode** (`src/search/chrome.mjs`, `extractors/common.mjs`) — Canvas noise, plugin spoofing, `window.chrome.runtime`, and console safening now inject on both headless and visible tabs.
47
+ - **Client Hints consistency** (`src/fetcher.mjs`) — Added `Sec-CH-UA`, `Sec-CH-UA-Mobile`, `Sec-CH-UA-Platform` headers to `DEFAULT_HEADERS`, matching the Chrome 122 user-agent. Inconsistency between UA and Client Hints is a strong bot signal.
48
+ - **Perplexity Cloudflare verification** (`extractors/perplexity.mjs`) — Added `handleVerification` call after navigation. Perplexity was the only engine missing Cloudflare challenge handling (Bing, Gemini, Google AI already had it).
49
+ - **Chrome TLS fetch fallback** (`src/search/fetch-source.mjs`) — New `fetchSourceViaChrome()` uses `Network.loadNetworkResource` (Chrome 124+) to fetch with authentic Chrome TLS/JA3+HTTP/2 fingerprints when Node.js HTTP fails. Zero navigation overhead.
50
+
51
+ ### Added
52
+
53
+ - **`visible` / `alwaysVisible` search options** (`src/tools/greedy-search-handler.ts`, `src/tools/shared.ts`, `bin/search.mjs`) — Agents can now force visible Chrome per call with `visible: true`, `alwaysVisible: true`, or `headless: false`. CLI aliases: `--visible`, `--always-visible`. Global env: `GREEDY_SEARCH_ALWAYS_VISIBLE=1`.
54
+ - **GreedySearch Chrome commands for Pi** (`index.ts`) — Added `/greedy-visible`, `/greedy-status`, and `/greedy-kill` so users do not need to know package install paths to manage the dedicated Chrome instance.
55
+ - **Safe CDP wrappers** (`bin/cdp-greedy.mjs`, `bin/cdp-visible.mjs`, `bin/cdp-headless.mjs`) — Agents can inspect only the dedicated GreedySearch Chrome profile. The wrappers always set `CDP_PROFILE_DIR` and mode-specific wrappers refuse to attach to the wrong mode, preventing accidental main-Chrome pollution.
56
+ - **`bin/kill-visible.mjs`** — Strong visible/port cleanup helper backed by `launch-visible.mjs`'s PID + port nuke path.
57
+ - **`bin/gschrome.mjs`** — Standalone Chrome lifecycle manager: `launch-headless`, `launch-visible`, `kill`, `status`. Port-based PID detection, forces mode switches, writes `DevToolsActivePort` for CDP.
58
+
59
+ ### Fixed
60
+
61
+ - **Single-engine visible recovery** (`bin/search.mjs`) — `engine: "bing"` and `engine: "perplexity"` now perform the same headless → visible retry as `engine: "all"` when blocked by Cloudflare, captcha, timeout, missing input, or clipboard failures.
62
+ - **Bing visible clipboard race** (`extractors/bing-copilot.mjs`) — Waits for the assistant copy button, polls clipboard interception after click, retries copy/poll, then falls back to visible DOM text. Fixes cases where Copilot visibly answered but the extractor returned `Clipboard interceptor returned empty text`.
63
+ - **Manual verification flow** (`bin/search.mjs`, `src/formatters/results.ts`) — If visible retry reaches a human verification challenge, GreedySearch leaves visible Chrome open and returns a clear “solve verification, then rerun” result instead of killing the browser and returning no results.
64
+ - **Visible/headless process cleanup** (`bin/launch.mjs`, `bin/visible.mjs`) — Fixed Windows `taskkill` arguments, added port fallback cleanup for `--kill`, and made `visible.mjs --kill` delegate to the stronger `launch-visible.mjs` cleanup path.
65
+ - **README install paths and skill guidance** (`README.md`, `skills/greedy-search/skill.md`) — Corrected Pi git/npm package paths, documented visible mode and safe CDP wrappers, and removed stale `coding_task` guidance from the agent skill.
66
+
67
+ ## [1.8.6] — 2026-05-04
68
+
69
+ ### Bing Copilot: Headless Cloudflare Recovery
70
+
71
+ - **Auto-retry triggers on all Bing failures** — Error pattern expanded from `input not found|verification` to include `clipboard` failures, so any extraction failure triggers the visible Chrome recovery.
72
+ - **Clipboard retry** — `bing-copilot.mjs` now retries clipboard extraction once with a 2s delay, matching the Perplexity extractor pattern.
73
+ - **Cloudflare detection** — If the clipboard is empty and the AI copy button is hidden, the extractor checks the accessibility tree for Cloudflare challenge text and logs it explicitly for faster diagnosis.
74
+ - **DOM extraction fallback** — If clipboard fails and the copy button is missing (headless anti-bot behavior), attempts direct text extraction from the `copilot.fun` → blob: iframe chain via CDP targets. Falls through to the visible auto-retry if Cloudflare blocks the iframe.
75
+ - **Investigation confirmed** — In headless mode, Copilot renders the AI response inside a `copilot.fun` → blob: iframe sandbox with a Cloudflare Turnstile challenge. The `copy-ai-message-button` (`data-testid`) is hidden. Content is unreachable from both the main frame JS (cross-origin) and CDP iframe traversal (Cloudflare blocks load). The only viable path is visible Chrome recovery — once cookies are cached in the profile, subsequent headless searches pass transparently.
76
+
77
+ ### Visible Chrome Recovery
78
+
79
+ - **Mode-aware `ensureChrome()`** — `src/search/chrome.mjs` now reads a mode marker file (`greedysearch-chrome-mode`) written by `launch.mjs`. When `GREEDY_SEARCH_VISIBLE=1` and Chrome is running headless, it kills and relaunches in visible mode with a forced relaunch guard (always relaunches after kill, even if port wasn't freed).
80
+ - **`launch.mjs` mode check on reuse** — When Chrome is already running and visible is requested (`GREEDY_SEARCH_VISIBLE=1`), checks the mode file. If headless, kills the running instance and launches visible instead of reusing.
81
+ - **Mode file cleanup** — Mode marker file cleaned on `--kill`, ghost cleanup, and idle timeout kill.
82
+ - **`bin/launch-visible.mjs`** — Standalone visible Chrome launcher. Nukes any process on port 9222 (by PID file + port scan), launches Chrome without `--headless`, and writes `"visible"` to the mode file. No ghost cleanup complexity, no mode switching — fire-and-forget visible Chrome.
83
+ - **`bin/visible.mjs`** — Convenience wrapper: kills headless, then launches visible (delegates to `launch-visible.mjs`).
84
+ - **Progress notification** — When the auto-retry launches visible Chrome for manual Cloudflare verification, a `PROGRESS:bing:needs-human` line is emitted to stderr. The progress tracker renders `🔓 bing needs manual verification` in the Pi UI.
85
+ - **Idle cleanup preserves mode** — Headless idle timeout cleanup now also removes the mode marker file.
86
+
87
+ ### Security & Robustness
88
+
89
+ - **Chrome process cleanup hardening** — `launch-visible.mjs` uses `taskkill /F /PID X /T` (process tree kill) on Windows to prevent orphan renderer processes. Repeated up to 5s until port 9222 is confirmed free.
90
+ - **Zombie Chrome prevention** — `launch.mjs` and `chrome.mjs` now clean up the mode marker and PID file consistently across all kill paths (--kill, ghost cleanup, idle timeout).
91
+
92
+ ### Added
93
+
94
+ - **`google-search` engine** — plain Google search extractor (locale-agnostic, `textarea[name="q"]`). Returns title/URL/snippet for traditional 10-blue-link results. Aliases: `gs`, `googlesearch`.
95
+
96
+ ### Headless Mode (default)
97
+
98
+ - **Chrome now runs headless by default** — no window, no GUI, purely background. Set `GREEDY_SEARCH_VISIBLE=1` to show the browser window.
99
+ - **Anti-detection stealth** — Patches injected via `Page.addScriptToEvaluateOnNewDocument` (runs before any page JS):
100
+ - `Runtime.enable` / CDP marker deletion (`__REBROWSER_*`, `__nightmare`, `__phantom`, etc.)
101
+ - `navigator.webdriver` → `false`, `navigator.plugins` → realistic list, `navigator.languages` → `['en-US', 'en']`
102
+ - `window.chrome` shim, WebGL vendor → Intel Iris, `hardwareConcurrency` → 8, `deviceMemory` → 8
103
+ - `TrustedTypes` policy, `requestAnimationFrame` keep-alive (prevents headless stall detection)
104
+ - `--disable-blink-features=AutomationControlled`, realistic `--user-agent`, `--window-size=1920,1080`
105
+ - **Human click simulation** — All verification/clicks now use CDP `Input.dispatchMouseEvent` with multi-event `mouseMoved→pressed→released`, ±3px coordinate jitter, and random delays (80–180ms hover, 30–90ms hold). Detection scripts return element selectors instead of clicking in-page; `handleVerification` performs human clicks via `humanClickElement()`/`humanClickXY()`. Applies to Turnstile iframes, reCAPTCHA, Cloudflare challenges, Microsoft auth, Copilot modals, and all generic verify/continue buttons.
106
+ - **Idle auto-cleanup** — Headless Chrome auto-killed after `GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES` (default 5 min) of inactivity. Kills only the PID-tracked instance on port 9222 — never touches the main Chrome session. Activity timestamp written at search start and end.
107
+
108
+ ### Performance
109
+
110
+ - **Timeouts cut ~40–50%** across all extractors — typical search ~60–90s → ~30–45s:
111
+ - `TIMING`: postNav 1500→800ms, postNavSlow 2000→1000ms, postClick 400→250ms, postType 400→250ms, inputPoll 400→300ms, copyPoll 600→400ms, afterVerify 3000→2000ms
112
+ - Defaults: waitForCopyButton 60s→30s, waitForStreamComplete 30s→20s, handleVerification 60s→30s
113
+ - Per-extractor: Google stream 45s→30s, Gemini copyButton 120s→60s + inputDeadline 10s→8s, Perplexity inputDeadline 8s→5s + stream 30s→20s, Bing verification 90s→30s + copyButton 60s→30s
114
+ - Engine process timeout: 90s→60s (180s→120s Gemini)
115
+
116
+ ### Security
117
+
118
+ - **SonarCloud security hotspots fixed** — Two open hotspots resolved:
119
+ - _Weak cryptography (S2245)_ in `extractors/consent.mjs`: replaced `Math.random()` with `crypto.randomInt()` for the mouse-jitter RNG. Not actually security-sensitive (used only for ±3px jitter and timing delays), but compliant now.
120
+ - _PATH injection (S4036)_ in `src/search/chrome.mjs`: `spawn("node", ...)` replaced with `spawn(process.execPath, ...)` so the launcher doesn't rely on the `PATH` environment variable.
121
+ - **Query/prompt leakage prevention** — Queries and synthesis prompts no longer appear in OS process tables. All `spawn()` calls now pipe query/prompt through stdin via `--stdin` flag instead of command-line arguments. Affects `runSearch`, `runExtractor`, `synthesizeWithGemini`, and all 5 extractors (`perplexity`, `bing-copilot`, `google-ai`, `google-search`, `gemini`).
122
+
123
+ ### Visual
124
+
125
+ - **Redesigned banner** — Cleaner SVG layout with pi logo icon, no text, no lens graphic. Gemini Synthesizer pill badge integrated. Three design iterations landed on a minimal icon-only look (`docs/banner.svg`).
126
+
127
+ ### Fixed
128
+
129
+ - **Gemini & Bing copy button race condition** — Both extractors were capturing the user's query instead of the AI's answer. Root cause: `document.querySelector()` returns the first copy button in DOM order, which is the user's echoed message (above the assistant's response). For short queries this triggers instantly. Fixed by: (1) replacing `waitForCopyButton` with `waitForStreamComplete` to ensure the response finishes streaming before copying, and (2) clicking the **last** copy button (`querySelectorAll` + `[length-1]`) instead of the first — matching Perplexity's proven pattern. Also added periodic scroll-to-bottom alongside stream wait for Gemini to trigger lazy-loaded content.
130
+ - **Progress tracker shows false ✅ for errors** — `makeProgressTracker` in `shared.ts` completely ignored the `status` parameter, always showing `✅ done` for every engine. Now correctly tracks per-engine status and shows `❌ failed` when an engine errors.
131
+ - **Synthesis echoes engine JSON when engines fail** — When Perplexity/Bing fail, Gemini was echoing the engine summary JSON back as its "answer". `synthesis-runner.mjs` now detects this pattern (engine keys without synthesis fields) and treats it as a parse failure, falling back to individual engine results.
132
+ - **`headless=false` parameter ignored** — The `--headless` flag was never checked by `search.mjs` or `launch.mjs`; they only read `GREEDY_SEARCH_VISIBLE`. `shared.ts` now propagates the visibility preference via the env var when `headless=false` is passed.
133
+
134
+ ### Cloudflare / Verification Recovery
135
+
136
+ - **Auto-recovery from Cloudflare blocks** — When Perplexity (`#ask-input` not found) or Bing (`input not found` / `verification required`) fail in headless mode, `search.mjs` now:
137
+ 1. Detects the Cloudflare/verification error pattern
138
+ 2. Kills headless Chrome, relaunches in visible mode
139
+ 3. Retries the blocked engines — Cloudflare bypasses, cookies stored in Chrome profile
140
+ 4. Kills visible Chrome, relaunches headless
141
+ 5. Continues remaining pipeline (source fetch, synthesis)
142
+ 6. Cookies persist — subsequent headless searches pass transparently
143
+
144
+ ### Removed
145
+
146
+ - **`coding_task` tool removed** — `bin/coding-task.mjs`, `src/formatters/coding.ts`, registration deleted (644 lines).
147
+ - **`deep_research` tool removed** — handler, test, and `formatDeepResearch` + helpers deleted (521 lines). Use `greedy_search` with `depth: "deep"`.
148
+ - **Minimize debug logs** — Removed 9 verbose `[minimize]` console.log statements from launch.mjs.
149
+
150
+ ### Fixes
151
+
152
+ - **Code scanning alerts resolved (5 alerts)** — (1) Added `permissions: contents: read` to `sync-to-webaio.yml` workflow (#14). (2) Fixed backslash escaping in `consent.mjs`'s `humanClickElement` selector injection (#10) — selectors containing backslashes (e.g., `\"`) weren't properly escaped before DOM injection. (3) Fixed same backslash escaping in `google-search.mjs`'s `SEARCH_BOX` selector in 3 locations (#11-13).
153
+ - **`cdp.mjs` `getPages()` filter** — Allows `chrome://newtab/` (headless Chrome default initial tab). Prevents "No Chrome tabs found" on cold start.
154
+
155
+ ### Security
156
+
157
+ - **SonarCloud: Log injection vulnerability (1 alert)** — `bin/launch.mjs` no longer logs the raw WebSocket debugger URL (user-controlled data). Replaced with a static "WebSocket URL received" message to prevent query/URL content from leaking into logs.
158
+
159
+ ### Code Quality
160
+
161
+ - **SonarCloud batch fixes (~52 issues resolved)** across 16 source files:
162
+ - `S7781` — Replaced 18 `String#replace()` calls with `String#replaceAll()` for global replacements (regex → literal where applicable).
163
+ - `S1128` — Removed 15 unused imports (`dirname`, `join`, `relative`, `spawn`, `tmpdir`, `existsSync`, `shouldUseBrowser`, `closeTabs`, `cdp`, `openNewTab`, `closeTab`, `activateTab`, `trimText`).
164
+ - `S7773` — Migrated 11 `parseInt`/`parseFloat` calls to `Number.parseInt`/`Number.parseFloat`.
165
+ - `S7780` — Wrapped 8 CDP eval templates containing backslash sequences in `String.raw()` to eliminate double-escaping.
166
+ - `S7735` — Eliminated 13 negated-condition ternaries by inverting the conditional logic (`!== -1 ? ... : null` → `=== -1 ? null : ...`).
167
+
168
+ ### Security Hotspot Review
169
+
170
+ - **SonarCloud: 20 security hotspots reviewed and marked Safe** — All outstanding hotspots were assessed and resolved in SonarCloud:
171
+ - `S4721` OS Command Injection (×2) — Inputs are hardcoded (`port=9222`) or parsed from system output and validated via `Number.parseInt`. Not user-controlled.
172
+ - `S5852` Regex ReDoS (×10) — Regexes operate on bounded input with negated char classes or short fixed patterns. No practical denial-of-service risk.
173
+ - `S4036` PATH environment variable (×8) — Local CLI extension spawning package-internal Node scripts. PATH is host-controlled; no untrusted input reaches the command.
174
+
175
+ ### Tooling
176
+
177
+ - **SonarCloud configuration** — Added `sonar-project.properties` with exclusions for `test/**`, `test.mjs`, `test.sh`, `test_unit.mjs`, and `scripts/**` so test-only code does not skew source quality metrics.
4
178
 
5
179
  ## v1.8.5 (2026-04-29)
6
180
 
package/README.md CHANGED
@@ -1,98 +1,173 @@
1
- # GreedySearch for Pi
2
-
3
- Multi-engine AI web search for Pi via browser automation.
4
-
5
- - No API keys
6
- - Real browser results (Perplexity, Bing Copilot, Google AI)
7
- - Optional Gemini synthesis with source grounding
8
-
9
- ## Install
10
-
11
- ```bash
12
- pi install npm:@apmantza/greedysearch-pi
13
- ```
14
-
15
- Or from git:
16
-
17
- ```bash
18
- pi install git:github.com/apmantza/GreedySearch-pi
19
- ```
20
-
21
- ## Tools
22
-
23
- - `greedy_search` - fast or grounded multi-engine search
24
- - `coding_task` - browser-routed Gemini/Copilot coding assistance
25
-
26
- ## Quick usage
27
-
28
- ```js
29
- greedy_search({ query: "React 19 changes" })
30
- greedy_search({ query: "Prisma vs Drizzle", engine: "all", depth: "fast" })
31
- greedy_search({ query: "Best auth architecture 2026", engine: "all", depth: "deep" })
32
- ```
33
-
34
- ## Parameters (`greedy_search`)
35
-
36
- - `query` (required)
37
- - `engine`: `all` (default), `perplexity`, `bing`, `google`, `gemini`
38
- - `depth`: `standard` (default), `fast`, `deep`
39
- - `fullAnswer`: return full single-engine output instead of preview
40
-
41
- ## Depth modes
42
-
43
- - `fast` - quickest, no synthesis/source fetching
44
- - `standard` - balanced default for `engine: "all"` (synthesis + fetched sources)
45
- - `deep` - strongest grounding and confidence metadata
46
-
47
- ## Runtime commands
48
-
49
- ```bash
50
- node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs
51
- node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --status
52
- node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --kill
53
- ```
54
-
55
- ## Requirements
56
-
57
- - Chrome
58
- - Node.js 20.11.0+ (22+ recommended)
59
-
60
- ## Source fetching
61
-
62
- When using `depth: "standard"` or `depth: "deep"`, source content is fetched and synthesized:
63
-
64
- - **Reddit** — Uses Reddit's public `.json` API for posts and comments (no scraping)
65
- - **GitHub** Uses GitHub REST API for repos, READMEs, and file trees
66
- - **General web** Mozilla Readability extraction with browser fallback for bot-blocked pages
67
- - **Metadata** title, author/byline, site name, publish date, language, excerpt
68
-
69
- ## Project layout
70
-
71
- - `bin/` - runtime CLIs (`search.mjs`, `launch.mjs`, `cdp.mjs`, `coding-task.mjs`)
72
- - `extractors/` - engine-specific automation
73
- - `src/` - ranking/fetching/formatting internals (includes `reddit.mjs`, `github.mjs`, `fetcher.mjs`)
74
- - `skills/` - Pi skill metadata
75
-
76
- ## Testing
77
-
78
- Cross-platform test runner (Windows + Unix):
79
- ```bash
80
- npm test # run all tests
81
- npm run test:quick # skip slow tests
82
- npm run test:smoke # basic health check
83
- ```
84
-
85
- Full bash test suite (Unix only):
86
- ```bash
87
- npm run test:bash # comprehensive tests
88
- ./test.sh parallel # race condition tests
89
- ./test.sh flags # flag/option tests
90
- ```
91
-
92
- ## Changelog
93
-
94
- See `CHANGELOG.md`.
95
-
96
- ## License
97
-
98
- MIT
1
+ # GreedySearch for Pi
2
+
3
+ ![GreedySearch](docs/banner.svg)
4
+
5
+ Multi-engine AI web search for Pi via browser automation.
6
+
7
+ - No API keys
8
+ - Real browser results (Perplexity, Bing Copilot, Google AI)
9
+ - Optional Gemini synthesis with source grounding
10
+ - Chrome runs headless by default — no window, purely background
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ pi install npm:@apmantza/greedysearch-pi
16
+ ```
17
+
18
+ Or from git:
19
+
20
+ ```bash
21
+ pi install git:github.com/apmantza/GreedySearch-pi
22
+ ```
23
+
24
+ ## Tools
25
+
26
+ - `greedy_search` — multi-engine AI web search
27
+ - `websearch` — lightweight DuckDuckGo/Brave search (via pi-webaio)
28
+ - `webfetch` / `webpull` — page fetching and site crawling (via pi-webaio)
29
+
30
+ ## Quick usage
31
+
32
+ ```js
33
+ greedy_search({ query: "React 19 changes" });
34
+ greedy_search({ query: "Prisma vs Drizzle", engine: "all", depth: "fast" });
35
+ greedy_search({
36
+ query: "Best auth architecture 2026",
37
+ engine: "all",
38
+ depth: "deep",
39
+ });
40
+ // Headless is the default — no window. To force visible Chrome:
41
+ greedy_search({ query: "Bing captcha setup", engine: "bing", visible: true });
42
+ ```
43
+
44
+ ## Parameters (`greedy_search`)
45
+
46
+ - `query` (required)
47
+ - `engine`: `all` (default), `perplexity`, `bing`, `google`, `gemini`
48
+ - `depth`: `standard` (default), `fast`, `deep`
49
+ - `fullAnswer`: return full single-engine output instead of preview
50
+ - `headless`: set to `false` to show Chrome window (default: `true`)
51
+ - `visible` / `alwaysVisible`: set to `true` to always use visible Chrome for this search
52
+
53
+ ## Environment variables
54
+
55
+ | Variable | Default | Description |
56
+ | ------------------------------------ | ------------- | ------------------------------------------------------------- |
57
+ | `GREEDY_SEARCH_VISIBLE` | (unset) | Set to `1` to show Chrome window instead of headless |
58
+ | `GREEDY_SEARCH_ALWAYS_VISIBLE` | (unset) | Set to `1` to force visible Chrome for all GreedySearch runs |
59
+ | `GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES` | `5` | Minutes of inactivity before auto-killing GreedySearch Chrome |
60
+ | `GREEDY_SEARCH_LOCALE` | `en` | Default result language (en, de, fr, es, ja, etc.) |
61
+ | `CHROME_PATH` | auto-detected | Path to Chrome/Chromium executable |
62
+
63
+ ## Depth modes
64
+
65
+ - `fast` - quickest, no synthesis/source fetching
66
+ - `standard` - balanced default for `engine: "all"` (synthesis + fetched sources)
67
+ - `deep` - strongest grounding and confidence metadata
68
+
69
+ ## Runtime commands
70
+
71
+ Inside Pi, prefer the extension commands (no package path needed):
72
+
73
+ ```text
74
+ /greedy-visible # launch visible Chrome for captcha/login/cookie setup
75
+ /greedy-status # show GreedySearch Chrome status
76
+ /greedy-kill # stop GreedySearch Chrome
77
+ ```
78
+
79
+ Git install path:
80
+
81
+ ```bash
82
+ GS=~/.pi/agent/git/github.com/apmantza/GreedySearch-pi
83
+ node "$GS/bin/launch.mjs" --status
84
+ node "$GS/bin/visible.mjs" # visible mode
85
+ node "$GS/bin/visible.mjs" --kill # strong visible/port cleanup
86
+ node "$GS/bin/kill-visible.mjs" # same as visible.mjs --kill
87
+ node "$GS/bin/cdp-visible.mjs" list # safe CDP: GreedySearch visible Chrome only
88
+ node "$GS/bin/cdp-headless.mjs" list # safe CDP: GreedySearch headless Chrome only
89
+ node "$GS/bin/cdp-greedy.mjs" list # safe CDP: any GreedySearch Chrome mode
90
+ ```
91
+
92
+ npm global install path:
93
+
94
+ ```bash
95
+ GS="$(npm root -g)/@apmantza/greedysearch-pi"
96
+ node "$GS/bin/launch.mjs" --status
97
+ node "$GS/bin/visible.mjs"
98
+ node "$GS/bin/visible.mjs" --kill
99
+ node "$GS/bin/kill-visible.mjs"
100
+ node "$GS/bin/cdp-visible.mjs" list
101
+ node "$GS/bin/cdp-headless.mjs" list
102
+ node "$GS/bin/cdp-greedy.mjs" list
103
+ ```
104
+
105
+ Chrome is auto-cleaned after 5 min idle. Override with `GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES=10` or disable with `0`.
106
+
107
+ **CDP safety:** use `cdp-visible.mjs`, `cdp-headless.mjs`, or `cdp-greedy.mjs` for debugging. They always set `CDP_PROFILE_DIR` to the dedicated GreedySearch Chrome profile and never fall back to your main Chrome session. Avoid calling raw `bin/cdp.mjs` manually unless you explicitly set `CDP_PROFILE_DIR`.
108
+
109
+ ## Requirements
110
+
111
+ - Chrome
112
+ - Node.js 22+
113
+
114
+ ## Known engine quirks
115
+
116
+ ### Bing Copilot
117
+
118
+ Bing Copilot detects headless Chrome and sandboxes all AI responses inside nested iframes (`copilot.microsoft.com` → `copilot.fun` → `blob:`). In this mode the copy button is hidden and the Cloudflare Turnstile challenge blocks content delivery. The clipboard-based extraction cannot work.
119
+
120
+ **Auto-recovery:** When Bing or Perplexity fails with a headless-only extraction error (clipboard, verification, timeout, Cloudflare), GreedySearch automatically switches to **visible Chrome** and retries, even in `fast` mode. If manual verification is required, the visible browser is left open and the tool returns instructions to solve the challenge and rerun the same search.
121
+
122
+ If you prefer to skip the auto-recovery delay, launch visible Chrome ahead of time with `/greedy-visible`, set `GREEDY_SEARCH_ALWAYS_VISIBLE=1`, or pass `visible: true` to `greedy_search`.
123
+
124
+ ## Anti-detection
125
+
126
+ Headless Chrome auto-injects stealth patches before any page JavaScript runs:
127
+
128
+ - `navigator.webdriver` hidden, plugins/languages faked, `window.chrome` shimmed
129
+ - WebGL vendor spoofed (Intel Iris), realistic hardware concurrency / memory
130
+ - CDP automation markers deleted, `requestAnimationFrame` kept alive
131
+ - Human-like click simulation with coordinate jitter and variable delays
132
+
133
+ This bypasses casual bot detection (basic `navigator.webdriver` checks) but does not defeat commercial anti-bot services (DataDome, PerimeterX, Kasada). **Bing Copilot specifically detects headless and sandboxes responses behind Cloudflare Turnstile** — see [Known engine quirks](#known-engine-quirks) for the auto-recovery mechanism.
134
+
135
+ When using `depth: "standard"` or `depth: "deep"`, source content is fetched and synthesized:
136
+
137
+ - **Reddit** — Uses Reddit's public `.json` API for posts and comments (no scraping)
138
+ - **GitHub** — Uses GitHub REST API for repos, READMEs, and file trees
139
+ - **General web** — Mozilla Readability extraction with browser fallback for bot-blocked pages
140
+ - **Metadata** — title, author/byline, site name, publish date, language, excerpt
141
+
142
+ ## Project layout
143
+
144
+ - `bin/` — runtime CLIs (`search.mjs`, `launch.mjs`, `launch-visible.mjs`, `visible.mjs`, `kill-visible.mjs`, safe CDP wrappers, `cdp.mjs`)
145
+ - `extractors/` — engine-specific automation + stealth/consent handling
146
+ - `src/` — search pipeline, chrome management, source fetching, formatting
147
+ - `skills/` — Pi skill metadata
148
+
149
+ ## Testing
150
+
151
+ Cross-platform test runner (Windows + Unix):
152
+
153
+ ```bash
154
+ npm test # run all tests
155
+ npm run test:quick # skip slow tests
156
+ npm run test:smoke # basic health check
157
+ ```
158
+
159
+ Full bash test suite (Unix only):
160
+
161
+ ```bash
162
+ npm run test:bash # comprehensive tests
163
+ ./test.sh parallel # race condition tests
164
+ ./test.sh flags # flag/option tests
165
+ ```
166
+
167
+ ## Changelog
168
+
169
+ See `CHANGELOG.md`.
170
+
171
+ ## License
172
+
173
+ MIT
@@ -0,0 +1,62 @@
1
+ #!/usr/bin/env node
2
+ // cdp-greedy.mjs — safe CDP wrapper for the dedicated GreedySearch Chrome.
3
+ //
4
+ // This ALWAYS sets CDP_PROFILE_DIR to the GreedySearch profile so it never
5
+ // falls back to the user's main Chrome DevToolsActivePort.
6
+ //
7
+ // Usage:
8
+ // node bin/cdp-greedy.mjs list
9
+ // node bin/cdp-greedy.mjs --mode visible list
10
+ // node bin/cdp-greedy.mjs --mode headless snap <tab>
11
+
12
+ import { spawn } from "node:child_process";
13
+ import { existsSync, readFileSync } from "node:fs";
14
+ import { tmpdir } from "node:os";
15
+
16
+ const tmp = tmpdir().replaceAll("\\", "/");
17
+ const PROFILE_DIR = `${tmp}/greedysearch-chrome-profile`;
18
+ const ACTIVE_PORT = `${PROFILE_DIR}/DevToolsActivePort`;
19
+ const MODE_FILE = `${tmp}/greedysearch-chrome-mode`;
20
+
21
+ const args = process.argv.slice(2);
22
+ let desiredMode = null;
23
+ const modeIdx = args.indexOf("--mode");
24
+ if (modeIdx !== -1) {
25
+ desiredMode = args[modeIdx + 1] || null;
26
+ args.splice(modeIdx, 2);
27
+ }
28
+
29
+ if (desiredMode && !["visible", "headless"].includes(desiredMode)) {
30
+ console.error(`Invalid --mode ${desiredMode}. Use visible or headless.`);
31
+ process.exit(2);
32
+ }
33
+
34
+ if (!existsSync(ACTIVE_PORT)) {
35
+ console.error(
36
+ `GreedySearch Chrome is not running (missing ${ACTIVE_PORT}). Launch with bin/visible.mjs or bin/launch.mjs.`,
37
+ );
38
+ process.exit(1);
39
+ }
40
+
41
+ if (desiredMode) {
42
+ const actualMode = existsSync(MODE_FILE)
43
+ ? readFileSync(MODE_FILE, "utf8").trim()
44
+ : "unknown";
45
+ if (actualMode !== desiredMode) {
46
+ console.error(
47
+ `GreedySearch Chrome is ${actualMode}, not ${desiredMode}. Refusing to attach.`,
48
+ );
49
+ process.exit(1);
50
+ }
51
+ }
52
+
53
+ const cdpBin = new URL("./cdp.mjs", import.meta.url).pathname.replace(
54
+ /^\/([A-Z]:)/,
55
+ "$1",
56
+ );
57
+
58
+ const proc = spawn(process.execPath, [cdpBin, ...args], {
59
+ stdio: "inherit",
60
+ env: { ...process.env, CDP_PROFILE_DIR: PROFILE_DIR },
61
+ });
62
+ proc.on("close", (code) => process.exit(code ?? 0));
@@ -0,0 +1,16 @@
1
+ #!/usr/bin/env node
2
+ // cdp-headless.mjs — safe CDP wrapper for headless GreedySearch Chrome only.
3
+
4
+ import { spawn } from "node:child_process";
5
+
6
+ const cdpGreedyBin = new URL(
7
+ "./cdp-greedy.mjs",
8
+ import.meta.url,
9
+ ).pathname.replace(/^\/([A-Z]:)/, "$1");
10
+
11
+ const proc = spawn(
12
+ process.execPath,
13
+ [cdpGreedyBin, "--mode", "headless", ...process.argv.slice(2)],
14
+ { stdio: "inherit" },
15
+ );
16
+ proc.on("close", (code) => process.exit(code ?? 0));
@@ -0,0 +1,16 @@
1
+ #!/usr/bin/env node
2
+ // cdp-visible.mjs — safe CDP wrapper for visible GreedySearch Chrome only.
3
+
4
+ import { spawn } from "node:child_process";
5
+
6
+ const cdpGreedyBin = new URL(
7
+ "./cdp-greedy.mjs",
8
+ import.meta.url,
9
+ ).pathname.replace(/^\/([A-Z]:)/, "$1");
10
+
11
+ const proc = spawn(
12
+ process.execPath,
13
+ [cdpGreedyBin, "--mode", "visible", ...process.argv.slice(2)],
14
+ { stdio: "inherit" },
15
+ );
16
+ proc.on("close", (code) => process.exit(code ?? 0));