@apmantza/greedysearch-pi 1.8.7 → 1.8.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +35 -0
- package/README.md +5 -4
- package/bin/search.mjs +668 -623
- package/extractors/bing-aria.mjs +539 -0
- package/extractors/bing-copilot.mjs +1 -1
- package/extractors/common.mjs +561 -529
- package/extractors/gemini.mjs +150 -150
- package/extractors/selectors.mjs +54 -54
- package/package.json +1 -1
- package/skills/greedy-search/skill.md +26 -53
- package/src/fetcher.mjs +652 -652
- package/src/search/browser-lifecycle.mjs +615 -0
- package/src/search/chrome.mjs +529 -449
- package/src/search/constants.mjs +44 -43
- package/src/search/engines.mjs +3 -2
- package/src/search/sources.mjs +5 -1
- package/src/search/synthesis.mjs +235 -223
- package/src/utils/content.mjs +5 -1
- package/src/utils/system-cmds.mjs +101 -0
package/CHANGELOG.md
CHANGED
|
@@ -2,8 +2,43 @@
|
|
|
2
2
|
|
|
3
3
|
## [Unreleased]
|
|
4
4
|
|
|
5
|
+
## [1.8.9] — 2026-05-11
|
|
6
|
+
|
|
7
|
+
### Changed
|
|
8
|
+
|
|
9
|
+
- **Halved Gemini synthesis timeout** (`extractors/gemini.mjs`) — `waitForStreamComplete` timeout reduced from 90s to 45s. Gemini synthesis prompts are ~8-10k chars and typically respond in 15-30s. The extra 45s was pure dead time.
|
|
10
|
+
- **Aligned Gemini extractor hard timeout** (`src/search/engines.mjs`) — reduced from 120s to 70s, matching the new 45s stream wait + ~25s nav/settle overhead.
|
|
11
|
+
|
|
12
|
+
### Fixed
|
|
13
|
+
|
|
14
|
+
- **Perplexity/Bing visible recovery now actually stores cookies** (`bin/search.mjs`) — Two issues fixed:
|
|
15
|
+
1. **Second visible retry**: The first visible retry resolves Cloudflare/Turnstile (navigating through the challenge which breaks the CDP session with "Inspected target navigated or closed"), but the search never ran. A second retry on the same tab now reuses the freshly-cached Turnstile cookies and executes the actual search.
|
|
16
|
+
2. **Keep Chrome alive on recovery success**: Previously Chrome was killed with `taskkill /F` after recovery, losing any pending cookie database writes. Now visible Chrome stays running when recovery succeeds (or needs human intervention), keeping the cookie session alive.
|
|
17
|
+
- **Visible Chrome window minimized after recovery** (`bin/search.mjs`) — When visible Chrome is left open after recovery (for cookie persistence or user verification), the window is automatically minimized so it doesn't clutter the desktop.
|
|
18
|
+
|
|
19
|
+
## [1.8.8] — 2026-05-09
|
|
20
|
+
|
|
21
|
+
### Added
|
|
22
|
+
|
|
23
|
+
- **`/set-greedy-locale` Pi command** (`index.ts`) — Set default locale for search results (e.g., `/set-greedy-locale de`, `/set-greedy-locale --clear`, `/set-greedy-locale --show`). Saves to `~/.config/greedysearch/config.json`.
|
|
24
|
+
- **Browser lifecycle defense patterns** (`src/search/browser-lifecycle.mjs`, new) — Centralized lifecycle management adopted from open-websearch's robust cross-process browser patterns:
|
|
25
|
+
- **Structured JSON metadata** (`greedysearch-chrome-metadata.json`) replaces three scattered text files (PID, mode, activity) with a single file tracking `browserPid`, `debugPort`, `tempDir`, `clientPids[]`, `sessionMode`, `lastActivity`, `launchedAt`. Backward-compatible — legacy files still written.
|
|
26
|
+
- **Process command-line verification** — `verifyBrowserProcess()` checks not just PID alive but that the process command line contains the profile dir and debug port. Prevents PID collision false-positives where a different process reuses the same PID.
|
|
27
|
+
- **Cross-process launch lock** — `acquireLaunchLock()` uses exclusive-create (`wx` flag) to prevent concurrent `ensureChrome()` calls from racing to launch Chrome. Stale lock recovery after 15s.
|
|
28
|
+
- **Stale session cleanup** — `cleanupStaleSessions()` runs once per process on first `ensureChrome()`. Scans metadata for dead PIDs, verifies survivors via command line, force-kills orphans, reclaims ghost processes on port 9222.
|
|
29
|
+
- **Client PID tracking** — `registerClient`/`unregisterClient` track which processes share the Chrome instance.
|
|
30
|
+
- **Mode-specific idle timeouts** (`src/search/chrome.mjs`) — Headless Chrome keeps the aggressive 5-minute idle timeout (`GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES`) since it's cheap to restart. Visible Chrome (explicitly launched for captcha/cookie setup) gets a 60-minute grace period (`GREEDY_SEARCH_VISIBLE_IDLE_TIMEOUT_MINUTES`) to avoid wasting the user's captcha investment. Set either to 0 to disable for that mode.
|
|
31
|
+
|
|
32
|
+
### Added
|
|
33
|
+
|
|
34
|
+
- **System command path resolution** (`src/utils/system-cmds.mjs`, new) — `resolveSystemCmd()` resolves `powershell`, `netstat`, `taskkill`, `ps`, `lsof`, `ss`, `grep` to absolute paths for secure execution. `isPathSafe()` validates PATH environment variable composition. Satisfies SonarCloud security hotspot requirements for `execFileSync`/`execSync` PATH safety.
|
|
35
|
+
|
|
5
36
|
### Fixed
|
|
6
37
|
|
|
38
|
+
- **SonarCloud security hotspots — 15 resolved** — Addressed all flagged items:
|
|
39
|
+
- **11 ReDoS-prone regex patterns**: Replaced greedy `.{0,50}` in fetcher's content quality check with lazy quantifier `.{0,50}?`; replaced alternation-heavy split regex in bing-copilot with `[^\S\n]*` horizontal whitespace; replaced `[\s\S]*` JSON extraction patterns in synthesis.mjs with `indexOf`/`lastIndexOf` brace matching; replaced `.+?\.` in selectors with `[^.]+`; replaced `\s+\S*$` trim patterns in sources.mjs, common.mjs, and content.mjs with `lastIndexOf` word-boundary detection; replaced markdown link regex in common.mjs with O(n) indexOf-based parser.
|
|
40
|
+
- **4 PATH-injection hotspots in browser-lifecycle.mjs and chrome.mjs**: Created `resolveSystemCmd()` utility returning absolute paths for `powershell.exe`, `netstat.exe`, `taskkill.exe` (Windows) and `/usr/bin/ps`, `/usr/bin/lsof`, `/usr/sbin/ss`, `/usr/bin/grep` (Unix). Replaced all bare command names in `execFileSync`/`execSync` calls.
|
|
41
|
+
|
|
7
42
|
- **SonarCloud minor vulnerability false positives** — Confirmed both remaining issues are false positives (internal diagnostic logging in `bin/gschrome.mjs` and test debug output in `test/fetcher-cli.mjs`). Verified via full smoke test suite: all 33 unit tests pass, all 4 engines (Perplexity, Bing, Google, Gemini) return results at all depths (fast/standard/deep), CDP safety wrappers correctly enforce mode boundaries.
|
|
8
43
|
|
|
9
44
|
- **SonarCloud security hotspots** (re-verified) — All previously fixed hotspots remain resolved: replaced `spawn("node", ...)` with `spawn(process.execPath, ...)`, replaced `Math.random()` with `crypto.randomInt()`, 19 remaining hotspots confirmed as false positives (hardcoded `execSync` commands, simple regex patterns).
|
package/README.md
CHANGED
|
@@ -71,9 +71,10 @@ greedy_search({ query: "Bing captcha setup", engine: "bing", visible: true });
|
|
|
71
71
|
Inside Pi, prefer the extension commands (no package path needed):
|
|
72
72
|
|
|
73
73
|
```text
|
|
74
|
-
/greedy-visible
|
|
75
|
-
/greedy-status
|
|
76
|
-
/greedy-kill
|
|
74
|
+
/greedy-visible # launch visible Chrome for captcha/login/cookie setup
|
|
75
|
+
/greedy-status # show GreedySearch Chrome status
|
|
76
|
+
/greedy-kill # stop GreedySearch Chrome
|
|
77
|
+
/set-greedy-locale # set default result language (de, fr, es, ja, etc.)
|
|
77
78
|
```
|
|
78
79
|
|
|
79
80
|
Git install path:
|
|
@@ -109,7 +110,7 @@ Chrome is auto-cleaned after 5 min idle. Override with `GREEDY_SEARCH_IDLE_TIMEO
|
|
|
109
110
|
## Requirements
|
|
110
111
|
|
|
111
112
|
- Chrome
|
|
112
|
-
- Node.js
|
|
113
|
+
- Node.js 20.11.0+
|
|
113
114
|
|
|
114
115
|
## Known engine quirks
|
|
115
116
|
|