@apmantza/greedysearch-pi 1.9.1 → 1.9.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +30 -13
- package/README.md +11 -1
- package/bin/launch.mjs +2 -0
- package/bin/search.mjs +757 -674
- package/extractors/bing-copilot.mjs +490 -374
- package/extractors/common.mjs +703 -645
- package/extractors/consent.mjs +421 -388
- package/index.ts +2 -1
- package/package.json +8 -4
- package/skills/greedy-search/skill.md +5 -14
- package/src/search/research.mjs +1581 -0
- package/src/search/sources.mjs +26 -4
- package/src/search/synthesis-runner.mjs +52 -46
- package/src/tools/greedy-search-handler.ts +85 -13
- package/test.mjs +971 -534
package/CHANGELOG.md
CHANGED
|
@@ -2,6 +2,36 @@
|
|
|
2
2
|
|
|
3
3
|
## [Unreleased]
|
|
4
4
|
|
|
5
|
+
### Added
|
|
6
|
+
|
|
7
|
+
### Fixed
|
|
8
|
+
|
|
9
|
+
### Changed
|
|
10
|
+
|
|
11
|
+
### Removed
|
|
12
|
+
|
|
13
|
+
## [1.9.2] — 2026-05-25
|
|
14
|
+
|
|
15
|
+
### Added
|
|
16
|
+
|
|
17
|
+
- **Iterative research mode** (`bin/search.mjs`, `src/search/research.mjs`) — Added `--research` / `--depth research` and `greedy_search({ depth: "research" })`. The new mode plans focused follow-up queries, runs fast multi-engine searches, fetches and deduplicates sources, extracts compact learnings/gaps with Gemini, and writes a final cited report. Optional knobs: `breadth` (1-5), `iterations` (1-3), and `maxSources` (3-12). Research mode now fills under-planned breadth with deterministic fallback query angles so `breadth: 3` actually fans out even when Gemini is conservative.
|
|
18
|
+
|
|
19
|
+
### Fixed
|
|
20
|
+
|
|
21
|
+
- **Pi update dependency install is leaner** (`package.json`, `package-lock.json`) — Moved the direct `@sinclair/typebox` import into runtime dependencies and marked the Pi host peer as optional so npm does not auto-install a full nested `@earendil-works/pi-coding-agent` tree during git-package updates. This keeps `pi update` focused on GreedySearch runtime deps (`jsdom`, `@mozilla/readability`, `turndown`) and avoids partial installs that leave `jsdom/package.json` missing.
|
|
22
|
+
|
|
23
|
+
- **Pi TUI peer import no longer required at load time** (`src/tools/greedy-search-handler.ts`) — Replaced the direct `@earendil-works/pi-tui` runtime import with a tiny local `Text` component implementation so Pi/jiti extension import works even when optional TUI peer packages are not installed locally.
|
|
24
|
+
|
|
25
|
+
- **Research unit tests no longer require fetcher dependencies at import time** (`src/search/research.mjs`) — Research mode now lazy-loads source fetching/file-output helpers only during live research execution, keeping pure planning/normalization unit tests runnable in CI's tarball install simulation without local `node_modules`.
|
|
26
|
+
|
|
27
|
+
- **Research query sanitizer avoids ReDoS hotspot** (`src/search/research.mjs`) — Replaced markdown-link cleanup regexes with bounded string scanning and manual whitespace collapse, resolving the SonarCloud super-linear regex hotspot while preserving `site:[label](url)` query cleanup.
|
|
28
|
+
|
|
29
|
+
- **Research source quality cleanup** (`src/search/sources.mjs`, `src/search/research.mjs`) — Social/login-wall domains (`facebook.com`, `linkedin.com`, `x.com`, etc.) now receive a strong ranking penalty unless the query explicitly targets that platform. Research source dedupe now uses the same composite score as normal source ranking, per-round learning extraction errors are recorded in `_research.rounds[].learningError`, child-search stderr forwarding is filtered so noisy page CSS/HTML cannot flood research logs, and markdown links in Gemini-generated follow-up queries are sanitized before search.
|
|
30
|
+
|
|
31
|
+
- **Bing headless stealth hardening** (`extractors/common.mjs`, `bin/launch.mjs`) — Adopted low-risk ideas from Obscura's stealth model: `navigator.webdriver` now resolves to `undefined` instead of `false`, navigator plugins/mimeTypes/mediaDevices/connection/pdfViewer/platform/vendor are made more Chrome-like, patched functions stringify as `[native code]`, canvas noise is stable per page instead of random on each call, and Chrome launches with `--lang=en-US` plus `--force-color-profile=srgb`. Live Bing headless smoke passed after the change without visible recovery.
|
|
32
|
+
|
|
33
|
+
- **Research/Bing false recovery fixed** (`bin/search.mjs`, `extractors/bing-copilot.mjs`, `extractors/consent.mjs`) — Research child searches no longer mark Bing/Perplexity failed before visible recovery has a final status, Bing fast-mode keeps a bounded 40s parent budget, and Bing's short-mode stream wait caps at 25s so research can extract rendered partial answers before timing out. Bing verification detection now reuses the DOM-based `handleVerification` detector instead of scanning accessibility text for generic words like “Cloudflare” or “challenge”, preventing false visible-recovery trips when the user query/answer is about anti-bot systems. Added locale-agnostic DOM/accessibility fallback extraction that picks the assistant article without relying solely on English “Copilot said” labels.
|
|
34
|
+
|
|
5
35
|
## [1.9.1] — 2026-05-23
|
|
6
36
|
|
|
7
37
|
### Fixed
|
|
@@ -18,19 +48,6 @@
|
|
|
18
48
|
|
|
19
49
|
- **Gemini tab no longer steals focus during synthesis** (`bin/search.mjs`) — Removed the `activateTab` call on the pre-navigated Gemini tab. `Target.activateTarget` was restoring the minimized Chrome window mid-search; CDP synthesis operates on the target ID directly and has no need for the tab to be Chrome's active tab.
|
|
20
50
|
|
|
21
|
-
### Changed
|
|
22
|
-
|
|
23
|
-
- **Result file auto-purge** (`src/search/output.mjs`) — On each search run, files older than 7 days are deleted from the results directory. The 10 most recent files are always kept regardless of age. Runs inside `resultsDir()` so it's transparent and zero-overhead.
|
|
24
|
-
|
|
25
|
-
- **`greedy_search` tool: collapsed rendering** (`src/tools/greedy-search-handler.ts`) — Added `renderCall` and `renderResult` hooks. The call line shows the query (truncated to 60 chars) and engine. The result collapses to a one-line summary: synthesis path shows source count + consensus label; single-engine path shows source count; human-verification path shows a warning. Full output is available via expand (Ctrl+O). Also migrated peer deps from `@mariozechner/pi-coding-agent` to `@earendil-works/pi-coding-agent` and added `@earendil-works/pi-tui` for the `Text` primitive.
|
|
26
|
-
|
|
27
|
-
- **Headless stealth hardening** (`bin/launch.mjs`, `extractors/common.mjs`) — Four fingerprinting gaps closed:
|
|
28
|
-
- **UA version auto-detected** — `getChromeVersion` reads the versioned sub-directory inside the Chrome Application folder (e.g. `148.0.7778.168/`) to extract the real major version, then injects it into the `--user-agent` flag. Eliminates the TLS/UA mismatch that was caused by the hardcoded `Chrome/131` string (actual binary was `Chrome/148`).
|
|
29
|
-
- **`navigator.userAgentData`** — Spoofed to match the detected UA version and remove any `HeadlessChrome` brand entry. `getHighEntropyValues()` returns consistent architecture, platform, and full version list.
|
|
30
|
-
- **`window.outerWidth/Height`** — Patched from `0` (headless default) to mirror `innerWidth/Height`. A zero outer dimension is a well-known one-signal bot detector.
|
|
31
|
-
- **`screen.colorDepth/pixelDepth`** — Ensured to report `24` when unset.
|
|
32
|
-
- **GPU rendering re-enabled in headless** — Removed `--disable-gpu` and `--disable-software-rasterizer`. With `--headless=new`, Chrome uses hardware GPU acceleration (ANGLE/Direct3D on Windows), producing canvas and WebGL output identical to visible mode. Cloudflare Turnstile passes automatically on Perplexity without triggering visible-mode retry.
|
|
33
|
-
|
|
34
51
|
## [1.9.0] — 2026-05-22
|
|
35
52
|
|
|
36
53
|
### Added
|
package/README.md
CHANGED
|
@@ -37,6 +37,12 @@ greedy_search({
|
|
|
37
37
|
engine: "all",
|
|
38
38
|
depth: "deep",
|
|
39
39
|
});
|
|
40
|
+
greedy_search({
|
|
41
|
+
query: "Evaluate browser automation options for AI agents",
|
|
42
|
+
depth: "research",
|
|
43
|
+
breadth: 3,
|
|
44
|
+
iterations: 2,
|
|
45
|
+
});
|
|
40
46
|
// Headless is the default — no window. To force visible Chrome:
|
|
41
47
|
greedy_search({ query: "Bing captcha setup", engine: "bing", visible: true });
|
|
42
48
|
```
|
|
@@ -45,7 +51,10 @@ greedy_search({ query: "Bing captcha setup", engine: "bing", visible: true });
|
|
|
45
51
|
|
|
46
52
|
- `query` (required)
|
|
47
53
|
- `engine`: `all` (default), `perplexity`, `bing`, `google`, `gemini`
|
|
48
|
-
- `depth`: `standard` (default), `fast`, `deep`
|
|
54
|
+
- `depth`: `standard` (default), `fast`, `deep`, `research`
|
|
55
|
+
- `breadth`: research mode query breadth, 1-5 (default 3)
|
|
56
|
+
- `iterations`: research mode rounds, 1-3 (default 2)
|
|
57
|
+
- `maxSources`: research mode fetched source cap, 3-12
|
|
49
58
|
- `fullAnswer`: return full single-engine output instead of preview
|
|
50
59
|
- `headless`: set to `false` to show Chrome window (default: `true`)
|
|
51
60
|
- `visible` / `alwaysVisible`: set to `true` to always use visible Chrome for this search
|
|
@@ -65,6 +74,7 @@ greedy_search({ query: "Bing captcha setup", engine: "bing", visible: true });
|
|
|
65
74
|
- `fast` - quickest, no synthesis/source fetching
|
|
66
75
|
- `standard` - balanced default for `engine: "all"` (synthesis + fetched sources)
|
|
67
76
|
- `deep` - strongest grounding and confidence metadata
|
|
77
|
+
- `research` - slowest; iterative query planning, fast multi-engine searches, source fetching, learning extraction, and a final cited report
|
|
68
78
|
|
|
69
79
|
## Runtime commands
|
|
70
80
|
|
package/bin/launch.mjs
CHANGED