@apmantza/greedysearch-pi 1.7.0 → 1.7.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,97 +1,115 @@
1
1
  # Changelog
2
2
 
3
- ## v1.6.5 (2026-04-04)
4
-
5
- ### Security
6
- - **Private URL blocking** — Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
7
-
8
- ### Features
9
- - **GitHub URL rewriting** — GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
10
- - **GitHub repo cloning** — Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
11
- - **Head+tail content trimming** — Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
12
- - **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
13
-
14
- ### Fixes
15
- - **Perplexity clipboard retry** — Added single retry with 2s delay when clipboard extraction fails, improving reliability.
16
-
17
- ## v1.6.4 (2026-04-02)
18
-
19
- ### Fixes
20
- - **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
21
- - **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
22
-
23
- ## v1.6.3 (2026-04-02)
24
-
25
- ### Fixes
26
- - **Debug output removed** — Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
27
-
28
- ## v1.6.2 (2026-04-01)
29
-
30
- ### Fixes
31
- - **Anti-bot detection evasion** — Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
32
-
33
- ## v1.6.1 (2026-03-31)
34
-
35
- ### Features
36
- - **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
3
+ ## v1.7.2 (2026-04-08)
37
4
 
38
- ### Code Quality
39
- - **Major refactoring** — extracted 438 lines from `index.ts` (856 418 lines) into modular formatters:
40
- - `src/formatters/coding.ts` — coding task formatting
41
- - `src/formatters/results.ts` — search and deep research formatting
42
- - `src/formatters/sources.ts` — source utilities (URL, label, consensus, formatting)
43
- - `src/formatters/synthesis.ts` — synthesis rendering
44
- - `src/utils/helpers.ts` — shared formatting utilities
45
- - **Complexity reduced** — cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
46
- - **Eliminated code duplication** — removed 6 duplicate blocks, consolidated 4+ single-use helper functions
5
+ ### Release
6
+ - **Patch release** — version bump and npm package verification for the `bin/` runtime layout (`bin/search.mjs`, `bin/launch.mjs`, `bin/cdp.mjs`, `bin/coding-task.mjs`).
47
7
 
48
- ### Documentation
49
- - Clarified `greedy_search` is WEB SEARCH ONLY — removed "NOT for codebase search" from tool description (still in skill documentation)
8
+ ## v1.7.1 (2026-04-08)
50
9
 
51
- ## v1.6.0 (2026-03-29)
10
+ ### Performance
11
+ - **Bounded source-fetch concurrency** — source fetching now uses a small worker pool (default `2`, configurable via `GREEDY_FETCH_CONCURRENCY`) to reduce burstiness while keeping deep-research fast.
52
12
 
53
- ### Breaking Changes (Backward Compatible)
54
- - **Merged deep_research into greedy_search**new `depth` parameter with three levels:
55
- - `fast`: single engine (~15-30s)
56
- - `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
57
- - `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
58
- - **Simpler mental model** — one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
59
- - **Deprecated flags still work** — `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
60
- - **deep_research tool aliased** — still works, calls `greedy_search` with `depth: "deep"`
13
+ ### Project structure
14
+ - **Runtime scripts moved to `bin/`** `search.mjs`, `launch.mjs`, `cdp.mjs`, and `coding-task.mjs` now live under `bin/` for a cleaner repository root.
15
+ - **Path references updated** — extension runtime, tests, extractor shared utilities, and docs now point to `bin/*` paths.
61
16
 
62
- ### Documentation
63
- - Updated README with new `depth` parameter and examples
64
- - Updated skill documentation (SKILL.md) to reflect simplified API
17
+ ### Packaging & docs
18
+ - **Package file list updated** — npm package now includes `bin/` directly instead of root script entries.
19
+ - **README simplified** rewritten into a shorter, concise format with quick install, usage, and layout guidance.
65
20
 
66
- ## v1.5.1 (2026-03-29)
67
-
68
- - **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
69
-
70
- ## v1.5.0 (2026-03-29)
71
-
72
- ### Features
73
- - **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
74
- - **Chrome targeting hardened** — all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
75
- - **Shared utilities** — extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
76
- - **Documentation leaner** — skill documentation reduced 61% (180 → 70 lines) while preserving all decision-making info
77
-
78
- ### Notable
79
- - **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
80
-
81
- ## v1.4.2 (2026-03-25)
82
-
83
- - **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
84
- - **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
85
- - **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
86
-
87
- ## v1.4.1
88
-
89
- - **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
90
-
91
- ## v1.4.0
92
-
93
- - **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
94
- - **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
95
- - **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
96
- - **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
97
- - **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
21
+ ## v1.6.5 (2026-04-04)
22
+
23
+ ### Security
24
+ - **Private URL blocking** — Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
25
+
26
+ ### Features
27
+ - **GitHub URL rewriting** — GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
28
+ - **GitHub repo cloning** — Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
29
+ - **Head+tail content trimming** — Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
30
+ - **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
31
+
32
+ ### Fixes
33
+ - **Perplexity clipboard retry** — Added single retry with 2s delay when clipboard extraction fails, improving reliability.
34
+
35
+ ## v1.6.4 (2026-04-02)
36
+
37
+ ### Fixes
38
+ - **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
39
+ - **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
40
+
41
+ ## v1.6.3 (2026-04-02)
42
+
43
+ ### Fixes
44
+ - **Debug output removed** — Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
45
+
46
+ ## v1.6.2 (2026-04-01)
47
+
48
+ ### Fixes
49
+ - **Anti-bot detection evasion** — Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
50
+
51
+ ## v1.6.1 (2026-03-31)
52
+
53
+ ### Features
54
+ - **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
55
+
56
+ ### Code Quality
57
+ - **Major refactoring** — extracted 438 lines from `index.ts` (856 → 418 lines) into modular formatters:
58
+ - `src/formatters/coding.ts` — coding task formatting
59
+ - `src/formatters/results.ts` — search and deep research formatting
60
+ - `src/formatters/sources.ts` — source utilities (URL, label, consensus, formatting)
61
+ - `src/formatters/synthesis.ts` — synthesis rendering
62
+ - `src/utils/helpers.ts` — shared formatting utilities
63
+ - **Complexity reduced** — cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
64
+ - **Eliminated code duplication** — removed 6 duplicate blocks, consolidated 4+ single-use helper functions
65
+
66
+ ### Documentation
67
+ - Clarified `greedy_search` is WEB SEARCH ONLY — removed "NOT for codebase search" from tool description (still in skill documentation)
68
+
69
+ ## v1.6.0 (2026-03-29)
70
+
71
+ ### Breaking Changes (Backward Compatible)
72
+ - **Merged deep_research into greedy_search** — new `depth` parameter with three levels:
73
+ - `fast`: single engine (~15-30s)
74
+ - `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
75
+ - `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
76
+ - **Simpler mental model** — one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
77
+ - **Deprecated flags still work** — `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
78
+ - **deep_research tool aliased** — still works, calls `greedy_search` with `depth: "deep"`
79
+
80
+ ### Documentation
81
+ - Updated README with new `depth` parameter and examples
82
+ - Updated skill documentation (SKILL.md) to reflect simplified API
83
+
84
+ ## v1.5.1 (2026-03-29)
85
+
86
+ - **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
87
+
88
+ ## v1.5.0 (2026-03-29)
89
+
90
+ ### Features
91
+ - **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
92
+ - **Chrome targeting hardened** — all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
93
+ - **Shared utilities** — extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
94
+ - **Documentation leaner** — skill documentation reduced 61% (180 → 70 lines) while preserving all decision-making info
95
+
96
+ ### Notable
97
+ - **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
98
+
99
+ ## v1.4.2 (2026-03-25)
100
+
101
+ - **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
102
+ - **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
103
+ - **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
104
+
105
+ ## v1.4.1
106
+
107
+ - **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
108
+
109
+ ## v1.4.0
110
+
111
+ - **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
112
+ - **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
113
+ - **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
114
+ - **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
115
+ - **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
package/LICENSE CHANGED
@@ -1,21 +1,21 @@
1
- MIT License
2
-
3
- Copyright (c) 2026
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
1
+ MIT License
2
+
3
+ Copyright (c) 2026
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md CHANGED
@@ -1,262 +1,73 @@
1
- # GreedySearch for Pi
2
-
3
- Pi extension that adds `greedy_search`, `deep_research`, and `coding_task` tools -- multi-engine AI search via browser automation. **NO API KEYS needed.**
4
-
5
- Fans out queries to Perplexity, Bing Copilot, and Google AI simultaneously. Returns AI-synthesized answers with fetched source content. Streams progress as each engine completes.
6
-
7
- **New in v2.0:** HTTP-first source fetching with Mozilla Readability extraction (~3x faster), smart query-aware source ranking.
8
-
9
- Forked from [GreedySearch-claude](https://github.com/apmantza/GreedySearch-claude).
10
-
11
- ## Install
12
-
13
- ```bash
14
- pi install npm:@apmantza/greedysearch-pi
15
- ```
16
-
17
- Or directly from git:
18
-
19
- ```bash
20
- pi install git:github.com/apmantza/GreedySearch-pi
21
- ```
22
-
23
- ## Quick Start
24
-
25
- Once installed, Pi gains a `greedy_search` tool with two modes.
26
-
27
- ```javascript
28
- // Default: multi-engine + source fetch + synthesis
29
- Greedy_search({ query: "What's new in React 19?" })
30
-
31
- // Fast: single engine, no synthesis
32
- greedy_search({ query: "What's new in React 19?", depth: "fast", engine: "perplexity" })
33
- ```
34
-
35
- ## Parameters
36
-
37
- | Parameter | Type | Default | Description |
38
- |-----------|------|---------|-------------|
39
- | `query` | string | required | The search question |
40
- | `engine` | string | `"all"` | `all`, `perplexity`, `bing`, `google`, `gemini` |
41
- | `depth` | string | `"standard"` | `fast` (1 engine, no fetch), `standard` (3 engines + fetch + synthesis) |
42
- | `fullAnswer` | boolean | `false` | Return complete answer (~3000+ chars) vs truncated preview (~300 chars) |
43
-
44
- ## Depth Levels
45
-
46
- | Depth | Engines | Synthesis | Source Fetch | Time | Best For |
47
- |-------|---------|-----------|--------------|------|----------|
48
- | `fast` | 1 | no | no | 10-30s | Quick lookup, single perspective |
49
- | `standard` | 3 | yes | yes (top 5) | 15-30s | **Default** -- balanced, grounded answers |
50
-
51
- **Standard mode** (default for `engine: "all"`): Queries 3 engines, fetches content from top 5 sources via HTTP (with Readability extraction), synthesizes grounded answer with citations.
52
-
53
- **Fast mode**: Single engine, no source fetching or synthesis. Good for quick checks.
54
-
55
- ## Engines
56
-
57
- | Engine | Alias | Best for |
58
- |--------|-------|----------|
59
- | `all` | - | **Default** -- all 3 engines with synthesis + source fetch |
60
- | `perplexity` | `p` | Technical Q&A, code explanations, documentation |
61
- | `bing` | `b` | Recent news, Microsoft ecosystem |
62
- | `google` | `g` | Broad coverage, multiple perspectives |
63
- | `gemini` | `gem` | Google's AI with different training data |
64
-
65
- ## Streaming Progress
66
-
67
- When using `engine: "all"`, the tool streams progress as each engine completes:
68
-
69
- ```
70
- **Searching...** pending: perplexity, bing, google
71
- **Searching...** done: perplexity, pending: bing, google
72
- **Searching...** done: perplexity, done: bing, pending: google
73
- **Searching...** done: perplexity, done: bing, done: google
74
- **Synthesizing...** with Gemini
75
- ```
76
-
77
- ## Source Fetching (HTTP-First)
78
-
79
- GreedySearch now uses **HTTP-first source fetching** with Mozilla Readability for content extraction:
80
-
81
- - **HTTP**: Fast (~200-800ms), parallel, structured markdown output
82
- - **Browser fallback**: Only when HTTP fails (bot protection, JS-heavy sites)
83
- - **Typical success rate**: 90%+ of documentation sites work via HTTP
84
- - **Speed improvement**: ~3x faster than browser-only fetching (15-30s vs 60-180s)
85
-
86
- The old regex-based HTML stripping has been replaced with professional-grade content extraction that preserves document structure, code blocks, and headings.
87
-
88
- ## Smart Source Ranking
89
-
90
- Sources are now ranked using query-aware domain boosting:
91
-
92
- - **Query keywords** boost official docs (e.g., "react" → react.dev +10 points)
93
- - **Consensus**: Sources found by multiple engines rank higher
94
- - **Source type**: Official docs > repos > blogs > community
95
- - **URL patterns**: `/docs/`, `/api/`, `/reference/` get extra boost
96
-
97
- 40+ tech stacks have preferred domain mappings including React, Node.js, Python, Rust, Go, Prisma, Supabase, and more.
98
-
99
- ## GitHub Content Extraction
100
-
101
- GreedySearch handles GitHub URLs intelligently:
102
-
103
- - **Blob URLs** (`/blob/`) — Automatically rewritten to `raw.githubusercontent.com` for instant raw file access
104
- - **Tree/Root URLs** — Clones repo locally with `git clone --depth 1`, returns README preview + file tree for agent exploration
105
- - **Benefits**: Real file contents (not rendered HTML), accurate line numbers, works with private repos via `gh` CLI auth
106
-
107
- ## Security
108
-
109
- - **Private URL blocking** — Requests to localhost, RFC1918 addresses (10.x, 192.168.x), and .local/.internal domains are automatically blocked
110
- - **Cross-host redirect detection** — Detects redirects to authentication/login pages and falls back to browser extraction
111
- - **File protocol blocking** — `file://` URLs are rejected
112
-
113
- ## Examples
114
-
115
- **Default research (multi-engine + sources + synthesis):**
116
-
117
- ```javascript
118
- greedy_search({ query: "Best practices for monorepo structure" })
119
- ```
120
-
121
- **Quick lookup (fast):**
122
-
123
- ```javascript
124
- greedy_search({ query: "How to use async await in Python", depth: "fast", engine: "perplexity" })
125
- ```
126
-
127
- **Compare tools:**
128
-
129
- ```javascript
130
- greedy_search({ query: "Prisma vs Drizzle in 2026" })
131
- ```
132
-
133
- **Debug an error:**
134
-
135
- ```javascript
136
- greedy_search({ query: "Error: Cannot find module 'react-dom/client' Next.js 15" })
137
- ```
138
-
139
- ## Full vs Short Answers
140
-
141
- Default mode returns ~300 char summaries to save tokens. Use `fullAnswer: true` for complete responses:
142
-
143
- ```javascript
144
- greedy_search({ query: "explain the React compiler", engine: "perplexity", fullAnswer: true })
145
- ```
146
-
147
- ## Requirements
148
-
149
- - **Chrome** -- must be installed. The extension auto-launches a dedicated Chrome instance on port 9222 with its own isolated profile and DevTools port file, separate from your main browser session.
150
- - **Node.js 22+** -- for built-in `fetch` and WebSocket support.
151
-
152
- ## Setup (first time)
153
-
154
- To pre-launch the dedicated GreedySearch Chrome instance:
155
-
156
- ```bash
157
- node ~/.pi/agent/git/GreedySearch-pi/launch.mjs
158
- ```
159
-
160
- Stop it when done:
161
-
162
- ```bash
163
- node ~/.pi/agent/git/GreedySearch-pi/launch.mjs --kill
164
- ```
165
-
166
- Check status:
167
-
168
- ```bash
169
- node ~/.pi/agent/git/GreedySearch-pi/launch.mjs --status
170
- ```
171
-
172
- ## Troubleshooting
173
-
174
- ### "Chrome not found"
175
-
176
- Set the path explicitly:
177
-
178
- ```bash
179
- export CHROME_PATH="/path/to/chrome"
180
- ```
181
-
182
- ### "CDP timeout" or "Chrome may have crashed"
183
-
184
- Restart GreedySearch Chrome:
185
-
186
- ```bash
187
- node ~/.pi/agent/git/GreedySearch-pi/launch.mjs --kill
188
- node ~/.pi/agent/git/GreedySearch-pi/launch.mjs
189
- ```
190
-
191
- ### Google / Bing "verify you're human"
192
-
193
- The extension auto-clicks verification buttons and Cloudflare Turnstile challenges using broad keyword matching -- resilient to variations like "Verify you are human" or localised button text. For hard CAPTCHAs (image puzzles), solve manually in the Chrome window that opens.
194
-
195
- ### Parallel searches failing
196
-
197
- Each search creates a fresh isolated browser tab that is closed after completion, allowing safe parallel execution without tab state conflicts.
198
-
199
- ### Search hangs
200
-
201
- Chrome may be unresponsive. Restart it with `launch.mjs --kill` then `launch.mjs`.
202
-
203
- ### Sources are empty or junk links
204
-
205
- Sources are now extracted by regex-parsing Markdown links (`[title](url)`) from the clipboard text captured after each engine responds -- not from DOM selectors that break when the engine's UI updates. If sources are empty, the engine's clipboard copy didn't include formatted links (Bing Copilot currently falls into this category).
206
-
207
- ## How It Works
208
-
209
- - `index.ts` -- Pi extension, registers `greedy_search` tool with streaming progress
210
- - `search.mjs` -- CLI runner, spawns extractors in parallel, emits `PROGRESS:` events to stderr
211
- - `launch.mjs` -- launches dedicated Chrome on port 9222 with isolated profile
212
- - `extractors/` -- per-engine CDP scrapers (Perplexity, Bing Copilot, Google AI, Gemini)
213
- - `cdp.mjs` -- Chrome DevTools Protocol CLI for browser automation
214
- - `skills/greedy-search/SKILL.md` -- skill file that guides the model on when/how to use greedy_search
215
-
216
- ## Changelog
217
-
218
- ### v1.6.1 (2026-03-31)
219
- - **Single-engine full answers by default** -- `engine: "google"` (or any single engine) now returns complete answers instead of truncated previews. Multi-engine (`all`) still truncates to save tokens during synthesis.
220
- - **Codebase refactored** -- extracted 438 lines from `index.ts` into modular formatters (`src/formatters/`) reducing cognitive complexity from 360 to ~60 and maintainability index from 11.2 to ~40+
221
- - **Removed codebase search confusion** -- clarified that `greedy_search` is WEB SEARCH ONLY (not for searching local code)
222
-
223
- ### v1.6.0 (2026-03-29)
224
- - **Merged deep_research into greedy_search** -- new `depth` parameter: `fast` (1 engine), `standard` (3 engines + synthesis), `deep` (3 engines + fetch + synthesis + confidence)
225
- - **Simpler API** -- one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
226
- - **Backward compatible** -- `deep_research` still works as alias, `--synthesize` and `--deep-research` flags still function
227
- - **Updated documentation** -- README and skill docs now use `depth` parameter throughout
228
-
229
- ### v1.5.1 (2026-03-29)
230
- - Fixed npm package -- added `.pi-lens/` and test files to `.npmignore`
231
-
232
- ### v1.5.0 (2026-03-29)
233
-
234
- - **Code extraction fixed** -- `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
235
- - **Chrome targeting hardened** -- all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
236
- - **Shared utilities** -- extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
237
- - **Documentation leaner** -- skill documentation reduced 61% (180 -> 70 lines) while preserving all decision-making info
238
- - **NO API KEYS** -- updated messaging to emphasize this works via browser automation, no API keys needed
239
-
240
- ### v1.4.2 (2026-03-25)
241
-
242
- - **Fresh isolated tabs** -- each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
243
- - **Regex-based citation extraction** -- all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
244
- - **Relaxed verification detection** -- `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
245
-
246
- ---
247
-
248
- ### v1.4.1
249
-
250
- - **Fixed parallel synthesis** -- multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
251
-
252
- ### v1.4.0
253
-
254
- - **Grounded synthesis** -- Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
255
- - **Real deep research** -- top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
256
- - **Richer source metadata** -- source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
257
- - **Cleaner tab lifecycle** -- temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
258
- - **Isolated Chrome targeting** -- GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
259
-
260
- ## License
261
-
262
- MIT
1
+ # GreedySearch for Pi
2
+
3
+ Multi-engine AI web search for Pi via browser automation.
4
+
5
+ - No API keys
6
+ - Real browser results (Perplexity, Bing Copilot, Google AI)
7
+ - Optional Gemini synthesis with source grounding
8
+
9
+ ## Install
10
+
11
+ ```bash
12
+ pi install npm:@apmantza/greedysearch-pi
13
+ ```
14
+
15
+ Or from git:
16
+
17
+ ```bash
18
+ pi install git:github.com/apmantza/GreedySearch-pi
19
+ ```
20
+
21
+ ## Tools
22
+
23
+ - `greedy_search` - fast or grounded multi-engine search
24
+ - `coding_task` - browser-routed Gemini/Copilot coding assistance
25
+
26
+ ## Quick usage
27
+
28
+ ```js
29
+ greedy_search({ query: "React 19 changes" })
30
+ greedy_search({ query: "Prisma vs Drizzle", engine: "all", depth: "fast" })
31
+ greedy_search({ query: "Best auth architecture 2026", engine: "all", depth: "deep" })
32
+ ```
33
+
34
+ ## Parameters (`greedy_search`)
35
+
36
+ - `query` (required)
37
+ - `engine`: `all` (default), `perplexity`, `bing`, `google`, `gemini`
38
+ - `depth`: `standard` (default), `fast`, `deep`
39
+ - `fullAnswer`: return full single-engine output instead of preview
40
+
41
+ ## Depth modes
42
+
43
+ - `fast` - quickest, no synthesis/source fetching
44
+ - `standard` - balanced default for `engine: "all"` (synthesis + fetched sources)
45
+ - `deep` - strongest grounding and confidence metadata
46
+
47
+ ## Runtime commands
48
+
49
+ ```bash
50
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs
51
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --status
52
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --kill
53
+ ```
54
+
55
+ ## Requirements
56
+
57
+ - Chrome
58
+ - Node.js 22+
59
+
60
+ ## Project layout
61
+
62
+ - `bin/` - runtime CLIs (`search.mjs`, `launch.mjs`, `cdp.mjs`, `coding-task.mjs`)
63
+ - `extractors/` - engine-specific automation
64
+ - `src/` - ranking/fetching/formatting internals
65
+ - `skills/` - Pi skill metadata
66
+
67
+ ## Changelog
68
+
69
+ See `CHANGELOG.md`.
70
+
71
+ ## License
72
+
73
+ MIT