@apmantza/greedysearch-pi 1.8.4 → 1.8.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,98 +1,150 @@
1
- # GreedySearch for Pi
2
-
3
- Multi-engine AI web search for Pi via browser automation.
4
-
5
- - No API keys
6
- - Real browser results (Perplexity, Bing Copilot, Google AI)
7
- - Optional Gemini synthesis with source grounding
8
-
9
- ## Install
10
-
11
- ```bash
12
- pi install npm:@apmantza/greedysearch-pi
13
- ```
14
-
15
- Or from git:
16
-
17
- ```bash
18
- pi install git:github.com/apmantza/GreedySearch-pi
19
- ```
20
-
21
- ## Tools
22
-
23
- - `greedy_search` - fast or grounded multi-engine search
24
- - `coding_task` - browser-routed Gemini/Copilot coding assistance
25
-
26
- ## Quick usage
27
-
28
- ```js
29
- greedy_search({ query: "React 19 changes" })
30
- greedy_search({ query: "Prisma vs Drizzle", engine: "all", depth: "fast" })
31
- greedy_search({ query: "Best auth architecture 2026", engine: "all", depth: "deep" })
32
- ```
33
-
34
- ## Parameters (`greedy_search`)
35
-
36
- - `query` (required)
37
- - `engine`: `all` (default), `perplexity`, `bing`, `google`, `gemini`
38
- - `depth`: `standard` (default), `fast`, `deep`
39
- - `fullAnswer`: return full single-engine output instead of preview
40
-
41
- ## Depth modes
42
-
43
- - `fast` - quickest, no synthesis/source fetching
44
- - `standard` - balanced default for `engine: "all"` (synthesis + fetched sources)
45
- - `deep` - strongest grounding and confidence metadata
46
-
47
- ## Runtime commands
48
-
49
- ```bash
50
- node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs
51
- node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --status
52
- node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --kill
53
- ```
54
-
55
- ## Requirements
56
-
57
- - Chrome
58
- - Node.js 20.11.0+ (22+ recommended)
59
-
60
- ## Source fetching
61
-
62
- When using `depth: "standard"` or `depth: "deep"`, source content is fetched and synthesized:
63
-
64
- - **Reddit** Uses Reddit's public `.json` API for posts and comments (no scraping)
65
- - **GitHub** Uses GitHub REST API for repos, READMEs, and file trees
66
- - **General web** — Mozilla Readability extraction with browser fallback for bot-blocked pages
67
- - **Metadata** — title, author/byline, site name, publish date, language, excerpt
68
-
69
- ## Project layout
70
-
71
- - `bin/` - runtime CLIs (`search.mjs`, `launch.mjs`, `cdp.mjs`, `coding-task.mjs`)
72
- - `extractors/` - engine-specific automation
73
- - `src/` - ranking/fetching/formatting internals (includes `reddit.mjs`, `github.mjs`, `fetcher.mjs`)
74
- - `skills/` - Pi skill metadata
75
-
76
- ## Testing
77
-
78
- Cross-platform test runner (Windows + Unix):
79
- ```bash
80
- npm test # run all tests
81
- npm run test:quick # skip slow tests
82
- npm run test:smoke # basic health check
83
- ```
84
-
85
- Full bash test suite (Unix only):
86
- ```bash
87
- npm run test:bash # comprehensive tests
88
- ./test.sh parallel # race condition tests
89
- ./test.sh flags # flag/option tests
90
- ```
91
-
92
- ## Changelog
93
-
94
- See `CHANGELOG.md`.
95
-
96
- ## License
97
-
98
- MIT
1
+ # GreedySearch for Pi
2
+
3
+ ![GreedySearch](docs/banner.svg)
4
+
5
+ Multi-engine AI web search for Pi via browser automation.
6
+
7
+ - No API keys
8
+ - Real browser results (Perplexity, Bing Copilot, Google AI)
9
+ - Optional Gemini synthesis with source grounding
10
+ - Chrome runs headless by default — no window, purely background
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ pi install npm:@apmantza/greedysearch-pi
16
+ ```
17
+
18
+ Or from git:
19
+
20
+ ```bash
21
+ pi install git:github.com/apmantza/GreedySearch-pi
22
+ ```
23
+
24
+ ## Tools
25
+
26
+ - `greedy_search` — multi-engine AI web search
27
+ - `websearch` — lightweight DuckDuckGo/Brave search (via pi-webaio)
28
+ - `webfetch` / `webpull` — page fetching and site crawling (via pi-webaio)
29
+
30
+ ## Quick usage
31
+
32
+ ```js
33
+ greedy_search({ query: "React 19 changes" });
34
+ greedy_search({ query: "Prisma vs Drizzle", engine: "all", depth: "fast" });
35
+ greedy_search({
36
+ query: "Best auth architecture 2026",
37
+ engine: "all",
38
+ depth: "deep",
39
+ });
40
+ // Headless is the default — no window. To see the browser:
41
+ // Set GREEDY_SEARCH_VISIBLE=1 before launching Pi
42
+ ```
43
+
44
+ ## Parameters (`greedy_search`)
45
+
46
+ - `query` (required)
47
+ - `engine`: `all` (default), `perplexity`, `bing`, `google`, `gemini`
48
+ - `depth`: `standard` (default), `fast`, `deep`
49
+ - `fullAnswer`: return full single-engine output instead of preview
50
+ - `headless`: set to `false` to show Chrome window (default: `true`)
51
+
52
+ ## Environment variables
53
+
54
+ | Variable | Default | Description |
55
+ | ------------------------------------ | ------------- | --------------------------------------------------------- |
56
+ | `GREEDY_SEARCH_VISIBLE` | (unset) | Set to `1` to show Chrome window instead of headless |
57
+ | `GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES` | `5` | Minutes of inactivity before auto-killing headless Chrome |
58
+ | `GREEDY_SEARCH_LOCALE` | `en` | Default result language (en, de, fr, es, ja, etc.) |
59
+ | `CHROME_PATH` | auto-detected | Path to Chrome/Chromium executable |
60
+
61
+ ## Depth modes
62
+
63
+ - `fast` - quickest, no synthesis/source fetching
64
+ - `standard` - balanced default for `engine: "all"` (synthesis + fetched sources)
65
+ - `deep` - strongest grounding and confidence metadata
66
+
67
+ ## Runtime commands
68
+
69
+ ````bash
70
+ # Headless (default, no GUI)
71
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs
72
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --status
73
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --kill
74
+
75
+ # Visible (show browser window — useful for one-time Cloudflare clearance)
76
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch-visible.mjs
77
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch-visible.mjs --kill
78
+
79
+ # Chrome auto-cleaned after 5 min idle (prevents OOM)
80
+ # Override: GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES=10
81
+
82
+ ## Requirements
83
+
84
+ - Chrome
85
+ - Node.js 20.11.0+ (22+ recommended)
86
+
87
+ ## Known engine quirks
88
+
89
+ ### Bing Copilot
90
+
91
+ Bing Copilot detects headless Chrome and sandboxes all AI responses inside nested iframes (`copilot.microsoft.com` → `copilot.fun` → `blob:`). In this mode the copy button is hidden and the Cloudflare Turnstile challenge blocks content delivery. The clipboard-based extraction cannot work.
92
+
93
+ **Auto-recovery:** When Bing fails with any extraction error (clipboard, verification, Cloudflare), GreedySearch automatically switches to **visible Chrome**, retries the search, and caches Cloudflare clearance cookies in the Chrome profile. You may need to solve the Cloudflare challenge **once** manually when the visible Chrome window appears. After that, all subsequent headless searches bypass the challenge — the cookies persist in the profile.
94
+
95
+ If you prefer to skip the auto-recovery delay, launch visible Chrome ahead of time:
96
+
97
+ ```bash
98
+ node ~/.pi/agent/git/GreedySearch-pi/bin/launch-visible.mjs
99
+ ````
100
+
101
+ ## Anti-detection
102
+
103
+ Headless Chrome auto-injects stealth patches before any page JavaScript runs:
104
+
105
+ - `navigator.webdriver` hidden, plugins/languages faked, `window.chrome` shimmed
106
+ - WebGL vendor spoofed (Intel Iris), realistic hardware concurrency / memory
107
+ - CDP automation markers deleted, `requestAnimationFrame` kept alive
108
+ - Human-like click simulation with coordinate jitter and variable delays
109
+
110
+ This bypasses casual bot detection (basic `navigator.webdriver` checks) but does not defeat commercial anti-bot services (DataDome, PerimeterX, Kasada). **Bing Copilot specifically detects headless and sandboxes responses behind Cloudflare Turnstile** — see [Known engine quirks](#known-engine-quirks) for the auto-recovery mechanism.
111
+
112
+ When using `depth: "standard"` or `depth: "deep"`, source content is fetched and synthesized:
113
+
114
+ - **Reddit** — Uses Reddit's public `.json` API for posts and comments (no scraping)
115
+ - **GitHub** — Uses GitHub REST API for repos, READMEs, and file trees
116
+ - **General web** — Mozilla Readability extraction with browser fallback for bot-blocked pages
117
+ - **Metadata** — title, author/byline, site name, publish date, language, excerpt
118
+
119
+ ## Project layout
120
+
121
+ - `bin/` — runtime CLIs (`search.mjs`, `launch.mjs`, `launch-visible.mjs`, `visible.mjs`, `cdp.mjs`)
122
+ - `extractors/` — engine-specific automation + stealth/consent handling
123
+ - `src/` — search pipeline, chrome management, source fetching, formatting
124
+ - `skills/` — Pi skill metadata
125
+
126
+ ## Testing
127
+
128
+ Cross-platform test runner (Windows + Unix):
129
+
130
+ ```bash
131
+ npm test # run all tests
132
+ npm run test:quick # skip slow tests
133
+ npm run test:smoke # basic health check
134
+ ```
135
+
136
+ Full bash test suite (Unix only):
137
+
138
+ ```bash
139
+ npm run test:bash # comprehensive tests
140
+ ./test.sh parallel # race condition tests
141
+ ./test.sh flags # flag/option tests
142
+ ```
143
+
144
+ ## Changelog
145
+
146
+ See `CHANGELOG.md`.
147
+
148
+ ## License
149
+
150
+ MIT