@apmantza/greedysearch-pi 1.8.4 → 1.8.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +347 -188
- package/README.md +150 -98
- package/bin/cdp.mjs +1010 -1004
- package/bin/launch-visible.mjs +233 -0
- package/bin/launch.mjs +410 -366
- package/bin/search.mjs +501 -388
- package/bin/visible.mjs +50 -0
- package/docs/banner.png +0 -0
- package/extractors/bing-copilot.mjs +264 -155
- package/extractors/common.mjs +387 -291
- package/extractors/consent.mjs +294 -273
- package/extractors/gemini.mjs +157 -146
- package/extractors/google-ai.mjs +126 -125
- package/extractors/google-search.mjs +234 -0
- package/extractors/perplexity.mjs +148 -147
- package/extractors/selectors.mjs +54 -54
- package/index.ts +123 -256
- package/package.json +55 -53
- package/src/fetcher.mjs +20 -11
- package/src/formatters/results.ts +0 -113
- package/src/github.mjs +254 -237
- package/src/reddit.mjs +17 -6
- package/src/search/chrome.mjs +165 -11
- package/src/search/constants.mjs +9 -4
- package/src/search/defaults.mjs +14 -14
- package/src/search/engines.mjs +66 -62
- package/src/search/fetch-source.mjs +10 -10
- package/src/search/output.mjs +60 -59
- package/src/search/sources.mjs +446 -446
- package/src/search/synthesis-runner.mjs +40 -8
- package/src/search/synthesis.mjs +223 -223
- package/src/tools/greedy-search-handler.ts +64 -14
- package/src/tools/shared.ts +34 -13
- package/src/types.ts +103 -103
- package/test.mjs +496 -423
- package/bin/coding-task.mjs +0 -396
- package/src/formatters/coding.ts +0 -68
- package/src/tools/deep-research-handler.ts +0 -37
package/README.md
CHANGED
|
@@ -1,98 +1,150 @@
|
|
|
1
|
-
# GreedySearch for Pi
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
-
|
|
6
|
-
|
|
7
|
-
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
```
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
- Chrome
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
-
|
|
65
|
-
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
1
|
+
# GreedySearch for Pi
|
|
2
|
+
|
|
3
|
+

|
|
4
|
+
|
|
5
|
+
Multi-engine AI web search for Pi via browser automation.
|
|
6
|
+
|
|
7
|
+
- No API keys
|
|
8
|
+
- Real browser results (Perplexity, Bing Copilot, Google AI)
|
|
9
|
+
- Optional Gemini synthesis with source grounding
|
|
10
|
+
- Chrome runs headless by default — no window, purely background
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
pi install npm:@apmantza/greedysearch-pi
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
Or from git:
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
pi install git:github.com/apmantza/GreedySearch-pi
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Tools
|
|
25
|
+
|
|
26
|
+
- `greedy_search` — multi-engine AI web search
|
|
27
|
+
- `websearch` — lightweight DuckDuckGo/Brave search (via pi-webaio)
|
|
28
|
+
- `webfetch` / `webpull` — page fetching and site crawling (via pi-webaio)
|
|
29
|
+
|
|
30
|
+
## Quick usage
|
|
31
|
+
|
|
32
|
+
```js
|
|
33
|
+
greedy_search({ query: "React 19 changes" });
|
|
34
|
+
greedy_search({ query: "Prisma vs Drizzle", engine: "all", depth: "fast" });
|
|
35
|
+
greedy_search({
|
|
36
|
+
query: "Best auth architecture 2026",
|
|
37
|
+
engine: "all",
|
|
38
|
+
depth: "deep",
|
|
39
|
+
});
|
|
40
|
+
// Headless is the default — no window. To see the browser:
|
|
41
|
+
// Set GREEDY_SEARCH_VISIBLE=1 before launching Pi
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Parameters (`greedy_search`)
|
|
45
|
+
|
|
46
|
+
- `query` (required)
|
|
47
|
+
- `engine`: `all` (default), `perplexity`, `bing`, `google`, `gemini`
|
|
48
|
+
- `depth`: `standard` (default), `fast`, `deep`
|
|
49
|
+
- `fullAnswer`: return full single-engine output instead of preview
|
|
50
|
+
- `headless`: set to `false` to show Chrome window (default: `true`)
|
|
51
|
+
|
|
52
|
+
## Environment variables
|
|
53
|
+
|
|
54
|
+
| Variable | Default | Description |
|
|
55
|
+
| ------------------------------------ | ------------- | --------------------------------------------------------- |
|
|
56
|
+
| `GREEDY_SEARCH_VISIBLE` | (unset) | Set to `1` to show Chrome window instead of headless |
|
|
57
|
+
| `GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES` | `5` | Minutes of inactivity before auto-killing headless Chrome |
|
|
58
|
+
| `GREEDY_SEARCH_LOCALE` | `en` | Default result language (en, de, fr, es, ja, etc.) |
|
|
59
|
+
| `CHROME_PATH` | auto-detected | Path to Chrome/Chromium executable |
|
|
60
|
+
|
|
61
|
+
## Depth modes
|
|
62
|
+
|
|
63
|
+
- `fast` - quickest, no synthesis/source fetching
|
|
64
|
+
- `standard` - balanced default for `engine: "all"` (synthesis + fetched sources)
|
|
65
|
+
- `deep` - strongest grounding and confidence metadata
|
|
66
|
+
|
|
67
|
+
## Runtime commands
|
|
68
|
+
|
|
69
|
+
````bash
|
|
70
|
+
# Headless (default, no GUI)
|
|
71
|
+
node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs
|
|
72
|
+
node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --status
|
|
73
|
+
node ~/.pi/agent/git/GreedySearch-pi/bin/launch.mjs --kill
|
|
74
|
+
|
|
75
|
+
# Visible (show browser window — useful for one-time Cloudflare clearance)
|
|
76
|
+
node ~/.pi/agent/git/GreedySearch-pi/bin/launch-visible.mjs
|
|
77
|
+
node ~/.pi/agent/git/GreedySearch-pi/bin/launch-visible.mjs --kill
|
|
78
|
+
|
|
79
|
+
# Chrome auto-cleaned after 5 min idle (prevents OOM)
|
|
80
|
+
# Override: GREEDY_SEARCH_IDLE_TIMEOUT_MINUTES=10
|
|
81
|
+
|
|
82
|
+
## Requirements
|
|
83
|
+
|
|
84
|
+
- Chrome
|
|
85
|
+
- Node.js 20.11.0+ (22+ recommended)
|
|
86
|
+
|
|
87
|
+
## Known engine quirks
|
|
88
|
+
|
|
89
|
+
### Bing Copilot
|
|
90
|
+
|
|
91
|
+
Bing Copilot detects headless Chrome and sandboxes all AI responses inside nested iframes (`copilot.microsoft.com` → `copilot.fun` → `blob:`). In this mode the copy button is hidden and the Cloudflare Turnstile challenge blocks content delivery. The clipboard-based extraction cannot work.
|
|
92
|
+
|
|
93
|
+
**Auto-recovery:** When Bing fails with any extraction error (clipboard, verification, Cloudflare), GreedySearch automatically switches to **visible Chrome**, retries the search, and caches Cloudflare clearance cookies in the Chrome profile. You may need to solve the Cloudflare challenge **once** manually when the visible Chrome window appears. After that, all subsequent headless searches bypass the challenge — the cookies persist in the profile.
|
|
94
|
+
|
|
95
|
+
If you prefer to skip the auto-recovery delay, launch visible Chrome ahead of time:
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
node ~/.pi/agent/git/GreedySearch-pi/bin/launch-visible.mjs
|
|
99
|
+
````
|
|
100
|
+
|
|
101
|
+
## Anti-detection
|
|
102
|
+
|
|
103
|
+
Headless Chrome auto-injects stealth patches before any page JavaScript runs:
|
|
104
|
+
|
|
105
|
+
- `navigator.webdriver` hidden, plugins/languages faked, `window.chrome` shimmed
|
|
106
|
+
- WebGL vendor spoofed (Intel Iris), realistic hardware concurrency / memory
|
|
107
|
+
- CDP automation markers deleted, `requestAnimationFrame` kept alive
|
|
108
|
+
- Human-like click simulation with coordinate jitter and variable delays
|
|
109
|
+
|
|
110
|
+
This bypasses casual bot detection (basic `navigator.webdriver` checks) but does not defeat commercial anti-bot services (DataDome, PerimeterX, Kasada). **Bing Copilot specifically detects headless and sandboxes responses behind Cloudflare Turnstile** — see [Known engine quirks](#known-engine-quirks) for the auto-recovery mechanism.
|
|
111
|
+
|
|
112
|
+
When using `depth: "standard"` or `depth: "deep"`, source content is fetched and synthesized:
|
|
113
|
+
|
|
114
|
+
- **Reddit** — Uses Reddit's public `.json` API for posts and comments (no scraping)
|
|
115
|
+
- **GitHub** — Uses GitHub REST API for repos, READMEs, and file trees
|
|
116
|
+
- **General web** — Mozilla Readability extraction with browser fallback for bot-blocked pages
|
|
117
|
+
- **Metadata** — title, author/byline, site name, publish date, language, excerpt
|
|
118
|
+
|
|
119
|
+
## Project layout
|
|
120
|
+
|
|
121
|
+
- `bin/` — runtime CLIs (`search.mjs`, `launch.mjs`, `launch-visible.mjs`, `visible.mjs`, `cdp.mjs`)
|
|
122
|
+
- `extractors/` — engine-specific automation + stealth/consent handling
|
|
123
|
+
- `src/` — search pipeline, chrome management, source fetching, formatting
|
|
124
|
+
- `skills/` — Pi skill metadata
|
|
125
|
+
|
|
126
|
+
## Testing
|
|
127
|
+
|
|
128
|
+
Cross-platform test runner (Windows + Unix):
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
npm test # run all tests
|
|
132
|
+
npm run test:quick # skip slow tests
|
|
133
|
+
npm run test:smoke # basic health check
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Full bash test suite (Unix only):
|
|
137
|
+
|
|
138
|
+
```bash
|
|
139
|
+
npm run test:bash # comprehensive tests
|
|
140
|
+
./test.sh parallel # race condition tests
|
|
141
|
+
./test.sh flags # flag/option tests
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
## Changelog
|
|
145
|
+
|
|
146
|
+
See `CHANGELOG.md`.
|
|
147
|
+
|
|
148
|
+
## License
|
|
149
|
+
|
|
150
|
+
MIT
|