@apmantza/greedysearch-pi 1.7.6 → 1.7.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +143 -143
- package/bin/search.mjs +23 -8
- package/index.ts +1 -1
- package/package.json +46 -46
- package/skills/greedy-search/skill.md +44 -44
- package/src/fetcher.mjs +48 -0
- package/src/github.mjs +232 -323
package/CHANGELOG.md
CHANGED
|
@@ -1,143 +1,143 @@
|
|
|
1
|
-
# Changelog
|
|
2
|
-
|
|
3
|
-
## v1.7.6 (2026-04-11)
|
|
4
|
-
|
|
5
|
-
### Fixes
|
|
6
|
-
- **Close Gemini synthesis tab** — after synthesis completes, the Gemini tab is now closed instead of merely activated, preventing stale tabs from accumulating across searches.
|
|
7
|
-
|
|
8
|
-
## v1.7.5 (2026-04-10)
|
|
9
|
-
|
|
10
|
-
### Plugin
|
|
11
|
-
- **Claude Code plugin** — added `.claude-plugin/plugin.json` and `marketplace.json` so GreedySearch can be installed directly as a Claude Code plugin via `claude plugin install`.
|
|
12
|
-
- **Auto-mirror GH Action** — every push to `GreedySearch-pi/master` automatically syncs to `GreedySearch-claude/main`, keeping the Claude plugin up to date.
|
|
13
|
-
- **Tightened `skill.md`** — removed verbose guidance sections; kept parameters, depth table, and coding_task reference. -72 lines.
|
|
14
|
-
|
|
15
|
-
## v1.7.4 (2026-04-10)
|
|
16
|
-
|
|
17
|
-
### Refactor
|
|
18
|
-
- **Shared `waitForCopyButton()`** — consolidated duplicate copy-button polling loops from `bing-copilot`, `gemini`, and `coding-task` into a single `waitForCopyButton(tab, selector, { timeout, onPoll })` in `common.mjs`. Gemini's scroll-to-bottom logic passed as `onPoll` callback.
|
|
19
|
-
- **Shared `TIMING` constants** — replaced 30+ scattered `setTimeout` magic numbers with named constants (`postNav`, `postNavSlow`, `postClick`, `postType`, `inputPoll`, `copyPoll`, `afterVerify`) in `common.mjs`.
|
|
20
|
-
- **`waitForStreamComplete` improvements** — added `minLength` option and graceful last-value fallback; `google-ai` now uses the shared implementation instead of its own copy.
|
|
21
|
-
- **Removed dead code** — deleted unused `_getOrReuseBlankTab` and `_getOrOpenEngineTab` from `bin/search.mjs`; removed unused `STREAM_POLL_INTERVAL` and `STREAM_STABLE_ROUNDS` from `coding-task`.
|
|
22
|
-
|
|
23
|
-
### Fixes
|
|
24
|
-
- **Synthesis tab regression** — `getOrOpenEngineTab("gemini")` call during synthesis was broken by the dead-code removal; replaced with `openNewTab()`.
|
|
25
|
-
|
|
26
|
-
## v1.7.3 (2026-04-10)
|
|
27
|
-
|
|
28
|
-
### Fixes
|
|
29
|
-
- **Force English in Google AI results** — Added `hl=en` query parameter to Google AI Mode search URL so responses are always returned in English, regardless of the user's IP-based region (fixes #1).
|
|
30
|
-
|
|
31
|
-
## v1.7.2 (2026-04-08)
|
|
32
|
-
|
|
33
|
-
### Release
|
|
34
|
-
- **Patch release** — version bump and npm package verification for the `bin/` runtime layout (`bin/search.mjs`, `bin/launch.mjs`, `bin/cdp.mjs`, `bin/coding-task.mjs`).
|
|
35
|
-
|
|
36
|
-
## v1.7.1 (2026-04-08)
|
|
37
|
-
|
|
38
|
-
### Performance
|
|
39
|
-
- **Bounded source-fetch concurrency** — source fetching now uses a small worker pool (default `2`, configurable via `GREEDY_FETCH_CONCURRENCY`) to reduce burstiness while keeping deep-research fast.
|
|
40
|
-
|
|
41
|
-
### Project structure
|
|
42
|
-
- **Runtime scripts moved to `bin/`** — `search.mjs`, `launch.mjs`, `cdp.mjs`, and `coding-task.mjs` now live under `bin/` for a cleaner repository root.
|
|
43
|
-
- **Path references updated** — extension runtime, tests, extractor shared utilities, and docs now point to `bin/*` paths.
|
|
44
|
-
|
|
45
|
-
### Packaging & docs
|
|
46
|
-
- **Package file list updated** — npm package now includes `bin/` directly instead of root script entries.
|
|
47
|
-
- **README simplified** — rewritten into a shorter, concise format with quick install, usage, and layout guidance.
|
|
48
|
-
|
|
49
|
-
## v1.6.5 (2026-04-04)
|
|
50
|
-
|
|
51
|
-
### Security
|
|
52
|
-
- **Private URL blocking** — Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
|
|
53
|
-
|
|
54
|
-
### Features
|
|
55
|
-
- **GitHub URL rewriting** — GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
|
|
56
|
-
- **GitHub repo cloning** — Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
|
|
57
|
-
- **Head+tail content trimming** — Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
|
|
58
|
-
- **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
|
|
59
|
-
|
|
60
|
-
### Fixes
|
|
61
|
-
- **Perplexity clipboard retry** — Added single retry with 2s delay when clipboard extraction fails, improving reliability.
|
|
62
|
-
|
|
63
|
-
## v1.6.4 (2026-04-02)
|
|
64
|
-
|
|
65
|
-
### Fixes
|
|
66
|
-
- **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
|
|
67
|
-
- **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
|
|
68
|
-
|
|
69
|
-
## v1.6.3 (2026-04-02)
|
|
70
|
-
|
|
71
|
-
### Fixes
|
|
72
|
-
- **Debug output removed** — Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
|
|
73
|
-
|
|
74
|
-
## v1.6.2 (2026-04-01)
|
|
75
|
-
|
|
76
|
-
### Fixes
|
|
77
|
-
- **Anti-bot detection evasion** — Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
|
|
78
|
-
|
|
79
|
-
## v1.6.1 (2026-03-31)
|
|
80
|
-
|
|
81
|
-
### Features
|
|
82
|
-
- **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
|
|
83
|
-
|
|
84
|
-
### Code Quality
|
|
85
|
-
- **Major refactoring** — extracted 438 lines from `index.ts` (856 → 418 lines) into modular formatters:
|
|
86
|
-
- `src/formatters/coding.ts` — coding task formatting
|
|
87
|
-
- `src/formatters/results.ts` — search and deep research formatting
|
|
88
|
-
- `src/formatters/sources.ts` — source utilities (URL, label, consensus, formatting)
|
|
89
|
-
- `src/formatters/synthesis.ts` — synthesis rendering
|
|
90
|
-
- `src/utils/helpers.ts` — shared formatting utilities
|
|
91
|
-
- **Complexity reduced** — cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
|
|
92
|
-
- **Eliminated code duplication** — removed 6 duplicate blocks, consolidated 4+ single-use helper functions
|
|
93
|
-
|
|
94
|
-
### Documentation
|
|
95
|
-
- Clarified `greedy_search` is WEB SEARCH ONLY — removed "NOT for codebase search" from tool description (still in skill documentation)
|
|
96
|
-
|
|
97
|
-
## v1.6.0 (2026-03-29)
|
|
98
|
-
|
|
99
|
-
### Breaking Changes (Backward Compatible)
|
|
100
|
-
- **Merged deep_research into greedy_search** — new `depth` parameter with three levels:
|
|
101
|
-
- `fast`: single engine (~15-30s)
|
|
102
|
-
- `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
|
|
103
|
-
- `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
|
|
104
|
-
- **Simpler mental model** — one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
|
|
105
|
-
- **Deprecated flags still work** — `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
|
|
106
|
-
- **deep_research tool aliased** — still works, calls `greedy_search` with `depth: "deep"`
|
|
107
|
-
|
|
108
|
-
### Documentation
|
|
109
|
-
- Updated README with new `depth` parameter and examples
|
|
110
|
-
- Updated skill documentation (SKILL.md) to reflect simplified API
|
|
111
|
-
|
|
112
|
-
## v1.5.1 (2026-03-29)
|
|
113
|
-
|
|
114
|
-
- **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
|
|
115
|
-
|
|
116
|
-
## v1.5.0 (2026-03-29)
|
|
117
|
-
|
|
118
|
-
### Features
|
|
119
|
-
- **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
|
|
120
|
-
- **Chrome targeting hardened** — all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
|
|
121
|
-
- **Shared utilities** — extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
|
|
122
|
-
- **Documentation leaner** — skill documentation reduced 61% (180 → 70 lines) while preserving all decision-making info
|
|
123
|
-
|
|
124
|
-
### Notable
|
|
125
|
-
- **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
|
|
126
|
-
|
|
127
|
-
## v1.4.2 (2026-03-25)
|
|
128
|
-
|
|
129
|
-
- **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
|
|
130
|
-
- **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
|
|
131
|
-
- **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
|
|
132
|
-
|
|
133
|
-
## v1.4.1
|
|
134
|
-
|
|
135
|
-
- **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
|
|
136
|
-
|
|
137
|
-
## v1.4.0
|
|
138
|
-
|
|
139
|
-
- **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
|
|
140
|
-
- **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
|
|
141
|
-
- **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
|
|
142
|
-
- **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
|
|
143
|
-
- **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## v1.7.6 (2026-04-11)
|
|
4
|
+
|
|
5
|
+
### Fixes
|
|
6
|
+
- **Close Gemini synthesis tab** — after synthesis completes, the Gemini tab is now closed instead of merely activated, preventing stale tabs from accumulating across searches.
|
|
7
|
+
|
|
8
|
+
## v1.7.5 (2026-04-10)
|
|
9
|
+
|
|
10
|
+
### Plugin
|
|
11
|
+
- **Claude Code plugin** — added `.claude-plugin/plugin.json` and `marketplace.json` so GreedySearch can be installed directly as a Claude Code plugin via `claude plugin install`.
|
|
12
|
+
- **Auto-mirror GH Action** — every push to `GreedySearch-pi/master` automatically syncs to `GreedySearch-claude/main`, keeping the Claude plugin up to date.
|
|
13
|
+
- **Tightened `skill.md`** — removed verbose guidance sections; kept parameters, depth table, and coding_task reference. -72 lines.
|
|
14
|
+
|
|
15
|
+
## v1.7.4 (2026-04-10)
|
|
16
|
+
|
|
17
|
+
### Refactor
|
|
18
|
+
- **Shared `waitForCopyButton()`** — consolidated duplicate copy-button polling loops from `bing-copilot`, `gemini`, and `coding-task` into a single `waitForCopyButton(tab, selector, { timeout, onPoll })` in `common.mjs`. Gemini's scroll-to-bottom logic passed as `onPoll` callback.
|
|
19
|
+
- **Shared `TIMING` constants** — replaced 30+ scattered `setTimeout` magic numbers with named constants (`postNav`, `postNavSlow`, `postClick`, `postType`, `inputPoll`, `copyPoll`, `afterVerify`) in `common.mjs`.
|
|
20
|
+
- **`waitForStreamComplete` improvements** — added `minLength` option and graceful last-value fallback; `google-ai` now uses the shared implementation instead of its own copy.
|
|
21
|
+
- **Removed dead code** — deleted unused `_getOrReuseBlankTab` and `_getOrOpenEngineTab` from `bin/search.mjs`; removed unused `STREAM_POLL_INTERVAL` and `STREAM_STABLE_ROUNDS` from `coding-task`.
|
|
22
|
+
|
|
23
|
+
### Fixes
|
|
24
|
+
- **Synthesis tab regression** — `getOrOpenEngineTab("gemini")` call during synthesis was broken by the dead-code removal; replaced with `openNewTab()`.
|
|
25
|
+
|
|
26
|
+
## v1.7.3 (2026-04-10)
|
|
27
|
+
|
|
28
|
+
### Fixes
|
|
29
|
+
- **Force English in Google AI results** — Added `hl=en` query parameter to Google AI Mode search URL so responses are always returned in English, regardless of the user's IP-based region (fixes #1).
|
|
30
|
+
|
|
31
|
+
## v1.7.2 (2026-04-08)
|
|
32
|
+
|
|
33
|
+
### Release
|
|
34
|
+
- **Patch release** — version bump and npm package verification for the `bin/` runtime layout (`bin/search.mjs`, `bin/launch.mjs`, `bin/cdp.mjs`, `bin/coding-task.mjs`).
|
|
35
|
+
|
|
36
|
+
## v1.7.1 (2026-04-08)
|
|
37
|
+
|
|
38
|
+
### Performance
|
|
39
|
+
- **Bounded source-fetch concurrency** — source fetching now uses a small worker pool (default `2`, configurable via `GREEDY_FETCH_CONCURRENCY`) to reduce burstiness while keeping deep-research fast.
|
|
40
|
+
|
|
41
|
+
### Project structure
|
|
42
|
+
- **Runtime scripts moved to `bin/`** — `search.mjs`, `launch.mjs`, `cdp.mjs`, and `coding-task.mjs` now live under `bin/` for a cleaner repository root.
|
|
43
|
+
- **Path references updated** — extension runtime, tests, extractor shared utilities, and docs now point to `bin/*` paths.
|
|
44
|
+
|
|
45
|
+
### Packaging & docs
|
|
46
|
+
- **Package file list updated** — npm package now includes `bin/` directly instead of root script entries.
|
|
47
|
+
- **README simplified** — rewritten into a shorter, concise format with quick install, usage, and layout guidance.
|
|
48
|
+
|
|
49
|
+
## v1.6.5 (2026-04-04)
|
|
50
|
+
|
|
51
|
+
### Security
|
|
52
|
+
- **Private URL blocking** — Added validation to block requests to localhost, RFC1918 private addresses (10.x, 192.168.x), and .local/.internal domains. Prevents accidental exposure of internal services.
|
|
53
|
+
|
|
54
|
+
### Features
|
|
55
|
+
- **GitHub URL rewriting** — GitHub blob URLs (`github.com/owner/repo/blob/...`) are automatically rewritten to `raw.githubusercontent.com` for faster, cleaner raw file access.
|
|
56
|
+
- **GitHub repo cloning** — Root and tree URLs now trigger `git clone --depth 1` for complete repo access. Agent can explore files locally instead of parsing rendered HTML. Includes README preview and directory tree listing.
|
|
57
|
+
- **Head+tail content trimming** — Large documents now use smart truncation: keeps 75% from the beginning (introduction) + 25% from the end (conclusions/examples) with `[...content trimmed...]` marker, instead of simple truncation.
|
|
58
|
+
- **Anubis bot detection** — Added detection for the new Anubis proof-of-work anti-bot system (`protected by anubis`, `anubis uses a proof-of-work`).
|
|
59
|
+
|
|
60
|
+
### Fixes
|
|
61
|
+
- **Perplexity clipboard retry** — Added single retry with 2s delay when clipboard extraction fails, improving reliability.
|
|
62
|
+
|
|
63
|
+
## v1.6.4 (2026-04-02)
|
|
64
|
+
|
|
65
|
+
### Fixes
|
|
66
|
+
- **Gemini scroll-to-bottom** — Changed from small random jitter scrolls to actual bottom-of-page scrolls every ~6 seconds while waiting for the copy button. This ensures lazy-loaded content is triggered and the full answer is captured.
|
|
67
|
+
- **Restored missing files** — `.mjs` source files (extractors, search.mjs, launch.mjs, etc.) were incorrectly removed in v1.6.2 cleanup; now properly tracked again.
|
|
68
|
+
|
|
69
|
+
## v1.6.3 (2026-04-02)
|
|
70
|
+
|
|
71
|
+
### Fixes
|
|
72
|
+
- **Debug output removed** — Cleaned up stderr passthrough that was causing CDP connection issues in some environments.
|
|
73
|
+
|
|
74
|
+
## v1.6.2 (2026-04-01)
|
|
75
|
+
|
|
76
|
+
### Fixes
|
|
77
|
+
- **Anti-bot detection evasion** — Gemini synthesis now performs gentle scroll every ~6 seconds while waiting for the copy button. This prevents the button from hanging due to anti-bot "human activity" checks.
|
|
78
|
+
|
|
79
|
+
## v1.6.1 (2026-03-31)
|
|
80
|
+
|
|
81
|
+
### Features
|
|
82
|
+
- **Single-engine full answers by default** — when using `engine: "perplexity"`, `engine: "bing"`, `engine: "google"`, or `engine: "gemini"`, the full answer is now returned by default instead of truncated previews. Multi-engine (`engine: "all"`) still uses truncated previews (~300 chars) to save tokens during synthesis. Explicit `fullAnswer: true/false` always overrides.
|
|
83
|
+
|
|
84
|
+
### Code Quality
|
|
85
|
+
- **Major refactoring** — extracted 438 lines from `index.ts` (856 → 418 lines) into modular formatters:
|
|
86
|
+
- `src/formatters/coding.ts` — coding task formatting
|
|
87
|
+
- `src/formatters/results.ts` — search and deep research formatting
|
|
88
|
+
- `src/formatters/sources.ts` — source utilities (URL, label, consensus, formatting)
|
|
89
|
+
- `src/formatters/synthesis.ts` — synthesis rendering
|
|
90
|
+
- `src/utils/helpers.ts` — shared formatting utilities
|
|
91
|
+
- **Complexity reduced** — cognitive complexity dropped from 360 to ~60, maintainability index improved from 11.2 to ~40+
|
|
92
|
+
- **Eliminated code duplication** — removed 6 duplicate blocks, consolidated 4+ single-use helper functions
|
|
93
|
+
|
|
94
|
+
### Documentation
|
|
95
|
+
- Clarified `greedy_search` is WEB SEARCH ONLY — removed "NOT for codebase search" from tool description (still in skill documentation)
|
|
96
|
+
|
|
97
|
+
## v1.6.0 (2026-03-29)
|
|
98
|
+
|
|
99
|
+
### Breaking Changes (Backward Compatible)
|
|
100
|
+
- **Merged deep_research into greedy_search** — new `depth` parameter with three levels:
|
|
101
|
+
- `fast`: single engine (~15-30s)
|
|
102
|
+
- `standard`: 3 engines + synthesis (~30-90s, default for `engine: "all"`)
|
|
103
|
+
- `deep`: 3 engines + source fetching + synthesis + confidence (~60-180s)
|
|
104
|
+
- **Simpler mental model** — one tool with clear speed/quality tradeoffs instead of separate tools with overlapping flags
|
|
105
|
+
- **Deprecated flags still work** — `--synthesize` maps to `depth: "standard"`, `--deep-research` maps to `depth: "deep"`
|
|
106
|
+
- **deep_research tool aliased** — still works, calls `greedy_search` with `depth: "deep"`
|
|
107
|
+
|
|
108
|
+
### Documentation
|
|
109
|
+
- Updated README with new `depth` parameter and examples
|
|
110
|
+
- Updated skill documentation (SKILL.md) to reflect simplified API
|
|
111
|
+
|
|
112
|
+
## v1.5.1 (2026-03-29)
|
|
113
|
+
|
|
114
|
+
- **Fixed npm package** — added `.pi-lens/` and test files to `.npmignore` to reduce package size
|
|
115
|
+
|
|
116
|
+
## v1.5.0 (2026-03-29)
|
|
117
|
+
|
|
118
|
+
### Features
|
|
119
|
+
- **Code extraction fixed** — `coding_task` now uses clipboard interception to preserve markdown code blocks (was losing them via DOM scraping)
|
|
120
|
+
- **Chrome targeting hardened** — all tools now consistently target the dedicated GreedySearch Chrome via `CDP_PROFILE_DIR`, preventing fallback to user's main Chrome session
|
|
121
|
+
- **Shared utilities** — extracted ~220 lines of duplicate code from extractors into `common.mjs` (cdp wrapper, tab management, clipboard interception)
|
|
122
|
+
- **Documentation leaner** — skill documentation reduced 61% (180 → 70 lines) while preserving all decision-making info
|
|
123
|
+
|
|
124
|
+
### Notable
|
|
125
|
+
- **NO API KEYS** — updated messaging to emphasize this works via browser automation, no API keys needed
|
|
126
|
+
|
|
127
|
+
## v1.4.2 (2026-03-25)
|
|
128
|
+
|
|
129
|
+
- **Fresh isolated tabs** — each search now always creates a new `about:blank` tab via `Target.createTarget` and refreshes the CDP page cache immediately after, preventing SPA navigation failures and stale DOM state from prior queries
|
|
130
|
+
- **Regex-based citation extraction** — all extractors (Perplexity, Bing, Gemini) now parse sources from clipboard Markdown links (`[title](url)`) instead of DOM selectors that break on UI updates
|
|
131
|
+
- **Relaxed verification detection** — `consent.mjs` now uses broad keyword matching (`includes('verify')`, `includes('human')`) instead of anchored regexes, correctly catching button text variants like "Verify you are human" across Cloudflare, Microsoft, and generic modals
|
|
132
|
+
|
|
133
|
+
## v1.4.1
|
|
134
|
+
|
|
135
|
+
- **Fixed parallel synthesis** — multiple `greedy_search` calls with `synthesize: true` now run safely in parallel. Each search creates a fresh Gemini tab that gets cleaned up after synthesis, preventing tab conflicts and "Uncaught" errors.
|
|
136
|
+
|
|
137
|
+
## v1.4.0
|
|
138
|
+
|
|
139
|
+
- **Grounded synthesis** — Gemini now receives a normalized source registry with stable source IDs, agreement summaries, caveats, and cited claims
|
|
140
|
+
- **Real deep research** — top sources are fetched before synthesis so deep research answers are grounded in fetched evidence, not just engine summaries
|
|
141
|
+
- **Richer source metadata** — source output now includes canonical URLs, domains, source types, per-engine attribution, and confidence metadata
|
|
142
|
+
- **Cleaner tab lifecycle** — temporary Perplexity, Bing, and Google tabs are closed after each fan-out search, and synthesis finishes on the Gemini tab
|
|
143
|
+
- **Isolated Chrome targeting** — GreedySearch now refuses to fall back to your normal Chrome session, preventing stray remote-debugging prompts
|
package/bin/search.mjs
CHANGED
|
@@ -501,6 +501,10 @@ function mergeFetchDataIntoSources(sources, fetchedSources) {
|
|
|
501
501
|
finalUrl: fetched.finalUrl || fetched.url || source.canonicalUrl,
|
|
502
502
|
contentType: fetched.contentType || "",
|
|
503
503
|
lastModified: fetched.lastModified || "",
|
|
504
|
+
publishedTime: fetched.publishedTime || "",
|
|
505
|
+
byline: fetched.byline || "",
|
|
506
|
+
siteName: fetched.siteName || "",
|
|
507
|
+
lang: fetched.lang || "",
|
|
504
508
|
title: fetched.title || "",
|
|
505
509
|
snippet: fetched.snippet || "",
|
|
506
510
|
contentChars: fetched.contentChars || 0,
|
|
@@ -634,12 +638,15 @@ function buildSynthesisPrompt(
|
|
|
634
638
|
engineCount: source.engineCount,
|
|
635
639
|
perEngine: source.perEngine,
|
|
636
640
|
fetch:
|
|
637
|
-
|
|
641
|
+
source.fetch?.attempted
|
|
638
642
|
? {
|
|
639
643
|
ok: source.fetch.ok,
|
|
640
644
|
status: source.fetch.status,
|
|
641
|
-
|
|
642
|
-
|
|
645
|
+
publishedTime: source.fetch.publishedTime || "",
|
|
646
|
+
lastModified: source.fetch.lastModified || "",
|
|
647
|
+
byline: source.fetch.byline || "",
|
|
648
|
+
siteName: source.fetch.siteName || "",
|
|
649
|
+
...(grounded ? { snippet: trimText(source.fetch.snippet || "", 700) } : {}),
|
|
643
650
|
}
|
|
644
651
|
: undefined,
|
|
645
652
|
}));
|
|
@@ -650,6 +657,7 @@ function buildSynthesisPrompt(
|
|
|
650
657
|
? "Use the fetched source snippets as the strongest evidence. Use engine answers for perspective and conflict detection."
|
|
651
658
|
: "Use the engine answers for perspective. Use the source registry for provenance and citations.",
|
|
652
659
|
"Prefer official docs, release notes, repositories, and maintainer-authored sources when available.",
|
|
660
|
+
"When publishedTime or lastModified is available, flag sources older than 2 years as potentially stale in caveats.",
|
|
653
661
|
"If the engines disagree, say so explicitly.",
|
|
654
662
|
"Do not invent sources. Only reference source IDs from the source registry.",
|
|
655
663
|
"Return valid JSON only. No markdown fences, no prose outside the JSON object.",
|
|
@@ -907,15 +915,14 @@ async function fetchSourceContent(url, maxChars = 8000) {
|
|
|
907
915
|
snippet: content.slice(0, 320),
|
|
908
916
|
content,
|
|
909
917
|
contentChars: content.length,
|
|
910
|
-
source: "github-
|
|
911
|
-
localPath: ghResult.localPath,
|
|
918
|
+
source: "github-api",
|
|
912
919
|
...(ghResult.tree && { tree: ghResult.tree }),
|
|
913
920
|
duration: Date.now() - start,
|
|
914
921
|
};
|
|
915
922
|
}
|
|
916
923
|
// If GitHub clone failed, fall through to HTTP (which will use raw for blobs)
|
|
917
924
|
process.stderr.write(
|
|
918
|
-
`[greedysearch] GitHub
|
|
925
|
+
`[greedysearch] GitHub API fetch failed, trying HTTP: ${ghResult.error}\n`,
|
|
919
926
|
);
|
|
920
927
|
}
|
|
921
928
|
}
|
|
@@ -930,7 +937,11 @@ async function fetchSourceContent(url, maxChars = 8000) {
|
|
|
930
937
|
finalUrl: httpResult.finalUrl,
|
|
931
938
|
status: httpResult.status,
|
|
932
939
|
contentType: "text/markdown",
|
|
933
|
-
lastModified: "",
|
|
940
|
+
lastModified: httpResult.lastModified || "",
|
|
941
|
+
publishedTime: httpResult.publishedTime || "",
|
|
942
|
+
byline: httpResult.byline || "",
|
|
943
|
+
siteName: httpResult.siteName || "",
|
|
944
|
+
lang: httpResult.lang || "",
|
|
934
945
|
title: httpResult.title,
|
|
935
946
|
snippet: httpResult.excerpt,
|
|
936
947
|
content,
|
|
@@ -1366,10 +1377,13 @@ async function main() {
|
|
|
1366
1377
|
depth = "fast";
|
|
1367
1378
|
}
|
|
1368
1379
|
|
|
1369
|
-
// --deep-research
|
|
1380
|
+
// --deep-research / --deep flags map to deep mode (backward compat)
|
|
1370
1381
|
if (args.includes("--deep-research")) {
|
|
1371
1382
|
depth = "standard";
|
|
1372
1383
|
}
|
|
1384
|
+
if (args.includes("--deep")) {
|
|
1385
|
+
depth = "deep";
|
|
1386
|
+
}
|
|
1373
1387
|
|
|
1374
1388
|
// For "all" engine with no explicit flags, standard is already default
|
|
1375
1389
|
|
|
@@ -1387,6 +1401,7 @@ async function main() {
|
|
|
1387
1401
|
a !== "--fetch-top-source" &&
|
|
1388
1402
|
a !== "--synthesize" &&
|
|
1389
1403
|
a !== "--deep-research" &&
|
|
1404
|
+
a !== "--deep" &&
|
|
1390
1405
|
a !== "--inline" &&
|
|
1391
1406
|
a !== "--depth" &&
|
|
1392
1407
|
a !== "--out" &&
|
package/index.ts
CHANGED
|
@@ -163,7 +163,7 @@ export default function greedySearchExtension(pi: ExtensionAPI) {
|
|
|
163
163
|
// For multi-engine, default to truncated to save tokens during synthesis
|
|
164
164
|
const fullAnswer = fullAnswerParam ?? (engine !== "all");
|
|
165
165
|
if (fullAnswer) flags.push("--full");
|
|
166
|
-
if (depth === "deep") flags.push("--deep");
|
|
166
|
+
if (depth === "deep") flags.push("--depth", "deep");
|
|
167
167
|
else if (depth === "standard" && engine === "all") flags.push("--synthesize");
|
|
168
168
|
|
|
169
169
|
const completed = new Set<string>();
|
package/package.json
CHANGED
|
@@ -1,46 +1,46 @@
|
|
|
1
|
-
{
|
|
2
|
-
"name": "@apmantza/greedysearch-pi",
|
|
3
|
-
"version": "1.7.
|
|
4
|
-
"description": "Pi extension: multi-engine AI search (Perplexity, Bing Copilot, Google AI) via browser automation -- NO API KEYS needed. Extracts answers with sources, optional Gemini synthesis. Grounded AI answers from real browser interactions.",
|
|
5
|
-
"type": "module",
|
|
6
|
-
"keywords": [
|
|
7
|
-
"pi-package"
|
|
8
|
-
],
|
|
9
|
-
"repository": {
|
|
10
|
-
"type": "git",
|
|
11
|
-
"url": "git+https://github.com/apmantza/GreedySearch-pi.git"
|
|
12
|
-
},
|
|
13
|
-
"author": "Apostolos Mantzaris",
|
|
14
|
-
"license": "MIT",
|
|
15
|
-
"scripts": {
|
|
16
|
-
"test": "./test.sh",
|
|
17
|
-
"test:quick": "./test.sh quick",
|
|
18
|
-
"test:smoke": "./test.sh smoke"
|
|
19
|
-
},
|
|
20
|
-
"files": [
|
|
21
|
-
"index.ts",
|
|
22
|
-
"bin/",
|
|
23
|
-
"src/",
|
|
24
|
-
"skills/",
|
|
25
|
-
"extractors/",
|
|
26
|
-
"CHANGELOG.md",
|
|
27
|
-
"README.md"
|
|
28
|
-
],
|
|
29
|
-
"pi": {
|
|
30
|
-
"extensions": [
|
|
31
|
-
"./index.ts"
|
|
32
|
-
],
|
|
33
|
-
"skills": [
|
|
34
|
-
"./skills"
|
|
35
|
-
]
|
|
36
|
-
},
|
|
37
|
-
"dependencies": {
|
|
38
|
-
"jsdom": "^24.0.0",
|
|
39
|
-
"@mozilla/readability": "^0.5.0",
|
|
40
|
-
"turndown": "^7.1.2"
|
|
41
|
-
},
|
|
42
|
-
"peerDependencies": {
|
|
43
|
-
"@mariozechner/pi-coding-agent": "*",
|
|
44
|
-
"@sinclair/typebox": "*"
|
|
45
|
-
}
|
|
46
|
-
}
|
|
1
|
+
{
|
|
2
|
+
"name": "@apmantza/greedysearch-pi",
|
|
3
|
+
"version": "1.7.7",
|
|
4
|
+
"description": "Pi extension: multi-engine AI search (Perplexity, Bing Copilot, Google AI) via browser automation -- NO API KEYS needed. Extracts answers with sources, optional Gemini synthesis. Grounded AI answers from real browser interactions.",
|
|
5
|
+
"type": "module",
|
|
6
|
+
"keywords": [
|
|
7
|
+
"pi-package"
|
|
8
|
+
],
|
|
9
|
+
"repository": {
|
|
10
|
+
"type": "git",
|
|
11
|
+
"url": "git+https://github.com/apmantza/GreedySearch-pi.git"
|
|
12
|
+
},
|
|
13
|
+
"author": "Apostolos Mantzaris",
|
|
14
|
+
"license": "MIT",
|
|
15
|
+
"scripts": {
|
|
16
|
+
"test": "./test.sh",
|
|
17
|
+
"test:quick": "./test.sh quick",
|
|
18
|
+
"test:smoke": "./test.sh smoke"
|
|
19
|
+
},
|
|
20
|
+
"files": [
|
|
21
|
+
"index.ts",
|
|
22
|
+
"bin/",
|
|
23
|
+
"src/",
|
|
24
|
+
"skills/",
|
|
25
|
+
"extractors/",
|
|
26
|
+
"CHANGELOG.md",
|
|
27
|
+
"README.md"
|
|
28
|
+
],
|
|
29
|
+
"pi": {
|
|
30
|
+
"extensions": [
|
|
31
|
+
"./index.ts"
|
|
32
|
+
],
|
|
33
|
+
"skills": [
|
|
34
|
+
"./skills"
|
|
35
|
+
]
|
|
36
|
+
},
|
|
37
|
+
"dependencies": {
|
|
38
|
+
"jsdom": "^24.0.0",
|
|
39
|
+
"@mozilla/readability": "^0.5.0",
|
|
40
|
+
"turndown": "^7.1.2"
|
|
41
|
+
},
|
|
42
|
+
"peerDependencies": {
|
|
43
|
+
"@mariozechner/pi-coding-agent": "*",
|
|
44
|
+
"@sinclair/typebox": "*"
|
|
45
|
+
}
|
|
46
|
+
}
|
|
@@ -1,44 +1,44 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: greedy-search
|
|
3
|
-
description: Live web search via Perplexity, Bing, and Google AI in parallel. Use for library docs, recent framework changes, error messages, dependency selection, or anything where training data may be stale. NOT for codebase search.
|
|
4
|
-
---
|
|
5
|
-
|
|
6
|
-
# GreedySearch — Live Web Search
|
|
7
|
-
|
|
8
|
-
Runs Perplexity, Bing Copilot, and Google AI in parallel. Gemini synthesizes results.
|
|
9
|
-
|
|
10
|
-
## greedy_search
|
|
11
|
-
|
|
12
|
-
```
|
|
13
|
-
greedy_search({ query: "React 19 changes", depth: "standard" })
|
|
14
|
-
```
|
|
15
|
-
|
|
16
|
-
| Parameter | Type | Default | Description |
|
|
17
|
-
|-----------|------|---------|-------------|
|
|
18
|
-
| `query` | string | required | Search question |
|
|
19
|
-
| `engine` | string | `"all"` | `all`, `perplexity`, `bing`, `google`, `gemini` |
|
|
20
|
-
| `depth` | string | `"standard"` | `fast`, `standard`, `deep` |
|
|
21
|
-
| `fullAnswer` | boolean | `false` | Full answer vs ~300 char summary |
|
|
22
|
-
|
|
23
|
-
| Depth | Engines | Synthesis | Source Fetch | Time |
|
|
24
|
-
|-------|---------|-----------|--------------|------|
|
|
25
|
-
| `fast` | 1 | — | — | 15-30s |
|
|
26
|
-
| `standard` | 3 | Gemini | — | 30-90s |
|
|
27
|
-
| `deep` | 3 | Gemini | top 5 | 60-180s |
|
|
28
|
-
|
|
29
|
-
**When engines agree** → high confidence. **When they diverge** → note both perspectives.
|
|
30
|
-
|
|
31
|
-
## coding_task
|
|
32
|
-
|
|
33
|
-
Second opinion from Gemini/Copilot on hard problems.
|
|
34
|
-
|
|
35
|
-
```
|
|
36
|
-
coding_task({ task: "debug race condition", mode: "debug", engine: "gemini" })
|
|
37
|
-
```
|
|
38
|
-
|
|
39
|
-
| Parameter | Type | Default | Options |
|
|
40
|
-
|-----------|------|---------|---------|
|
|
41
|
-
| `task` | string | required | — |
|
|
42
|
-
| `engine` | string | `"gemini"` | `gemini`, `copilot`, `all` |
|
|
43
|
-
| `mode` | string | `"code"` | `debug`, `plan`, `review`, `test`, `code` |
|
|
44
|
-
| `context` | string | — | Code snippet |
|
|
1
|
+
---
|
|
2
|
+
name: greedy-search
|
|
3
|
+
description: Live web search via Perplexity, Bing, and Google AI in parallel. Use for library docs, recent framework changes, error messages, dependency selection, or anything where training data may be stale. NOT for codebase search.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# GreedySearch — Live Web Search
|
|
7
|
+
|
|
8
|
+
Runs Perplexity, Bing Copilot, and Google AI in parallel. Gemini synthesizes results.
|
|
9
|
+
|
|
10
|
+
## greedy_search
|
|
11
|
+
|
|
12
|
+
```
|
|
13
|
+
greedy_search({ query: "React 19 changes", depth: "standard" })
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
| Parameter | Type | Default | Description |
|
|
17
|
+
|-----------|------|---------|-------------|
|
|
18
|
+
| `query` | string | required | Search question |
|
|
19
|
+
| `engine` | string | `"all"` | `all`, `perplexity`, `bing`, `google`, `gemini` |
|
|
20
|
+
| `depth` | string | `"standard"` | `fast`, `standard`, `deep` |
|
|
21
|
+
| `fullAnswer` | boolean | `false` | Full answer vs ~300 char summary |
|
|
22
|
+
|
|
23
|
+
| Depth | Engines | Synthesis | Source Fetch | Time |
|
|
24
|
+
|-------|---------|-----------|--------------|------|
|
|
25
|
+
| `fast` | 1 | — | — | 15-30s |
|
|
26
|
+
| `standard` | 3 | Gemini | — | 30-90s |
|
|
27
|
+
| `deep` | 3 | Gemini | top 5 | 60-180s |
|
|
28
|
+
|
|
29
|
+
**When engines agree** → high confidence. **When they diverge** → note both perspectives.
|
|
30
|
+
|
|
31
|
+
## coding_task
|
|
32
|
+
|
|
33
|
+
Second opinion from Gemini/Copilot on hard problems.
|
|
34
|
+
|
|
35
|
+
```
|
|
36
|
+
coding_task({ task: "debug race condition", mode: "debug", engine: "gemini" })
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
| Parameter | Type | Default | Options |
|
|
40
|
+
|-----------|------|---------|---------|
|
|
41
|
+
| `task` | string | required | — |
|
|
42
|
+
| `engine` | string | `"gemini"` | `gemini`, `copilot`, `all` |
|
|
43
|
+
| `mode` | string | `"code"` | `debug`, `plan`, `review`, `test`, `code` |
|
|
44
|
+
| `context` | string | — | Code snippet |
|
package/src/fetcher.mjs
CHANGED
|
@@ -178,6 +178,7 @@ export async function fetchSourceHttp(url, options = {}) {
|
|
|
178
178
|
|
|
179
179
|
const contentType = response.headers.get("content-type") || "";
|
|
180
180
|
const finalUrl = response.url;
|
|
181
|
+
const lastModified = response.headers.get("last-modified") || "";
|
|
181
182
|
|
|
182
183
|
// Handle raw text/plain from GitHub (raw file content)
|
|
183
184
|
if (
|
|
@@ -191,6 +192,11 @@ export async function fetchSourceHttp(url, options = {}) {
|
|
|
191
192
|
finalUrl,
|
|
192
193
|
status: response.status,
|
|
193
194
|
title: finalUrl.split("/").pop() || "GitHub File",
|
|
195
|
+
byline: "",
|
|
196
|
+
siteName: "GitHub",
|
|
197
|
+
lang: "",
|
|
198
|
+
publishedTime: lastModified,
|
|
199
|
+
lastModified,
|
|
194
200
|
markdown: text,
|
|
195
201
|
contentLength: text.length,
|
|
196
202
|
excerpt: text.slice(0, 300).replace(/\n/g, " "),
|
|
@@ -250,6 +256,11 @@ export async function fetchSourceHttp(url, options = {}) {
|
|
|
250
256
|
finalUrl,
|
|
251
257
|
status: response.status,
|
|
252
258
|
title: extracted.title,
|
|
259
|
+
byline: extracted.byline,
|
|
260
|
+
siteName: extracted.siteName,
|
|
261
|
+
lang: extracted.lang,
|
|
262
|
+
publishedTime: extracted.publishedTime || lastModified,
|
|
263
|
+
lastModified,
|
|
253
264
|
markdown: extracted.markdown,
|
|
254
265
|
excerpt: extracted.excerpt,
|
|
255
266
|
contentLength: extracted.markdown.length,
|
|
@@ -437,6 +448,29 @@ function isNetworkErrorRetryableWithBrowser(error) {
|
|
|
437
448
|
);
|
|
438
449
|
}
|
|
439
450
|
|
|
451
|
+
/**
|
|
452
|
+
* Extract a date string from <meta> tags (Open Graph, schema.org, standard)
|
|
453
|
+
* Returns ISO string or empty string.
|
|
454
|
+
*/
|
|
455
|
+
function extractMetaDate(document) {
|
|
456
|
+
const selectors = [
|
|
457
|
+
'meta[property="article:published_time"]',
|
|
458
|
+
'meta[name="article:published_time"]',
|
|
459
|
+
'meta[property="og:published_time"]',
|
|
460
|
+
'meta[name="publication_date"]',
|
|
461
|
+
'meta[name="date"]',
|
|
462
|
+
'meta[itemprop="datePublished"]',
|
|
463
|
+
'time[itemprop="datePublished"]',
|
|
464
|
+
'meta[name="DC.date"]',
|
|
465
|
+
];
|
|
466
|
+
for (const sel of selectors) {
|
|
467
|
+
const el = document.querySelector(sel);
|
|
468
|
+
const val = el?.getAttribute("content") || el?.getAttribute("datetime") || "";
|
|
469
|
+
if (val) return val;
|
|
470
|
+
}
|
|
471
|
+
return "";
|
|
472
|
+
}
|
|
473
|
+
|
|
440
474
|
/**
|
|
441
475
|
* Extract readable content using Mozilla Readability + Turndown
|
|
442
476
|
*/
|
|
@@ -452,8 +486,14 @@ function extractContent(html, url) {
|
|
|
452
486
|
const markdown = turndown.turndown(article.content);
|
|
453
487
|
const cleanMarkdown = markdown.replace(/\n{3,}/g, "\n\n").trim();
|
|
454
488
|
|
|
489
|
+
const publishedTime = article.publishedTime || extractMetaDate(document) || "";
|
|
490
|
+
|
|
455
491
|
return {
|
|
456
492
|
title: article.title || document.title || url,
|
|
493
|
+
byline: article.byline || "",
|
|
494
|
+
siteName: article.siteName || "",
|
|
495
|
+
lang: article.lang || "",
|
|
496
|
+
publishedTime,
|
|
457
497
|
markdown: cleanMarkdown,
|
|
458
498
|
excerpt: cleanMarkdown.slice(0, 300).replace(/\n/g, " "),
|
|
459
499
|
};
|
|
@@ -472,6 +512,10 @@ function extractContent(html, url) {
|
|
|
472
512
|
|
|
473
513
|
return {
|
|
474
514
|
title: document.title || url,
|
|
515
|
+
byline: "",
|
|
516
|
+
siteName: "",
|
|
517
|
+
lang: "",
|
|
518
|
+
publishedTime: extractMetaDate(document),
|
|
475
519
|
markdown: cleanText,
|
|
476
520
|
excerpt: cleanText.slice(0, 300),
|
|
477
521
|
};
|
|
@@ -480,6 +524,10 @@ function extractContent(html, url) {
|
|
|
480
524
|
// Last resort
|
|
481
525
|
return {
|
|
482
526
|
title: url,
|
|
527
|
+
byline: "",
|
|
528
|
+
siteName: "",
|
|
529
|
+
lang: "",
|
|
530
|
+
publishedTime: "",
|
|
483
531
|
markdown: "",
|
|
484
532
|
excerpt: "",
|
|
485
533
|
};
|
package/src/github.mjs
CHANGED
|
@@ -1,323 +1,232 @@
|
|
|
1
|
-
// src/github.mjs - GitHub
|
|
2
|
-
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
return { owner, repo, type
|
|
40
|
-
}
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
*
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
}
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
|
|
184
|
-
}
|
|
185
|
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
|
|
194
|
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
const
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
221
|
-
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
230
|
-
|
|
231
|
-
|
|
232
|
-
|
|
233
|
-
walk(join(dir, entry.name), entryRelPath);
|
|
234
|
-
} else if (entry.isFile()) {
|
|
235
|
-
const stats = statSync(join(dir, entry.name));
|
|
236
|
-
results.push({ path: entryRelPath, type: "file", size: stats.size });
|
|
237
|
-
}
|
|
238
|
-
}
|
|
239
|
-
} catch {
|
|
240
|
-
// Ignore permission errors
|
|
241
|
-
}
|
|
242
|
-
}
|
|
243
|
-
|
|
244
|
-
walk(targetPath, subPath);
|
|
245
|
-
return results;
|
|
246
|
-
}
|
|
247
|
-
|
|
248
|
-
/**
|
|
249
|
-
* Fetch GitHub content by cloning repo
|
|
250
|
-
* @param {string} url - GitHub URL (blob, tree, or root)
|
|
251
|
-
* @returns {Promise<{ok: boolean, content?: string, title?: string, error?: string, localPath?: string, tree?: Array}>}
|
|
252
|
-
*/
|
|
253
|
-
export async function fetchGitHubContent(url) {
|
|
254
|
-
const parsed = parseGitHubUrl(url);
|
|
255
|
-
if (!parsed) {
|
|
256
|
-
return { ok: false, error: "Not a valid GitHub URL" };
|
|
257
|
-
}
|
|
258
|
-
|
|
259
|
-
const { owner, repo, type, ref, path } = parsed;
|
|
260
|
-
|
|
261
|
-
// Clone repo
|
|
262
|
-
const cloneResult = await cloneGitHubRepo(owner, repo, ref);
|
|
263
|
-
if (cloneResult.error) {
|
|
264
|
-
return { ok: false, error: `Clone failed: ${cloneResult.error}` };
|
|
265
|
-
}
|
|
266
|
-
|
|
267
|
-
const repoPath = cloneResult.path;
|
|
268
|
-
|
|
269
|
-
// Handle different URL types
|
|
270
|
-
if (type === "root" || (type === "tree" && !path)) {
|
|
271
|
-
// Return README + tree
|
|
272
|
-
const tree = getRepoTree(repoPath, "", 50);
|
|
273
|
-
|
|
274
|
-
// Try to find README
|
|
275
|
-
const readmeNames = ["README.md", "Readme.md", "readme.md", "README.MD"];
|
|
276
|
-
let readmeContent = "";
|
|
277
|
-
for (const name of readmeNames) {
|
|
278
|
-
const readme = readRepoFile(repoPath, name);
|
|
279
|
-
if (readme) {
|
|
280
|
-
readmeContent = readme.content.slice(0, 5000); // First 5KB of README
|
|
281
|
-
break;
|
|
282
|
-
}
|
|
283
|
-
}
|
|
284
|
-
|
|
285
|
-
return {
|
|
286
|
-
ok: true,
|
|
287
|
-
title: `${owner}/${repo}`,
|
|
288
|
-
content: readmeContent || `[Repository: ${owner}/${repo}]`,
|
|
289
|
-
localPath: repoPath,
|
|
290
|
-
tree: tree.slice(0, 30),
|
|
291
|
-
};
|
|
292
|
-
}
|
|
293
|
-
|
|
294
|
-
if (type === "blob" && path) {
|
|
295
|
-
// Return specific file
|
|
296
|
-
const file = readRepoFile(repoPath, path);
|
|
297
|
-
if (!file) {
|
|
298
|
-
return { ok: false, error: `File not found: ${path}` };
|
|
299
|
-
}
|
|
300
|
-
|
|
301
|
-
return {
|
|
302
|
-
ok: true,
|
|
303
|
-
title: `${owner}/${repo}: ${path}`,
|
|
304
|
-
content: file.content,
|
|
305
|
-
localPath: join(repoPath, path),
|
|
306
|
-
};
|
|
307
|
-
}
|
|
308
|
-
|
|
309
|
-
if (type === "tree" && path) {
|
|
310
|
-
// Return directory listing
|
|
311
|
-
const tree = getRepoTree(repoPath, path, 50);
|
|
312
|
-
|
|
313
|
-
return {
|
|
314
|
-
ok: true,
|
|
315
|
-
title: `${owner}/${repo}/${path}`,
|
|
316
|
-
content: `[Directory: ${path}]\n\nFiles:\n${tree.map((t) => ` ${t.type === "dir" ? "📁" : "📄"} ${t.path}`).join("\n")}`,
|
|
317
|
-
localPath: join(repoPath, path),
|
|
318
|
-
tree,
|
|
319
|
-
};
|
|
320
|
-
}
|
|
321
|
-
|
|
322
|
-
return { ok: false, error: "Unsupported GitHub URL type" };
|
|
323
|
-
}
|
|
1
|
+
// src/github.mjs - GitHub content fetching via REST API
|
|
2
|
+
|
|
3
|
+
const GITHUB_API = "https://api.github.com";
|
|
4
|
+
const DEFAULT_HEADERS = {
|
|
5
|
+
"user-agent": "GreedySearch/1.0",
|
|
6
|
+
accept: "application/vnd.github+json",
|
|
7
|
+
"x-github-api-version": "2022-11-28",
|
|
8
|
+
};
|
|
9
|
+
|
|
10
|
+
/**
|
|
11
|
+
* Parse a GitHub URL into components
|
|
12
|
+
* @param {string} url
|
|
13
|
+
* @returns {{owner: string, repo: string, type: 'blob'|'tree'|'root', ref?: string, path?: string} | null}
|
|
14
|
+
*/
|
|
15
|
+
export function parseGitHubUrl(url) {
|
|
16
|
+
try {
|
|
17
|
+
const parsed = new URL(url);
|
|
18
|
+
if (!parsed.hostname.endsWith("github.com")) {
|
|
19
|
+
return null;
|
|
20
|
+
}
|
|
21
|
+
|
|
22
|
+
const parts = parsed.pathname.split("/").filter(Boolean);
|
|
23
|
+
if (parts.length < 2) {
|
|
24
|
+
return null;
|
|
25
|
+
}
|
|
26
|
+
|
|
27
|
+
const [owner, repo] = parts;
|
|
28
|
+
|
|
29
|
+
// Root: github.com/owner/repo
|
|
30
|
+
if (parts.length === 2) {
|
|
31
|
+
return { owner, repo, type: "root" };
|
|
32
|
+
}
|
|
33
|
+
|
|
34
|
+
// With type: github.com/owner/repo/blob|tree/ref/path
|
|
35
|
+
if (parts.length >= 4 && (parts[2] === "blob" || parts[2] === "tree")) {
|
|
36
|
+
const type = parts[2];
|
|
37
|
+
const ref = parts[3];
|
|
38
|
+
const path = parts.slice(4).join("/");
|
|
39
|
+
return { owner, repo, type, ref, path };
|
|
40
|
+
}
|
|
41
|
+
|
|
42
|
+
return null;
|
|
43
|
+
} catch {
|
|
44
|
+
return null;
|
|
45
|
+
}
|
|
46
|
+
}
|
|
47
|
+
|
|
48
|
+
/**
|
|
49
|
+
* Fetch JSON from GitHub API with timeout
|
|
50
|
+
*/
|
|
51
|
+
async function apiGet(path, timeoutMs = 10000) {
|
|
52
|
+
const controller = new AbortController();
|
|
53
|
+
const tid = setTimeout(() => controller.abort(), timeoutMs);
|
|
54
|
+
try {
|
|
55
|
+
const res = await fetch(`${GITHUB_API}${path}`, {
|
|
56
|
+
headers: DEFAULT_HEADERS,
|
|
57
|
+
signal: controller.signal,
|
|
58
|
+
});
|
|
59
|
+
clearTimeout(tid);
|
|
60
|
+
if (!res.ok) {
|
|
61
|
+
throw new Error(`GitHub API ${res.status}: ${path}`);
|
|
62
|
+
}
|
|
63
|
+
return await res.json();
|
|
64
|
+
} catch (err) {
|
|
65
|
+
clearTimeout(tid);
|
|
66
|
+
throw err;
|
|
67
|
+
}
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
/**
|
|
71
|
+
* Fetch the default branch README as plain text
|
|
72
|
+
*/
|
|
73
|
+
async function fetchReadme(owner, repo) {
|
|
74
|
+
try {
|
|
75
|
+
const data = await apiGet(`/repos/${owner}/${repo}/readme`);
|
|
76
|
+
if (data.content && data.encoding === "base64") {
|
|
77
|
+
return Buffer.from(data.content, "base64").toString("utf8");
|
|
78
|
+
}
|
|
79
|
+
return "";
|
|
80
|
+
} catch {
|
|
81
|
+
return "";
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
|
|
85
|
+
/**
|
|
86
|
+
* Fetch top-level file tree (non-recursive)
|
|
87
|
+
*/
|
|
88
|
+
async function fetchTree(owner, repo, ref = "HEAD", subPath = "") {
|
|
89
|
+
try {
|
|
90
|
+
// Resolve ref to a tree SHA first when using HEAD or a branch name
|
|
91
|
+
const refData = await apiGet(`/repos/${owner}/${repo}/git/ref/heads/${ref === "HEAD" ? "main" : ref}`).catch(() =>
|
|
92
|
+
apiGet(`/repos/${owner}/${repo}/git/ref/heads/master`).catch(() => null)
|
|
93
|
+
);
|
|
94
|
+
|
|
95
|
+
let treeSha;
|
|
96
|
+
if (refData?.object?.sha) {
|
|
97
|
+
// Get commit to get tree SHA
|
|
98
|
+
const commit = await apiGet(`/repos/${owner}/${repo}/git/commits/${refData.object.sha}`);
|
|
99
|
+
treeSha = commit.tree.sha;
|
|
100
|
+
} else {
|
|
101
|
+
// Fall back to repo default branch info
|
|
102
|
+
const repoInfo = await apiGet(`/repos/${owner}/${repo}`);
|
|
103
|
+
const branch = await apiGet(`/repos/${owner}/${repo}/branches/${repoInfo.default_branch}`);
|
|
104
|
+
treeSha = branch.commit.commit.tree.sha;
|
|
105
|
+
}
|
|
106
|
+
|
|
107
|
+
const treeData = await apiGet(`/repos/${owner}/${repo}/git/trees/${treeSha}`);
|
|
108
|
+
let items = treeData.tree || [];
|
|
109
|
+
|
|
110
|
+
// Filter to subPath if requested
|
|
111
|
+
if (subPath) {
|
|
112
|
+
items = items.filter((item) => item.path.startsWith(subPath));
|
|
113
|
+
}
|
|
114
|
+
|
|
115
|
+
return items.slice(0, 50).map((item) => ({
|
|
116
|
+
path: item.path,
|
|
117
|
+
type: item.type === "tree" ? "dir" : "file",
|
|
118
|
+
size: item.size,
|
|
119
|
+
}));
|
|
120
|
+
} catch {
|
|
121
|
+
return [];
|
|
122
|
+
}
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
/**
|
|
126
|
+
* Fetch a specific file via raw.githubusercontent.com
|
|
127
|
+
*/
|
|
128
|
+
async function fetchRawFile(owner, repo, ref, filePath, timeoutMs = 10000) {
|
|
129
|
+
const ref_ = ref && ref !== "HEAD" ? ref : "main";
|
|
130
|
+
const urls = [
|
|
131
|
+
`https://raw.githubusercontent.com/${owner}/${repo}/${ref_}/${filePath}`,
|
|
132
|
+
`https://raw.githubusercontent.com/${owner}/${repo}/master/${filePath}`,
|
|
133
|
+
];
|
|
134
|
+
|
|
135
|
+
for (const url of urls) {
|
|
136
|
+
const controller = new AbortController();
|
|
137
|
+
const tid = setTimeout(() => controller.abort(), timeoutMs);
|
|
138
|
+
try {
|
|
139
|
+
const res = await fetch(url, {
|
|
140
|
+
headers: { "user-agent": DEFAULT_HEADERS["user-agent"] },
|
|
141
|
+
signal: controller.signal,
|
|
142
|
+
});
|
|
143
|
+
clearTimeout(tid);
|
|
144
|
+
if (res.ok) {
|
|
145
|
+
return await res.text();
|
|
146
|
+
}
|
|
147
|
+
} catch {
|
|
148
|
+
clearTimeout(tid);
|
|
149
|
+
}
|
|
150
|
+
}
|
|
151
|
+
return null;
|
|
152
|
+
}
|
|
153
|
+
|
|
154
|
+
/**
|
|
155
|
+
* Fetch GitHub content via API
|
|
156
|
+
* @param {string} url - GitHub URL (blob, tree, or root)
|
|
157
|
+
* @returns {Promise<{ok: boolean, content?: string, title?: string, error?: string, tree?: Array}>}
|
|
158
|
+
*/
|
|
159
|
+
export async function fetchGitHubContent(url) {
|
|
160
|
+
const parsed = parseGitHubUrl(url);
|
|
161
|
+
if (!parsed) {
|
|
162
|
+
return { ok: false, error: "Not a valid GitHub URL" };
|
|
163
|
+
}
|
|
164
|
+
|
|
165
|
+
const { owner, repo, type, ref, path } = parsed;
|
|
166
|
+
|
|
167
|
+
try {
|
|
168
|
+
if (type === "root" || (type === "tree" && !path)) {
|
|
169
|
+
// Fetch repo info + README + top-level tree in parallel
|
|
170
|
+
const [repoInfo, readme, tree] = await Promise.allSettled([
|
|
171
|
+
apiGet(`/repos/${owner}/${repo}`),
|
|
172
|
+
fetchReadme(owner, repo),
|
|
173
|
+
fetchTree(owner, repo, ref || "HEAD"),
|
|
174
|
+
]);
|
|
175
|
+
|
|
176
|
+
const info = repoInfo.status === "fulfilled" ? repoInfo.value : null;
|
|
177
|
+
const readmeText = readme.status === "fulfilled" ? readme.value : "";
|
|
178
|
+
const treeItems = tree.status === "fulfilled" ? tree.value : [];
|
|
179
|
+
|
|
180
|
+
const description = info?.description ? `\n\n> ${info.description}` : "";
|
|
181
|
+
const stars = info?.stargazers_count != null ? ` ⭐ ${info.stargazers_count}` : "";
|
|
182
|
+
const language = info?.language ? ` · ${info.language}` : "";
|
|
183
|
+
|
|
184
|
+
let content = `# ${owner}/${repo}${stars}${language}${description}\n\n`;
|
|
185
|
+
|
|
186
|
+
if (readmeText) {
|
|
187
|
+
content += readmeText.slice(0, 6000);
|
|
188
|
+
} else {
|
|
189
|
+
content += `[No README found]\n\nFiles:\n${treeItems.map((t) => ` ${t.type === "dir" ? "📁" : "📄"} ${t.path}`).join("\n")}`;
|
|
190
|
+
}
|
|
191
|
+
|
|
192
|
+
return {
|
|
193
|
+
ok: true,
|
|
194
|
+
title: `${owner}/${repo}`,
|
|
195
|
+
content,
|
|
196
|
+
tree: treeItems.slice(0, 30),
|
|
197
|
+
};
|
|
198
|
+
}
|
|
199
|
+
|
|
200
|
+
if (type === "blob" && path) {
|
|
201
|
+
// Fetch specific file via raw URL
|
|
202
|
+
const content = await fetchRawFile(owner, repo, ref, path);
|
|
203
|
+
if (content === null) {
|
|
204
|
+
return { ok: false, error: `File not found: ${path}` };
|
|
205
|
+
}
|
|
206
|
+
return {
|
|
207
|
+
ok: true,
|
|
208
|
+
title: `${owner}/${repo}: ${path}`,
|
|
209
|
+
content,
|
|
210
|
+
};
|
|
211
|
+
}
|
|
212
|
+
|
|
213
|
+
if (type === "tree" && path) {
|
|
214
|
+
// Directory listing via API tree
|
|
215
|
+
const treeItems = await fetchTree(owner, repo, ref || "HEAD", path);
|
|
216
|
+
const listing = treeItems
|
|
217
|
+
.map((t) => ` ${t.type === "dir" ? "📁" : "📄"} ${t.path}`)
|
|
218
|
+
.join("\n");
|
|
219
|
+
|
|
220
|
+
return {
|
|
221
|
+
ok: true,
|
|
222
|
+
title: `${owner}/${repo}/${path}`,
|
|
223
|
+
content: `[Directory: ${path}]\n\nFiles:\n${listing}`,
|
|
224
|
+
tree: treeItems,
|
|
225
|
+
};
|
|
226
|
+
}
|
|
227
|
+
|
|
228
|
+
return { ok: false, error: "Unsupported GitHub URL type" };
|
|
229
|
+
} catch (err) {
|
|
230
|
+
return { ok: false, error: err.message };
|
|
231
|
+
}
|
|
232
|
+
}
|