@gkoreli/ghx 0.1.3 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (4) hide show
  1. package/README.md +46 -26
  2. package/SKILL.md +305 -0
  3. package/ghx +160 -7
  4. package/package.json +3 -3
package/README.md CHANGED
@@ -2,37 +2,47 @@
2
2
 
3
3
  GitHub code exploration for agents and humans. One command does what takes 3-5 API calls with any other tool.
4
4
 
5
+ - **Repos** — search repos with README preview in 1 GraphQL call
5
6
  - **Explore** a repo (tree + README) in 1 API call
6
7
  - **Read** 1-10 files in 1 API call via GraphQL batching
7
8
  - **Map** code structure with ~92% token reduction
8
- - **Search** code with full GitHub search syntax
9
+ - **Search** code with AND matching, matching context, token protection
9
10
 
10
- ~135 lines of bash. One dependency: `gh` CLI.
11
+ Bash script. One dependency: `gh` CLI. Cross-platform (macOS, Linux, Windows via Git Bash/WSL).
11
12
 
12
13
  ## Install
13
14
 
14
15
  ```bash
15
- # Homebrew (macOS/Linux)
16
- brew install gkoreli/tap/ghx
17
-
18
- # gh extension (coming soon — requires separate gh-ghx repo)
19
- gh extension install gkoreli/gh-ghx
16
+ # npm (recommended)
17
+ npm install -g @gkoreli/ghx
20
18
 
21
19
  # npx (zero install)
22
20
  npx @gkoreli/ghx --help
23
21
 
24
22
  # curl
25
23
  curl -sf https://raw.githubusercontent.com/gkoreli/ghx/main/install.sh | sh
26
-
27
- # Manual — just copy the script
28
- curl -sf https://raw.githubusercontent.com/gkoreli/ghx/main/ghx -o ~/.local/bin/ghx && chmod +x ~/.local/bin/ghx
29
24
  ```
30
25
 
31
26
  Requires [`gh` CLI](https://cli.github.com/) authenticated (`gh auth login`).
32
27
 
28
+ [![npm](https://img.shields.io/npm/v/@gkoreli/ghx)](https://www.npmjs.com/package/@gkoreli/ghx)
29
+
30
+ ### Platform Support
31
+
32
+ | Platform | Status | Notes |
33
+ |----------|--------|-------|
34
+ | macOS | ✅ Native | bash + readlink -f (12.3+) |
35
+ | Linux | ✅ Native | bash + GNU coreutils |
36
+ | Windows | ✅ Git Bash / WSL | Ships with Git for Windows. Raw cmd.exe/PowerShell not supported |
37
+
38
+ If you have `gh` CLI working, ghx works too — same prerequisites.
39
+
33
40
  ## Usage
34
41
 
35
42
  ```bash
43
+ # Search repos — name, stars, language, README preview in 1 call
44
+ ghx repos "react state management"
45
+
36
46
  # Explore a repo — branch, file tree, and README in 1 API call
37
47
  ghx explore plausible/analytics
38
48
 
@@ -48,7 +58,7 @@ ghx read plausible/analytics --grep "defmodule" lib/plausible/stats/query.ex
48
58
  # Read specific line range
49
59
  ghx read plausible/analytics --lines 42-80 lib/plausible/stats/query.ex
50
60
 
51
- # Search code (full GitHub search syntax)
61
+ # Search code (AND matching, shows matching lines, token-protected)
52
62
  ghx search "useState repo:facebook/react"
53
63
  ghx search "path:llms.txt extension:txt"
54
64
 
@@ -58,24 +68,36 @@ ghx tree plausible/analytics assets/js
58
68
 
59
69
  ## Why
60
70
 
61
- AI agents exploring GitHub repos face a tooling gap:
71
+ AI agents exploring GitHub face a reliability gap: *"Did I find nothing because nothing exists, or because I used the tool wrong?"* ghx eliminates this with smart defaults — AND matching instead of exact phrase, README previews instead of bare names, matching context instead of bare paths. The right behavior is the default behavior.
62
72
 
63
- | Tool | Files per API call | Context overhead | Dependencies |
64
- |------|-------------------|-----------------|-------------|
65
- | GitHub MCP | 1 | ~10K tokens (50+ tool schemas) | Go binary |
66
- | Octocode MCP | 1 (parallel) | ~10K tokens | npm + Docker |
67
- | Raw `gh` CLI | 1 | 0 | `gh` |
68
- | Gitingest | N (clones first) | 0 | pip + tiktoken |
69
- | **ghx** | **1-10 (GraphQL batch)** | **0** | **`gh`** |
73
+ | Tool | Files per call | Matching context | Smart defaults | Dependencies |
74
+ |------|---------------|-----------------|---------------|-------------|
75
+ | GitHub MCP | 1 | No | No (~10K token schemas) | Go binary |
76
+ | `gh` CLI | 1 | No | No (exact phrase, base64, no README) | `gh` |
77
+ | **ghx** | **1-10 (batch)** | **Yes** | **Yes** | **`gh`** |
70
78
 
71
- `ghx` reads 10 files in 1 API call. No other tool does this.
79
+ ## Agent Skill Integration
80
+
81
+ `ghx skill` outputs the full [`SKILL.md`](./SKILL.md) to stdout — designed for eager context injection via spawn hooks:
82
+
83
+ ```json
84
+ {
85
+ "hooks": {
86
+ "agentSpawn": [
87
+ {"command": "ghx skill"}
88
+ ]
89
+ }
90
+ }
91
+ ```
92
+
93
+ This loads the skill eagerly into every session — not on-demand like a typical skill file. Use this for agent identities where GitHub exploration is a core capability (not occasional). The agent always has the latest ghx knowledge (commands, gotchas, search strategy) without needing to load it mid-conversation.
94
+
95
+ SKILL.md is included in the npm package and resolved via symlink, so this works with all installation methods.
72
96
 
73
97
  ## Code Map (`--map`)
74
98
 
75
99
  The `--map` flag extracts only structural declarations — imports, exports, function signatures, class definitions, type declarations. Implementation bodies are stripped.
76
100
 
77
- Tested on 6 real files across TypeScript, Python, and Go:
78
-
79
101
  | File | Full | Map | Reduction |
80
102
  |------|------|-----|-----------|
81
103
  | repomix/parseFile.ts | 5,599 | 812 | 86% |
@@ -84,13 +106,11 @@ Tested on 6 real files across TypeScript, Python, and Go:
84
106
 
85
107
  Average: **92% reduction**. An agent can map 16 files in the space of reading 1 file fully.
86
108
 
87
- Supported: TypeScript/JavaScript, Python, Go, Rust, Java/Kotlin, Ruby. Falls back to generic pattern for unknown extensions.
109
+ Supported: TypeScript/JavaScript, Python, Go, Rust, Java/Kotlin, Ruby. Generic fallback for unknown extensions.
88
110
 
89
111
  ## How It Works
90
112
 
91
- `ghx` wraps `gh` CLI with GraphQL batching. The `explore` command fetches tree + README in 1 GraphQL call. The `read` command uses GraphQL aliases (`f0:`, `f1:`, ...) to fetch up to 10 files in 1 call. The `search` command hits the REST `/search/code` endpoint directly (GraphQL has no code search).
92
-
93
- The `--map` flag applies per-language regex patterns to extract structural declarations from the fetched content. No Tree-sitter, no AST parsing — just regex on the first line of each declaration. This works because structural declarations in most languages start at the beginning of a line with a keyword (`function`, `class`, `def`, `func`, `export`, `import`, etc.).
113
+ `ghx` wraps `gh` CLI with GraphQL batching. `repos` and `explore` use GraphQL to batch search + metadata + README into 1 call. `read` uses GraphQL aliases to fetch up to 10 files in 1 call. `search` hits the REST `/search/code` endpoint with `text_matches` for matching context and 200-char token protection.
94
114
 
95
115
  ## License
96
116
 
package/SKILL.md ADDED
@@ -0,0 +1,305 @@
1
+ ---
2
+ name: ghx
3
+ description: GitHub code exploration for AI agents. Use for repo exploration, reading remote files, code search, code maps. Wraps gh CLI with GraphQL batching — one command does what takes 3-5 API calls.
4
+ ---
5
+
6
+ # ghx — GitHub Code Exploration for AI Agents
7
+
8
+ Use `ghx` via `execute_bash` for anything on GitHub — repos, files, code search. Authenticated via `gh` CLI, structured output, zero context overhead.
9
+
10
+ ## Why This Exists
11
+
12
+ Agents exploring GitHub face a reliability gap: *"Did I find nothing because nothing exists, or because I used the tool wrong?"* Raw `gh` commands have silent failure modes — `gh search code` wraps in quotes without telling you, `gh api contents/` returns base64, README requires a separate call. The agent can't distinguish "no results" from "wrong flags."
13
+
14
+ ghx eliminates this by encoding the right defaults into every command. One call returns enough context to decide the next action. You opt into the ghx skill and stop worrying about whether you searched correctly — the right behavior is the default behavior.
15
+
16
+ ## Commands
17
+
18
+ ```bash
19
+ ghx explore <owner/repo> # Branch + tree + README in 1 API call
20
+ ghx explore <owner/repo> <path> # Subdirectory listing
21
+ ghx read <owner/repo> <f1> [f2] [f3] # Read 1-10 files in 1 API call (GraphQL batching)
22
+ ghx read <owner/repo> --map <f1> [f2] # Structural map: signatures, imports, types (~92% token reduction)
23
+ ghx read <owner/repo> --grep "pat" <f> # Read file, show only matching lines (2 lines context)
24
+ ghx read <owner/repo> --lines 42-80 <f> # Read specific line range
25
+ ghx repos "<query>" # Search repos with README preview in 1 GraphQL call
26
+ ghx search "<query>" # Code search (REST API, AND matching, shows matching lines)
27
+ ghx search --full "<query>" # Code search without line truncation (for minified files)
28
+ ghx tree <owner/repo> [path] # Full recursive tree listing
29
+ ```
30
+
31
+ ## Chain of Thought: Progressive Disclosure
32
+
33
+ **Always start surgical, escalate only when needed.** This mirrors how developers work: scan structure → identify interesting files → read specific sections.
34
+
35
+ ```
36
+ 1. ghx explore owner/repo → What's in this repo? (structure + README)
37
+ 2. ghx read owner/repo --map *.ts → What do these files define? (signatures only, 92% fewer tokens)
38
+ 3. ghx read owner/repo --grep "X" f → Where exactly is X in this file? (targeted lines)
39
+ 4. ghx read owner/repo f → Show me the full file (only when needed)
40
+ ```
41
+
42
+ **Why this order matters:** At 92% reduction, `--map` lets you scan 7 files in the space of reading 1 full file. The agent can understand an entire module's structure before committing context to any single file. Aider's docs confirm: *"The LLM can see classes, methods and function signatures from everywhere in the repo. This alone may give it enough context to solve many tasks."*
43
+
44
+ **When to escalate beyond ghx:**
45
+ - "Understand this entire module" → `gitingest https://github.com/owner/repo/tree/branch/path -i "*.ts" -o - 2>/dev/null`
46
+ - "Compressed view of a codebase" → `npx repomix --remote owner/repo --compress --include "src/**" --stdout`
47
+
48
+ ## Search Query Syntax
49
+
50
+ `ghx search` uses the GitHub REST code search API (legacy). Multi-word queries use AND matching — both words must appear in the file but not necessarily adjacent. This is different from `gh search code` which silently wraps in quotes (exact phrase).
51
+
52
+ **Output format:**
53
+ ```
54
+ 201472 results (showing 30) ← stderr (total + page count)
55
+ jquery/jquery src/attributes/classes.js: addClass: function( value ) { ← stdout (repo path: matching line)
56
+ ```
57
+
58
+ Agents get: result count (stderr) + one line per result with matching context (stdout).
59
+
60
+ ```bash
61
+ ghx search "addClass repo:jquery/jquery" # Scoped to repo
62
+ ghx search "useState language:typescript" # Language filter
63
+ ghx search "filename:package.json repo:owner/repo" # Find specific filename
64
+ ghx search "form path:cgi-bin extension:py" # Path + extension filter
65
+ ghx search '"progress_bar" repo:plausible/analytics' # Exact phrase (shell quotes around double quotes)
66
+ ghx search "path:llms.txt" # Find files by name
67
+ ```
68
+
69
+ **Valid REST API qualifiers:** `repo:`, `org:`, `user:`, `path:`, `filename:`, `extension:`, `language:`, `in:file`, `in:path`, `size:`, `fork:true`
70
+
71
+ **Web-only (DO NOT USE — silently treated as literal text):** `OR`, `NOT`, `symbol:`, `content:`, `is:`, regex (`/pattern/`), `enterprise:`, glob in `path:`. ghx warns on stderr if you use these.
72
+
73
+ **Rate limit:** 9 req/min for code search (strictest endpoint). Authentication required — `gh auth login` first.
74
+
75
+ **Special characters:** Dots act as word separators, not wildcards. `console.log` matches files with both `console` and `log` — it does NOT match `consolelog`.
76
+
77
+ ## Search Strategy for Agents
78
+
79
+ **Search is the entry point.** Agents search first, then read. Bad search = wasted follow-up reads = token explosion. ghx search is designed to give you enough context to decide your next action in one call.
80
+
81
+ ### Reading search output
82
+
83
+ ```
84
+ 90 results (showing 30) ← stderr: is this too broad?
85
+ ⚠ Lines truncated to 200 chars (use --full for complete fragments) ← stderr: token protection kicked in
86
+ ⚠ Query too broad — add repo:, language:, or path: to narrow ← stderr: >1000 results
87
+ jquery/jquery src/attributes/classes.js: addClass: function( value ) { ← stdout: repo path: matching line
88
+ ```
89
+
90
+ **Decision tree after seeing results:**
91
+ - `0 results` → query too specific, broaden (remove qualifiers, try synonyms)
92
+ - `1-30 results` → good. Scan matching lines, `ghx read` the relevant files
93
+ - `30-1000 results` → workable but noisy. Add `repo:`, `language:`, or `path:` to narrow
94
+ - `>1000 results` → too broad. MUST add qualifiers before trusting results
95
+ - `⚠ incomplete` → query timed out, results are partial. Narrow the scope
96
+
97
+ ### Token protection (safe by default)
98
+
99
+ ghx truncates each matching line to 200 chars. This prevents minified JS files (10,000+ char lines) from exploding your context window. One untruncated minified result can consume more tokens than the other 29 results combined.
100
+
101
+ - **Default**: 200 char truncation. You see `⚠ Lines truncated` on stderr only when it triggers.
102
+ - **`--full`**: Disables truncation. Use when you specifically need the complete matching line.
103
+ - **When to use `--full`**: Almost never. The truncated line is enough to decide "relevant" or "skip." Use `ghx read` to get the full file context after you've identified the right file.
104
+
105
+ ### Search refinement chain of thought
106
+
107
+ ```
108
+ 1. ghx search "useState" → 201K results. Too broad.
109
+ 2. ghx search "useState language:typescript" → 50K results. Still broad.
110
+ 3. ghx search "useState repo:vercel/next.js" → 89 results. Workable.
111
+ 4. ghx search "useState path:packages/next extension:tsx repo:vercel/next.js" → 12 results. Surgical.
112
+ ```
113
+
114
+ **Refine, don't paginate.** At 9 req/min, pagination burns rate limit on the same broad query. Adding one qualifier is always better than fetching page 2.
115
+
116
+ ### Two search systems (why some things don't work)
117
+
118
+ GitHub has two code search engines. The REST API (what ghx uses) is the legacy one. The web UI uses Blackbird (new). No programmatic tool — ghx, `gh` CLI, GitHub MCP, Octocode — can access Blackbird. This is a platform limitation, not a ghx limitation.
119
+
120
+ **What this means for agents:**
121
+ - `OR`, `NOT`, `symbol:`, regex, `content:`, `is:` → web-only, don't use
122
+ - `repo:`, `path:`, `filename:`, `language:`, `extension:`, `in:`, `size:`, `fork:` → work in REST API
123
+ - ghx warns on stderr if you use web-only qualifiers, but the results will be wrong
124
+
125
+ ### ghx search vs `gh search code`
126
+
127
+ | Behavior | `ghx search` | `gh search code` |
128
+ |---|---|---|
129
+ | Multi-word matching | AND (both words anywhere) | Exact phrase (words must be adjacent) |
130
+ | Matching context | Shows matching line per result | No matching context |
131
+ | Result count | stderr: "90 results (showing 30)" | Not shown |
132
+ | Token protection | 200 char truncation, `--full` opt-out | None |
133
+ | Web-only warnings | Warns on stderr | Silent |
134
+ | Rate limit | Same (9 req/min) | Same |
135
+
136
+ AND matching is almost always what agents want. `gh search code "useState fetchData"` returns zero results if the words aren't adjacent — with no error. `ghx search "useState fetchData"` finds files containing both terms.
137
+
138
+ ## Gotchas
139
+
140
+ 1. **Web-only qualifiers silently degrade.** `symbol:`, `OR`, `NOT`, `content:`, `is:`, regex — these only work in GitHub's new web code search (Blackbird). The REST API treats them as literal text. `symbol:foo` searches for the TEXT "symbol:foo" inside files. ghx warns on stderr, but the results will be wrong. No programmatic tool can use these features — it's a GitHub platform limitation.
141
+
142
+ 2. **`filename:` vs `path:` — both valid, different systems.** `filename:package.json` works in the REST API (legacy) for exact filename match. `path:` also works and is more flexible (matches directories too). In the NEW web code search, only `path:` works — `filename:` is not recognized. Since ghx uses the REST API, both work.
143
+
144
+ 3. **`language:markdown` won't find `.txt` files.** GitHub's linguist detection doesn't classify .txt as markdown. Use `extension:txt` instead. `language:` = linguist detection, `extension:` = literal file extension.
145
+
146
+ 4. **`gh search code` silently wraps queries in quotes.** `gh search code "foo bar"` sends `q="foo bar"` (exact phrase), not `q=foo bar` (AND). If the words aren't adjacent in the file, you get zero results with no error. `ghx search` sends AND queries — both words must appear but in any order. This is almost always what you want. ghx also shows result count on stderr and matching line context — `gh search code` shows neither.
147
+
148
+ 5. **GraphQL returns null for missing paths.** `object(expression: "branch:path")` returns null silently if the path doesn't exist. No error. `ghx` handles this, but if using `gh api graphql` directly, check for null.
149
+
150
+ 6. **Flag ordering in `read` command.** `ghx read owner/repo file --map` works. `ghx read --map owner/repo file` does NOT — repo must be the first positional arg.
151
+
152
+ 7. **Not all repos use `main`.** cli/cli uses `trunk`, others use `master`. `ghx` handles this automatically. For raw `gh api` calls, query the default branch first: `gh repo view owner/repo --json defaultBranchRef --jq '.defaultBranchRef.name'`
153
+
154
+ 8. **`gh` field names are inconsistent.** `stargazersCount` (search) vs `stargazerCount` (repo view). Always check with `--json` (no fields) to see available fields for any command.
155
+
156
+ 9. **`gh api repos/.../contents/` returns base64 by default.** Without `-H "Accept: application/vnd.github.raw+json"`, you get a JSON blob with base64-encoded content. `ghx read` returns plain text via GraphQL — no decoding needed.
157
+
158
+ 10. **`gh search repos` and `gh search code` use different rate limit pools.** Repo search: 30/min (generous). Code search: 10/min (restrictive). Don't assume one rate limit applies to both.
159
+
160
+ ## Anti-Patterns
161
+
162
+ - ❌ `web_fetch`/`web_search` on github.com — returns HTML noise, wastes thousands of tokens for zero useful information
163
+ - ❌ `gh api repos/.../contents/<path>` WITHOUT `-H "Accept: application/vnd.github.raw+json"` — returns base64-encoded JSON blob instead of readable text
164
+ - ❌ Reading entire large files when you need 10 lines — use `--grep "pattern"` or `--lines N-M`
165
+ - ❌ Multiple sequential `gh api` calls for explore workflows — use `ghx explore` (1 GraphQL call) or `ghx read` (batch files)
166
+ - ❌ Using web-only qualifiers (`OR`, `NOT`, `symbol:`, regex) in `ghx search` — silently treated as literal text, returns wrong results. ghx warns but can't prevent it
167
+ - ❌ Firing multiple code search requests in parallel — 9 req/min rate limit, you'll get 403s
168
+ - ❌ Dumping entire repos into context for a specific question — use targeted `ghx` commands. Reserve `gitingest`/`repomix` for "understand this whole module" tasks
169
+ - ❌ Relying on `gh search code` for multi-word queries — silently wraps in quotes (exact phrase), returns nothing when words aren't adjacent. Use `ghx search` (AND matching + matching context)
170
+ - ❌ Using `ghx search` to find repos — ghx search is for code. Use `ghx repos "query"` for repo discovery
171
+ - ❌ Using `gh` for batch file reads — 1 API call per file, base64 encoded. Use `ghx read repo f1 f2 f3` (1 GraphQL call, plain text)
172
+ - ❌ Using `gh repo view` to explore a repo — gets metadata but not tree listing or README content in one call. Use `ghx explore` (1 call for all three)
173
+
174
+ ## Best Practices
175
+
176
+ - **Batch file reads.** `ghx read owner/repo f1 f2 f3` = 1 API call. Three separate reads = 3 calls.
177
+ - **Map before reading.** `ghx read --map` first to understand structure, then `--grep` or `--lines` for specifics.
178
+ - **Refine search, don't paginate.** If `ghx search` shows "201472 results (showing 30)", add qualifiers (`repo:`, `language:`, `path:`) — don't try to page through. 9 req/min rate limit makes pagination expensive.
179
+ - **Use `gh api --cache 1h`** for repeated lookups when using raw `gh` commands.
180
+ - **Use `--json fields --jq 'expr'`** on `gh` commands to get structured output and reduce noise.
181
+ - **Piped output is machine-formatted.** Tab-delimited, no truncation, no color codes — agents always get clean output.
182
+
183
+ ## The `--map` Flag: Why It Matters
184
+
185
+ `--map` extracts only structural declarations (imports, exports, function/class/type signatures) via per-language regex patterns. Tested on 6 real files across TypeScript, Python, Go:
186
+
187
+ | Metric | Result |
188
+ |--------|--------|
189
+ | Average token reduction | 92% |
190
+ | Files scannable per context window | 7x more than full reads |
191
+ | Implementation | ~15 lines of bash, zero dependencies |
192
+
193
+ Output includes line numbers and token stats:
194
+ ```
195
+ === src/core/parseFile.ts (5544 bytes) ===
196
+ 21:import type { RepomixConfigMerged } from '../../config/configSchema.js';
197
+ 35:export const CHUNK_SEPARATOR = '⋮----';
198
+ 38:export const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) =>
199
+ 107:const getLanguageParserSingleton = async () =>
200
+ # map: 812/5544 chars (~1386 tokens full, ~203 tokens map)
201
+ ```
202
+
203
+ Supported: TypeScript/JavaScript, Python, Go, Rust, Java/Kotlin, Ruby. Generic fallback for unknown extensions.
204
+
205
+ ## Examples
206
+
207
+ ### Simple: Explore a repo and read a file
208
+
209
+ ```bash
210
+ # What's in this repo?
211
+ ghx explore plausible/analytics
212
+
213
+ # Read the main config
214
+ ghx read plausible/analytics config/runtime.exs
215
+ ```
216
+
217
+ ### Advanced: Research a codebase you've never seen
218
+
219
+ ```bash
220
+ # 1. Explore structure
221
+ ghx explore yamadashy/repomix
222
+
223
+ # 2. Map the core module — understand what exists (92% fewer tokens)
224
+ ghx read yamadashy/repomix --map src/core/output/outputGenerate.ts src/core/file/fileProcess.ts src/core/treeSitter/parseFile.ts
225
+
226
+ # 3. Found interesting function in map output — grep for usage details
227
+ ghx read yamadashy/repomix --grep "processFiles" src/core/file/fileProcess.ts
228
+
229
+ # 4. Search across the whole repo for a pattern
230
+ ghx search "CHUNK_SEPARATOR repo:yamadashy/repomix"
231
+ # → stderr: "3 results (showing 3)"
232
+ # → stdout: yamadashy/repomix src/core/output/outputGenerate.ts: const CHUNK_SEPARATOR = '⋮----';
233
+
234
+ # 5. Read specific lines of a file you've narrowed down
235
+ ghx read yamadashy/repomix --lines 38-65 src/core/treeSitter/parseFile.ts
236
+
237
+ # 6. If you need the full picture of a subdirectory, escalate:
238
+ # gitingest https://github.com/yamadashy/repomix/tree/main/src/core -i "*.ts" -o - 2>/dev/null
239
+ ```
240
+
241
+ ## Complementary Tools
242
+
243
+ | Goal | Tool | Why |
244
+ |------|------|-----|
245
+ | Surgical exploration | `ghx` | Batched API calls, zero overhead, targeted extraction |
246
+ | Holistic understanding | `gitingest` / `repomix --compress` | Dump entire module for broad reasoning |
247
+ | PRs, issues, CI | `gh pr view`, `gh issue view`, `gh pr checks` | Purpose-built commands |
248
+
249
+ ## ghx vs gh: When to Use What
250
+
251
+ **ghx is a complement to gh, not a replacement.** Use ghx for code exploration. Use gh for everything else.
252
+
253
+ ### Use ghx (code exploration)
254
+
255
+ | Task | Command | Why ghx wins |
256
+ |------|---------|-------------|
257
+ | Code search | `ghx search "query"` | AND matching (gh uses exact phrase), matching context, 37x token reduction on minified files, result count + warnings on stderr |
258
+ | Repo search | `ghx repos "query"` | 1 GraphQL call gets name + stars + language + README preview. gh needs 1+N calls for same info, returns worse ranking, no README |
259
+ | Repo overview | `ghx explore owner/repo` | 1 GraphQL call gets description + tree + README (gh needs 3 calls) |
260
+ | Read multiple files | `ghx read owner/repo f1 f2 f3` | 1 GraphQL call for N files (gh needs N calls, returns base64) |
261
+ | Targeted extraction | `ghx read --grep "pat" f` | Built-in grep with context lines — no shell piping |
262
+ | Code map | `ghx read --map f1 f2` | ~92% token reduction, no gh equivalent |
263
+
264
+ ### Use gh (everything else)
265
+
266
+ | Task | Command | Why gh wins |
267
+ |------|---------|-------------|
268
+ | Issues | `gh issue list/view -R owner/repo` | ghx doesn't touch issues |
269
+ | Pull requests | `gh pr list/view/diff/checks -R owner/repo` | ghx doesn't touch PRs |
270
+ | Releases | `gh release list -R owner/repo` | ghx doesn't touch releases |
271
+ | Repo metadata | `gh repo view owner/repo --json stargazerCount,forkCount` | Detailed stats beyond what ghx repos shows |
272
+ | Auth | `gh auth login/status` | ghx depends on gh for auth |
273
+ | Create/update | `gh issue create`, `gh pr create` | ghx is read-only |
274
+
275
+ ### Rate limits (from GitHub docs)
276
+
277
+ | Endpoint | Limit | Used by |
278
+ |---|---|---|
279
+ | Core REST | 5,000/hour | gh commands, ghx tree |
280
+ | GraphQL | 5,000/hour | ghx explore, ghx read |
281
+ | Search (repos, issues) | 30/min | `gh search repos/issues` |
282
+ | Code search | 10/min (budget 9) | `ghx search`, `gh search code` |
283
+
284
+ Code search is 50x more restricted than core REST. This is why "refine don't paginate" matters for search but not for explore/read.
285
+
286
+ ## `gh` CLI Quick Reference
287
+
288
+ ```bash
289
+ # Repos
290
+ gh search repos "<query>" -L 10 --json fullName,description,stargazersCount
291
+ gh repo view owner/repo --json defaultBranchRef --jq '.defaultBranchRef.name'
292
+
293
+ # PRs
294
+ gh pr view 123 -R owner/repo # Title, body, status
295
+ gh pr diff 123 -R owner/repo # Full diff
296
+ gh pr checks 123 -R owner/repo # CI status
297
+
298
+ # Issues
299
+ gh issue view 456 -R owner/repo
300
+ gh issue list -R owner/repo -S "query" -L 20
301
+
302
+ # Raw API (always use the raw header for files)
303
+ gh api repos/owner/repo/contents/path -H "Accept: application/vnd.github.raw+json"
304
+ gh api repos/owner/repo/git/trees/main --jq '.tree[].path' # List structure
305
+ ```
package/ghx CHANGED
@@ -132,16 +132,165 @@ read)
132
132
  done
133
133
  ;;
134
134
 
135
- # ─── search: code search via REST API (supports full query syntax) ───
135
+ # ─── search: code search via REST API ───
136
136
  search)
137
+ # Parse flags
138
+ full_mode=""
139
+ args=()
140
+ for arg in "$@"; do
141
+ case "$arg" in
142
+ --full) full_mode=1 ;;
143
+ *) args+=("$arg") ;;
144
+ esac
145
+ done
146
+ query="${args[*]}"
147
+ if [[ -z "$query" ]]; then
148
+ echo "Usage: ghx search <query> [--full]" >&2
149
+ echo "Examples: 'useState repo:owner/repo' | 'path:src/ extension:tsx language:typescript'" >&2
150
+ exit 1
151
+ fi
152
+
153
+ # Prerequisite checks
154
+ if ! command -v gh &>/dev/null; then
155
+ echo "ghx requires the GitHub CLI (gh). Install: https://cli.github.com" >&2
156
+ exit 1
157
+ fi
158
+ if ! gh auth status &>/dev/null; then
159
+ echo "GitHub code search requires authentication. Run: gh auth login" >&2
160
+ exit 1
161
+ fi
162
+
163
+ # Warn on web-only qualifiers (they silently become literal text in REST API)
164
+ for pat in 'symbol:' 'content:' 'is:' ' OR ' ' NOT ' '/.*/' 'enterprise:'; do
165
+ if [[ "$query" == *$pat* ]]; then
166
+ echo "⚠ '$pat' is web-only (Blackbird) — REST API treats it as literal text" >&2
167
+ fi
168
+ done
169
+
170
+ # Search with text_matches for matching context
171
+ response=$(gh api /search/code --method GET \
172
+ -H "Accept: application/vnd.github.text-match+json" \
173
+ -f q="$query" 2>&1) || {
174
+ echo "$response" >&2
175
+ exit 1
176
+ }
177
+
178
+ # Result count + incomplete warning to stderr
179
+ total=$(echo "$response" | jq -r '.total_count')
180
+ incomplete=$(echo "$response" | jq -r '.incomplete_results')
181
+ count=$(echo "$response" | jq '.items | length')
182
+ echo "$total results (showing $count)" >&2
183
+ [[ "$incomplete" == "true" ]] && echo "⚠ Results may be incomplete (query timed out)" >&2
184
+ [[ "$total" -gt 1000 ]] 2>/dev/null && echo "⚠ Query too broad — add repo:, language:, or path: to narrow" >&2
185
+
186
+ # Output: repo path: matching context from fragment
187
+ # Default: 200 char window centered on the match (prevents minified JS token explosion)
188
+ # Uses matches[0].indices to find the actual match position in the fragment
189
+ if [[ -n "$full_mode" ]]; then
190
+ echo "$response" | jq -r '
191
+ .items[] |
192
+ .repository.full_name as $repo |
193
+ .path as $path |
194
+ (.text_matches // []) as $tm |
195
+ if ($tm | length) > 0 then
196
+ ($tm[0].fragment | gsub("\\n"; " ") | gsub("\\s+"; " ") |
197
+ gsub("^\\s+"; "") | gsub("\\s+$"; "")) as $line |
198
+ "\($repo) \($path): \($line)"
199
+ else
200
+ "\($repo) \($path)"
201
+ end'
202
+ else
203
+ truncated=$(echo "$response" | jq '[.items[] |
204
+ (.text_matches // []) as $tm |
205
+ if ($tm | length) > 0 then
206
+ ($tm[0].fragment | length) > 200
207
+ else false end] | any')
208
+ [[ "$truncated" == "true" ]] && echo "⚠ Lines truncated to 200 chars (use --full for complete fragments)" >&2
209
+ echo "$response" | jq -r '
210
+ .items[] |
211
+ .repository.full_name as $repo |
212
+ .path as $path |
213
+ (.text_matches // []) as $tm |
214
+ if ($tm | length) > 0 then
215
+ $tm[0] as $m |
216
+ ($m.fragment | gsub("\\n"; " ") | gsub("\\s+"; " ")) as $flat |
217
+ (if ($m.matches | length) > 0 then
218
+ $m.matches[0].indices[0] as $start |
219
+ ([$start - 80, 0] | max) as $from |
220
+ ($from + 200) as $to |
221
+ (if $from > 0 then "…" else "" end) as $prefix |
222
+ (if $to < ($flat | length) then "…" else "" end) as $suffix |
223
+ $prefix + ($flat[$from:$to] | gsub("^\\s+"; "") | gsub("\\s+$"; "")) + $suffix
224
+ else
225
+ $flat[:200]
226
+ end) as $line |
227
+ "\($repo) \($path): \($line)"
228
+ else
229
+ "\($repo) \($path)"
230
+ end'
231
+ fi
232
+ ;;
233
+
234
+ # ─── repos: search repos with README preview in 1 GraphQL call ───
235
+ repos)
137
236
  query="$*"
138
237
  if [[ -z "$query" ]]; then
139
- echo "Usage: ghx search <query>" >&2
140
- echo "Query syntax: 'term1 term2 repo:owner/repo' (AND), 'term1 OR term2 repo:owner/repo'" >&2
141
- echo "Filters: path:src/ extension:tsx language:typescript" >&2
238
+ echo "Usage: ghx repos <query>" >&2
239
+ exit 1
240
+ fi
241
+
242
+ response=$(gh api graphql -f query='
243
+ {
244
+ search(query: "'"$query"'", type: REPOSITORY, first: 5) {
245
+ repositoryCount
246
+ nodes {
247
+ ... on Repository {
248
+ nameWithOwner
249
+ description
250
+ stargazerCount
251
+ primaryLanguage { name }
252
+ object(expression: "HEAD:README.md") {
253
+ ... on Blob { text }
254
+ }
255
+ }
256
+ }
257
+ }
258
+ }')
259
+
260
+ echo "$response" | jq -r '.data.search.repositoryCount | "\(.) repos found"' >&2
261
+ echo "$response" | jq -r '
262
+ .data.search.nodes[] |
263
+ .nameWithOwner as $name |
264
+ (.stargazerCount | tostring) as $stars |
265
+ ((.primaryLanguage.name // "?")) as $lang |
266
+ (.description // "") as $desc |
267
+ ((.object.text // "") | gsub("\\n"; " ") | gsub("\\s+"; " ") | gsub("^\\s+"; "") |
268
+ gsub("\\[!\\[[^]]*\\]\\([^)]*\\)\\]\\([^)]*\\)"; "") |
269
+ gsub("!\\[[^]]*\\]\\([^)]*\\)"; "") |
270
+ gsub("\\[!\\[[^]]*\\]\\([^)]*\\)\\]"; "") |
271
+ gsub("\\[![^]]*\\]\\([^)]*\\)"; "") |
272
+ gsub("<img[^>]*>"; "") |
273
+ gsub("<div[^>]*>"; "") | gsub("</div>"; "") |
274
+ gsub("<br[^>]*/?>"; "") |
275
+ gsub("<p[^>]*>"; "") | gsub("</p>"; "") |
276
+ gsub("<a[^>]*>"; "") | gsub("</a>"; "") |
277
+ gsub("\\s+"; " ") | gsub("^\\s+"; "") |
278
+ .[:300]) as $readme |
279
+ "\($name) (\($stars)★ \($lang)) \($desc)" +
280
+ (if ($readme | length) > 0 then "\n " + $readme + (if ($readme | length) >= 300 then "…" else "" end) else "" end)
281
+ '
282
+ ;;
283
+
284
+ # ─── skill: output SKILL.md for agent context injection ───
285
+ skill)
286
+ script_dir="$(dirname "$(readlink -f "$0")")"
287
+ skill_file="$script_dir/SKILL.md"
288
+ if [[ -f "$skill_file" ]]; then
289
+ cat "$skill_file"
290
+ else
291
+ echo "SKILL.md not found at $skill_file" >&2
142
292
  exit 1
143
293
  fi
144
- gh api /search/code --method GET -f q="$query" --jq '.items[] | "\(.repository.full_name) \(.path):\(.name)"'
145
294
  ;;
146
295
 
147
296
  # ─── tree: recursive tree listing via REST API ───
@@ -171,7 +320,10 @@ ghx — GitHub code exploration for agents and humans
171
320
  Commands:
172
321
  ghx explore <owner/repo> [path] Branch + tree + README in 1 API call
173
322
  ghx read <owner/repo> <f1> [f2..] Read 1-10 files in 1 API call
174
- ghx search <query> Code search (full syntax: AND, OR, path:, extension:, language:)
323
+ ghx repos <query> Search repos with README preview in 1 GraphQL call
324
+ ghx search "<query>" Code search (AND matching, qualifiers: repo: path: language: extension: filename:)
325
+ ghx search --full "<query>" Code search without line truncation
326
+ ghx skill Output SKILL.md (for agent context injection)
175
327
  ghx tree <owner/repo> [path] Full recursive tree listing
176
328
 
177
329
  Read flags:
@@ -183,8 +335,9 @@ Examples:
183
335
  ghx read plausible/analytics mix.exs assets/js/dashboard/stats/bar.js
184
336
  ghx read plausible/analytics src/app.ts --grep "useState"
185
337
  ghx read plausible/analytics src/app.ts --lines 42-80
338
+ ghx repos "react state management"
186
339
  ghx search "useState repo:plausible/analytics"
187
- ghx search "bar OR percentage repo:plausible/analytics"
340
+ ghx search "filename:package.json repo:plausible/analytics"
188
341
  ghx tree plausible/analytics assets/js
189
342
  EOF
190
343
  ;;
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@gkoreli/ghx",
3
- "version": "0.1.3",
3
+ "version": "0.2.0",
4
4
  "description": "GitHub code exploration for agents and humans. Batch file reads, code maps, search — all via gh CLI.",
5
5
  "bin": {
6
6
  "ghx": "./ghx"
@@ -11,8 +11,8 @@
11
11
  "type": "git",
12
12
  "url": "https://github.com/gkoreli/ghx"
13
13
  },
14
- "files": ["ghx", "README.md", "LICENSE"],
15
- "os": ["darwin", "linux"],
14
+ "files": ["ghx", "SKILL.md", "README.md", "LICENSE"],
15
+ "os": ["darwin", "linux", "win32"],
16
16
  "engines": {
17
17
  "node": ">=16"
18
18
  }