@gkoreli/ghx 0.2.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,122 +1,117 @@
1
- # ghx
1
+ # ghx — GitHub Code Exploration for AI Agents
2
2
 
3
- GitHub code exploration for agents and humans. One command does what takes 3-5 API calls with any other tool.
3
+ One command does what takes 3-5 API calls. Batch file reads, code maps, search — all via `gh` CLI. Write JS programs that compose operations in one round-trip.
4
4
 
5
- - **Repos** — search repos with README preview in 1 GraphQL call
6
- - **Explore** a repo (tree + README) in 1 API call
7
- - **Read** 1-10 files in 1 API call via GraphQL batching
8
- - **Map** code structure with ~92% token reduction
9
- - **Search** code with AND matching, matching context, token protection
5
+ ## Why
6
+
7
+ AI agents exploring GitHub face a reliability gap: *"Did I find nothing because nothing exists, or because I used the tool wrong?"* Raw `gh` commands have silent failure modes — `gh search code` wraps in quotes without telling you, `gh api contents/` returns base64, README requires a separate call. The agent can't distinguish "no results" from "wrong flags."
10
8
 
11
- Bash script. One dependency: `gh` CLI. Cross-platform (macOS, Linux, Windows via Git Bash/WSL).
9
+ ghx eliminates this by encoding the right defaults into every command. One call returns enough context to decide the next action.
10
+
11
+ | Tool | Files per call | Matching context | Programmable | Dependencies |
12
+ |------|---------------|-----------------|-------------|-------------|
13
+ | GitHub MCP | 1 | No | No (~10K token schemas) | Go binary |
14
+ | `gh` CLI | 1 | No | No (exact phrase, base64, no README) | `gh` |
15
+ | **ghx** | **1-10 (batch)** | **Yes** | **Yes (codemode)** | **`gh`** |
12
16
 
13
17
  ## Install
14
18
 
15
19
  ```bash
16
- # npm (recommended)
17
- npm install -g @gkoreli/ghx
20
+ # Build from source
21
+ cd v2 && go build -o ghx .
18
22
 
19
- # npx (zero install)
20
- npx @gkoreli/ghx --help
23
+ # Homebrew
24
+ brew install gkoreli/tap/ghx
21
25
 
22
- # curl
23
- curl -sf https://raw.githubusercontent.com/gkoreli/ghx/main/install.sh | sh
26
+ # One-liner
27
+ curl -sfL https://raw.githubusercontent.com/gkoreli/ghx/mainline/install.sh | bash
24
28
  ```
25
29
 
26
- Requires [`gh` CLI](https://cli.github.com/) authenticated (`gh auth login`).
27
-
28
- [![npm](https://img.shields.io/npm/v/@gkoreli/ghx)](https://www.npmjs.com/package/@gkoreli/ghx)
29
-
30
- ### Platform Support
31
-
32
- | Platform | Status | Notes |
33
- |----------|--------|-------|
34
- | macOS | ✅ Native | bash + readlink -f (12.3+) |
35
- | Linux | ✅ Native | bash + GNU coreutils |
36
- | Windows | ✅ Git Bash / WSL | Ships with Git for Windows. Raw cmd.exe/PowerShell not supported |
30
+ Requires: [gh CLI](https://cli.github.com/) authenticated (`gh auth login`).
37
31
 
38
- If you have `gh` CLI working, ghx works too — same prerequisites.
39
-
40
- ## Usage
32
+ ## Commands
41
33
 
42
34
  ```bash
43
- # Search repos name, stars, language, README preview in 1 call
44
- ghx repos "react state management"
45
- ghx repos "playwright mcp" --limit 5
35
+ ghx explore <owner/repo> # Branch + tree + README in 1 API call
36
+ ghx explore <owner/repo> <path> # Subdirectory listing
37
+ ghx read <owner/repo> <f1> [f2] [f3] # Read 1-10 files (GraphQL batching)
38
+ ghx search "<query>" # Code search with matching lines
39
+ ghx repos "<query>" # Repo search with README preview
40
+ ghx tree <owner/repo> [path] # Full recursive tree
41
+ ```
46
42
 
47
- # Explore a repo — branch, file tree, and README in 1 API call
48
- ghx explore plausible/analytics
43
+ ## Codemode
49
44
 
50
- # Read multiple files in 1 API call
51
- ghx read plausible/analytics mix.exs assets/js/dashboard/stats/bar.js
45
+ Write JS programs that compose multiple operations in one round-trip. All `codemode.*` calls are synchronous — no `await`. Full TypeScript type stubs with return types are injected into the sandbox.
52
46
 
53
- # Code map — signatures, imports, types only (~92% token reduction)
54
- ghx read plausible/analytics --map lib/plausible/stats/query.ex
47
+ ```bash
48
+ # What branch is this repo on?
49
+ ghx code 'var r = codemode.explore({repo: "vercel/next.js"}); return r.branch;'
55
50
 
56
- # Grep within a remote file (2 lines context)
57
- ghx read plausible/analytics --grep "defmodule" lib/plausible/stats/query.ex
51
+ # Search + read composition
52
+ ghx code 'var hits = codemode.search({query: "useState repo:vercel/next.js", limit: 3});
53
+ var first = codemode.read({repo: "vercel/next.js", files: [hits.matches[0].path]});
54
+ return {file: hits.matches[0].path, lines: first[0].content.split("\n").length};'
58
55
 
59
- # Read specific line range
60
- ghx read plausible/analytics --lines 42-80 lib/plausible/stats/query.ex
56
+ # See what tools and types are available
57
+ ghx code --list
58
+ ```
61
59
 
62
- # Search code (AND matching, shows matching lines, token-protected)
63
- ghx search "useState repo:facebook/react"
64
- ghx search "path:llms.txt extension:txt" --limit 10
60
+ Type stubs tell the LLM exactly what fields exist — no guessing:
65
61
 
66
- # Full recursive tree
67
- ghx tree plausible/analytics assets/js
62
+ ```typescript
63
+ declare const codemode: {
64
+ explore: (input: ExploreInput) => { description: string; branch: string; files: { name: string; type: string }[]; readme: string };
65
+ search: (input: SearchInput) => { total: number; incomplete: boolean; matches: { repo: string; path: string; fragment: string }[] };
66
+ tree: (input: TreeInput) => string[];
67
+ // ...
68
+ }
68
69
  ```
69
70
 
70
- ## Why
71
-
72
- AI agents exploring GitHub face a reliability gap: *"Did I find nothing because nothing exists, or because I used the tool wrong?"* ghx eliminates this with smart defaults — AND matching instead of exact phrase, README previews instead of bare names, matching context instead of bare paths. Unknown flags are rejected loudly (not silently absorbed into queries). The right behavior is the default behavior, and the wrong behavior is impossible.
71
+ ## MCP Server
73
72
 
74
- | Tool | Files per call | Matching context | Smart defaults | Dependencies |
75
- |------|---------------|-----------------|---------------|-------------|
76
- | GitHub MCP | 1 | No | No (~10K token schemas) | Go binary |
77
- | `gh` CLI | 1 | No | No (exact phrase, base64, no README) | `gh` |
78
- | **ghx** | **1-10 (batch)** | **Yes** | **Yes** | **`gh`** |
73
+ ```bash
74
+ ghx serve # stdio (for Claude, Cursor, etc.)
75
+ ghx serve --http :8080 # HTTP transport
76
+ ```
79
77
 
80
- ## Agent Skill Integration
78
+ 7 tools: `explore`, `read`, `search`, `repos`, `tree`, `code` (meta-tool), `search_tools`.
81
79
 
82
- `ghx skill` outputs the full [`SKILL.md`](./SKILL.md) to stdout — designed for eager context injection via spawn hooks:
80
+ ## Agent Integration
83
81
 
84
- ```json
85
- {
86
- "hooks": {
87
- "agentSpawn": [
88
- {"command": "ghx skill"}
89
- ]
90
- }
91
- }
82
+ ```bash
83
+ ghx skill # CLI skill (for SKILL.md injection)
84
+ ghx skill --mcp # MCP skill
92
85
  ```
93
86
 
94
- This loads the skill eagerly into every sessionnot on-demand like a typical skill file. Use this for agent identities where GitHub exploration is a core capability (not occasional). The agent always has the latest ghx knowledge (commands, gotchas, search strategy) without needing to load it mid-conversation.
87
+ Designed for eager context injection via spawn hooksthe agent always has the latest ghx knowledge without loading it mid-conversation.
95
88
 
96
- SKILL.md is included in the npm package and resolved via symlink, so this works with all installation methods.
89
+ ## How It Works
97
90
 
98
- ## Code Map (`--map`)
91
+ Wraps `gh` CLI with GraphQL batching. `repos` and `explore` batch search + metadata + README into 1 call. `read` uses GraphQL aliases to fetch up to 10 files in 1 call. `search` hits REST `/search/code` with `text_matches` for matching context and 200-char token protection.
99
92
 
100
- The `--map` flag extracts only structural declarations imports, exports, function signatures, class definitions, type declarations. Implementation bodies are stripped.
93
+ Codemode runs JS in a [goja](https://github.com/nicholasgasior/goja) sandbox with esbuild TypeScript transpilation. Tools are injected as synchronous functions on a `codemode` global object. Max 20 tool calls per execution, 64KB code size limit.
101
94
 
102
- | File | Full | Map | Reduction |
103
- |------|------|-----|-----------|
104
- | repomix/parseFile.ts | 5,599 | 812 | 86% |
105
- | github-mcp/repositories.go | 68,862 | 1,551 | 97.7% |
106
- | aider/repomap.py | 27,346 | 1,496 | 94.5% |
95
+ ## Architecture
107
96
 
108
- Average: **92% reduction**. An agent can map 16 files in the space of reading 1 file fully.
97
+ ```
98
+ v2/
99
+ ├── pkg/ghx/ — core library (Explore, Read, Search, Repos, Tree)
100
+ ├── pkg/codemode/ — JS executor (goja sandbox, TS transpilation, type generation)
101
+ └── cmd/ — CLI frontend (cobra) + MCP server (mcp-go)
102
+ ```
109
103
 
110
- Supported: TypeScript/JavaScript, Python, Go, Rust, Java/Kotlin, Ruby. Generic fallback for unknown extensions.
104
+ See [docs/adr/](docs/adr/) for architectural decisions.
111
105
 
112
- ## How It Works
106
+ ## v1 (bash)
113
107
 
114
- `ghx` wraps `gh` CLI with GraphQL batching. `repos` and `explore` use GraphQL to batch search + metadata + README into 1 call. `read` uses GraphQL aliases to fetch up to 10 files in 1 call. `search` hits the REST `/search/code` endpoint with `text_matches` for matching context and 200-char token protection.
108
+ The original bash implementation is in [`v1/`](v1/). Zero dependencies beyond `gh` and `jq` useful if you just want a shell script you can drop anywhere without compiling Go.
109
+
110
+ ```bash
111
+ npm install -g @gkoreli/ghx # npm
112
+ cp v1/ghx /usr/local/bin/ghx # manual
113
+ ```
115
114
 
116
115
  ## License
117
116
 
118
117
  MIT
119
-
120
- ## For AI Agents
121
-
122
- See [`SKILL.md`](./SKILL.md) for agent-optimized instructions — chain of thought, gotchas, anti-patterns, and examples.
package/package.json CHANGED
@@ -1,19 +1,40 @@
1
1
  {
2
2
  "name": "@gkoreli/ghx",
3
- "version": "0.2.1",
4
- "description": "GitHub code exploration for agents and humans. Batch file reads, code maps, search all via gh CLI.",
3
+ "version": "2.0.2",
4
+ "description": "Agent-first GitHub code exploration. GraphQL batching, code maps, codemode TypeScript sandbox, MCP server.",
5
5
  "bin": {
6
6
  "ghx": "./ghx"
7
7
  },
8
- "keywords": ["github", "cli", "code-exploration", "agent", "graphql", "code-map"],
8
+ "keywords": [
9
+ "github",
10
+ "cli",
11
+ "code-exploration",
12
+ "agent",
13
+ "graphql",
14
+ "code-map"
15
+ ],
9
16
  "license": "MIT",
10
17
  "repository": {
11
18
  "type": "git",
12
19
  "url": "https://github.com/gkoreli/ghx"
13
20
  },
14
- "files": ["ghx", "SKILL.md", "README.md", "LICENSE"],
15
- "os": ["darwin", "linux", "win32"],
21
+ "files": [
22
+ "ghx",
23
+ "postinstall.js",
24
+ "v2/SKILL.md",
25
+ "v2/MCP-SKILL.md",
26
+ "README.md",
27
+ "LICENSE"
28
+ ],
29
+ "os": [
30
+ "darwin",
31
+ "linux",
32
+ "win32"
33
+ ],
16
34
  "engines": {
17
35
  "node": ">=16"
36
+ },
37
+ "scripts": {
38
+ "postinstall": "node postinstall.js"
18
39
  }
19
40
  }
package/postinstall.js ADDED
@@ -0,0 +1,59 @@
1
+ #!/usr/bin/env node
2
+ // Downloads the ghx Go binary from GitHub releases on npm install
3
+ const { execSync } = require("child_process");
4
+ const fs = require("fs");
5
+ const path = require("path");
6
+ const https = require("https");
7
+
8
+ const REPO = "gkoreli/ghx";
9
+ const BIN = path.join(__dirname, "ghx");
10
+
11
+ const PLATFORM_MAP = { darwin: "darwin", linux: "linux", win32: "windows" };
12
+ const ARCH_MAP = { x64: "amd64", arm64: "arm64" };
13
+
14
+ const os = PLATFORM_MAP[process.platform];
15
+ const arch = ARCH_MAP[process.arch];
16
+ if (!os || !arch) {
17
+ console.error(`Unsupported platform: ${process.platform}/${process.arch}`);
18
+ process.exit(1);
19
+ }
20
+
21
+ const ext = os === "windows" ? "zip" : "tar.gz";
22
+ const url = `https://github.com/${REPO}/releases/latest/download/ghx_${os}_${arch}.${ext}`;
23
+
24
+ function download(url, dest) {
25
+ return new Promise((resolve, reject) => {
26
+ https.get(url, (res) => {
27
+ if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
28
+ return download(res.headers.location, dest).then(resolve, reject);
29
+ }
30
+ if (res.statusCode !== 200) return reject(new Error(`HTTP ${res.statusCode} for ${url}`));
31
+ const file = fs.createWriteStream(dest);
32
+ res.pipe(file);
33
+ file.on("finish", () => file.close(resolve));
34
+ }).on("error", reject);
35
+ });
36
+ }
37
+
38
+ async function main() {
39
+ const tmp = path.join(__dirname, `ghx-download.${ext}`);
40
+ try {
41
+ console.log(`Downloading ghx (${os}/${arch})...`);
42
+ await download(url, tmp);
43
+ if (ext === "tar.gz") {
44
+ execSync(`tar xzf "${tmp}" -C "${__dirname}" ghx`, { stdio: "pipe" });
45
+ } else {
46
+ execSync(`unzip -o "${tmp}" ghx.exe -d "${__dirname}"`, { stdio: "pipe" });
47
+ }
48
+ fs.chmodSync(BIN, 0o755);
49
+ console.log("ghx installed successfully");
50
+ } finally {
51
+ try { fs.unlinkSync(tmp); } catch {}
52
+ }
53
+ }
54
+
55
+ main().catch((e) => {
56
+ console.error(`Failed to install ghx: ${e.message}`);
57
+ console.error("Install manually: https://github.com/gkoreli/ghx#install");
58
+ process.exit(1);
59
+ });
@@ -0,0 +1,175 @@
1
+ ---
2
+ name: ghx-mcp
3
+ description: GitHub code exploration via MCP. 7 tools — 5 direct + code meta-tool + search_tools. The code tool lets you write JS programs that compose operations in one round-trip.
4
+ ---
5
+
6
+ # ghx MCP — GitHub Code Exploration via MCP
7
+
8
+ 7 tools for GitHub exploration. 5 direct tools for simple queries. 1 `code` meta-tool for complex multi-step operations. 1 `search_tools` for discovery.
9
+
10
+ ## Tools
11
+
12
+ ### Direct Tools (simple one-shot queries)
13
+
14
+ | Tool | Input | What it does |
15
+ |------|-------|-------------|
16
+ | `explore` | `repo` (required), `path` | Branch, file tree, README in 1 API call |
17
+ | `read` | `repo` + `paths` (required), `grep`, `lines`, `map` | Read 1-10 files in 1 API call. `map` = signatures only (~92% reduction) |
18
+ | `search` | `query` (required), `limit`, `full` | Code search with AND matching + matching lines |
19
+ | `repos` | `query` (required), `limit` | Search repos with README preview |
20
+ | `tree` | `repo` (required), `path` | Full recursive file tree |
21
+
22
+ ### Meta-Tools (compose operations)
23
+
24
+ | Tool | Input | What it does |
25
+ |------|-------|-------------|
26
+ | `code` | `code` (required) | Execute JS that calls any combination of the 5 tools above |
27
+ | `search_tools` | `query` (optional) | List available tools with TypeScript type stubs |
28
+
29
+ ## When to Use `code` vs Direct Tools
30
+
31
+ **Direct tools** — simple, single-operation queries:
32
+ ```
33
+ explore({ repo: "vercel/next.js" }) → one call, one result
34
+ read({ repo: "vercel/next.js", paths: "README.md" }) → one call, one result
35
+ ```
36
+
37
+ **`code` tool** — multi-step, filtering, conditional logic:
38
+ ```javascript
39
+ // Explore → filter → read in ONE round-trip (not three)
40
+ var repo = codemode.explore({ repo: "vercel/next.js" });
41
+ var tsFiles = repo.files.filter(f => f.name.endsWith(".ts")).slice(0, 5);
42
+ var contents = codemode.read({ repo: "vercel/next.js", files: tsFiles.map(f => f.name), map: true });
43
+ return contents;
44
+ ```
45
+
46
+ **Rule of thumb:** If you need the result of tool A to decide what to call for tool B → use `code`. Otherwise use direct tools.
47
+
48
+ ## The `code` Tool
49
+
50
+ ### Available API
51
+
52
+ ```typescript
53
+ type ExploreInput = { repo: string; path?: string }
54
+ type ReadInput = { repo: string; files: string[]; grep?: string; map?: boolean }
55
+ type ReposInput = { query: string; limit?: number }
56
+ type SearchInput = { query: string; limit?: number; fullMode?: boolean }
57
+ type TreeInput = { repo: string; path?: string }
58
+
59
+ declare const codemode: {
60
+ explore: (input: ExploreInput) => any;
61
+ read: (input: ReadInput) => any;
62
+ repos: (input: ReposInput) => any;
63
+ search: (input: SearchInput) => any;
64
+ tree: (input: TreeInput) => any;
65
+ }
66
+ ```
67
+
68
+ Use `search_tools` to get the latest type stubs at runtime.
69
+
70
+ ### Writing Code
71
+
72
+ ```javascript
73
+ // ✅ Correct: plain JavaScript, synchronous calls, return a value
74
+ var repo = codemode.explore({ repo: "dop251/goja" });
75
+ return { branch: repo.branch, fileCount: repo.files.length };
76
+
77
+ // ❌ Wrong: await (tools are synchronous, await causes transpile error)
78
+ const r = await codemode.explore(...) // top-level await not supported
79
+
80
+ // ❌ Wrong: TypeScript syntax
81
+ const r: ExploreResult = codemode.explore(...) // no type annotations
82
+
83
+ // ❌ Wrong: no return value
84
+ codemode.explore({ repo: "dop251/goja" }); // result is lost
85
+ ```
86
+
87
+ ### Rules
88
+
89
+ - Write plain JavaScript, not TypeScript (no type annotations, interfaces, generics)
90
+ - `codemode.*` calls are synchronous — do NOT use `await`
91
+ - Must `return` a value — the return value is what you see in the response
92
+ - `console.log()` output appears in the response under "Console:" (useful for debugging)
93
+ - Max 20 tool calls per execution
94
+ - Max 64KB code size
95
+ - Response truncated at 24K chars (use `map: true` or `grep` to reduce output)
96
+
97
+ ### Patterns
98
+
99
+ **Explore and filter:**
100
+ ```javascript
101
+ var repo = codemode.explore({ repo: "owner/repo" });
102
+ var goFiles = repo.files.filter(f => f.name.endsWith(".go"));
103
+ return goFiles.map(f => f.name);
104
+ ```
105
+
106
+ **Map multiple files (understand structure before reading):**
107
+ ```javascript
108
+ var repo = codemode.explore({ repo: "owner/repo" });
109
+ var srcFiles = repo.files.filter(f => f.name.startsWith("src/") && f.type === "blob").slice(0, 8);
110
+ return codemode.read({ repo: "owner/repo", files: srcFiles.map(f => f.name), map: true });
111
+ ```
112
+
113
+ **Search then read matches:**
114
+ ```javascript
115
+ var hits = codemode.search({ query: "GraphQL repo:owner/repo", limit: 5 });
116
+ var files = hits.results.map(r => r.path);
117
+ return codemode.read({ repo: "owner/repo", files: files });
118
+ ```
119
+
120
+ **Conditional logic:**
121
+ ```javascript
122
+ var repo = codemode.explore({ repo: "owner/repo" });
123
+ var hasGo = repo.files.some(f => f.name === "go.mod");
124
+ var hasPython = repo.files.some(f => f.name === "requirements.txt");
125
+ if (hasGo) {
126
+ return { lang: "go", mod: codemode.read({ repo: "owner/repo", files: ["go.mod"] }) };
127
+ } else if (hasPython) {
128
+ return { lang: "python", reqs: codemode.read({ repo: "owner/repo", files: ["requirements.txt"] }) };
129
+ }
130
+ return { lang: "unknown", files: repo.files.slice(0, 10) };
131
+ ```
132
+
133
+ ## Chain of Thought
134
+
135
+ **Always start surgical, escalate only when needed.**
136
+
137
+ ```
138
+ 1. explore({ repo: "owner/repo" }) → What's in this repo?
139
+ 2. read({ repo: "...", paths: "f1,f2", map: true }) → What do these files define? (92% fewer tokens)
140
+ 3. read({ repo: "...", paths: "f1", grep: "pattern" }) → Where exactly is X?
141
+ 4. read({ repo: "...", paths: "f1" }) → Full file (only when needed)
142
+ ```
143
+
144
+ **When to escalate to `code`:**
145
+ - Step 1 result determines what to do in step 2 → use `code` (one round-trip)
146
+ - Need to filter/transform results before returning → use `code`
147
+ - Simple single query → use direct tool
148
+
149
+ ## Search Query Syntax
150
+
151
+ Same as GitHub REST code search API. Multi-word = AND matching.
152
+
153
+ **Valid qualifiers:** `repo:`, `org:`, `path:`, `filename:`, `extension:`, `language:`, `in:file`, `in:path`
154
+
155
+ **DO NOT USE (web-only, silently wrong):** `OR`, `NOT`, `symbol:`, `content:`, `is:`, regex
156
+
157
+ **Rate limit:** 9 req/min for code search. Refine queries, don't paginate.
158
+
159
+ ## Gotchas
160
+
161
+ 1. **`read` paths are comma-separated.** `paths: "f1.go,f2.go"` not `paths: ["f1.go", "f2.go"]`.
162
+ 2. **Response truncation at 24K chars.** Use `map: true` or `grep` to keep results small. The `code` tool is especially prone to this when reading multiple full files.
163
+ 3. **`search` uses AND matching.** `"foo bar"` finds files with both words anywhere. For exact phrase, wrap in escaped quotes: `"\"foo bar\""`.
164
+ 4. **Web-only qualifiers silently degrade.** `symbol:`, `OR`, `NOT` are treated as literal text in the REST API.
165
+ 5. **`code` tool: no TypeScript.** Type stubs are for your reference. Write plain JS.
166
+ 6. **`code` tool: return is required.** Bare expressions don't auto-return (except simple identifiers). Always use `return`.
167
+
168
+ ## Anti-Patterns
169
+
170
+ - ❌ Three sequential tool calls when `code` can do it in one
171
+ - ❌ Reading full files when you need 10 lines — use `grep` or `map: true`
172
+ - ❌ Paginating broad searches — refine with `repo:`, `language:`, `path:`
173
+ - ❌ Writing TypeScript in the `code` tool — stripped by transpiler but may cause subtle issues
174
+ - ❌ Forgetting `return` in `code` — you get `undefined` back
175
+ - ❌ Reading 10 full files in `code` — hits 24K truncation. Use `map: true` first
package/v2/SKILL.md ADDED
@@ -0,0 +1,129 @@
1
+ ---
2
+ name: ghx
3
+ description: GitHub code exploration for AI agents. CLI + codemode. One command does what takes 3-5 API calls. Write JS programs that compose operations in one round-trip.
4
+ ---
5
+
6
+ # ghx — GitHub Code Exploration for AI Agents
7
+
8
+ Use `ghx` via `execute_bash` for anything on GitHub — repos, files, code search, codemode. Authenticated via `gh` CLI, structured output, zero context overhead.
9
+
10
+ ## Commands
11
+
12
+ ```bash
13
+ ghx explore <owner/repo> # Branch + tree + README in 1 API call
14
+ ghx explore <owner/repo> <path> # Subdirectory listing
15
+ ghx read <owner/repo> <f1> [f2] [f3] # Read 1-10 files in 1 API call (GraphQL batching)
16
+ ghx read <owner/repo> --map <f1> [f2] # Structural map: signatures, imports, types (~92% token reduction)
17
+ ghx read <owner/repo> --grep "pat" <f> # Read file, show only matching lines (2 lines context)
18
+ ghx read <owner/repo> --lines 42-80 <f> # Read specific line range
19
+ ghx repos "<query>" # Search repos with README preview in 1 GraphQL call
20
+ ghx search "<query>" # Code search (AND matching, shows matching lines)
21
+ ghx search --full "<query>" # Code search without line truncation
22
+ ghx tree <owner/repo> [path] # Full recursive tree listing
23
+ ghx code "<js>" # Execute JS with access to all ghx tools
24
+ ghx code - # Read code from stdin
25
+ ghx code --list # List available tools with type stubs
26
+ ```
27
+
28
+ **Exit codes:** 0 = success, 1 = no results, 2 = usage error.
29
+
30
+ ## Chain of Thought: Progressive Disclosure
31
+
32
+ **Start surgical, escalate only when needed.**
33
+
34
+ ```
35
+ 1. ghx explore owner/repo → What's in this repo? (structure + README)
36
+ 2. ghx read owner/repo --map *.ts → What do these files define? (signatures only, 92% fewer tokens)
37
+ 3. ghx read owner/repo --grep "X" f → Where exactly is X in this file? (targeted lines)
38
+ 4. ghx read owner/repo f → Show me the full file (only when needed)
39
+ ```
40
+
41
+ At 92% reduction, `--map` lets you scan 7 files in the space of reading 1 full file.
42
+
43
+ ## When to Use `ghx code`
44
+
45
+ Use direct commands for simple one-shot queries. Use `ghx code` when you need to:
46
+ - **Chain operations** — explore → filter → read in one shot
47
+ - **Filter results** — JS logic runs locally, not in the LLM
48
+ - **Conditional logic** — read different files based on what you find
49
+
50
+ ```bash
51
+ # Simple: use direct command
52
+ ghx explore vercel/next.js
53
+
54
+ # Complex: use ghx code (one round-trip instead of three)
55
+ ghx code "
56
+ var repo = codemode.explore({ repo: 'vercel/next.js' });
57
+ var goFiles = repo.files.filter(f => f.name.endsWith('.go'));
58
+ var contents = codemode.read({ repo: 'vercel/next.js', files: goFiles.map(f => f.name) });
59
+ return contents;
60
+ "
61
+ ```
62
+
63
+ ### Codemode API
64
+
65
+ ```bash
66
+ ghx code --list # See all available tools with type stubs
67
+ ```
68
+
69
+ ```typescript
70
+ declare const codemode: {
71
+ explore: (input: { repo: string; path?: string }) => any;
72
+ read: (input: { repo: string; files: string[]; grep?: string; map?: boolean }) => any;
73
+ repos: (input: { query: string; limit?: number }) => any;
74
+ search: (input: { query: string; limit?: number; fullMode?: boolean }) => any;
75
+ tree: (input: { repo: string; path?: string }) => any;
76
+ }
77
+ ```
78
+
79
+ ### Codemode Rules
80
+
81
+ - Write plain JavaScript, not TypeScript (no type annotations)
82
+ - `codemode.*` calls are synchronous — no `await` needed
83
+ - Must `return` a value — bare expressions don't auto-return (except simple identifiers)
84
+ - Must `return` a value — bare expressions don't auto-return (except simple identifiers)
85
+ - Console output goes to stderr, return value goes to stdout
86
+ - Max 20 tool calls per execution, 64KB code size limit
87
+
88
+ ## Search Query Syntax
89
+
90
+ `ghx search` uses GitHub REST code search API. Multi-word = AND matching (both words anywhere in file).
91
+
92
+ ```bash
93
+ ghx search "addClass repo:jquery/jquery" # Scoped to repo
94
+ ghx search "useState language:typescript" # Language filter
95
+ ghx search "filename:package.json repo:owner/repo" # Find specific filename
96
+ ghx search '"exact phrase" repo:plausible/analytics' # Exact phrase (shell quotes)
97
+ ghx search "path:llms.txt" # Find files by name
98
+ ```
99
+
100
+ **Valid qualifiers:** `repo:`, `org:`, `path:`, `filename:`, `extension:`, `language:`, `in:file`, `in:path`, `size:`, `fork:true`
101
+
102
+ **DO NOT USE (web-only, silently wrong):** `OR`, `NOT`, `symbol:`, `content:`, `is:`, regex
103
+
104
+ **Rate limit:** 9 req/min for code search. Refine queries, don't paginate.
105
+
106
+ ## Gotchas
107
+
108
+ 1. **Web-only qualifiers silently degrade.** `symbol:`, `OR`, `NOT` are treated as literal text. ghx warns on stderr.
109
+ 2. **`gh search code` wraps in quotes.** `gh search code "foo bar"` = exact phrase. `ghx search "foo bar"` = AND. Use ghx.
110
+ 3. **Flag ordering in `read`.** `ghx read owner/repo file --map` works. `ghx read --map owner/repo file` does NOT.
111
+ 4. **Not all repos use `main`.** ghx handles this automatically.
112
+ 5. **Unknown flags are rejected.** Exit 2 with clear error. Intentional — prevents silent query corruption.
113
+
114
+ ## Anti-Patterns
115
+
116
+ - ❌ `web_fetch` on github.com — HTML noise, zero useful info
117
+ - ❌ Reading entire large files when you need 10 lines — use `--grep` or `--lines`
118
+ - ❌ Multiple `gh api` calls for explore — use `ghx explore` (1 call)
119
+ - ❌ Using web-only qualifiers in search — silently wrong results
120
+ - ❌ Paginating broad searches — refine with `repo:`, `language:`, `path:` instead
121
+ - ❌ `gh search code` for multi-word queries — silently wraps in quotes, returns nothing
122
+
123
+ ## Best Practices
124
+
125
+ - **Batch reads.** `ghx read repo f1 f2 f3` = 1 API call. Three separate reads = 3 calls.
126
+ - **Map before reading.** `--map` first, then `--grep` or `--lines` for specifics.
127
+ - **Refine search, don't paginate.** Add qualifiers instead of fetching page 2.
128
+ - **Use `ghx code` for multi-step workflows.** One round-trip beats three sequential commands.
129
+ - **Check exit codes.** 0 = results, 1 = no results (broaden query), 2 = usage error (fix command).
package/SKILL.md DELETED
@@ -1,314 +0,0 @@
1
- ---
2
- name: ghx
3
- description: GitHub code exploration for AI agents. Use for repo exploration, reading remote files, code search, code maps. Wraps gh CLI with GraphQL batching — one command does what takes 3-5 API calls.
4
- ---
5
-
6
- # ghx — GitHub Code Exploration for AI Agents
7
-
8
- Use `ghx` via `execute_bash` for anything on GitHub — repos, files, code search. Authenticated via `gh` CLI, structured output, zero context overhead.
9
-
10
- ## Why This Exists
11
-
12
- Agents exploring GitHub face a reliability gap: *"Did I find nothing because nothing exists, or because I used the tool wrong?"* Raw `gh` commands have silent failure modes — `gh search code` wraps in quotes without telling you, `gh api contents/` returns base64, README requires a separate call. The agent can't distinguish "no results" from "wrong flags."
13
-
14
- ghx eliminates this by encoding the right defaults into every command. One call returns enough context to decide the next action. You opt into the ghx skill and stop worrying about whether you searched correctly — the right behavior is the default behavior.
15
-
16
- ## Commands
17
-
18
- ```bash
19
- ghx explore <owner/repo> # Branch + tree + README in 1 API call
20
- ghx explore <owner/repo> <path> # Subdirectory listing
21
- ghx read <owner/repo> <f1> [f2] [f3] # Read 1-10 files in 1 API call (GraphQL batching)
22
- ghx read <owner/repo> --map <f1> [f2] # Structural map: signatures, imports, types (~92% token reduction)
23
- ghx read <owner/repo> --grep "pat" <f> # Read file, show only matching lines (2 lines context)
24
- ghx read <owner/repo> --lines 42-80 <f> # Read specific line range
25
- ghx repos "<query>" # Search repos with README preview (default: 10 results)
26
- ghx repos "<query>" --limit 5 # Limit repo results (max: 20)
27
- ghx search "<query>" # Code search (AND matching, default: 30 results)
28
- ghx search "<query>" --limit 10 # Limit code search results (max: 100)
29
- ghx search --full "<query>" # Code search without line truncation (for minified files)
30
- ghx tree <owner/repo> [path] # Full recursive tree listing
31
- ```
32
-
33
- **Exit codes:** 0 = results returned, 1 = no results (query valid), 2 = usage error (bad flags/args).
34
- **Flag safety:** Unknown flags always error (exit 2). Never silently absorbed into queries.
35
-
36
- ## Chain of Thought: Progressive Disclosure
37
-
38
- **Always start surgical, escalate only when needed.** This mirrors how developers work: scan structure → identify interesting files → read specific sections.
39
-
40
- ```
41
- 1. ghx explore owner/repo → What's in this repo? (structure + README)
42
- 2. ghx read owner/repo --map *.ts → What do these files define? (signatures only, 92% fewer tokens)
43
- 3. ghx read owner/repo --grep "X" f → Where exactly is X in this file? (targeted lines)
44
- 4. ghx read owner/repo f → Show me the full file (only when needed)
45
- ```
46
-
47
- **Why this order matters:** At 92% reduction, `--map` lets you scan 7 files in the space of reading 1 full file. The agent can understand an entire module's structure before committing context to any single file. Aider's docs confirm: *"The LLM can see classes, methods and function signatures from everywhere in the repo. This alone may give it enough context to solve many tasks."*
48
-
49
- **When to escalate beyond ghx:**
50
- - "Understand this entire module" → `gitingest https://github.com/owner/repo/tree/branch/path -i "*.ts" -o - 2>/dev/null`
51
- - "Compressed view of a codebase" → `npx repomix --remote owner/repo --compress --include "src/**" --stdout`
52
-
53
- ## Search Query Syntax
54
-
55
- `ghx search` uses the GitHub REST code search API (legacy). Multi-word queries use AND matching — both words must appear in the file but not necessarily adjacent. This is different from `gh search code` which silently wraps in quotes (exact phrase).
56
-
57
- **Output format:**
58
- ```
59
- 201472 results (showing 30) ← stderr (total + page count)
60
- jquery/jquery src/attributes/classes.js: addClass: function( value ) { ← stdout (repo path: matching line)
61
- ```
62
-
63
- Agents get: result count (stderr) + one line per result with matching context (stdout).
64
-
65
- ```bash
66
- ghx search "addClass repo:jquery/jquery" # Scoped to repo
67
- ghx search "useState language:typescript" # Language filter
68
- ghx search "filename:package.json repo:owner/repo" # Find specific filename
69
- ghx search "form path:cgi-bin extension:py" # Path + extension filter
70
- ghx search '"progress_bar" repo:plausible/analytics' # Exact phrase (shell quotes around double quotes)
71
- ghx search "path:llms.txt" # Find files by name
72
- ```
73
-
74
- **Valid REST API qualifiers:** `repo:`, `org:`, `user:`, `path:`, `filename:`, `extension:`, `language:`, `in:file`, `in:path`, `size:`, `fork:true`
75
-
76
- **Web-only (DO NOT USE — silently treated as literal text):** `OR`, `NOT`, `symbol:`, `content:`, `is:`, regex (`/pattern/`), `enterprise:`, glob in `path:`. ghx warns on stderr if you use these.
77
-
78
- **Rate limit:** 9 req/min for code search (strictest endpoint). Authentication required — `gh auth login` first.
79
-
80
- **Special characters:** Dots act as word separators, not wildcards. `console.log` matches files with both `console` and `log` — it does NOT match `consolelog`.
81
-
82
- ## Search Strategy for Agents
83
-
84
- **Search is the entry point.** Agents search first, then read. Bad search = wasted follow-up reads = token explosion. ghx search is designed to give you enough context to decide your next action in one call.
85
-
86
- ### Reading search output
87
-
88
- ```
89
- 90 results (showing 30) ← stderr: is this too broad?
90
- ⚠ Lines truncated to 200 chars (use --full for complete fragments) ← stderr: token protection kicked in
91
- ⚠ Query too broad — add repo:, language:, or path: to narrow ← stderr: >1000 results
92
- jquery/jquery src/attributes/classes.js: addClass: function( value ) { ← stdout: repo path: matching line
93
- ```
94
-
95
- **Decision tree after seeing results:**
96
- - `0 results` → query too specific, broaden (remove qualifiers, try synonyms)
97
- - `1-30 results` → good. Scan matching lines, `ghx read` the relevant files
98
- - `30-1000 results` → workable but noisy. Add `repo:`, `language:`, or `path:` to narrow
99
- - `>1000 results` → too broad. MUST add qualifiers before trusting results
100
- - `⚠ incomplete` → query timed out, results are partial. Narrow the scope
101
-
102
- ### Token protection (safe by default)
103
-
104
- ghx truncates each matching line to 200 chars. This prevents minified JS files (10,000+ char lines) from exploding your context window. One untruncated minified result can consume more tokens than the other 29 results combined.
105
-
106
- - **Default**: 200 char truncation. You see `⚠ Lines truncated` on stderr only when it triggers.
107
- - **`--full`**: Disables truncation. Use when you specifically need the complete matching line.
108
- - **When to use `--full`**: Almost never. The truncated line is enough to decide "relevant" or "skip." Use `ghx read` to get the full file context after you've identified the right file.
109
-
110
- ### Search refinement chain of thought
111
-
112
- ```
113
- 1. ghx search "useState" → 201K results. Too broad.
114
- 2. ghx search "useState language:typescript" → 50K results. Still broad.
115
- 3. ghx search "useState repo:vercel/next.js" → 89 results. Workable.
116
- 4. ghx search "useState path:packages/next extension:tsx repo:vercel/next.js" → 12 results. Surgical.
117
- ```
118
-
119
- **Refine, don't paginate.** At 9 req/min, pagination burns rate limit on the same broad query. Adding one qualifier is always better than fetching page 2.
120
-
121
- ### Two search systems (why some things don't work)
122
-
123
- GitHub has two code search engines. The REST API (what ghx uses) is the legacy one. The web UI uses Blackbird (new). No programmatic tool — ghx, `gh` CLI, GitHub MCP, Octocode — can access Blackbird. This is a platform limitation, not a ghx limitation.
124
-
125
- **What this means for agents:**
126
- - `OR`, `NOT`, `symbol:`, regex, `content:`, `is:` → web-only, don't use
127
- - `repo:`, `path:`, `filename:`, `language:`, `extension:`, `in:`, `size:`, `fork:` → work in REST API
128
- - ghx warns on stderr if you use web-only qualifiers, but the results will be wrong
129
-
130
- ### ghx search vs `gh search code`
131
-
132
- | Behavior | `ghx search` | `gh search code` |
133
- |---|---|---|
134
- | Multi-word matching | AND (both words anywhere) | Exact phrase (words must be adjacent) |
135
- | Matching context | Shows matching line per result | No matching context |
136
- | Result count | stderr: "90 results (showing 30)" | Not shown |
137
- | Token protection | 200 char truncation, `--full` opt-out | None |
138
- | Web-only warnings | Warns on stderr | Silent |
139
- | Rate limit | Same (9 req/min) | Same |
140
-
141
- AND matching is almost always what agents want. `gh search code "useState fetchData"` returns zero results if the words aren't adjacent — with no error. `ghx search "useState fetchData"` finds files containing both terms.
142
-
143
- ## Gotchas
144
-
145
- 1. **Web-only qualifiers silently degrade.** `symbol:`, `OR`, `NOT`, `content:`, `is:`, regex — these only work in GitHub's new web code search (Blackbird). The REST API treats them as literal text. `symbol:foo` searches for the TEXT "symbol:foo" inside files. ghx warns on stderr, but the results will be wrong. No programmatic tool can use these features — it's a GitHub platform limitation.
146
-
147
- 2. **`filename:` vs `path:` — both valid, different systems.** `filename:package.json` works in the REST API (legacy) for exact filename match. `path:` also works and is more flexible (matches directories too). In the NEW web code search, only `path:` works — `filename:` is not recognized. Since ghx uses the REST API, both work.
148
-
149
- 3. **`language:markdown` won't find `.txt` files.** GitHub's linguist detection doesn't classify .txt as markdown. Use `extension:txt` instead. `language:` = linguist detection, `extension:` = literal file extension.
150
-
151
- 4. **`gh search code` silently wraps queries in quotes.** `gh search code "foo bar"` sends `q="foo bar"` (exact phrase), not `q=foo bar` (AND). If the words aren't adjacent in the file, you get zero results with no error. `ghx search` sends AND queries — both words must appear but in any order. This is almost always what you want. ghx also shows result count on stderr and matching line context — `gh search code` shows neither.
152
-
153
- 5. **GraphQL returns null for missing paths.** `object(expression: "branch:path")` returns null silently if the path doesn't exist. No error. `ghx` handles this, but if using `gh api graphql` directly, check for null.
154
-
155
- 6. **Flag ordering in `read` command.** `ghx read owner/repo file --map` works. `ghx read --map owner/repo file` does NOT — repo must be the first positional arg.
156
-
157
- 7. **Not all repos use `main`.** cli/cli uses `trunk`, others use `master`. `ghx` handles this automatically. For raw `gh api` calls, query the default branch first: `gh repo view owner/repo --json defaultBranchRef --jq '.defaultBranchRef.name'`
158
-
159
- 8. **`gh` field names are inconsistent.** `stargazersCount` (search) vs `stargazerCount` (repo view). Always check with `--json` (no fields) to see available fields for any command.
160
-
161
- 9. **`gh api repos/.../contents/` returns base64 by default.** Without `-H "Accept: application/vnd.github.raw+json"`, you get a JSON blob with base64-encoded content. `ghx read` returns plain text via GraphQL — no decoding needed.
162
-
163
- 10. **`gh search repos` and `gh search code` use different rate limit pools.** Repo search: 30/min (generous). Code search: 10/min (restrictive). Don't assume one rate limit applies to both.
164
-
165
- 11. **Unknown flags are rejected, not silently absorbed.** `ghx search "query" --json` exits 2 with a clear error. This is intentional — silent flag absorption was the #1 cause of agent failures (flags like `--limit` would get concatenated into the query string, corrupting it). If you get exit 2, check your flags.
166
-
167
- ## Anti-Patterns
168
-
169
- - ❌ `web_fetch`/`web_search` on github.com — returns HTML noise, wastes thousands of tokens for zero useful information
170
- - ❌ `gh api repos/.../contents/<path>` WITHOUT `-H "Accept: application/vnd.github.raw+json"` — returns base64-encoded JSON blob instead of readable text
171
- - ❌ Reading entire large files when you need 10 lines — use `--grep "pattern"` or `--lines N-M`
172
- - ❌ Multiple sequential `gh api` calls for explore workflows — use `ghx explore` (1 GraphQL call) or `ghx read` (batch files)
173
- - ❌ Using web-only qualifiers (`OR`, `NOT`, `symbol:`, regex) in `ghx search` — silently treated as literal text, returns wrong results. ghx warns but can't prevent it
174
- - ❌ Firing multiple code search requests in parallel — 9 req/min rate limit, you'll get 403s
175
- - ❌ Dumping entire repos into context for a specific question — use targeted `ghx` commands. Reserve `gitingest`/`repomix` for "understand this whole module" tasks
176
- - ❌ Relying on `gh search code` for multi-word queries — silently wraps in quotes (exact phrase), returns nothing when words aren't adjacent. Use `ghx search` (AND matching + matching context)
177
- - ❌ Using `ghx search` to find repos — ghx search is for code. Use `ghx repos "query"` for repo discovery
178
- - ❌ Using `gh` for batch file reads — 1 API call per file, base64 encoded. Use `ghx read repo f1 f2 f3` (1 GraphQL call, plain text)
179
- - ❌ Using `gh repo view` to explore a repo — gets metadata but not tree listing or README content in one call. Use `ghx explore` (1 call for all three)
180
-
181
- ## Best Practices
182
-
183
- - **Batch file reads.** `ghx read owner/repo f1 f2 f3` = 1 API call. Three separate reads = 3 calls.
184
- - **Map before reading.** `ghx read --map` first to understand structure, then `--grep` or `--lines` for specifics.
185
- - **Refine search, don't paginate.** If `ghx search` shows "201472 results (showing 30)", add qualifiers (`repo:`, `language:`, `path:`) — don't try to page through. 9 req/min rate limit makes pagination expensive.
186
- - **Use `--limit` to control token budget.** `ghx repos "query" --limit 5` for quick checks, `--limit 15` for thorough discovery. `ghx search "query" --limit 10` when you only need top results.
187
- - **Check exit codes.** 0 = got results, 1 = no results (query was valid, broaden it), 2 = usage error (fix your command).
188
- - **Use `gh api --cache 1h`** for repeated lookups when using raw `gh` commands.
189
- - **Use `--json fields --jq 'expr'`** on `gh` commands to get structured output and reduce noise.
190
- - **Piped output is machine-formatted.** Tab-delimited, no truncation, no color codes — agents always get clean output.
191
-
192
- ## The `--map` Flag: Why It Matters
193
-
194
- `--map` extracts only structural declarations (imports, exports, function/class/type signatures) via per-language regex patterns. Tested on 6 real files across TypeScript, Python, Go:
195
-
196
- | Metric | Result |
197
- |--------|--------|
198
- | Average token reduction | 92% |
199
- | Files scannable per context window | 7x more than full reads |
200
- | Implementation | ~15 lines of bash, zero dependencies |
201
-
202
- Output includes line numbers and token stats:
203
- ```
204
- === src/core/parseFile.ts (5544 bytes) ===
205
- 21:import type { RepomixConfigMerged } from '../../config/configSchema.js';
206
- 35:export const CHUNK_SEPARATOR = '⋮----';
207
- 38:export const parseFile = async (fileContent: string, filePath: string, config: RepomixConfigMerged) =>
208
- 107:const getLanguageParserSingleton = async () =>
209
- # map: 812/5544 chars (~1386 tokens full, ~203 tokens map)
210
- ```
211
-
212
- Supported: TypeScript/JavaScript, Python, Go, Rust, Java/Kotlin, Ruby. Generic fallback for unknown extensions.
213
-
214
- ## Examples
215
-
216
- ### Simple: Explore a repo and read a file
217
-
218
- ```bash
219
- # What's in this repo?
220
- ghx explore plausible/analytics
221
-
222
- # Read the main config
223
- ghx read plausible/analytics config/runtime.exs
224
- ```
225
-
226
- ### Advanced: Research a codebase you've never seen
227
-
228
- ```bash
229
- # 1. Explore structure
230
- ghx explore yamadashy/repomix
231
-
232
- # 2. Map the core module — understand what exists (92% fewer tokens)
233
- ghx read yamadashy/repomix --map src/core/output/outputGenerate.ts src/core/file/fileProcess.ts src/core/treeSitter/parseFile.ts
234
-
235
- # 3. Found interesting function in map output — grep for usage details
236
- ghx read yamadashy/repomix --grep "processFiles" src/core/file/fileProcess.ts
237
-
238
- # 4. Search across the whole repo for a pattern
239
- ghx search "CHUNK_SEPARATOR repo:yamadashy/repomix"
240
- # → stderr: "3 results (showing 3)"
241
- # → stdout: yamadashy/repomix src/core/output/outputGenerate.ts: const CHUNK_SEPARATOR = '⋮----';
242
-
243
- # 5. Read specific lines of a file you've narrowed down
244
- ghx read yamadashy/repomix --lines 38-65 src/core/treeSitter/parseFile.ts
245
-
246
- # 6. If you need the full picture of a subdirectory, escalate:
247
- # gitingest https://github.com/yamadashy/repomix/tree/main/src/core -i "*.ts" -o - 2>/dev/null
248
- ```
249
-
250
- ## Complementary Tools
251
-
252
- | Goal | Tool | Why |
253
- |------|------|-----|
254
- | Surgical exploration | `ghx` | Batched API calls, zero overhead, targeted extraction |
255
- | Holistic understanding | `gitingest` / `repomix --compress` | Dump entire module for broad reasoning |
256
- | PRs, issues, CI | `gh pr view`, `gh issue view`, `gh pr checks` | Purpose-built commands |
257
-
258
- ## ghx vs gh: When to Use What
259
-
260
- **ghx is a complement to gh, not a replacement.** Use ghx for code exploration. Use gh for everything else.
261
-
262
- ### Use ghx (code exploration)
263
-
264
- | Task | Command | Why ghx wins |
265
- |------|---------|-------------|
266
- | Code search | `ghx search "query"` | AND matching (gh uses exact phrase), matching context, 37x token reduction on minified files, result count + warnings on stderr |
267
- | Repo search | `ghx repos "query"` | 1 GraphQL call gets name + stars + language + README preview. gh needs 1+N calls for same info, returns worse ranking, no README |
268
- | Repo overview | `ghx explore owner/repo` | 1 GraphQL call gets description + tree + README (gh needs 3 calls) |
269
- | Read multiple files | `ghx read owner/repo f1 f2 f3` | 1 GraphQL call for N files (gh needs N calls, returns base64) |
270
- | Targeted extraction | `ghx read --grep "pat" f` | Built-in grep with context lines — no shell piping |
271
- | Code map | `ghx read --map f1 f2` | ~92% token reduction, no gh equivalent |
272
-
273
- ### Use gh (everything else)
274
-
275
- | Task | Command | Why gh wins |
276
- |------|---------|-------------|
277
- | Issues | `gh issue list/view -R owner/repo` | ghx doesn't touch issues |
278
- | Pull requests | `gh pr list/view/diff/checks -R owner/repo` | ghx doesn't touch PRs |
279
- | Releases | `gh release list -R owner/repo` | ghx doesn't touch releases |
280
- | Repo metadata | `gh repo view owner/repo --json stargazerCount,forkCount` | Detailed stats beyond what ghx repos shows |
281
- | Auth | `gh auth login/status` | ghx depends on gh for auth |
282
- | Create/update | `gh issue create`, `gh pr create` | ghx is read-only |
283
-
284
- ### Rate limits (from GitHub docs)
285
-
286
- | Endpoint | Limit | Used by |
287
- |---|---|---|
288
- | Core REST | 5,000/hour | gh commands, ghx tree |
289
- | GraphQL | 5,000/hour | ghx explore, ghx read |
290
- | Search (repos, issues) | 30/min | `gh search repos/issues` |
291
- | Code search | 10/min (budget 9) | `ghx search`, `gh search code` |
292
-
293
- Code search is 50x more restricted than core REST. This is why "refine don't paginate" matters for search but not for explore/read.
294
-
295
- ## `gh` CLI Quick Reference
296
-
297
- ```bash
298
- # Repos
299
- gh search repos "<query>" -L 10 --json fullName,description,stargazersCount
300
- gh repo view owner/repo --json defaultBranchRef --jq '.defaultBranchRef.name'
301
-
302
- # PRs
303
- gh pr view 123 -R owner/repo # Title, body, status
304
- gh pr diff 123 -R owner/repo # Full diff
305
- gh pr checks 123 -R owner/repo # CI status
306
-
307
- # Issues
308
- gh issue view 456 -R owner/repo
309
- gh issue list -R owner/repo -S "query" -L 20
310
-
311
- # Raw API (always use the raw header for files)
312
- gh api repos/owner/repo/contents/path -H "Accept: application/vnd.github.raw+json"
313
- gh api repos/owner/repo/git/trees/main --jq '.tree[].path' # List structure
314
- ```
package/ghx DELETED
@@ -1,385 +0,0 @@
1
- #!/usr/bin/env bash
2
- # ghx — GitHub code exploration for agents and humans
3
- # Efficient GitHub repo exploration via GraphQL batching, code maps, and targeted extraction.
4
- # One command does what takes 3-5 API calls with any other tool.
5
-
6
- set -euo pipefail
7
-
8
- VERSION="0.2.1"
9
-
10
- cmd="${1:-help}"
11
- shift || true
12
-
13
- case "$cmd" in
14
-
15
- # ─── explore: branch + tree + README in one GraphQL call ───
16
- explore)
17
- repo="$1"; shift || true
18
- path="${1:-}"
19
- owner="${repo%%/*}"
20
- name="${repo##*/}"
21
-
22
- # First get default branch
23
- branch=$(gh repo view "$repo" --json defaultBranchRef --jq '.defaultBranchRef.name')
24
-
25
- if [[ -z "$path" ]]; then
26
- # Root exploration: tree + README
27
- gh api graphql -f query="
28
- {
29
- repository(owner: \"$owner\", name: \"$name\") {
30
- description
31
- tree: object(expression: \"$branch:\") {
32
- ... on Tree { entries { name type } }
33
- }
34
- readme: object(expression: \"$branch:README.md\") {
35
- ... on Blob { text }
36
- }
37
- }
38
- }" --jq '{
39
- description: .data.repository.description,
40
- branch: "'"$branch"'",
41
- files: [.data.repository.tree.entries[] | "\(if .type == "tree" then .name + "/" else .name end)"],
42
- readme: (.data.repository.readme.text // "(no README.md)")
43
- }'
44
- else
45
- # Subdir exploration: tree listing
46
- gh api graphql -f query="
47
- {
48
- repository(owner: \"$owner\", name: \"$name\") {
49
- tree: object(expression: \"$branch:$path\") {
50
- ... on Tree { entries { name type } }
51
- }
52
- }
53
- }" --jq '{
54
- path: "'"$path"'",
55
- entries: [.data.repository.tree.entries[] | "\(if .type == "tree" then .name + "/" else .name end)"]
56
- }'
57
- fi
58
- ;;
59
-
60
- # ─── read: read 1-10 files in one GraphQL call ───
61
- # Supports --grep "pattern" to filter output to matching lines (with 2 lines context)
62
- # Supports --lines N-M to extract specific line ranges
63
- read)
64
- repo="$1"; shift
65
- owner="${repo%%/*}"
66
- name="${repo##*/}"
67
-
68
- # Parse flags (must come before file paths)
69
- grep_pattern=""
70
- line_range=""
71
- map_mode=""
72
- files=()
73
- while [[ $# -gt 0 ]]; do
74
- case "$1" in
75
- --grep) grep_pattern="$2"; shift 2 ;;
76
- --lines) line_range="$2"; shift 2 ;;
77
- --map) map_mode=1; shift ;;
78
- --*) echo "ghx read: unknown flag '$1'" >&2; exit 2 ;;
79
- *) files+=("$1"); shift ;;
80
- esac
81
- done
82
-
83
- if [[ ${#files[@]} -eq 0 ]]; then
84
- echo "Usage: ghx read <owner/repo> <path1> [path2] [--grep pattern] [--lines N-M]" >&2
85
- exit 2
86
- fi
87
-
88
- # Get default branch
89
- branch=$(gh repo view "$repo" --json defaultBranchRef --jq '.defaultBranchRef.name')
90
-
91
- # Build GraphQL query with aliases
92
- query="{ repository(owner: \"$owner\", name: \"$name\") {"
93
- for i in "${!files[@]}"; do
94
- query+=" f$i: object(expression: \"$branch:${files[$i]}\") { ... on Blob { text byteSize } }"
95
- done
96
- query+=" } }"
97
-
98
- # Execute and format output
99
- result=$(gh api graphql -f query="$query")
100
-
101
- for i in "${!files[@]}"; do
102
- text=$(echo "$result" | jq -r ".data.repository.f$i.text // empty")
103
- size=$(echo "$result" | jq -r ".data.repository.f$i.byteSize // empty")
104
- if [[ -n "$text" ]]; then
105
- echo "=== ${files[$i]} ($size bytes) ==="
106
- if [[ -n "$grep_pattern" ]]; then
107
- echo "$text" | grep -n -i -C2 "$grep_pattern" || echo "(no matches for '$grep_pattern')"
108
- elif [[ -n "$line_range" ]]; then
109
- start="${line_range%-*}"
110
- end="${line_range#*-}"
111
- echo "$text" | sed -n "${start},${end}p"
112
- elif [[ -n "$map_mode" ]]; then
113
- ext="${files[$i]##*.}"
114
- case "$ext" in
115
- ts|tsx|js|jsx|mjs) pat='^(import |export |const |let |var |function |class |interface |type |enum )' ;;
116
- py) pat='^(import |from |class |def | def | def |@)' ;;
117
- go) pat='^(package |import |func |type |var |const )' ;;
118
- rs) pat='^(use |pub |fn |struct |enum |trait |impl |type |mod |const )' ;;
119
- java|kt) pat='^(import |public |private |protected |class |interface |enum |@)' ;;
120
- rb) pat='^(require |class |module |def | def | def )' ;;
121
- *) pat='^(import |export |function |class |def |func |type |const |pub )' ;;
122
- esac
123
- echo "$text" | grep -nE "$pat" || echo "(no signatures detected)"
124
- chars=$(echo "$text" | wc -c | tr -d ' ')
125
- map_chars=$(echo "$text" | grep -E "$pat" | wc -c | tr -d ' ')
126
- echo "# map: ${map_chars}/${chars} chars (~$(( chars / 4 )) tokens full, ~$(( map_chars / 4 )) tokens map)"
127
- else
128
- echo "$text"
129
- fi
130
- echo ""
131
- else
132
- echo "=== ${files[$i]} (not found) ==="
133
- echo ""
134
- fi
135
- done
136
- ;;
137
-
138
- # ─── search: code search via REST API ───
139
- search)
140
- # Parse flags
141
- full_mode=""
142
- limit=""
143
- args=()
144
- while [[ $# -gt 0 ]]; do
145
- case "$1" in
146
- --full) full_mode=1; shift ;;
147
- --limit) [[ $# -lt 2 ]] && { echo "ghx search: --limit requires a value" >&2; exit 2; }
148
- limit="$2"; shift 2 ;;
149
- --*) echo "ghx search: unknown flag '$1'" >&2; exit 2 ;;
150
- *) args+=("$1"); shift ;;
151
- esac
152
- done
153
- query="${args[*]}"
154
- if [[ -z "$query" ]]; then
155
- echo "Usage: ghx search <query> [--full] [--limit N]" >&2
156
- echo "Examples: 'useState repo:owner/repo' | 'path:src/ extension:tsx language:typescript'" >&2
157
- exit 2
158
- fi
159
- # Validate and clamp limit (API max: 100)
160
- if [[ -n "$limit" ]]; then
161
- [[ "$limit" =~ ^[0-9]+$ ]] || { echo "ghx search: --limit must be a number, got '$limit'" >&2; exit 2; }
162
- (( limit > 100 )) && { echo "⚠ --limit clamped to 100 (API max)" >&2; limit=100; }
163
- fi
164
-
165
- # Prerequisite checks
166
- if ! command -v gh &>/dev/null; then
167
- echo "ghx requires the GitHub CLI (gh). Install: https://cli.github.com" >&2
168
- exit 2
169
- fi
170
- if ! gh auth status &>/dev/null; then
171
- echo "GitHub code search requires authentication. Run: gh auth login" >&2
172
- exit 2
173
- fi
174
-
175
- # Warn on web-only qualifiers (they silently become literal text in REST API)
176
- for pat in 'symbol:' 'content:' 'is:' ' OR ' ' NOT ' '/.*/' 'enterprise:'; do
177
- if [[ "$query" == *$pat* ]]; then
178
- echo "⚠ '$pat' is web-only (Blackbird) — REST API treats it as literal text" >&2
179
- fi
180
- done
181
-
182
- # Search with text_matches for matching context
183
- per_page="${limit:-30}"
184
- response=$(gh api /search/code --method GET \
185
- -H "Accept: application/vnd.github.text-match+json" \
186
- -f q="$query" -f per_page="$per_page" 2>&1) || {
187
- echo "$response" >&2
188
- exit 1
189
- }
190
-
191
- # Result count + incomplete warning to stderr
192
- total=$(echo "$response" | jq -r '.total_count')
193
- incomplete=$(echo "$response" | jq -r '.incomplete_results')
194
- count=$(echo "$response" | jq '.items | length')
195
- echo "$total results (showing $count)" >&2
196
- [[ "$incomplete" == "true" ]] && echo "⚠ Results may be incomplete (query timed out)" >&2
197
- [[ "$total" -gt 1000 ]] 2>/dev/null && echo "⚠ Query too broad — add repo:, language:, or path: to narrow" >&2
198
- [[ "$count" -eq 0 ]] && exit 1
199
-
200
- # Output: repo path: matching context from fragment
201
- # Default: 200 char window centered on the match (prevents minified JS token explosion)
202
- # Uses matches[0].indices to find the actual match position in the fragment
203
- if [[ -n "$full_mode" ]]; then
204
- echo "$response" | jq -r '
205
- .items[] |
206
- .repository.full_name as $repo |
207
- .path as $path |
208
- (.text_matches // []) as $tm |
209
- if ($tm | length) > 0 then
210
- ($tm[0].fragment | gsub("\\n"; " ") | gsub("\\s+"; " ") |
211
- gsub("^\\s+"; "") | gsub("\\s+$"; "")) as $line |
212
- "\($repo) \($path): \($line)"
213
- else
214
- "\($repo) \($path)"
215
- end'
216
- else
217
- truncated=$(echo "$response" | jq '[.items[] |
218
- (.text_matches // []) as $tm |
219
- if ($tm | length) > 0 then
220
- ($tm[0].fragment | length) > 200
221
- else false end] | any')
222
- [[ "$truncated" == "true" ]] && echo "⚠ Lines truncated to 200 chars (use --full for complete fragments)" >&2
223
- echo "$response" | jq -r '
224
- .items[] |
225
- .repository.full_name as $repo |
226
- .path as $path |
227
- (.text_matches // []) as $tm |
228
- if ($tm | length) > 0 then
229
- $tm[0] as $m |
230
- ($m.fragment | gsub("\\n"; " ") | gsub("\\s+"; " ")) as $flat |
231
- (if ($m.matches | length) > 0 then
232
- $m.matches[0].indices[0] as $start |
233
- ([$start - 80, 0] | max) as $from |
234
- ($from + 200) as $to |
235
- (if $from > 0 then "…" else "" end) as $prefix |
236
- (if $to < ($flat | length) then "…" else "" end) as $suffix |
237
- $prefix + ($flat[$from:$to] | gsub("^\\s+"; "") | gsub("\\s+$"; "")) + $suffix
238
- else
239
- $flat[:200]
240
- end) as $line |
241
- "\($repo) \($path): \($line)"
242
- else
243
- "\($repo) \($path)"
244
- end'
245
- fi
246
- ;;
247
-
248
- # ─── repos: search repos with README preview in 1 GraphQL call ───
249
- repos)
250
- limit=""
251
- args=()
252
- while [[ $# -gt 0 ]]; do
253
- case "$1" in
254
- --limit) [[ $# -lt 2 ]] && { echo "ghx repos: --limit requires a value" >&2; exit 2; }
255
- limit="$2"; shift 2 ;;
256
- --*) echo "ghx repos: unknown flag '$1'" >&2; exit 2 ;;
257
- *) args+=("$1"); shift ;;
258
- esac
259
- done
260
- query="${args[*]}"
261
- if [[ -z "$query" ]]; then
262
- echo "Usage: ghx repos <query> [--limit N]" >&2
263
- exit 2
264
- fi
265
- # Validate and clamp limit (GraphQL max: 100, but README fetching makes >20 expensive)
266
- first="${limit:-10}"
267
- [[ "$first" =~ ^[0-9]+$ ]] || { echo "ghx repos: --limit must be a number, got '$first'" >&2; exit 2; }
268
- (( first > 20 )) && { echo "⚠ --limit clamped to 20 (README fetching makes larger values slow)" >&2; first=20; }
269
-
270
- response=$(gh api graphql -f query='
271
- {
272
- search(query: "'"$query"'", type: REPOSITORY, first: '"$first"') {
273
- repositoryCount
274
- nodes {
275
- ... on Repository {
276
- nameWithOwner
277
- description
278
- stargazerCount
279
- primaryLanguage { name }
280
- object(expression: "HEAD:README.md") {
281
- ... on Blob { text }
282
- }
283
- }
284
- }
285
- }
286
- }')
287
-
288
- echo "$response" | jq -r '.data.search.repositoryCount | "\(.) repos found"' >&2
289
- count=$(echo "$response" | jq '.data.search.nodes | length')
290
- [[ "$count" -eq 0 ]] && exit 1
291
- echo "$response" | jq -r '
292
- .data.search.nodes[] |
293
- .nameWithOwner as $name |
294
- (.stargazerCount | tostring) as $stars |
295
- ((.primaryLanguage.name // "?")) as $lang |
296
- (.description // "") as $desc |
297
- ((.object.text // "") | gsub("\\n"; " ") | gsub("\\s+"; " ") | gsub("^\\s+"; "") |
298
- gsub("\\[!\\[[^]]*\\]\\([^)]*\\)\\]\\([^)]*\\)"; "") |
299
- gsub("!\\[[^]]*\\]\\([^)]*\\)"; "") |
300
- gsub("\\[!\\[[^]]*\\]\\([^)]*\\)\\]"; "") |
301
- gsub("\\[![^]]*\\]\\([^)]*\\)"; "") |
302
- gsub("<img[^>]*>"; "") |
303
- gsub("<div[^>]*>"; "") | gsub("</div>"; "") |
304
- gsub("<br[^>]*/?>"; "") |
305
- gsub("<p[^>]*>"; "") | gsub("</p>"; "") |
306
- gsub("<a[^>]*>"; "") | gsub("</a>"; "") |
307
- gsub("\\s+"; " ") | gsub("^\\s+"; "") |
308
- .[:300]) as $readme |
309
- "\($name) (\($stars)★ \($lang)) \($desc)" +
310
- (if ($readme | length) > 0 then "\n " + $readme + (if ($readme | length) >= 300 then "…" else "" end) else "" end)
311
- '
312
- ;;
313
-
314
- # ─── skill: output SKILL.md for agent context injection ───
315
- skill)
316
- script_dir="$(dirname "$(readlink -f "$0")")"
317
- skill_file="$script_dir/SKILL.md"
318
- if [[ -f "$skill_file" ]]; then
319
- cat "$skill_file"
320
- else
321
- echo "SKILL.md not found at $skill_file" >&2
322
- exit 1
323
- fi
324
- ;;
325
-
326
- # ─── tree: recursive tree listing via REST API ───
327
- tree)
328
- repo="$1"; shift || true
329
- path="${1:-}"
330
- owner="${repo%%/*}"
331
- name="${repo##*/}"
332
-
333
- branch=$(gh repo view "$repo" --json defaultBranchRef --jq '.defaultBranchRef.name')
334
-
335
- gh api "repos/$owner/$name/git/trees/$branch?recursive=1" --jq '
336
- [.tree[] | select(.type == "blob") | .path] |
337
- if "'"$path"'" != "" then
338
- [.[] | select(startswith("'"$path"'/"))] |
339
- map(ltrimstr("'"$path"'/"))
340
- else . end |
341
- .[]
342
- '
343
- ;;
344
-
345
- # ─── help ───
346
- version|-v|--version)
347
- echo "ghx $VERSION"
348
- ;;
349
-
350
- help|*)
351
- cat <<'EOF'
352
- ghx — GitHub code exploration for agents and humans
353
-
354
- Commands:
355
- ghx explore <owner/repo> [path] Branch + tree + README in 1 API call
356
- ghx read <owner/repo> <f1> [f2..] Read 1-10 files in 1 API call
357
- ghx repos <query> [--limit N] Search repos with README preview (default: 10)
358
- ghx search "<query>" [--limit N] Code search (AND matching, default: 30)
359
- ghx search --full "<query>" Code search without line truncation
360
- ghx skill Output SKILL.md (for agent context injection)
361
- ghx tree <owner/repo> [path] Full recursive tree listing
362
-
363
- Read flags:
364
- --grep <pattern> Filter output to matching lines (case-insensitive, 2 lines context)
365
- --lines <N-M> Extract specific line range
366
- --map Structural signatures only (~92% token reduction)
367
-
368
- Exit codes:
369
- 0 Success with results
370
- 1 No results (query valid, nothing matched)
371
- 2 Usage error (bad flags, missing args)
372
-
373
- Examples:
374
- ghx explore plausible/analytics
375
- ghx read plausible/analytics mix.exs assets/js/dashboard/stats/bar.js
376
- ghx read plausible/analytics src/app.ts --grep "useState"
377
- ghx read plausible/analytics src/app.ts --lines 42-80
378
- ghx repos "react state management"
379
- ghx search "useState repo:plausible/analytics"
380
- ghx search "filename:package.json repo:plausible/analytics"
381
- ghx tree plausible/analytics assets/js
382
- EOF
383
- ;;
384
-
385
- esac