codemode-lsp 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +70 -56
  2. package/dist/index.js +168900 -31
  3. package/package.json +2 -2
package/README.md CHANGED
@@ -1,12 +1,10 @@
1
1
  # codemode-lsp
2
2
 
3
- An MCP server exposing a **single `execute` tool** backed by LSP. The LLM
4
- writes JavaScript that chains semantic code operations (`lsp.*`), executed in a
5
- sandbox with transactional write semantics.
6
-
7
- LLMs are better at writing code than orchestrating tool calls. Instead of
8
- filtering, looping, and branching in natural language across many round-trips,
9
- the model writes one script:
3
+ Semantic code intelligence and refactoring for TypeScript/JavaScript agents:
4
+ an MCP server exposing a **single `execute` tool** backed by the language
5
+ server. The LLM writes one JavaScript script that chains semantic operations
6
+ (`lsp.*`) — filtering, looping, and aggregating in code instead of across many
7
+ tool-call round trips:
10
8
 
11
9
  ```javascript
12
10
  const refs = await lsp.findReferences("src/api.ts", "handleRequest");
@@ -17,20 +15,26 @@ for (const ref of relevant) {
17
15
  ({ modified: relevant.length });
18
16
  ```
19
17
 
20
- One round-trip. Nothing hits disk unless the whole script succeeds; if it
21
- throws, every buffered write rolls back and the tool returns an operation trace
22
- showing exactly where the script died. Successful writes come back as
23
- reviewable unified diffs.
18
+ It shines on **codebase-wide** work that grep-based agents reconstruct by hand
19
+ and get subtly wrong: impact analysis ("what breaks if I change X"), call
20
+ graphs, usage audits, refactor planning, and atomic multi-file edits. Names
21
+ are *resolved*, not text-matched, and a whole analysis runs in one round trip.
22
+ For a single-file lookup, an agent's plain file reads are cheaper — this is
23
+ the tool to reach for when the answer spans files.
24
+
25
+ Writes are transactional: nothing hits disk unless the whole script succeeds.
26
+ If it throws, every buffered write rolls back and the tool returns an
27
+ operation trace showing exactly where the script died. Successful writes come
28
+ back as reviewable unified diffs.
24
29
 
25
- v1 targets TypeScript projects via `typescript-language-server`.
30
+ v1 targets TypeScript/JavaScript via `typescript-language-server`.
26
31
 
27
32
  ## Setup
28
33
 
29
- Add the server to your MCP client no install step needed. The package bundles
30
- its own `typescript-language-server`, so `npx`/`bunx` is all it takes. The
31
- workspace root is the directory the server is spawned from, so for a
32
- project-level `.mcp.json` (Claude Code spawns servers from the project
33
- directory):
34
+ No install stepthe package bundles its own `typescript-language-server`,
35
+ so `npx`/`bunx` is all it takes. The workspace root is the directory the
36
+ server is spawned from (Claude Code spawns project-level `.mcp.json` servers
37
+ from the project directory):
34
38
 
35
39
  ```json
36
40
  {
@@ -45,40 +49,47 @@ directory):
45
49
 
46
50
  (`"command": "bunx"` with `"args": ["codemode-lsp"]` works too.)
47
51
 
48
- To try it on a repo it physically cannot write to, add:
52
+ To try it on a repo it physically cannot write to:
49
53
 
50
54
  ```json
51
55
  "env": { "CODEMODE_READONLY": "1" }
52
56
  ```
53
57
 
54
58
  Running from a clone instead: requires [Bun](https://bun.sh) — `bun install`,
55
- then use `"command": "bun"`, `"args": ["run", "/absolute/path/to/codemode-lsp/src/index.ts"]`.
59
+ then `"command": "bun"`, `"args": ["run", "/absolute/path/to/codemode-lsp/src/index.ts"]`.
56
60
 
57
61
  ## The `execute` tool
58
62
 
59
63
  Accepts JavaScript, runs it in a `vm` sandbox where `lsp.*` is available, and
60
64
  returns `{ result, logs, changes }`:
61
65
 
62
- - **result** — what the script's last expression evaluates to (JSON-serialized,
63
- capped at 50k chars).
64
- - **logs** — captured `console.log/warn/error` output.
66
+ - **result** — the script's last expression (JSON-serialized, capped at 50k chars)
67
+ - **logs** captured `console.log/warn/error` output
65
68
  - **changes** — every file that hit disk, as `{ file, kind, diff }` with a
66
- unified diff against the pre-script content. Empty for read-only scripts.
67
-
68
- The API surface is 18 functions plus `getDiagnostics`: 10 read ops (`readFile`,
69
- `getSymbolBody`, `getSymbols`, `findSymbol`, `findReferences`,
70
- `goToDefinition`, `incomingCalls`, `outgoingCalls`, `searchText`, `listFiles`)
71
- and 7 write ops (`renameSymbol`, `replaceSymbolBody`, `insertBeforeSymbol`,
72
- `insertAfterSymbol`, `deleteSymbol`, `writeFile`, `deleteFile`). The call
73
- hierarchy ops return only true calls attributed to the enclosing function,
74
- resolved across modules — where `findReferences` mixes calls with imports and
75
- re-exports. Symbols are addressed by
76
- slash-separated paths (`MyClass/myMethod`) discovered via `getSymbols`. The
77
- full type definitions are embedded in the tool description, generated straight
78
- from the source (`bun run generate:types`). If a client truncates the
79
- description, the script `await lsp.help()` returns the complete reference —
80
- the description's first lines advertise this, so an agent can always recover
81
- the full API without probing.
69
+ unified diff against the pre-script content; empty for read-only scripts
70
+
71
+ **Read ops (11):**
72
+
73
+ | | |
74
+ | --- | --- |
75
+ | `readFile`, `getSymbols`, `getSymbolBody` | file contents, outline, one symbol's source |
76
+ | `findSymbol`, `findReferences`, `goToDefinition` | workspace symbol search, all references, definition |
77
+ | `incomingCalls`, `outgoingCalls` | call hierarchy — true calls only, attributed to the enclosing function, resolved across modules (`findReferences` mixes calls with imports and re-exports) |
78
+ | `getDependencies` | what a symbol's body needs from outside itself: used imports (module + type-only flag) and same-file helpers — makes moving a symbol a computation, not an eyeballing exercise |
79
+ | `searchText`, `listFiles` | regex search and glob listing (`.gitignore`-aware) |
80
+
81
+ **Write ops (7):** `renameSymbol`, `replaceSymbolBody`, `insertBeforeSymbol`,
82
+ `insertAfterSymbol`, `deleteSymbol`, `writeFile`, `deleteFile` all buffered,
83
+ flushed atomically, each returning fresh diagnostics for the files it touched.
84
+
85
+ Plus `getDiagnostics` for type errors on session-touched files.
86
+
87
+ Symbols are addressed by slash-separated paths (`MyClass/myMethod`) discovered
88
+ via `getSymbols`. The full type definitions are embedded in the tool
89
+ description, generated straight from the source (`bun run generate:types`).
90
+ If a client truncates the description, the script `await lsp.help()` returns
91
+ the complete reference — the description's first lines advertise this, so an
92
+ agent can always recover the full API without probing.
82
93
 
83
94
  See `PRD.md` for the complete spec.
84
95
 
@@ -89,47 +100,50 @@ No config file. Three environment variables:
89
100
  | Variable | Default | Effect |
90
101
  | --- | --- | --- |
91
102
  | `CODEMODE_TIMEOUT_MS` | `30000` | Script timeout |
92
- | `CODEMODE_LSP_BIN` | `typescript-language-server` | Language server command |
103
+ | `CODEMODE_LSP_BIN` | bundled `typescript-language-server` | Language server command |
93
104
  | `CODEMODE_READONLY` | unset | `1`/`true` removes the 7 write ops from the sandbox, the type defs, and the tool description |
94
105
 
95
106
  Workspace root = the server's cwd. Paths resolving outside it are rejected,
96
107
  reads and writes alike.
97
108
 
98
- ## Limitations (v1)
109
+ ## Limitations
99
110
 
100
- - TypeScript only (the architecture is language-agnostic; more servers later).
111
+ - TypeScript/JavaScript only (the architecture is language-agnostic; more
112
+ servers later).
101
113
  - Diagnostics cover files touched in the session, not the whole project —
102
114
  `tsserver` only publishes for opened files.
103
115
  - A synchronous infinite loop (`while (true) {}`) is not interrupted by the
104
116
  script timeout; async work is.
105
117
  - `Reference.isWriteAccess` is always `false` (not exposed over standard LSP).
118
+ - `getDependencies` is syntactic — a local variable shadowing an import can
119
+ produce a false positive.
106
120
 
107
121
  ## Eval
108
122
 
109
- `bun run eval` measures the project's core success criterion: can an LLM, given
110
- only the tool description, write correct scripts? It runs 15 benchmark tasks
111
- (exploration, reference-finding, diagnostics, renames, multi-file refactors)
112
- against a throwaway copy of the fixture project. The agent is headless Claude
113
- Code (`claude -p`, billed to your Claude subscription — no API key) with every
114
- built-in tool disabled, so the only way to solve a task is the `execute` tool.
115
- It runs on Sonnet by default (pinned so pass rates are comparable);
116
- `--model opus` etc. overrides.
117
- Grading is deterministic: read tasks are scored on the final answer, write
118
- tasks on the resulting disk state.
123
+ `bun run eval` measures the project's core success criterion: can an LLM,
124
+ given only the tool description, write correct scripts? It runs 16 benchmark
125
+ tasks (exploration, reference-finding, diagnostics, renames, multi-file
126
+ refactors) against a throwaway copy of the fixture project. The agent is
127
+ headless Claude Code (`claude -p`, billed to your Claude subscription — no API
128
+ key) with every built-in tool disabled, so the only way to solve a task is the
129
+ `execute` tool. Sonnet by default (pinned so pass rates are comparable across
130
+ runs); override with `--model`. Grading is deterministic: read tasks are
131
+ scored on the final answer, write tasks on the resulting disk state.
119
132
 
120
133
  Each task ships with a reference solution that runs against the real server as
121
- part of `bun test`, so the benchmark itself can never rot. The eval is run on
134
+ part of `bun test`, so the benchmark itself can never rot. The eval runs on
122
135
  demand, never in CI.
123
136
 
124
- Current pass rate: **15/15 (100%)** — headless Claude Code (Fable 5), 2026-06-10.
137
+ Last measured: **15/15 (100%)** — headless Claude Code (Fable 5), 2026-06-10,
138
+ before the 16th task was added.
125
139
 
126
140
  ## Development
127
141
 
128
142
  ```sh
129
- bun run check # typecheck + lint + all tests — run before declaring done
130
- bun test # all tests (integration tests run the real language server)
143
+ bun run check # typecheck + lint + all tests — run before declaring done
144
+ bun test # all tests (integration tests run the real language server)
131
145
  bun run generate:types # regenerate src/lsp-types.generated.ts after API changes
132
- bun run eval # LLM benchmark (on demand; needs the claude CLI)
146
+ bun run eval # LLM benchmark (on demand; needs the claude CLI)
133
147
  ```
134
148
 
135
149
  The worked examples in the tool description run verbatim as golden tests