codemode-lsp 0.2.1 → 0.3.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +70 -56
- package/dist/index.js +168900 -31
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -1,12 +1,10 @@
|
|
|
1
1
|
# codemode-lsp
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
|
|
8
|
-
filtering, looping, and branching in natural language across many round-trips,
|
|
9
|
-
the model writes one script:
|
|
3
|
+
Semantic code intelligence and refactoring for TypeScript/JavaScript agents:
|
|
4
|
+
an MCP server exposing a **single `execute` tool** backed by the language
|
|
5
|
+
server. The LLM writes one JavaScript script that chains semantic operations
|
|
6
|
+
(`lsp.*`) — filtering, looping, and aggregating in code instead of across many
|
|
7
|
+
tool-call round trips:
|
|
10
8
|
|
|
11
9
|
```javascript
|
|
12
10
|
const refs = await lsp.findReferences("src/api.ts", "handleRequest");
|
|
@@ -17,20 +15,26 @@ for (const ref of relevant) {
|
|
|
17
15
|
({ modified: relevant.length });
|
|
18
16
|
```
|
|
19
17
|
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
18
|
+
It shines on **codebase-wide** work that grep-based agents reconstruct by hand
|
|
19
|
+
and get subtly wrong: impact analysis ("what breaks if I change X"), call
|
|
20
|
+
graphs, usage audits, refactor planning, and atomic multi-file edits. Names
|
|
21
|
+
are *resolved*, not text-matched, and a whole analysis runs in one round trip.
|
|
22
|
+
For a single-file lookup, an agent's plain file reads are cheaper — this is
|
|
23
|
+
the tool to reach for when the answer spans files.
|
|
24
|
+
|
|
25
|
+
Writes are transactional: nothing hits disk unless the whole script succeeds.
|
|
26
|
+
If it throws, every buffered write rolls back and the tool returns an
|
|
27
|
+
operation trace showing exactly where the script died. Successful writes come
|
|
28
|
+
back as reviewable unified diffs.
|
|
24
29
|
|
|
25
|
-
v1 targets TypeScript
|
|
30
|
+
v1 targets TypeScript/JavaScript via `typescript-language-server`.
|
|
26
31
|
|
|
27
32
|
## Setup
|
|
28
33
|
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
directory):
|
|
34
|
+
No install step — the package bundles its own `typescript-language-server`,
|
|
35
|
+
so `npx`/`bunx` is all it takes. The workspace root is the directory the
|
|
36
|
+
server is spawned from (Claude Code spawns project-level `.mcp.json` servers
|
|
37
|
+
from the project directory):
|
|
34
38
|
|
|
35
39
|
```json
|
|
36
40
|
{
|
|
@@ -45,40 +49,47 @@ directory):
|
|
|
45
49
|
|
|
46
50
|
(`"command": "bunx"` with `"args": ["codemode-lsp"]` works too.)
|
|
47
51
|
|
|
48
|
-
To try it on a repo it physically cannot write to
|
|
52
|
+
To try it on a repo it physically cannot write to:
|
|
49
53
|
|
|
50
54
|
```json
|
|
51
55
|
"env": { "CODEMODE_READONLY": "1" }
|
|
52
56
|
```
|
|
53
57
|
|
|
54
58
|
Running from a clone instead: requires [Bun](https://bun.sh) — `bun install`,
|
|
55
|
-
then
|
|
59
|
+
then `"command": "bun"`, `"args": ["run", "/absolute/path/to/codemode-lsp/src/index.ts"]`.
|
|
56
60
|
|
|
57
61
|
## The `execute` tool
|
|
58
62
|
|
|
59
63
|
Accepts JavaScript, runs it in a `vm` sandbox where `lsp.*` is available, and
|
|
60
64
|
returns `{ result, logs, changes }`:
|
|
61
65
|
|
|
62
|
-
- **result** —
|
|
63
|
-
|
|
64
|
-
- **logs** — captured `console.log/warn/error` output.
|
|
66
|
+
- **result** — the script's last expression (JSON-serialized, capped at 50k chars)
|
|
67
|
+
- **logs** — captured `console.log/warn/error` output
|
|
65
68
|
- **changes** — every file that hit disk, as `{ file, kind, diff }` with a
|
|
66
|
-
unified diff against the pre-script content
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
resolved across modules
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
69
|
+
unified diff against the pre-script content; empty for read-only scripts
|
|
70
|
+
|
|
71
|
+
**Read ops (11):**
|
|
72
|
+
|
|
73
|
+
| | |
|
|
74
|
+
| --- | --- |
|
|
75
|
+
| `readFile`, `getSymbols`, `getSymbolBody` | file contents, outline, one symbol's source |
|
|
76
|
+
| `findSymbol`, `findReferences`, `goToDefinition` | workspace symbol search, all references, definition |
|
|
77
|
+
| `incomingCalls`, `outgoingCalls` | call hierarchy — true calls only, attributed to the enclosing function, resolved across modules (`findReferences` mixes calls with imports and re-exports) |
|
|
78
|
+
| `getDependencies` | what a symbol's body needs from outside itself: used imports (module + type-only flag) and same-file helpers — makes moving a symbol a computation, not an eyeballing exercise |
|
|
79
|
+
| `searchText`, `listFiles` | regex search and glob listing (`.gitignore`-aware) |
|
|
80
|
+
|
|
81
|
+
**Write ops (7):** `renameSymbol`, `replaceSymbolBody`, `insertBeforeSymbol`,
|
|
82
|
+
`insertAfterSymbol`, `deleteSymbol`, `writeFile`, `deleteFile` — all buffered,
|
|
83
|
+
flushed atomically, each returning fresh diagnostics for the files it touched.
|
|
84
|
+
|
|
85
|
+
Plus `getDiagnostics` for type errors on session-touched files.
|
|
86
|
+
|
|
87
|
+
Symbols are addressed by slash-separated paths (`MyClass/myMethod`) discovered
|
|
88
|
+
via `getSymbols`. The full type definitions are embedded in the tool
|
|
89
|
+
description, generated straight from the source (`bun run generate:types`).
|
|
90
|
+
If a client truncates the description, the script `await lsp.help()` returns
|
|
91
|
+
the complete reference — the description's first lines advertise this, so an
|
|
92
|
+
agent can always recover the full API without probing.
|
|
82
93
|
|
|
83
94
|
See `PRD.md` for the complete spec.
|
|
84
95
|
|
|
@@ -89,47 +100,50 @@ No config file. Three environment variables:
|
|
|
89
100
|
| Variable | Default | Effect |
|
|
90
101
|
| --- | --- | --- |
|
|
91
102
|
| `CODEMODE_TIMEOUT_MS` | `30000` | Script timeout |
|
|
92
|
-
| `CODEMODE_LSP_BIN` | `typescript-language-server` | Language server command |
|
|
103
|
+
| `CODEMODE_LSP_BIN` | bundled `typescript-language-server` | Language server command |
|
|
93
104
|
| `CODEMODE_READONLY` | unset | `1`/`true` removes the 7 write ops from the sandbox, the type defs, and the tool description |
|
|
94
105
|
|
|
95
106
|
Workspace root = the server's cwd. Paths resolving outside it are rejected,
|
|
96
107
|
reads and writes alike.
|
|
97
108
|
|
|
98
|
-
## Limitations
|
|
109
|
+
## Limitations
|
|
99
110
|
|
|
100
|
-
- TypeScript only (the architecture is language-agnostic; more
|
|
111
|
+
- TypeScript/JavaScript only (the architecture is language-agnostic; more
|
|
112
|
+
servers later).
|
|
101
113
|
- Diagnostics cover files touched in the session, not the whole project —
|
|
102
114
|
`tsserver` only publishes for opened files.
|
|
103
115
|
- A synchronous infinite loop (`while (true) {}`) is not interrupted by the
|
|
104
116
|
script timeout; async work is.
|
|
105
117
|
- `Reference.isWriteAccess` is always `false` (not exposed over standard LSP).
|
|
118
|
+
- `getDependencies` is syntactic — a local variable shadowing an import can
|
|
119
|
+
produce a false positive.
|
|
106
120
|
|
|
107
121
|
## Eval
|
|
108
122
|
|
|
109
|
-
`bun run eval` measures the project's core success criterion: can an LLM,
|
|
110
|
-
only the tool description, write correct scripts? It runs
|
|
111
|
-
(exploration, reference-finding, diagnostics, renames, multi-file
|
|
112
|
-
against a throwaway copy of the fixture project. The agent is
|
|
113
|
-
Code (`claude -p`, billed to your Claude subscription — no API
|
|
114
|
-
built-in tool disabled, so the only way to solve a task is the
|
|
115
|
-
|
|
116
|
-
`--model
|
|
117
|
-
|
|
118
|
-
tasks on the resulting disk state.
|
|
123
|
+
`bun run eval` measures the project's core success criterion: can an LLM,
|
|
124
|
+
given only the tool description, write correct scripts? It runs 16 benchmark
|
|
125
|
+
tasks (exploration, reference-finding, diagnostics, renames, multi-file
|
|
126
|
+
refactors) against a throwaway copy of the fixture project. The agent is
|
|
127
|
+
headless Claude Code (`claude -p`, billed to your Claude subscription — no API
|
|
128
|
+
key) with every built-in tool disabled, so the only way to solve a task is the
|
|
129
|
+
`execute` tool. Sonnet by default (pinned so pass rates are comparable across
|
|
130
|
+
runs); override with `--model`. Grading is deterministic: read tasks are
|
|
131
|
+
scored on the final answer, write tasks on the resulting disk state.
|
|
119
132
|
|
|
120
133
|
Each task ships with a reference solution that runs against the real server as
|
|
121
|
-
part of `bun test`, so the benchmark itself can never rot. The eval
|
|
134
|
+
part of `bun test`, so the benchmark itself can never rot. The eval runs on
|
|
122
135
|
demand, never in CI.
|
|
123
136
|
|
|
124
|
-
|
|
137
|
+
Last measured: **15/15 (100%)** — headless Claude Code (Fable 5), 2026-06-10,
|
|
138
|
+
before the 16th task was added.
|
|
125
139
|
|
|
126
140
|
## Development
|
|
127
141
|
|
|
128
142
|
```sh
|
|
129
|
-
bun run check
|
|
130
|
-
bun test
|
|
143
|
+
bun run check # typecheck + lint + all tests — run before declaring done
|
|
144
|
+
bun test # all tests (integration tests run the real language server)
|
|
131
145
|
bun run generate:types # regenerate src/lsp-types.generated.ts after API changes
|
|
132
|
-
bun run eval
|
|
146
|
+
bun run eval # LLM benchmark (on demand; needs the claude CLI)
|
|
133
147
|
```
|
|
134
148
|
|
|
135
149
|
The worked examples in the tool description run verbatim as golden tests
|