npm - codemode-lsp - Versions diffs - 0.2.1 → 0.3.1 - Mend

codemode-lsp 0.2.1 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/README.md CHANGED Viewed

@@ -1,12 +1,10 @@
 # codemode-lsp
-An MCP server exposing a **single `execute` tool** backed by LSP. The LLM
-writes JavaScript that chains semantic code operations (`lsp.*`), executed in a
-sandbox with transactional write semantics.
-LLMs are better at writing code than orchestrating tool calls. Instead of
-filtering, looping, and branching in natural language across many round-trips,
-the model writes one script:
+Semantic code intelligence and refactoring for TypeScript/JavaScript agents:
+an MCP server exposing a **single `execute` tool** backed by the language
+server. The LLM writes one JavaScript script that chains semantic operations
+(`lsp.*`) — filtering, looping, and aggregating in code instead of across many
+tool-call round trips:
 ```javascript
 const refs = await lsp.findReferences("src/api.ts", "handleRequest");
@@ -17,20 +15,26 @@ for (const ref of relevant) {
 ({ modified: relevant.length });
 ```
-One round-trip. Nothing hits disk unless the whole script succeeds; if it
-throws, every buffered write rolls back and the tool returns an operation trace
-showing exactly where the script died. Successful writes come back as
-reviewable unified diffs.
+It shines on **codebase-wide** work that grep-based agents reconstruct by hand
+and get subtly wrong: impact analysis ("what breaks if I change X"), call
+graphs, usage audits, refactor planning, and atomic multi-file edits. Names
+are *resolved*, not text-matched, and a whole analysis runs in one round trip.
+For a single-file lookup, an agent's plain file reads are cheaper — this is
+the tool to reach for when the answer spans files.
+Writes are transactional: nothing hits disk unless the whole script succeeds.
+If it throws, every buffered write rolls back and the tool returns an
+operation trace showing exactly where the script died. Successful writes come
+back as reviewable unified diffs.
-v1 targets TypeScript projects via `typescript-language-server`.
+v1 targets TypeScript/JavaScript via `typescript-language-server`.
 ## Setup
-Add the server to your MCP client — no install step needed. The package bundles
-its own `typescript-language-server`, so `npx`/`bunx` is all it takes. The
-workspace root is the directory the server is spawned from, so for a
-project-level `.mcp.json` (Claude Code spawns servers from the project
-directory):
+No install step — the package bundles its own `typescript-language-server`,
+so `npx`/`bunx` is all it takes. The workspace root is the directory the
+server is spawned from (Claude Code spawns project-level `.mcp.json` servers
+from the project directory):
 ```json
 {
@@ -45,40 +49,47 @@ directory):
 (`"command": "bunx"` with `"args": ["codemode-lsp"]` works too.)
-To try it on a repo it physically cannot write to, add:
+To try it on a repo it physically cannot write to:
 ```json
       "env": { "CODEMODE_READONLY": "1" }
 ```
 Running from a clone instead: requires [Bun](https://bun.sh) — `bun install`,
-then use `"command": "bun"`, `"args": ["run", "/absolute/path/to/codemode-lsp/src/index.ts"]`.
+then `"command": "bun"`, `"args": ["run", "/absolute/path/to/codemode-lsp/src/index.ts"]`.
 ## The `execute` tool
 Accepts JavaScript, runs it in a `vm` sandbox where `lsp.*` is available, and
 returns `{ result, logs, changes }`:
-- **result** — what the script's last expression evaluates to (JSON-serialized,
-  capped at 50k chars).
-- **logs** — captured `console.log/warn/error` output.
+- **result** — the script's last expression (JSON-serialized, capped at 50k chars)
+- **logs** — captured `console.log/warn/error` output
 - **changes** — every file that hit disk, as `{ file, kind, diff }` with a
-  unified diff against the pre-script content. Empty for read-only scripts.
-The API surface is 18 functions plus `getDiagnostics`: 10 read ops (`readFile`,
-`getSymbolBody`, `getSymbols`, `findSymbol`, `findReferences`,
-`goToDefinition`, `incomingCalls`, `outgoingCalls`, `searchText`, `listFiles`)
-and 7 write ops (`renameSymbol`, `replaceSymbolBody`, `insertBeforeSymbol`,
-`insertAfterSymbol`, `deleteSymbol`, `writeFile`, `deleteFile`). The call
-hierarchy ops return only true calls — attributed to the enclosing function,
-resolved across modules — where `findReferences` mixes calls with imports and
-re-exports. Symbols are addressed by
-slash-separated paths (`MyClass/myMethod`) discovered via `getSymbols`. The
-full type definitions are embedded in the tool description, generated straight
-from the source (`bun run generate:types`). If a client truncates the
-description, the script `await lsp.help()` returns the complete reference —
-the description's first lines advertise this, so an agent can always recover
-the full API without probing.
+  unified diff against the pre-script content; empty for read-only scripts
+**Read ops (11):**
+| | |
+| --- | --- |
+| `readFile`, `getSymbols`, `getSymbolBody` | file contents, outline, one symbol's source |
+| `findSymbol`, `findReferences`, `goToDefinition` | workspace symbol search, all references, definition |
+| `incomingCalls`, `outgoingCalls` | call hierarchy — true calls only, attributed to the enclosing function, resolved across modules (`findReferences` mixes calls with imports and re-exports) |
+| `getDependencies` | what a symbol's body needs from outside itself: used imports (module + type-only flag) and same-file helpers — makes moving a symbol a computation, not an eyeballing exercise |
+| `searchText`, `listFiles` | regex search and glob listing (`.gitignore`-aware) |
+**Write ops (7):** `renameSymbol`, `replaceSymbolBody`, `insertBeforeSymbol`,
+`insertAfterSymbol`, `deleteSymbol`, `writeFile`, `deleteFile` — all buffered,
+flushed atomically, each returning fresh diagnostics for the files it touched.
+Plus `getDiagnostics` for type errors on session-touched files.
+Symbols are addressed by slash-separated paths (`MyClass/myMethod`) discovered
+via `getSymbols`. The full type definitions are embedded in the tool
+description, generated straight from the source (`bun run generate:types`).
+If a client truncates the description, the script `await lsp.help()` returns
+the complete reference — the description's first lines advertise this, so an
+agent can always recover the full API without probing.
 See `PRD.md` for the complete spec.
@@ -89,47 +100,50 @@ No config file. Three environment variables:
 | Variable | Default | Effect |
 | --- | --- | --- |
 | `CODEMODE_TIMEOUT_MS` | `30000` | Script timeout |
-| `CODEMODE_LSP_BIN` | `typescript-language-server` | Language server command |
+| `CODEMODE_LSP_BIN` | bundled `typescript-language-server` | Language server command |
 | `CODEMODE_READONLY` | unset | `1`/`true` removes the 7 write ops from the sandbox, the type defs, and the tool description |
 Workspace root = the server's cwd. Paths resolving outside it are rejected,
 reads and writes alike.
-## Limitations (v1)
+## Limitations
-- TypeScript only (the architecture is language-agnostic; more servers later).
+- TypeScript/JavaScript only (the architecture is language-agnostic; more
+  servers later).
 - Diagnostics cover files touched in the session, not the whole project —
   `tsserver` only publishes for opened files.
 - A synchronous infinite loop (`while (true) {}`) is not interrupted by the
   script timeout; async work is.
 - `Reference.isWriteAccess` is always `false` (not exposed over standard LSP).
+- `getDependencies` is syntactic — a local variable shadowing an import can
+  produce a false positive.
 ## Eval
-`bun run eval` measures the project's core success criterion: can an LLM, given
-only the tool description, write correct scripts? It runs 15 benchmark tasks
-(exploration, reference-finding, diagnostics, renames, multi-file refactors)
-against a throwaway copy of the fixture project. The agent is headless Claude
-Code (`claude -p`, billed to your Claude subscription — no API key) with every
-built-in tool disabled, so the only way to solve a task is the `execute` tool.
-It runs on Sonnet by default (pinned so pass rates are comparable);
-`--model opus` etc. overrides.
-Grading is deterministic: read tasks are scored on the final answer, write
-tasks on the resulting disk state.
+`bun run eval` measures the project's core success criterion: can an LLM,
+given only the tool description, write correct scripts? It runs 16 benchmark
+tasks (exploration, reference-finding, diagnostics, renames, multi-file
+refactors) against a throwaway copy of the fixture project. The agent is
+headless Claude Code (`claude -p`, billed to your Claude subscription — no API
+key) with every built-in tool disabled, so the only way to solve a task is the
+`execute` tool. Sonnet by default (pinned so pass rates are comparable across
+runs); override with `--model`. Grading is deterministic: read tasks are
+scored on the final answer, write tasks on the resulting disk state.
 Each task ships with a reference solution that runs against the real server as
-part of `bun test`, so the benchmark itself can never rot. The eval is run on
+part of `bun test`, so the benchmark itself can never rot. The eval runs on
 demand, never in CI.
-Current pass rate: **15/15 (100%)** — headless Claude Code (Fable 5), 2026-06-10.
+Last measured: **15/15 (100%)** — headless Claude Code (Fable 5), 2026-06-10,
+before the 16th task was added.
 ## Development
 ```sh
-bun run check        # typecheck + lint + all tests — run before declaring done
-bun test             # all tests (integration tests run the real language server)
+bun run check           # typecheck + lint + all tests — run before declaring done
+bun test                # all tests (integration tests run the real language server)
 bun run generate:types  # regenerate src/lsp-types.generated.ts after API changes
-bun run eval         # LLM benchmark (on demand; needs the claude CLI)
+bun run eval            # LLM benchmark (on demand; needs the claude CLI)
 ```
 The worked examples in the tool description run verbatim as golden tests