npm - little-coder - Versions diffs - 1.5.1 → 1.6.0 - Mend

little-coder 1.5.1 → 1.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/.pi/extensions/read-guard/index.ts +153 -0
package/.pi/extensions/read-guard/read-guard.test.ts +189 -0
package/CHANGELOG.md +12 -0
package/README.md +5 -3
package/package.json +1 -1

package/.pi/extensions/read-guard/index.ts ADDED Viewed

@@ -0,0 +1,153 @@
+import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
+import { harnessIntervention } from "../_shared/intervention.ts";
+// Harness intervention: trim a `read` result that would overflow the context window.
+//
+// little-coder drives SMALL local models with small context windows
+// (`context_limit` is 32768 in .pi/settings.json, and the live window is often
+// less). pi's built-in `read` returns up to ~2000 lines in a single tool result
+// — for a small model that one result can blow past the remaining budget, evict
+// earlier conversation, and wreck the run. That's exactly the class of failure
+// the harness-intervention layer exists to catch (cf. thinking-budget cap,
+// write-guard redirect, turn-cap).
+//
+// When a read result would push context usage past the window, we replace it
+// with only the file's first HEAD_LINES lines plus a message telling the model
+// why it was trimmed and to use those lines to understand the structure, then
+// locate what it needs with grep/find or a targeted read (offset/limit) — rather
+// than re-reading the whole file. The user sees one uniform "harness
+// intervention: …" line, like every other intervention.
+//
+// Why `tool_result`, not `tool_call`: a `tool_call` handler can only `block`
+// with a `reason` string (no file content) or mutate `input.limit` (lines but no
+// message). Delivering BOTH the first 30 lines AND an explanation in one result
+// requires `tool_result`, whose return value replaces the content the model sees
+// (ToolResultEventResult.content). The full file is still read from disk (pi
+// already caps that at ~2000 lines) but the oversized text never reaches the LLM
+// context because we swap it out before it lands.
+export const HEAD_LINES = 30;
+// When current context usage is unknown (e.g. right after compaction
+// getContextUsage().tokens is null), fall back to "a single file should never
+// eat more than this fraction of the whole window".
+export const FALLBACK_FRACTION = 0.5;
+// Tokens to keep in reserve below the window before we call a read an overflow.
+// 0 = trim only on literal overflow; raise it to trim slightly earlier and leave
+// the model headroom to act on the 30 lines.
+export const RESERVE = 0;
+/** chars→tokens estimate. Same 3.5 ratio as thinking-budget's charsToTokens /
+ *  local/context_manager.estimate_tokens. */
+export function estimateTokens(chars: number): number {
+  return Math.ceil(chars / 3.5);
+}
+/** First `n` lines of `text`, preserving pi's `cat -n` line-number prefixes so
+ *  the model keeps a real structural view. Safe when text has fewer than n. */
+export function firstLines(text: string, n: number): string {
+  return text.split("\n").slice(0, n).join("\n");
+}
+export function countLines(text: string): number {
+  if (text === "") return 0;
+  return text.split("\n").length;
+}
+/**
+ * Decide whether a read result should be trimmed because keeping it whole would
+ * exceed the context window.
+ *
+ * - Nothing to trim if the result is already <= headN lines, or we have no window.
+ * - With a known current token count: trim when current + est would cross the
+ *   window (less RESERVE) — the literal "will result in exceeding the window".
+ * - With unknown current usage: trim when the result alone exceeds
+ *   FALLBACK_FRACTION of the window.
+ */
+export function shouldTrimRead(a: {
+  contentChars: number;
+  currentTokens: number | null;
+  contextWindow: number;
+  lineCount: number;
+  headN: number;
+}): boolean {
+  if (!a.contextWindow) return false;
+  if (a.lineCount <= a.headN) return false;
+  const est = estimateTokens(a.contentChars);
+  if (a.currentTokens == null) {
+    return est > a.contextWindow * FALLBACK_FRACTION;
+  }
+  return a.currentTokens + est > a.contextWindow - RESERVE;
+}
+/** Message appended below the 30 lines, addressed to the model. Leads with the
+ *  consequence and the directive. */
+export function trimmedReadMessage(a: {
+  shownLines: number;
+  totalLines: number;
+  estTokens: number;
+  contextWindow: number;
+}): string {
+  return (
+    `⚠️ This file is too large to read in full — reading all ${a.totalLines} lines ` +
+    `(~${a.estTokens} tokens) would exceed the remaining context window ` +
+    `(${a.contextWindow} tokens). Only the first ${a.shownLines} lines are shown above.\n` +
+    `\n` +
+    `Use these ${a.shownLines} lines to understand the file's structure, then narrow down ` +
+    `instead of reading the whole thing:\n` +
+    `  • search for what you need with \`grep\` (by content) or \`find\` (by name), then\n` +
+    `  • \`read\` only the relevant range with \`offset\` and \`limit\`.\n` +
+    `\n` +
+    `Do NOT re-read this file in full — it will be trimmed again.`
+  );
+}
+type TextOrImage = { type: string; text?: string };
+export default function (pi: ExtensionAPI) {
+  pi.on("tool_result", async (event, ctx) => {
+    if (String((event as any).toolName ?? "").toLowerCase() !== "read") return;
+    if ((event as any).isError) return;
+    const content = (((event as any).content ?? []) as TextOrImage[]);
+    if (content.length === 0) return;
+    // Text-only: an image read can't be line-trimmed, leave it alone.
+    if (content.some((c) => c.type !== "text")) return;
+    const text = content.map((c) => c.text ?? "").join("");
+    // getContextUsage may be absent on older SDK builds; without a window we
+    // can't judge overflow, so leave the result untouched.
+    const usage =
+      typeof ctx.getContextUsage === "function" ? ctx.getContextUsage() : undefined;
+    if (!usage?.contextWindow) return;
+    const lineCount = countLines(text);
+    if (
+      !shouldTrimRead({
+        contentChars: text.length,
+        currentTokens: usage.tokens,
+        contextWindow: usage.contextWindow,
+        lineCount,
+        headN: HEAD_LINES,
+      })
+    ) {
+      return;
+    }
+    const head = firstLines(text, HEAD_LINES);
+    const msg = trimmedReadMessage({
+      shownLines: HEAD_LINES,
+      totalLines: lineCount,
+      estTokens: estimateTokens(text.length),
+      contextWindow: usage.contextWindow,
+    });
+    harnessIntervention(
+      ctx,
+      "a read would have overflowed the context window — showed only the file's first 30 lines and told the model to search it instead.",
+    );
+    return { content: [{ type: "text" as const, text: head + "\n\n" + msg }] };
+  });
+}

package/.pi/extensions/read-guard/read-guard.test.ts ADDED Viewed

@@ -0,0 +1,189 @@
+import { describe, it, expect } from "vitest";
+import setupReadGuard, {
+  HEAD_LINES,
+  FALLBACK_FRACTION,
+  estimateTokens,
+  firstLines,
+  countLines,
+  shouldTrimRead,
+  trimmedReadMessage,
+} from "./index.ts";
+// ── pure helpers ────────────────────────────────────────────────────────────
+describe("estimateTokens", () => {
+  it("uses the 3.5 chars/token ratio, rounding up", () => {
+    expect(estimateTokens(0)).toBe(0);
+    expect(estimateTokens(1)).toBe(1); // ceil(1/3.5)
+    expect(estimateTokens(35)).toBe(10);
+    expect(estimateTokens(36)).toBe(11); // ceil(36/3.5)
+  });
+});
+describe("firstLines", () => {
+  const sample = Array.from({ length: 100 }, (_, i) => `${i + 1}\tline ${i + 1}`).join("\n");
+  it("returns the first n lines and preserves cat -n prefixes", () => {
+    const out = firstLines(sample, 30);
+    expect(countLines(out)).toBe(30);
+    expect(out.startsWith("1\tline 1")).toBe(true);
+    expect(out.endsWith("30\tline 30")).toBe(true);
+  });
+  it("is safe when the text has fewer than n lines", () => {
+    expect(firstLines("a\nb", 30)).toBe("a\nb");
+    expect(firstLines("", 30)).toBe("");
+  });
+});
+describe("countLines", () => {
+  it("counts newline-separated lines, with empty string as zero", () => {
+    expect(countLines("")).toBe(0);
+    expect(countLines("one")).toBe(1);
+    expect(countLines("one\ntwo\nthree")).toBe(3);
+    expect(countLines("trailing\n")).toBe(2); // trailing newline => empty final line
+  });
+});
+describe("shouldTrimRead", () => {
+  const base = { contextWindow: 32768, headN: HEAD_LINES };
+  it("trims when current tokens + estimate would exceed the window", () => {
+    // 100k chars ≈ 28572 tokens; with 10000 already used that crosses 32768.
+    expect(
+      shouldTrimRead({ ...base, contentChars: 100_000, currentTokens: 10_000, lineCount: 2000 }),
+    ).toBe(true);
+  });
+  it("does not trim when the result comfortably fits", () => {
+    expect(
+      shouldTrimRead({ ...base, contentChars: 4_000, currentTokens: 1_000, lineCount: 200 }),
+    ).toBe(false);
+  });
+  it("never trims when the result is <= headN lines", () => {
+    expect(
+      shouldTrimRead({ ...base, contentChars: 1_000_000, currentTokens: 30_000, lineCount: HEAD_LINES }),
+    ).toBe(false);
+  });
+  it("falls back to a window fraction when current usage is unknown (null)", () => {
+    const window = 10_000;
+    const overChars = Math.ceil(window * FALLBACK_FRACTION * 3.5) + 100; // est just over half
+    const underChars = Math.floor(window * FALLBACK_FRACTION * 3.5) - 100; // est just under half
+    expect(
+      shouldTrimRead({ contextWindow: window, headN: HEAD_LINES, currentTokens: null, contentChars: overChars, lineCount: 2000 }),
+    ).toBe(true);
+    expect(
+      shouldTrimRead({ contextWindow: window, headN: HEAD_LINES, currentTokens: null, contentChars: underChars, lineCount: 2000 }),
+    ).toBe(false);
+  });
+  it("returns false when there is no context window to judge against", () => {
+    expect(
+      shouldTrimRead({ contextWindow: 0, headN: HEAD_LINES, currentTokens: 1, contentChars: 1_000_000, lineCount: 2000 }),
+    ).toBe(false);
+  });
+});
+describe("trimmedReadMessage", () => {
+  it("explains the trim and directs to grep/find + targeted read", () => {
+    const msg = trimmedReadMessage({ shownLines: 30, totalLines: 2000, estTokens: 28572, contextWindow: 32768 });
+    expect(msg).toContain("too large");
+    expect(msg).toContain("first 30 lines");
+    expect(msg).toContain("grep");
+    expect(msg).toContain("find");
+    expect(msg).toContain("offset");
+    expect(msg).toContain("limit");
+    expect(msg).toContain("Do NOT re-read");
+  });
+});
+// ── tool_result handler ─────────────────────────────────────────────────────
+function getToolResultHandler() {
+  let handler: ((event: any, ctx: any) => any) | undefined;
+  const pi = {
+    on(name: string, h: (event: any, ctx: any) => any) {
+      if (name === "tool_result") handler = h;
+    },
+  };
+  setupReadGuard(pi as any);
+  if (!handler) throw new Error("read-guard did not register a tool_result handler");
+  return handler;
+}
+function makeCtx(usage: { tokens: number | null; contextWindow: number } | undefined) {
+  const notifies: string[] = [];
+  return {
+    notifies,
+    ui: { notify: (m: string) => notifies.push(m) },
+    getContextUsage: () => (usage ? { ...usage, percent: null } : undefined),
+  };
+}
+// A read result whose text is `lines` numbered lines, ~chars wide each.
+function bigReadEvent(lines: number, width = 80) {
+  const text = Array.from({ length: lines }, (_, i) => `${i + 1}\t${"x".repeat(width)}`).join("\n");
+  return { toolName: "read", isError: false, content: [{ type: "text", text }] };
+}
+describe("read-guard tool_result handler", () => {
+  it("trims an oversized read to 30 lines + a directive and fires one intervention", async () => {
+    const handler = getToolResultHandler();
+    const ctx = makeCtx({ tokens: 20_000, contextWindow: 32768 });
+    const result = await handler(bigReadEvent(2000), ctx);
+    expect(result?.content).toHaveLength(1);
+    const out = result.content[0].text as string;
+    // first 30 lines preserved (and only those), then the directive
+    expect(out.startsWith("1\t")).toBe(true);
+    expect(out).not.toContain("\n31\t"); // line 31's content must be gone
+    const [headPart] = out.split("⚠️");
+    expect(countLines(headPart.trimEnd())).toBe(30);
+    expect(out).toContain("grep");
+    expect(ctx.notifies).toHaveLength(1);
+    expect(ctx.notifies[0]).toMatch(/harness intervention:.*first 30 lines/i);
+  });
+  it("leaves a read that fits the window untouched", async () => {
+    const handler = getToolResultHandler();
+    const ctx = makeCtx({ tokens: 1_000, contextWindow: 32768 });
+    const result = await handler(bigReadEvent(50), ctx);
+    expect(result).toBeUndefined();
+    expect(ctx.notifies).toHaveLength(0);
+  });
+  it("ignores error results", async () => {
+    const handler = getToolResultHandler();
+    const ctx = makeCtx({ tokens: 30_000, contextWindow: 32768 });
+    const ev = { ...bigReadEvent(2000), isError: true };
+    expect(await handler(ev, ctx)).toBeUndefined();
+    expect(ctx.notifies).toHaveLength(0);
+  });
+  it("ignores results that contain an image block (can't line-trim an image)", async () => {
+    const handler = getToolResultHandler();
+    const ctx = makeCtx({ tokens: 30_000, contextWindow: 32768 });
+    const ev = {
+      toolName: "read",
+      isError: false,
+      content: [{ type: "image", data: "…", mimeType: "image/png" }],
+    };
+    expect(await handler(ev, ctx)).toBeUndefined();
+  });
+  it("ignores non-read tools", async () => {
+    const handler = getToolResultHandler();
+    const ctx = makeCtx({ tokens: 30_000, contextWindow: 32768 });
+    const ev = { ...bigReadEvent(2000), toolName: "bash" };
+    expect(await handler(ev, ctx)).toBeUndefined();
+  });
+  it("does nothing when context usage is unavailable", async () => {
+    const handler = getToolResultHandler();
+    const ctx = makeCtx(undefined);
+    expect(await handler(bigReadEvent(2000), ctx)).toBeUndefined();
+    expect(ctx.notifies).toHaveLength(0);
+  });
+});

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,18 @@
 All notable changes to little-coder are documented here. The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and little-coder's public interface (CLI, providers, tools, skills) follows semver starting at `v0.0.1` post-rename.
+## [v1.6.0] — 2026-05-23
+A new harness intervention for small-context models: oversized file reads no longer blow the context window. little-coder targets local models with small windows (`context_limit` is 32768, and the live window is often less), but pi's built-in `read` returns up to ~2000 lines in a single tool result — enough for one read to evict the conversation and derail the run. The harness now catches that read before it lands and replaces it with the file's head plus a "search, don't slurp" directive, surfaced through the same one-voice `harness intervention: …` line as the thinking-budget cap, write-guard redirect, and turn-cap.
+### Added
+- **`read-guard` extension — trims a Read that would overflow the context window.** On the `tool_result` event, when a successful `read`'s content would push context usage past the window (`ctx.getContextUsage().tokens + estimate(result) > contextWindow`, estimated at the same 3.5 chars/token ratio as the thinking-budget cap), the harness replaces the result with **only the file's first 30 lines** followed by a message that explains the trim and directs the model to use those lines to understand the file's structure, then narrow down — locate what it needs with `grep`/`find` or a targeted `read` (`offset`/`limit`) — rather than re-reading the whole file. The full file is still read from disk (pi already caps that at ~2000 lines), but the oversized text never reaches the model's context because the result content is swapped before it lands. `tool_result` (not `tool_call`) is used precisely because it can deliver both the 30 lines and the explanation in one result — a `tool_call` block can only return a `reason` string, and mutating `input.limit` gives lines but no message. When current usage is unknown (e.g. right after compaction, `tokens` is null), it falls back to trimming any single read that alone exceeds half the window. Image reads and error results are left untouched. New extension at `.pi/extensions/read-guard/`, auto-discovered by the launcher.
+### Notes for upgraders
+- No CLI flag, `models.json` shape, `.pi/settings.json`, or per-model-profile schema changes. The new extension auto-loads like every other `.pi/extensions/*/index.ts`, and only changes behaviour when a read would otherwise overflow the context window — normal reads pass through untouched. The threshold reads pi's live `getContextUsage()`, so it scales with whatever context window the active model reports.
+---
 ## [v1.5.1] — 2026-05-22
 A branding release — no behaviour changes. little-coder now wears the v1.0 brand book: the warm **paper / ink / honey** palette (`#F2EBDC` · `#1A1410` · `#E15A1F`), the `lc▌` block-cursor mark, and IBM Plex Mono. The "ready to type" cursor is the punchline — it ties the CLI heritage into the identity without saying so.

package/README.md CHANGED Viewed

@@ -1,9 +1,10 @@
+![little-coder — a coding agent for the laptop in front of you](assets/banner.svg)
 # little-coder
 **A coding agent tuned for small local models, built on top of [pi](https://pi.dev).**
-![little-coder — a coding agent for the laptop in front of you](assets/banner.svg)
 The research story behind all this — why scaffold–model fit matters, how a 9.7 B Qwen beat frontier entries on Aider Polyglot, and what the load-bearing mechanisms actually do — is written up on Substack: **[*Honey, I Shrunk the Coding Agent*](https://open.substack.com/pub/itayinbarr/p/honey-i-shrunk-the-coding-agent)**. Start there if you want the "why"; stay here for the "how".
 ## How it relates to pi
@@ -298,10 +299,11 @@ The benchmarks harness (`benchmarks/`) is dev-only and not shipped with the npm
 little-coder/
 ├── .pi/
 │   ├── settings.json               # per-model profiles + benchmark_overrides (terminal_bench, gaia)
-│   └── extensions/                 # 21 TypeScript extensions, auto-discovered by pi
+│   └── extensions/                 # 23 TypeScript extensions, auto-discovered by pi
 │       ├── branding/               # little-coder startup header + terminal title (replaces pi's built-in)
 │       ├── llama-cpp-provider/     # data-driven provider registration from models.json — ships llamacpp, ollama, lmstudio (+ user override file)
 │       ├── write-guard/            # Write refuses on existing files; rewrites root-bare /foo.md paths to cwd
+│       ├── read-guard/             # trims a Read that would overflow the context window to its first 30 lines + a search-instead directive
 │       ├── extra-tools/            # glob, webfetch, websearch (pi ships grep/find)
 │       ├── skill-inject/           # per-turn tool-skill selection (error > recency > intent)
 │       ├── knowledge-inject/       # algorithm cheat-sheet scoring (word=1.0, bigram=2.0, threshold=2.0)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "little-coder",
-  "version": "1.5.1",
+  "version": "1.6.0",
   "description": "A pi-based coding agent optimized for small local language models. Reproduces the whitepaper's scaffold-model-fit adaptations as pi extensions.",
   "homepage": "https://github.com/itayinbarr/little-coder",
   "repository": {