npm - @toolkit-cli/toolkode - Versions diffs - 1.13.3 → 1.15.0 - Mend

@toolkit-cli/toolkode 1.13.3 → 1.15.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (2) hide show

package/README.md +99 -2
package/package.json +6 -5

package/README.md CHANGED Viewed

@@ -35,6 +35,103 @@ toolkode --prompt "fix the failing tests"
 toolkode serve
 ```
+## What's new in v1.15 "Oracle"
+**200-pattern predictive failure analysis, compiled into Toolkode's own napi-rs binary.** v1.15 ships Toolkode's first real Rust hero feature: a failure-prediction engine that catches auth, data, async, API, state, database, security, and integration bugs *before* you write code. Sub-millisecond, local-only, no network calls, no telemetry.
+**`/foresight` — inline predictions.** Type a requirement and get ranked predictions with severity, confidence, prevention templates, test suggestions, and OWASP 2021 mappings.
+```
+/foresight "add jwt auth with refresh tokens"
+/predict "migrate user table to UUIDs"            # alias
+/oracle "add retry logic to payment webhook"      # alias
+```
+All three commands route to the same handler — the aliases are honored when typed directly into the prompt, not just via the command palette.
+**200 failure patterns across 8 categories:**
+| Category           | Patterns | Examples                                                      |
+| ------------------ | -------- | ------------------------------------------------------------- |
+| Authentication     | 25       | JWT `alg=none`, session fixation, MFA bypass, SAML sigs       |
+| Data Handling      | 30       | XXE, CSV formula injection, prototype pollution, PII leaks    |
+| Async / Concurrency | 25       | Race conditions, TOCTOU, deadlock, unhandled rejections       |
+| API                | 25       | Rate limit bypass, idempotency, CORS wildcard, filter injection |
+| State Management   | 20       | Stale closure, subscriber leak, optimistic drift              |
+| Database           | 25       | SQL injection, N+1, connection pool leak, schema drift        |
+| Security           | 30       | XSS, CSRF, SSRF, path traversal, insecure deserialization     |
+| Integration        | 20       | Webhook replay, schema drift, timeout cascade, missing DLQ    |
+| **Total**          | **200**  |                                                               |
+Every pattern ships as compiled Rust struct literals inside `@toolkit-cli/toolkode-native` — no JSON loading, no runtime fetches, no cloud. The pattern database, matching engine, regex compilation, and confidence scorer are a single `.node` file per platform.
+**Mission pre-implementation advisory.** Before Mission spawns an implement worker, Toolkode runs foresight on the task requirement with the project's tech stack and a file-path-inferred context (auth / database / api / async_concurrency / security). Critical and High predictions surface as advisory log entries tagged `foresight.advisory`. **Non-blocking by design** — this is a nudge, not a gate. Hard gates arrive with v1.18 "Verdict" and `/certainty`.
+**`@toolkit-cli/toolkode-native` umbrella package.** v1.14 scaffolded the napi-rs pipeline with a trivial `ping()`. v1.15 proves it at scale with ~3,700 lines of idiomatic Rust in the `.node` binary. Distribution is wired through a new `@toolkit-cli/toolkode-native` umbrella npm package with per-platform `optionalDependencies`: `@toolkit-cli/toolkode-native-darwin-arm64`, `@toolkit-cli/toolkode-native-darwin-x64`, `@toolkit-cli/toolkode-native-linux-x64`, `@toolkit-cli/toolkode-native-linux-arm64`, `@toolkit-cli/toolkode-native-windows-x64`. `npm i -g @toolkit-cli/toolkode@1.15.0` pulls the umbrella, the umbrella pulls the matching platform binary, and a runtime platform dispatcher in `index.js` resolves the correct `.node` at load time. Graceful fallback if the binary can't load — `/foresight` returns a friendly error instead of crashing.
+**Real tests for the FFI boundary.** v1.14's TS tests passed vacuously because every positive assertion early-returned when `Native.isAvailable()` was false — and it was always false in CI without a built `.node`. v1.15 fixes the harness: CI now runs `napi build --release` before `bun test`, an unskippable gate asserts `isAvailable() === true` in CI, and the positive tests actually exercise the Rust engine through the JSON-at-the-boundary FFI. The publish job depends on the test job — no green, no ship.
+**Hardened Rust release pipeline.** `toolkode-core`'s `.node` binary ships with an explicit release profile: `strip = "symbols"` removes all function and type names (6,933 → 1 exported symbol, the sole napi entry point), `lto = true` eliminates dead code and further shrinks the binary (~28% smaller), and CI sets `RUSTFLAGS="--remap-path-prefix=..."` to scrub absolute build paths and dependency source locations from panic metadata in `.rodata`. The pattern database strings are intentionally preserved as the IP moat; everything else that could leak host info or internal structure is stripped.
+### Sub-agent reliability & observability (lifecycle hardening)
+v1.15 also lands an architectural reliability pass on Toolkode's sub-agent orchestration. Previously, three separate models — background `task` subagents, `team` teammates, resumed sessions via `send_message` — each inferred "agent is doing useful work" from brittle signals like "process exists" or "session is busy." Silent failure was the main enemy: a spawned teammate that never acknowledged its task would appear "active" in the sidebar forever; a background subagent that went idle mid-task would silently stop with no operator-visible failure.
+**Shared lifecycle registry.** A new `AgentLifecycle` module gives every sub-agent an explicit state machine with 16 states (`created → queued → assigned → booting → ready → acknowledged → working → progressing → blocked → retrying → stalled → timed_out → failed → completed → abandoned → cancelled`), per-source budgets (ack / progress / heartbeat / blocked / max-age), and a persistent timeline + evidence record. Each orchestration path (`task.ts`, `team/index.ts`, `send-message.ts`, child `run.ts`) emits lifecycle events for every real signal — tool start, tool finish, meaningful text output, session status change, permission prompt, child exit.
+**Active watchdog.** Every 5 seconds the watchdog evaluates each live agent against its budgets and produces a structured assessment (`healthy` / `warning` / `stalled` / `timed_out` / `orphaned`) with a recovery action (`nudge` / `rehydrate` / `restart` / `escalate`). Background subagents get one automatic rehydrate attempt; team teammates get one automatic restart for early no-ack boot failures (with proper SIGTERM-then-SIGKILL process hygiene — no more zombie children on watchdog restart). Beyond that, operators intervene via the inspector.
+**Background task completion visible to the parent LLM.** When a background task, a resumed subagent, or a team teammate completes, Toolkode now publishes a `TaskNotify` bus event that injects a `<system-reminder><task-notification>...</task-notification></system-reminder>` block into the parent session's next LLM turn. Before this, background results disappeared silently — the parent had to poll. Now the parent model sees the result (or the failure) as part of its context on the next step.
+**Inspector and sidebar show authoritative state.** The TUI agent inspector (`/agents` dialog) and the sidebar agents panel no longer derive status from `SessionStatus === busy`. They read the lifecycle registry directly and surface 16-state status, watchdog reason, ack / progress / heartbeat ages, retry count, visibility reason, and the last 6 timeline events. Operators can see exactly why an agent is still visible and what state it's stuck in, not just "running" vs "idle".
+**Tests.** 3 unit tests for the lifecycle module (`test/agent/lifecycle.test.ts` — transition recording, no-ack timeout detection, heartbeat-without-progress stall detection), 7 integration tests for the TaskNotify bus round trip (`test/session/task-notify.test.ts` — notify→pending→drain, XML formatting, error-block semantics, duplicate dedup, per-session scoping, empty-format guard, manual acknowledge).
+### Buddies vs. Oracles
+> **Same-day release coincidence.** Anthropic shipped **Claude Buddies** the same day we shipped **Oracle**. We couldn't have scripted this better if we tried.
+>
+> **Buddies** are AI companions that keep you company while you code. They chat. They encourage. They're friendly. They're supportive. We're sure they're lovely.
+>
+> **Oracle** is 200 Rust-compiled failure patterns that tell you the SQL injection is coming before you type the first backtick. It doesn't chat. It doesn't encourage. It predicts failure, names the CVE class, cites the OWASP entry, and hands you the prevention template — in sub-millisecond time, on your laptop, with zero network calls.
+>
+> Two valid philosophies. One ships comfort; the other ships foresight. Both are helpful. Only one of them catches `alg: none` in your JWT middleware at 2 AM on a Sunday before the pager fires.
+>
+> We respect the craft. We also note that a buddy watching you write `eval(req.body.code)` will probably just say "you've got this!" while an oracle opens with `sec-028: Unsafe eval() / Function() on User Input — Critical — CWE-95 — stop`.
+>
+> Pick whichever fits your workflow. If you want both, they don't conflict. If you want neither, you're probably fine until production.
+>
+> _We'll send Anthropic a muffin basket. We ship Rust._
+### Upgrade notes
+- No breaking changes. No config changes.
+- The native binary is pulled automatically via `@toolkit-cli/toolkode-native`'s `optionalDependencies` on install.
+- Supported platforms: darwin-arm64, darwin-x64, linux-x64, linux-arm64, windows-x64.
+- If the native module fails to load on your platform, `/foresight` returns a graceful error instead of crashing. The rest of the TUI keeps working.
+- Foresight is 100% local. No network calls, no telemetry.
+## What's new in v1.14 "Mission Critical"
+A stability-and-polish release. v1.14 closes the last known OpenTUI streaming crash, finishes the `/compact` UX cleanup started in v1.13.2, restores the `/checkpoint` alias, and adds a 144-test suite around the artifact and compat layers that shipped in v1.13.3.
+**Streaming crash: fully guarded.** v1.13.1 wrapped the SSE listener in `batch()` and switched hot-path array mutations to `reconcile()`, which eliminated most "Child not found in children" crashes — but not all. v1.14 adds a renderer-level belt to the v1.13.1 suspenders: an idempotent monkey-patch on `TextNodeRenderable.prototype.remove` that makes the call a no-op when the child is already gone instead of throwing. Installed once at module load, guarded by a `Symbol.for` flag so repeat imports and HMR don't stack wrappers. No more mid-stream TUI death, period.
+**`/compact` is finally tidy.** The canonical `/compact` handler in the session route now: shows a "Compacting…" toast so you know the work kicked off, awaits the server response (which returns immediately — compaction runs fire-and-forget server-side and streams back over SSE), logs failures through the TUI logger instead of silently swallowing them, clears the dialog on the null-model warning path (which previously left it open), and reports errors via toast. The stub entry in `app.tsx` that v1.13.2 deleted stays deleted — there's exactly one `/compact` in the palette, and it's the right one.
+**`/checkpoint` alias restored.** `/rewind` responds to `/checkpoint` again, matching pre-v1.13.2 behavior. v1.13.2's dedup pass removed it as a side effect of killing a duplicate `/rewind` entry; it's back on the canonical session-route command.
+**144 new tests for the compat layer.** The Claude/Gemini/Codex artifact-compat code that landed in v1.13.3 now has real coverage: 144 tests across 7 files exercising the artifact util, path resolution, Zod schema validation (valid **and** invalid inputs), the artifact registry, and per-agent path derivation. All 144 pass, 197 `expect()` calls, under 1.5s on a cold run.
+**README cleanup.** Stale "targeted for v1.14.0" references are corrected — the compat feature actually shipped in v1.13.3 (commit `724c94b`). The `/checkpoint` restoration note is updated to reflect reality.
+### Upgrade notes
+- No breaking changes. No config changes. No new dependencies.
+- If you were working around the streaming crash by restarting the TUI, you can stop.
+- If you had `/checkpoint` muscle memory, it works again.
+- The compat tests run as part of `bun test` — nothing to configure.
 ## What's new in v1.13 "Dreamer" (v1.13.2 stable)
 **Memory that fades gracefully.** Working set, facts, goals, and corrections now decay on category-specific exponential curves — corrections stay weighted for a year, working-set notes for 30 minutes, steers for 2 hours. Stale memories stop crowding the prompt, fresh ones get priority, and nothing drops off a cliff.
@@ -53,8 +150,8 @@ v1.13.0 shipped with a handful of bugs that a deep audit turned up. v1.13.1 was
 - **Fixed: `/compact` command produced no visible result.** The `/session/:id/summarize` server route was blocking its HTTP response on the full compaction LLM call (30s-120s), which tripped client-side timeouts and surfaced as "Compaction failed" even though the work was actually running. The route now kicks the loop off fire-and-forget (matching the existing `prompt_async` pattern) and returns 200 immediately. Clients observe progress and the summary message via SSE.
 - **Fixed: two `/compact` entries in the command palette.** A stub entry in `app.tsx` was shadowing the canonical session-route version. While we were there, we also removed duplicate `/rename`, `/copy`, `/export`, and `/rewind` entries that had the same shape collision, and killed a `"search"` alias on `/find` that was colliding with the semantic search command.
-- **Fixed: release build failed to resolve `./routes/compat`.** A Claude/Codex/Gemini artifact-compat feature was partially landed on main (import in `server/instance.ts`) without the corresponding `src/compat/` subtree committed. The import is reverted in v1.13.2 so the build is clean; the full compat feature is targeted for v1.14.0.
-- **Known minor regression**: the `/rewind` command's `checkpoint` alias was removed along with the duplicate entry. If you've been typing `/checkpoint`, use `/rewind` instead for now. The alias will be restored in v1.14.0 on the canonical session-route version.
+- **Fixed: release build failed to resolve `./routes/compat`.** A Claude/Codex/Gemini artifact-compat feature was partially landed on main (import in `server/instance.ts`) without the corresponding `src/compat/` subtree committed. The import is reverted in v1.13.2 so the build is clean; the full compat feature shipped in v1.13.3 (724c94b).
+- **Known minor regression**: the `/rewind` command's `checkpoint` alias was removed along with the duplicate entry. Restored in v1.14.0.
 ### v1.13.1 fix set (bundled into v1.13.2)

package/package.json CHANGED Viewed

@@ -12,12 +12,13 @@
   "scripts": {
     "postinstall": "bun ./postinstall.mjs || node ./postinstall.mjs"
   },
-  "version": "1.13.3",
+  "version": "1.15.0",
   "license": "SEE LICENSE IN LICENSE",
   "optionalDependencies": {
-    "@toolkit-cli/toolkode-linux-arm64": "1.13.3",
-    "@toolkit-cli/toolkode-linux-x64": "1.13.3",
-    "@toolkit-cli/toolkode-darwin-arm64": "1.13.3",
-    "@toolkit-cli/toolkode-windows-x64": "1.13.3"
+    "@toolkit-cli/toolkode-linux-arm64": "1.15.0",
+    "@toolkit-cli/toolkode-linux-x64": "1.15.0",
+    "@toolkit-cli/toolkode-darwin-arm64": "1.15.0",
+    "@toolkit-cli/toolkode-windows-x64": "1.15.0",
+    "@toolkit-cli/toolkode-native": "1.15.0"
   }
 }