knit-mcp 0.10.0 → 0.11.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -3,8 +3,8 @@
3
3
  <a href="https://github.com/PDgit12/knit/actions/workflows/ci.yml"><img src="https://img.shields.io/github/actions/workflow/status/PDgit12/knit/ci.yml?style=for-the-badge&label=CI&color=10b981" alt="CI" /></a>
4
4
  <img src="https://img.shields.io/badge/license-MIT-3b82f6?style=for-the-badge" alt="license" />
5
5
  <img src="https://img.shields.io/badge/node-%E2%89%A518-339933?style=for-the-badge&logo=node.js&logoColor=white" alt="node" />
6
- <img src="https://img.shields.io/badge/tests-492%20passing-22c55e?style=for-the-badge" alt="tests" />
7
- <img src="https://img.shields.io/badge/MCP%20tools-43-7c3aed?style=for-the-badge" alt="tools" />
6
+ <img src="https://img.shields.io/badge/tests-665%20passing-22c55e?style=for-the-badge" alt="tests" />
7
+ <img src="https://img.shields.io/badge/MCP%20tools-53-7c3aed?style=for-the-badge" alt="tools" />
8
8
  </p>
9
9
 
10
10
  <h1 align="center">🧶 knit</h1>
@@ -18,9 +18,10 @@
18
18
  <p align="center">
19
19
  <a href="#-quick-start">Quick start</a> ·
20
20
  <a href="#-what-knit-is">What it is</a> ·
21
- <a href="#-whats-new-in-v090">v0.9</a> ·
22
- <a href="#-43-mcp-tools">Tools</a> ·
23
- <a href="#-how-its-different">Comparison</a>
21
+ <a href="#-whats-new-in-v0110">v0.11</a> ·
22
+ <a href="#-52-mcp-tools">Tools</a> ·
23
+ <a href="#-how-its-different">Comparison</a> ·
24
+ <a href="#-honest-comparison-vs-memory-libraries">vs mem0/Letta</a>
24
25
  </p>
25
26
 
26
27
  ---
@@ -410,10 +411,52 @@ Compounding
410
411
 
411
412
  ---
412
413
 
414
+ ## 🧭 Honest comparison vs memory libraries
415
+
416
+ The mem0 / Letta / agentmemory comparison deserves a separate section because they're a different category — **memory-as-a-service libraries**, not MCP-native workflow layers. Reading their published benchmarks side-by-side:
417
+
418
+ | | mem0 | Letta (MemGPT) | agentmemory | **Knit** |
419
+ |--|---|---|---|---|
420
+ | **Published benchmark** | LOCOMO: 67–92% LLM-as-Judge; ~90% token reduction (1.7K vs 26K per conversation) | No head-to-head token-reduction number; "Letta Leaderboard" benchmarks *LLMs* on agentic memory, not Letta | LongMemEval-S: **95.2% R@5** with BM25+RRF+graph; 86.2% BM25-only | **Not yet measured.** Same architecture as agentmemory; no published number. |
421
+ | **Retrieval architecture** | Vector + graph (Mem0g variant) | OS-inspired tiered memory (core/recall/archival) | BM25 + local vectors + KG fused via RRF (k=60) | BM25 + RRF + graph-traversal (fused via RRF k=60). Per-project + cross-project diversity caps. |
422
+ | **Install shape** | SDK integration; managed cloud or self-hosted | SDK integration; self-hosted server | Python library | **`npx knit-mcp setup` → MCP server, zero glue.** Works with Claude Code / Cursor / Codex / any MCP host. |
423
+ | **Workflow primitive** | None — pure memory | Agent-managed memory operations | None — pure retrieval | **4-tier classifier + plan-mode + protocol guard + parallel team worktrees.** |
424
+ | **Self-calibration** | No | No | No | **Per-project classifier calibration** (v0.11): user FP feedback shifts thresholds; classifier gets less wrong over time. |
425
+
426
+ ### What's honest about this
427
+
428
+ **Knit's measured retrieval on a 50-question synthetic harness (v0.11.2):**
429
+
430
+ | Metric | Knit (v0.11.2 synthetic) | agentmemory (LongMemEval-S, published) |
431
+ |---|---|---|
432
+ | Top-1 accuracy | **86.0%** | not published in that form |
433
+ | Recall@5 | **96.0%** | **95.2%** |
434
+
435
+ Run it yourself: `npm run bench`. Source: [`benchmarks/retrieval-synthetic.ts`](./benchmarks/retrieval-synthetic.ts).
436
+
437
+ **These numbers are NOT apples-to-apples with agentmemory's.** Their benchmark is 1,500 questions from real long conversations; Knit's is 50 hand-authored questions on a 7KB synthetic corpus. The numbers are close because the architecture is similar (BM25 + RRF), not because we've proven parity at scale. **Real comparison requires running LongMemEval-S on Knit** — on the roadmap for v0.13.
438
+
439
+ **Knit isn't trying to be a better mem0.** It's a different product:
440
+ - **MCP-native + zero-glue install** — mem0/Letta require SDK integration; Knit drops into any MCP host (Claude Code, Cursor, Codex) with one command.
441
+ - **Workflow primitive** — the 4-tier classifier + plan-mode + protocol guard + team worktrees is what makes Knit a *command layer*, not a memory library.
442
+ - **Per-project classifier calibration** (v0.11 slice 4) — `knit_record_false_positive` with a direction tag shifts thresholds over time. Nobody else does this; nobody else needs to, because they're memory libraries, not workflow routers.
443
+ - **Measurable cheapness** — `knit_compounding_metrics` + `knit_get_metrics_history` make the "cheaper over time" claim *chartable per project*. mem0 publishes aggregate dataset numbers; Knit ships per-user instrumentation.
444
+
445
+ ### What's deferred
446
+
447
+ LongMemEval-S R@5/R@10 + LOCOMO LLM-as-Judge runs are on the roadmap (v0.13+). Until they're published, treat any cross-system token-savings comparison as architectural-claim-only.
448
+
449
+ ---
450
+
413
451
  ## 📜 Release history
414
452
 
415
453
  | Version | Headline |
416
454
  |---|---|
455
+ | **v0.11.4** | Dogfood audit · ran a full audit of Knit's own codebase using its own `knit_spawn_team_worktree` primitive (4 parallel teams: Core Logic, Infrastructure, UI, Quality Assurance). Fixes: HIGH `engram refresh` no longer clobbers user-curated CLAUDE.md (now uses `spliceKnitBlock` like `cache.ts`); `saveSource`/`loadSource` validate `sourceId`; `appendGlobalLearning` propagates write failures; `redactSecrets` applied to `label`/`tags`/`domains` across all persistence boundaries; 100KB response ceiling on `knit_generate_test_cases`; full v0.11 tool surface now documented in `workflow-protocol.ts` generator (was frozen at the v0.4 surface). Plus: 16 key tools reclassified with `[PROTOCOL]`/`[REVIEW]`/`[MEMORY]`/`[GRAPH]` prefixes so the LLM picks the right tool reliably. 53 tools, 687 tests. |
456
+ | **v0.11.3** | Propagation patch · `update_available` flag now surfaces in `knit_load_session` response (≈100% session reach vs. brain_status' low reach) + startup stderr nag on stale versions. Helps FUTURE upgrades land faster; doesn't retroactively reach v0.10.x users. 53 tools, 665 tests. |
457
+ | **v0.11.2** | Pre-publish polish · chunk cap (2000) + `errorResponse` envelope across handlers + CLAUDE.md generator surfaces v0.11 tools · new `engram doctor` install health-check CLI · upgrade-path smoke test caught + fixed a data-loss bug in cache.ts (Case B was wiping user permissions on upgrade) · 11 real exploit-payload integration tests prove C1/C2/H1 fixes hold · `npm run bench` ships a synthetic retrieval harness (50 Q&A) measuring 86% top-1 / 96% R@5. 53 tools, 664 tests. |
458
+ | **v0.11.1** | Audit-driven hardening · 3 CRITICAL (source_id path traversal, post-edit tsc shell injection, live calibration bug) + 10 HIGH fixes from a 5-agent audit, implemented in 3 parallel `knit_spawn_team_worktree` teams. HOOKS_VERSION 11 (auto-upgrades existing users). New `knit_delete_requirements` tool. Honest comparison vs mem0/Letta added. 53 tools, 636 tests. |
459
+ | **v0.11.0** | Verify Layer + auto-config foundation · mandatory `knit_verify_claim` REVIEW gate · post-edit diff verify + universal `tsc` check · drift detector · self-healing classifier (per-project calibration) · `knit_index_requirements` + `knit_generate_test_cases` (BM25 over long specs) · `knit_get_fingerprint` + `knit_infer_domains` + `knit_compose_template` (zero-config CLAUDE.md). 52 tools, 625 tests. |
417
460
  | **v0.10.0** | Token-economics release · risk × scope × change_kind classifier split · `context_budget_remaining` graceful degradation · per-project diversity cap on cross-project search · 11 new compounding-metrics fields + weekly snapshot persistence + `knit_get_metrics_history`. Makes "Knit makes Claude cheaper" a chartable number from day 1. |
418
461
  | **v0.9.0** | Hook-level enforcement · citation rule · `knit_verify_claim` · auto-search in classify · `suggested_reads` · `knit_get_learning` · `knit_consolidate_learnings`. |
419
462
  | **v0.8.x** | Vectorless RAG (BM25 + RRF) · graph-traversal retriever · per-project instruction tailoring · `knit_compounding_metrics` · integration scanner. |
@@ -0,0 +1,20 @@
1
+ import {
2
+ detectProjectRoot,
3
+ getBrain,
4
+ refreshBrain
5
+ } from "./chunk-I63UMEBF.js";
6
+ import "./chunk-HROSQ5MS.js";
7
+ import "./chunk-GATMQQK5.js";
8
+ import "./chunk-WKQHCLLO.js";
9
+ import "./chunk-MOOVNMIN.js";
10
+ import "./chunk-ST4X7LZT.js";
11
+ import "./chunk-M3YZOJNW.js";
12
+ import "./chunk-POXT5OYN.js";
13
+ import "./chunk-VB2TIR6L.js";
14
+ import "./chunk-7UFS67HP.js";
15
+ import "./chunk-27TA2ZQZ.js";
16
+ export {
17
+ detectProjectRoot,
18
+ getBrain,
19
+ refreshBrain
20
+ };
@@ -93,6 +93,18 @@ function searchMarkerPath(rootPath) {
93
93
  function claimMarkerPath(rootPath) {
94
94
  return join2(projectDataDir(rootPath), ".claim-verified-current");
95
95
  }
96
+ function turnEditLogPath(rootPath) {
97
+ return join2(projectDataDir(rootPath), ".turn-edits.jsonl");
98
+ }
99
+ function calibrationPath(rootPath) {
100
+ return join2(projectDataDir(rootPath), "calibration.json");
101
+ }
102
+ function requirementsDir(rootPath) {
103
+ return join2(projectDataDir(rootPath), "requirements");
104
+ }
105
+ function requirementSourcePath(rootPath, sourceId) {
106
+ return join2(requirementsDir(rootPath), `${sourceId}.json`);
107
+ }
96
108
  function featuresConfigPath(rootPath) {
97
109
  return join2(projectDataDir(rootPath), "features.json");
98
110
  }
@@ -150,6 +162,10 @@ export {
150
162
  sessionMarkerPath,
151
163
  searchMarkerPath,
152
164
  claimMarkerPath,
165
+ turnEditLogPath,
166
+ calibrationPath,
167
+ requirementsDir,
168
+ requirementSourcePath,
153
169
  featuresConfigPath,
154
170
  integrationsConfigPath,
155
171
  metricsHistoryPath,
@@ -53,7 +53,15 @@ function generateSessionStartup() {
53
53
 
54
54
  First action: call \`knit_load_session\`. One MCP call returns last sessions, handoff, learnings, false positives. If \`handoff.md\` exists at the repo root, resume that work first.
55
55
 
56
- Protocol Guard runs in \`warn\` mode by default \u2014 adjust with \`knit_set_protocol_strictness\`.`;
56
+ Protocol Guard runs in \`warn\` mode by default \u2014 adjust with \`knit_set_protocol_strictness\`.
57
+
58
+ ## v0.11 tool surface (in addition to query/search/record)
59
+
60
+ - **\`knit_verify_claim\`** \u2014 fact-check one claim against the knowledge graph before LEARN. Stop-hook enforces on standard/complex scope.
61
+ - **\`knit_index_requirements\` + \`knit_generate_test_cases\` + \`knit_list_requirements\` + \`knit_delete_requirements\`** \u2014 long-form spec / RFC ingestion (200KB doc \u2192 relevant 5\u20137KB chunks per feature query).
62
+ - **\`knit_get_fingerprint\` + \`knit_infer_domains\` + \`knit_compose_template\`** \u2014 auto-config primitives: detected stack \u2192 ranked domains \u2192 composed CLAUDE.md sections (preview only; you paste to accept).
63
+ - **\`knit_get_calibration\` + tag your false-positives** (e.g. \`#complex-was-trivial\`) \u2014 the per-project self-healing classifier tunes thresholds after 3 same-direction FPs.
64
+ - **\`knit_brain_status\`** surfaces calibration / requirements / fingerprint state so you can discover all of the above from one health check.`;
57
65
  }
58
66
  function generateProjectMap(knowledge) {
59
67
  const { summary } = knowledge;
@@ -5,13 +5,13 @@ import {
5
5
  isBundledCore,
6
6
  knownAgents,
7
7
  rawAgentUrl
8
- } from "./chunk-7PPC6IG6.js";
8
+ } from "./chunk-ST4X7LZT.js";
9
9
  import {
10
10
  agentsCacheFile,
11
11
  projectAgentFile,
12
12
  projectAgentsDir,
13
13
  sessionsJsonlPath
14
- } from "./chunk-BW4JUY74.js";
14
+ } from "./chunk-27TA2ZQZ.js";
15
15
 
16
16
  // src/engine/install-agents.ts
17
17
  import { existsSync as existsSync2, mkdirSync as mkdirSync2, readFileSync as readFileSync2, writeFileSync as writeFileSync2 } from "fs";
@@ -62,7 +62,7 @@ async function fetchAgent(name, opts = {}) {
62
62
  throw new AgentFetchError(`Unknown agent: "${name}". Not in engram's registry.`);
63
63
  }
64
64
  const cachePath = agentsCacheFile(ref, cat, bare);
65
- if (existsSync(cachePath)) {
65
+ if (!opts.refresh && existsSync(cachePath)) {
66
66
  return readFileSync(cachePath, "utf-8");
67
67
  }
68
68
  if (process.env.KNIT_OFFLINE === "1" || process.env.ENGRAM_OFFLINE === "1") {
@@ -250,7 +250,7 @@ async function installAgentsForProject(rootPath, config, knowledge, knowledgeBas
250
250
  }
251
251
  }
252
252
  try {
253
- const baseMd = await fetchAgent(name, opts.refresh ? { ref: void 0 } : {});
253
+ const baseMd = await fetchAgent(name, opts.refresh ? { ref: void 0, refresh: true } : {});
254
254
  const relevant = knowledgeBase ? selectRelevantLearnings(knowledgeBase.entries, name) : [];
255
255
  const personalized = personalizeAgent(baseMd, {
256
256
  config,
@@ -285,55 +285,6 @@ function agentsNeededByProject(config) {
285
285
  return Array.from(names);
286
286
  }
287
287
 
288
- // src/mcp/update-check.ts
289
- var REGISTRY_DIST_TAGS_URL = "https://registry.npmjs.org/-/package/knit-mcp/dist-tags";
290
- var FETCH_TIMEOUT_MS = 2e3;
291
- var CACHE_TTL_MS = 60 * 60 * 1e3;
292
- var cachedLatest = null;
293
- var lastCheckedAt = 0;
294
- var inFlight = null;
295
- function getCachedLatestVersion() {
296
- if (Date.now() - lastCheckedAt > CACHE_TTL_MS) {
297
- prewarmLatestVersion();
298
- }
299
- return cachedLatest;
300
- }
301
- function prewarmLatestVersion() {
302
- if (inFlight) return;
303
- if (Date.now() - lastCheckedAt < CACHE_TTL_MS && cachedLatest !== null) return;
304
- inFlight = doFetch().finally(() => {
305
- inFlight = null;
306
- });
307
- }
308
- async function doFetch() {
309
- const controller = new AbortController();
310
- const timeout = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
311
- try {
312
- const res = await fetch(REGISTRY_DIST_TAGS_URL, { signal: controller.signal });
313
- if (!res.ok) return;
314
- const data = await res.json();
315
- if (typeof data.latest === "string" && data.latest.length > 0) {
316
- cachedLatest = data.latest;
317
- lastCheckedAt = Date.now();
318
- }
319
- } catch {
320
- } finally {
321
- clearTimeout(timeout);
322
- }
323
- }
324
- function isNewerVersion(latest, current) {
325
- const parse = (v) => {
326
- const stripped = v.replace(/[-+].*$/, "");
327
- const parts = stripped.split(".").map((n) => parseInt(n, 10) || 0);
328
- return [parts[0] ?? 0, parts[1] ?? 0, parts[2] ?? 0];
329
- };
330
- const [a1, a2, a3] = parse(latest);
331
- const [b1, b2, b3] = parse(current);
332
- if (a1 !== b1) return a1 > b1;
333
- if (a2 !== b2) return a2 > b2;
334
- return a3 > b3;
335
- }
336
-
337
288
  // src/engine/sessions.ts
338
289
  import { existsSync as existsSync3, mkdirSync as mkdirSync3, appendFileSync, readFileSync as readFileSync3, statSync, writeFileSync as writeFileSync3, renameSync } from "fs";
339
290
  import { dirname as dirname2 } from "path";
@@ -466,9 +417,6 @@ function parseLine(line) {
466
417
 
467
418
  export {
468
419
  installAgentsForProject,
469
- getCachedLatestVersion,
470
- prewarmLatestVersion,
471
- isNewerVersion,
472
420
  appendSession,
473
421
  searchSessions,
474
422
  getRecentSessions,