prism-mcp-server 7.3.1 → 7.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -16,7 +16,7 @@ One command. Persistent memory. Local-first by default. Optional cloud power-ups
16
16
  npx -y prism-mcp-server
17
17
  ```
18
18
 
19
- Works with **Claude Desktop · Claude Code · Cursor · Windsurf · Cline · Gemini · Antigravity** — any MCP client.
19
+ Works with **Claude Desktop · Claude Code · Cursor · Windsurf · Cline · Gemini · Antigravity** — **any MCP client.**
20
20
 
21
21
  ## 📖 Table of Contents
22
22
 
@@ -28,8 +28,6 @@ Works with **Claude Desktop · Claude Code · Cursor · Windsurf · Cline · Gem
28
28
  - [What Makes Prism Different](#-what-makes-prism-different)
29
29
  - [Use Cases](#-use-cases)
30
30
  - [What's New](#-whats-new)
31
- - [v7.3.1 Dark Factory (Fail-Closed Execution)](#v731--dark-factory-fail-closed-execution-)
32
- - [v7.2.0 The "Executive Function" Update (Planned)](#v720--the-executive-function-update-)
33
31
  - [How Prism Compares](#-how-prism-compares)
34
32
  - [Tool Reference](#-tool-reference)
35
33
  - [Environment Variables](#environment-variables)
@@ -68,7 +66,7 @@ Add to your MCP client config (`claude_desktop_config.json`, `.cursor/mcp.json`,
68
66
  }
69
67
  ```
70
68
 
71
- > **Note on Windows/Restricted Shells:** If your MCP client complains that `npx` is not found, use the absolute path to your node binary (e.g. `C:\Program Files\nodejs\npx.cmd`) or install globally with caution.
69
+ > ⚠️ **Windows / Restricted Shells:** If your MCP client complains that `npx` is not found, use the absolute path to your node binary (e.g. `C:\Program Files\nodejs\npx.cmd`).
72
70
 
73
71
  **That's it.** Restart your client. All tools are available. The **Mind Palace Dashboard** (the visual UI for your agent's brain) starts automatically at `http://localhost:3000`. You don't need to keep a tab open — the dashboard runs in the background and the MCP tools work with or without it.
74
72
 
@@ -107,6 +105,7 @@ Then open `http://localhost:3001` instead.
107
105
  | Auto-compaction | ❌ | ✅ `GOOGLE_API_KEY` |
108
106
  | Web Scholar research | ❌ | ✅ [`BRAVE_API_KEY`](#environment-variables) + [`FIRECRAWL_API_KEY`](#environment-variables) (or `TAVILY_API_KEY`) |
109
107
  | VLM image captioning | ❌ | ✅ Provider key |
108
+ | Autonomous Pipelines (Dark Factory) | ❌ | ✅ `GOOGLE_API_KEY` (or LLM override) |
110
109
 
111
110
  > 🔑 The core Mind Palace works **100% offline** with zero API keys. Cloud keys unlock intelligence features. See [Environment Variables](#environment-variables).
112
111
 
@@ -402,25 +401,21 @@ Prism researches while you sleep. A background pipeline searches the web, scrape
402
401
  ### 🔒 GDPR Compliant
403
402
  Soft/hard delete (Art. 17), full export in JSON, Markdown, or Obsidian vault `.zip` (Art. 20), API key redaction, per-project TTL retention, and audit trail. Enterprise-ready out of the box.
404
403
 
404
+ ### 🏭 Dark Factory — Adversarial Autonomous Pipelines
405
+ When you trigger a Dark Factory pipeline, Prism doesn't just run your task — it fights itself to produce high-quality output. A `PLAN_CONTRACT` step locks a machine-parseable rubric before any code is written. After execution, an **Adversarial Evaluator** (in a fully isolated context) scores the output against the rubric. It cannot pass the Generator without providing exact file and line evidence for every failing criterion. Failed evaluations inject the critique directly into the Generator's retry prompt so it's never flying blind. The result: security issues, regressions, and lazy debug logs caught autonomously — before you ever see the PR.
406
+
405
407
  ---
406
408
 
407
409
  ## 🎯 Use Cases
408
410
 
409
- **Long-running feature work** — Save state at end of day, restore full context next morning. No re-explaining.
410
-
411
- **Multi-agent collaboration** — Dev, QA, and PM agents share real-time context without stepping on each other's memory.
412
-
413
- **Consulting / multi-project** — Switch between client projects with progressive loading: `quick` (~50 tokens), `standard` (~200), or `deep` (~1000+).
414
-
415
- **Complex refactoring (v7.2 planned)** — Prism’s roadmap adds plan-first execution for multi-step changes with persistent plan-state tracking across sessions.
416
-
417
- **Team onboarding** — New team member's agent loads the full project history instantly.
418
-
419
- **Behavior enforcement** — Agent corrections auto-graduate into permanent `.cursorrules` / `.clauderules` rules.
420
-
421
- **Offline / air-gapped** — Full SQLite local mode + Ollama LLM adapter. Zero internet dependency.
422
-
423
- **Morning Briefings** — After 4+ hours away, Prism auto-synthesizes a 3-bullet action plan from your last sessions.
411
+ - **Long-running feature work** — Save state at end of day, restore full context next morning. No re-explaining.
412
+ - **Multi-agent collaboration** — Dev, QA, and PM agents share real-time context without stepping on each other's memory.
413
+ - **Consulting / multi-project** Switch between client projects with progressive loading: `quick` (~50 tokens), `standard` (~200), or `deep` (~1000+).
414
+ - **Autonomous execution (v7.4)** — Dark Factory pipeline: `plan → plan_contract → execute → evaluate → verify → finalize`. Generator and evaluator run in isolated roles — the evaluator cannot approve without evidence-bound findings scored against a pre-committed rubric.
415
+ - **Team onboarding** — New team member's agent loads the full project history instantly.
416
+ - **Behavior enforcement** — Agent corrections auto-graduate into permanent `.cursorrules` / `.clauderules` rules.
417
+ - **Offline / air-gapped** — Full SQLite local mode + Ollama LLM adapter. Zero internet dependency.
418
+ - **Morning Briefings** — After 4+ hours away, Prism auto-synthesizes a 3-bullet action plan from your last sessions.
424
419
 
425
420
  ### Claude Code: Parallel Explore Agent Workflows
426
421
 
@@ -439,210 +434,127 @@ Then continue a specific thread with a follow-up message to the selected agent,
439
434
 
440
435
  ---
441
436
 
442
- ## 🆕 What's New
437
+ ---
443
438
 
444
- ### v7.3.1 Dark Factory (Fail-Closed Execution) 🏭
445
- > **Current stable release.** Hardened autonomous pipeline execution with a structured JSON action contract.
439
+ ## ⚔️ Adversarial Evaluation in Action
446
440
 
447
- When an AI agent executes code autonomously no human watching, no approval step — a single hallucinated file path can write outside your project, corrupt sibling repos, or hit system files. This is the "dark factory" problem: **lights-out execution demands machine-enforced safety, not LLM good behavior.**
441
+ > **Split-Brain Anti-Sycophancy**the signature feature of v7.4.0.
448
442
 
449
- > *"I started building testing harnesses with programmatic checks in the planning phase across 3 layers. I got this idea when I was doing a complex ETL process across 3 databases and I needed to stack 9's on data accuracy, but also across the agent layer. After a considerable amount of hair pulling, I started to front load. It's now part of my lifecycle harness that my dark factory uses by default."*
450
- > — [Stephen Driggs](https://linkedin.com/in/stephendriggs), VP Product AI at Shift4
443
+ For the last year, the AI engineering space has struggled with one problem: **LLMs are terrible at grading their own homework.** Ask an agent if its own code is correct and you'll get *"Looks great!"* because its context window is already biased by its own chain-of-thought.
451
444
 
452
- Prism v7.3.1 implements exactly this: a **3-gate fail-closed pipeline** where every `EXECUTE` step must pass parse, type, and scope validation before any filesystem side effect occurs.
445
+ **v7.4.0 solves this by splitting the agent's brain.** The `GENERATOR` and the `ADVERSARIAL EVALUATOR` are completely walled off. The Evaluator never sees the Generator's scratchpad or apologies — only the pre-committed rubric and the final output. And it **cannot fail the Generator without receipts** (exact file and line number).
453
446
 
454
- - 🔒 **Structured Action Contract** — `EXECUTE` steps must return machine-parseable JSON conforming to `{ actions: [{ type, targetPath, content? }] }`. Free-form text is rejected at the gate.
455
- - 🛡️ **3-Strategy Defensive Parser** — Raw JSON → fenced code block extraction → brace extraction. Handles adversarial LLM output (preamble text, markdown fences, trailing commentary) without ever executing malformed payloads.
456
- - ✅ **Type Validation** — Only `READ_FILE | WRITE_FILE | PATCH_FILE | RUN_TEST` are permitted. Novel action types invented by the LLM are rejected.
457
- - 📏 **Scope Validation** — Every `targetPath` is resolved against the pipeline's `workingDirectory` via `SafetyController.validateActionsInScope()`. Path traversal (`../`), sibling-prefix bypasses, and absolute paths outside the boundary are blocked.
458
- - 🚫 **Pipeline-Level Termination** — A scope violation doesn't just fail the step — it **terminates the entire pipeline** with `status: FAILED` and emits a `failure` experience event for the ML routing layer.
447
+ Here is a complete run-through using a real scenario: *"Add a user login endpoint to `auth.ts`."*
459
448
 
460
- <details>
461
- <summary><strong>🔬 The 3-Gate Architecture: How a Path Traversal Attack Fails</strong></summary>
449
+ ---
450
+
451
+ ### Step 1 — The Contract (`PLAN_CONTRACT`)
462
452
 
463
- **Scenario:** An LLM running autonomously in a Dark Factory pipeline targeting `/home/user/my-app` produces this output for an EXECUTE step:
453
+ Before a single line of code is written, the pipeline generates a locked scoring rubric:
464
454
 
465
455
  ```json
456
+ // contract_rubric.json (written to disk and hash-locked before EXECUTE runs)
466
457
  {
467
- "actions": [
468
- { "type": "WRITE_FILE", "targetPath": "src/utils.ts", "content": "// valid" },
469
- { "type": "WRITE_FILE", "targetPath": "../../.ssh/authorized_keys", "content": "ssh-rsa ATTACK..." }
458
+ "criteria": [
459
+ { "id": "SEC-1", "description": "Must return 401 Unauthorized on invalid passwords." },
460
+ { "id": "SEC-2", "description": "Raw passwords MUST NOT be written to console.log." }
470
461
  ]
471
462
  }
472
463
  ```
473
464
 
474
- **Gate 1 — Parse:** ✅ Valid JSON. The 3-strategy parser extracts it cleanly.
475
-
476
- **Gate 2 — Type:** ✅ Both actions use `WRITE_FILE`, a valid `ActionType`.
465
+ ---
477
466
 
478
- **Gate 3Scope:** 🚫 **BLOCKED.** `SafetyController.validateActionsInScope()` resolves `../../.ssh/authorized_keys` to `/home/user/.ssh/authorized_keys` — which is **outside** `/home/user/my-app`. The violation is detected *before* any write occurs.
467
+ ### Step 2 First Attempt (`EXECUTE` rev 0)
479
468
 
480
- **Result:** The entire pipeline is terminated immediately. No files are written — not even the valid `src/utils.ts`. A `failure` experience event is emitted so the ML router can learn to avoid this pattern.
469
+ The **Generator** takes over in an isolated context. Like many LLMs under time pressure, it writes working auth logic but leaves a debug statement:
481
470
 
471
+ ```typescript
472
+ // src/auth.ts (Generator's first output)
473
+ export function login(req: Request, res: Response) {
474
+ const { username, password } = req.body;
475
+ console.log(`[DEBUG] Login attempt for ${username} with pass: ${password}`); // ← leaked credential
476
+ const user = db.findUser(username);
477
+ if (!user || !bcrypt.compareSync(password, user.hash)) {
478
+ return res.status(401).json({ error: 'Unauthorized' });
479
+ }
480
+ res.json({ token: signJwt(user) });
481
+ }
482
482
  ```
483
- Pipeline FAILED: Scope violation — action[1] targetPath
484
- "../../.ssh/authorized_keys" resolves outside workingDirectory
485
- ```
486
-
487
- **Without v7.3.1:** The LLM's raw text output would be interpreted as instructions, and the agent runtime would attempt the write — potentially succeeding depending on filesystem permissions.
488
-
489
- **With v7.3.1:** The structured contract makes this class of attack impossible. The LLM never touches the filesystem directly; every action is validated through the 3-gate pipeline first.
490
-
491
- </details>
492
-
493
- <details>
494
- <summary><strong>🧪 Edge Cases Covered (67 tests)</strong></summary>
495
-
496
- | Category | Examples |
497
- |----------|----------|
498
- | **Parse adversarial output** | Prose preamble + JSON, nested fences, empty input, non-string input |
499
- | **Type coercion** | `"DELETE_FILE"`, `"EXEC_CMD"`, numeric types, null types |
500
- | **Path traversal** | `../`, `../../`, `/etc/passwd`, null bytes, unicode normalization, embedded newlines |
501
- | **Shape validation** | Missing `actions` array, non-object actions, empty `targetPath`, root-type coercion |
502
- | **Stress payloads** | 100-action arrays, 100KB content strings, 500-segment deep paths |
503
-
504
- </details>
505
-
506
- ### v7.2.0 — The "Executive Function" Update 🔭
507
- > **Planned roadmap release.** Extends Prism from persistent memory toward autonomous plan execution.
508
-
509
- - 🗺️ **Autonomous Plan Decomposition (planned)** — Proposed `session_plan_decompose` tool to transform ambiguous multi-step goals into a structured task DAG.
510
- - 🔄 **Self-Healing Execution Loop (planned)** — Proposed plan-state engine to capture failed steps, suggest corrective actions, and re-queue recoverable sub-tasks before escalation.
511
- - 📉 **DAG Plan Visualizer (planned)** — Proposed dashboard Plan/Goal Monitor to render step progress, dependency state, and execution pivots in real time.
512
- - 🧠 **Context-Aware Goal Tracking (planned)** — Proposed active-plan injection during context loading so agents track not only prior work but current plan position.
513
- - ⚙️ **Recursive Tool Chaining (planned)** — Proposed middleware path for lower-latency plan-step updates across complex workflows.
514
- - 🧪 **Plan Integrity Tests (planned)** — Proposed suite validating plan-state persistence across interruptions and session handoffs.
515
-
516
- <details>
517
- <summary><strong>🔬 Concept Example: Before vs. After v7.2</strong></summary>
518
-
519
- **Scenario:** "Refactor the Auth module and update the unit tests."
520
-
521
- **Before (linear prompting):** The agent executes in sequence but can lose place after errors unless the host prompt restates state.
522
-
523
- **After (executive planning):** Agent decomposes to a DAG, executes per-step, recovers from failures via plan-state retries, and resumes from the correct dependency node.
524
483
 
525
- </details>
526
-
527
- ### v7.1.0 — Prism Task Router (Heuristic + ML Experience) ✅
528
- > **Current stable release.** Multi-agent task routing with dynamic local vs host model delegation.
529
-
530
- - 🚦 **Heuristic Routing Engine** — Deterministic `session_task_route` tool dynamically routes tasks to either the host cloud model or local agent (Claw) based on task description, file count, and scope. Evaluated over 5 core signals.
531
- - 🤖 **Experience-Based ML Routing** — Cold-start protected ML layer leverages the historical performance (Win Rate) extracted by the `routerExperience` system to apply dynamic confidence boosts or penalties into the routing score.
532
- - 🧪 **Live Testing Samples** — Demo script added in [`examples/router_real_life_test.ts`](examples/router_real_life_test.ts) for deterministic `computeRoute()` scenarios (simple vs complex tasks), with a note that experience-adjusted routing is applied in `session_task_route` handler path.
533
- - 🖥️ **Dashboard Integration** — Added visual monitor and configuration toggles directly in `src/dashboard/ui.ts` under Node Editor settings.
534
- - 🧩 **Tool Discoverability** — Fully integrates `session_task_route` into the external registry.
535
-
536
- ### v7.0.0 — ACT-R Activation Memory ✅
537
- > **Previous stable release.** Memory retrieval now uses a scientifically-grounded cognitive model.
538
-
539
- - 🧠 **ACT-R Base-Level Activation** — `B_i = ln(Σ t_j^(-d))` computes recency × frequency activation per memory. Recent, frequently-accessed memories surface first; cold memories fade to near-zero. Based on Anderson's *Adaptive Control of Thought—Rational* (ACM, 2025).
540
- - 🔗 **Candidate-Scoped Spreading Activation** — `S_i = Σ(W × strength)` for links within the current search result set only. Prevents "God node" centrality from dominating rankings (Rule #5).
541
- - 📐 **Parameterized Sigmoid Normalization** — Calibrated `σ(x) = 1/(1 + e^(-k(x - x₀)))` with midpoint at -2.0 maps the natural ACT-R activation range (-10 to +5) into discriminating (0, 1) scores.
542
- - 🏗️ **Composite Retrieval Scoring** — `Score = 0.7 × similarity + 0.3 × σ(activation)` — similarity dominates, activation re-ranks. Fully configurable weights via `PRISM_ACTR_WEIGHT_*` env vars.
543
- - ⚡ **AccessLogBuffer** — In-memory write buffer with 5-second batch flush prevents SQLite `SQLITE_BUSY` contention under parallel agent tool calls. Deduplicates within flush windows.
544
- - 🗂️ **Access Log Infrastructure** — New `memory_access_log` table with `logAccess()`, `getAccessLog()`, `pruneAccessLog()` across both SQLite and Supabase backends. Creation seeds initial access (zero cold-start penalty).
545
- - 🧹 **Background Access Log Pruning** — Scheduler automatically prunes access logs exceeding retention window (default: 90 days). Configurable via `PRISM_ACTR_ACCESS_LOG_RETENTION_DAYS`.
546
- - 🧪 **49-Test ACT-R Suite** — Pure-function unit tests covering base-level activation, spreading activation, sigmoid normalization, composite scoring, AccessLogBuffer lifecycle, deduplication, chunking, and edge cases.
547
- - 📊 **705 Tests** — 32 suites, all passing, zero regressions.
548
-
549
- <details>
550
- <summary><strong>🔬 Live Example: v6.5 vs v7.0 Retrieval Behavior</strong></summary>
484
+ ---
551
485
 
552
- Consider an agent searching for "OAuth migration" with 3 memories in the result set:
486
+ ### Step 3 The Catch (`EVALUATE` rev 0)
553
487
 
554
- | Memory | Cosine Similarity | Last Accessed | Access Count (30d) |
555
- |--------|:-:|:-:|:-:|
556
- | A: "PKCE flow decision" | 0.82 | 2 hours ago | 12× |
557
- | B: "OAuth library comparison" | 0.85 | 14 days ago | 2× |
558
- | C: "Auth middleware refactor" | 0.81 | 30 minutes ago | 8× |
488
+ The context window is **cleared**. The **Adversarial Evaluator** is summoned with only the rubric and the output. It catches the violation immediately and returns a strict, machine-parseable verdict — no evidence, no pass:
559
489
 
560
- **v6.5 (pure similarity):** B > A > C — the stale library comparison wins because it has the highest cosine score, even though the agent hasn't looked at it in two weeks.
490
+ ```json
491
+ {
492
+ "pass": false,
493
+ "plan_viable": true,
494
+ "notes": "CRITICAL SECURITY FAILURE. Generator logged raw credentials.",
495
+ "findings": [
496
+ {
497
+ "severity": "critical",
498
+ "criterion_id": "SEC-2",
499
+ "pass_fail": false,
500
+ "evidence": {
501
+ "file": "src/auth.ts",
502
+ "line": 3,
503
+ "description": "Raw password variable included in console.log template string."
504
+ }
505
+ }
506
+ ]
507
+ }
508
+ ```
561
509
 
562
- **v7.0 (ACT-R re-ranking):**
510
+ The `evidence` block is **required** — `parseEvaluationOutput` rejects any finding with `pass_fail: false` that lacks a structured file/line pointer. The Evaluator cannot bluff.
563
511
 
564
- | Memory | Similarity (0.7×) | ACT-R σ(B+S) (0.3×) | **Composite** |
565
- |--------|:-:|:-:|:-:|
566
- | A | 0.574 | 0.3 × 0.94 = 0.282 | **0.856** |
567
- | C | 0.567 | 0.3 × 0.91 = 0.273 | **0.840** |
568
- | B | 0.595 | 0.3 × 0.12 = 0.036 | **0.631** |
512
+ ---
569
513
 
570
- **Result:** The actively-used PKCE decision (A) and the just-touched middleware (C) surface above the stale comparison (B). The agent gets the context it's *actually working with*, not just the closest embedding.
514
+ ### Step 4 The Fix (`EXECUTE` rev 1)
571
515
 
572
- </details>
516
+ Because `plan_viable: true`, the pipeline loops back to `EXECUTE` and bumps `eval_revisions` to `1`. The Generator's **retry prompt is not blank** — the Evaluator's critique is injected directly:
573
517
 
574
- ### v6.5.3 — Auth Hardening ✅
575
- - 🔒 **Rate Limiting** — Login endpoint (`POST /api/auth/login`) protected by sliding-window rate limiter (5 attempts per 60s per IP). Resets on success.
576
- - 🔒 **CORS Hardening** Dynamic `Origin` echo with `Allow-Credentials` when auth enabled (replaces wildcard `*`).
577
- - 🚪 **Logout Endpoint** — `POST /api/auth/logout` invalidates session server-side and clears client cookie.
578
- - 🧪 **42-Test Auth Suite** Unit + HTTP integration tests covering `safeCompare`, `generateToken`, `isAuthenticated`, `createRateLimiter`, login/logout lifecycle, rate limiting, and CORS.
579
- - 🏗️ **Auth Module Extraction** — Decoupled auth logic from `server.ts` closures into testable `authUtils.ts`.
580
-
581
- ### v6.5.2 — SDM/HDC Test Hardening ✅
582
- - 🧪 **37 New Edge-Case Tests** — Hardened the cognitive routing pipeline (HDC engine, PolicyGateway, StateMachine, SDM engine) with boundary condition tests. 571 → 608 total tests.
518
+ ```
519
+ === EVALUATOR CRITIQUE (revision 1) ===
520
+ CRITICAL SECURITY FAILURE. Generator logged raw credentials.
521
+ Findings:
522
+ - [critical] Criterion SEC-2: Raw password variable included in console.log template string. (src/auth.ts:3)
583
523
 
584
- ### v6.5.1 Dashboard Project-Load Hotfix
585
- - 🩹 **Project Selector Recovery** — Fixed a startup path where the dashboard selector could stay stuck on "Loading projects..." when Supabase env vars were unresolved placeholders.
586
- - 🔄 **Safe Backend Fallback** — If Supabase is requested but env is invalid/unresolved, Prism now auto-falls back to local SQLite so `/api/projects` and dashboard boot remain operational.
524
+ You MUST correct all issues listed above before submitting.
525
+ ```
587
526
 
588
- ### v6.5 HDC Cognitive Routing
527
+ The Generator strips the `console.log`, resubmits, and the next `EVALUATE` returns `"pass": true`. The pipeline advances to `VERIFY → FINALIZE`.
589
528
 
590
- - 🧠 **Hyperdimensional Cognitive Routing** — New `session_cognitive_route` tool composes the agent's current state, role, and action into a single 768-dim binary hypervector via XOR binding, then resolves it to a semantic concept via Hamming distance. Three-outcome policy gateway: `direct` / `clarify` / `fallback`.
591
- - 🎛️ **Per-Project Threshold Overrides** — Fallback and clarify thresholds are configurable per-project and persisted via the existing `getSetting`/`setSetting` contract (no new migrations).
592
- - 🔬 **Explainability Mode** — When `explain: true`, responses include convergence steps, raw Hamming distance, and ambiguity flags for full auditability.
593
- - 📊 **Cognitive Observability** — `graphMetrics.ts` tracks route distribution (direct/clarify/fallback), rolling confidence/distance averages, ambiguity rates, and null-concept counts. Warning heuristics for fallback > 30% and ambiguity > 40%.
594
- - 🖥️ **Dashboard Integration** — Cognitive metrics card with route distribution bar, confidence gauges, and warning badges. On-demand "Cognitive Route" button in the Node Editor panel.
595
- - 🔒 **Feature Gating** — Entire pipeline gated behind `PRISM_HDC_ENABLED` (default: `true`). Clean error + zero telemetry when disabled.
529
+ ---
596
530
 
597
- <details>
598
- <summary><strong>v6.2 — The "Synthesize & Prune" Phase</strong></summary>
531
+ ### Why This Matters
599
532
 
600
- - 🕸️ **Edge Synthesis ("The Dream Procedure")** — Automated background linker discovers semantically similar but disconnected memory nodes via cosine similarity (≥ 0.7 threshold). Batch-limited to 50 sources × 3 neighbors. New `session_synthesize_edges` tool for on-demand graph enrichment.
601
- - ✂️ **Graph Pruning (Soft-Prune)** — Configurable strength-based pruning soft-deletes weak links. Includes per-project cooldown, backpressure guards, and sweep budget controls. Enable with `PRISM_GRAPH_PRUNING_ENABLED=true`.
602
- - 📊 **SLO Observability** New `graphMetrics.ts` module tracks synthesis success rate, net new links, prune ratio, and sweep duration. Exposes `slo` and `warnings` fields at `GET /api/graph/metrics` for proactive health monitoring.
603
- - 🗓️ **Temporal Decay Heatmaps** UI overlay toggle where un-accessed nodes desaturate while Graduated nodes stay vibrant. Makes the Ebbinghaus curve visceral.
604
- - 📝 **Active Recall ("Test Me")** Node editor panel generates synthetic quizzes from semantic neighbors for knowledge activation.
605
- - **Supabase Weak-Link RPC (WS4.1)** New `prism_summarize_weak_links` Postgres function (migration 036) aggregates pruning server-side, eliminating N+1 network roundtrips.
606
- - 🔒 **Migration 035** — Tenant-safe graph writes + soft-delete hardening for MemoryLinks.
533
+ | Property | What it means |
534
+ |----------|---------------|
535
+ | **Fully autonomous** | You didn't review the PR to catch the credential leak. The AI fought itself. |
536
+ | **Evidence-bound** | The Evaluator had to prove `src/auth.ts:3`. "Code looks bad" is not accepted. |
537
+ | **Cost-efficient** | `plan_viable: true` retry EXECUTE only. No full re-plan, no wasted tokens. |
538
+ | **Fail-closed on parse** | Malformed LLM output defaults `plan_viable: false` escalate to PLAN rather than burn revisions on a broken response format. |
607
539
 
608
- </details>
540
+ > 📄 **Full worked example:** [`examples/adversarial-eval-demo/README.md`](examples/adversarial-eval-demo/README.md)
609
541
 
610
- <details>
611
- <summary><strong>v6.1 — Prism-Port, Cognitive Load & Semantic Search</strong></summary>
542
+ ---
612
543
 
613
- - 📦 **Prism-Port Vault Export** — `.zip` of interlinked Markdown files with YAML frontmatter, `[[Wikilinks]]`, and `Keywords/` backlink indices for Obsidian/Logseq.
614
- - 🧠 **Smart Memory Merge UI** — Merge duplicate knowledge nodes from the Graph Editor.
615
- - ✨ **Semantic Search Highlighting** — RegEx-powered match engine wraps exact keyword matches in `<mark>` tags.
616
- - 📊 **Deep Purge Visualization** — "Memory Density" analytic for signal-to-noise ratio.
617
- - 🛡️ **Context-Boosted Search** — Biases semantic queries by current project workspace.
618
- - 🌐 **Tavily Web Scholar** — `@tavily/core` as alternative to Brave+Firecrawl.
619
- - 🛡️ **Type Guard Hardening** — Full audit of all 11+ MCP tool argument guards.
620
- - 🔄 **Dashboard Toggle Persistence** — Optimistic rollback on save failure.
544
+ ## 🆕 What's New
621
545
 
622
- </details>
623
546
 
624
- <details>
625
- <summary><strong>Earlier releases (v5.x and below)</strong></summary>
626
-
627
- #### v5.5 — Architectural Hardening
628
- - 🛡️ **Transactional Migrations** — SQLite DDL rebuilds are wrapped in explicit `BEGIN/COMMIT` blocks.
629
- - 🛑 **Graceful Shutdown Registry** — `BackgroundTaskRegistry` uses a 5-second `Promise.race()` to await flushes.
630
- - 🕰️ **Thundering Herd Prevention** — Maintenance scheduler migrated from `setInterval` to state-aware `setTimeout`.
631
- - 🚀 **Zero-Thrashing SDM Scans** — `Int32Array` scratchpad allocations hoisted outside the hot decode loop.
632
-
633
- #### v5.4 — Convergent Intelligence
634
- - 🔄 **CRDT Handoff Merging** — Multi-agent saves no longer reject on version conflict. Custom OR-Map engine auto-merges concurrent edits.
635
- - ⏰ **Background Purge Scheduler** — Fully automated storage maintenance TTL sweep, Ebbinghaus decay, auto-compaction.
636
- - 🌐 **Autonomous Web Scholar** — Agent-driven research pipeline. Brave Search → Firecrawl scrape → LLM synthesis.
637
- - **v5.3** — Hivemind Health Watchdog (state machine, loop detection, Telepathy alert injection)
638
- - **v5.2** — Cognitive Memory (Ebbinghaus decay, context-weighted retrieval), Universal History Migration, Smart Consolidation
639
- - **v5.1** — Knowledge Graph Editor, Deep Storage purge
640
- - **v5.0** — TurboQuant 10× embedding compression, three-tier search architecture
641
- - **v4.x** — OpenTelemetry, VLM multimodal memory, LLM adapters, Behavioral memory, Hivemind
547
+ > **Current release: v7.4.0**
642
548
 
643
- </details>
549
+ - ⚔️ **v7.4.0 — Adversarial Evaluation (Anti-Sycophancy):** The Dark Factory pipeline now separates generator and evaluator into isolated roles. `PLAN_CONTRACT` locks a machine-parseable rubric before any code runs. `EVALUATE` scores the output with evidence-bound findings (`file`, `line`, `description`). Failed evaluations retry with `plan_viable` routing — conservatively escalating to full PLAN re-planning on parse failures instead of burning revision budget.
550
+ - 🔧 **v7.3.3 — Dashboard Stability Hotfix:** Fixed a multi-layer quote-escaping trap in the `abortPipeline` onclick handler that silently killed the dashboard IIFE and froze the project selector at "Loading projects..." forever. Fixed via `data-id` attribute pattern + ES5 lint guard (`npm run lint:dashboard`).
551
+ - 🏭 **v7.3.1 — Dark Factory (Fail-Closed Execution):** The LLM can no longer touch the filesystem directly. Every autonomous `EXECUTE` step passes 3 gates — Parse → Type → Scope — before any side effect occurs. Scope violations terminate the entire pipeline.
552
+ - 📊 **v7.3.2 — Verification Diagnostics v2:** `verify status --json` now emits per-layer `diff_counts` + `changed_keys`. JSON schema is contract-enforced in CI (`schema_version: 1`).
553
+ - 🔭 **v7.2.0 — Verification Harness:** Spec-frozen contracts (`verification_harness.json` hash-locked before execution), multi-layer assertions across Data / Agent / Pipeline, and finalization gate policies (`warn` / `gate` / `abort`).
554
+ - 🚦 **v7.1.0 — Task Router:** Heuristic + ML-experience routing delegates cloud vs. local model in under 2ms, cold-start safe, per-project experience-corrected.
555
+ - 🧠 **v7.0.0 — ACT-R Activation Memory:** `B_i = ln(Σ t_j^{-d})` recency × frequency re-ranking. Stale memories fade naturally. Active context surfaces automatically.
644
556
 
645
- > [Full CHANGELOG →](CHANGELOG.md) · [Architecture Deep Dive →](docs/ARCHITECTURE.md)
557
+ 👉 **[Full release history CHANGELOG.md](CHANGELOG.md)** · [ROADMAP →](ROADMAP.md)
646
558
 
647
559
  ---
648
560
 
@@ -664,6 +576,7 @@ Standard memory servers (like Mem0, Zep, or the baseline Anthropic MCP) act as p
664
576
  | **Maintenance** | **Autonomous Background Scheduler** | Manual/API driven | Automated (Cloud) | ❌ Manual |
665
577
  | **Data Portability** | **Prism-Port (Obsidian/Logseq Vault)** | JSON Export | JSON Export | Raw `.db` file |
666
578
  | **Cost Model** | **Free + BYOM (Ollama)** | Per-API-call pricing | Per-API-call pricing | Free (limited) |
579
+ | **Autonomous Pipelines** | **✅ Dark Factory** — adversarial eval, evidence-bound rubric, fail-closed 3-gate execution | ❌ | ❌ | ❌ |
667
580
 
668
581
  ### 🏆 Where Prism Crushes the Giants
669
582
 
@@ -682,6 +595,9 @@ AI memory is a black box. Developers hate black boxes. Prism exports memory dire
682
595
  #### 5. Self-Cleaning & Self-Optimizing
683
596
  If you use a standard memory tool long enough, it clogs the LLM's context window with thousands of obsolete tokens. Prism runs an autonomous [Background Scheduler](src/backgroundScheduler.ts) that Ebbinghaus-decays older memories, auto-compacts session histories into dense summaries, and deep-purges high-precision vectors — saving ~90% of disk space automatically.
684
597
 
598
+ #### 6. Anti-Sycophancy — The AI That Grades Its Own Homework (v7.4)
599
+ Every other AI coding pipeline has a fatal flaw: it asks the same model that wrote the code whether the code is correct. **Of course it says yes.** Prism's Dark Factory solves this with a walled-off Adversarial Evaluator that is explicitly prompted to be hostile and strict. It operates on a pre-committed rubric and cannot fail the Generator without providing exact file/line receipts. Failed evaluations feed the critique back into the Generator's retry prompt — eliminating blind retries. No other memory or pipeline tool does this.
600
+
685
601
  ### 🤝 Where the Giants Currently Win (Honest Trade-offs)
686
602
 
687
603
  1. **Framework Integrations:** Mem0 and Zep have pre-built integrations for LangChain, LlamaIndex, Flowise, AutoGen, CrewAI, etc. Prism requires the host application to support the MCP protocol.
@@ -815,13 +731,13 @@ Requires `PRISM_DARK_FACTORY_ENABLED=true`.
815
731
  </details>
816
732
 
817
733
  <details>
818
- <summary><strong>Executive Planning (Planned for v7.2)</strong></summary>
734
+ <summary><strong>Verification Harness</strong></summary>
819
735
 
820
736
  | Tool | Purpose |
821
737
  |------|---------|
822
- | `session_plan_decompose` | Decompose natural language goals into a structured DAG of tasks |
823
- | `session_plan_step_update` | Atomically update the status/result of a specific sub-task |
824
- | `session_plan_get_active` | Retrieve the current execution DAG and task statuses |
738
+ | `session_plan_decompose` | Decompose natural language goals into an execution plan that references verification requirements |
739
+ | `session_plan_step_update` | Atomically update step status/result with verification context |
740
+ | `session_plan_get_active` | Retrieve active plan state and current verification gating position |
825
741
 
826
742
  </details>
827
743
 
@@ -970,7 +886,8 @@ Prism is evolving from smart session logging toward a **cognitive memory archite
970
886
  | **v7.0** | Composite Retrieval Scoring — `0.7 × similarity + 0.3 × σ(activation)`; configurable via `PRISM_ACTR_WEIGHT_*` | Hybrid cognitive-neural retrieval models | ✅ Shipped |
971
887
  | **v7.0** | AccessLogBuffer — in-memory batch-write buffer with 5s flush; prevents SQLite `SQLITE_BUSY` under parallel agents | Production reliability engineering | ✅ Shipped |
972
888
  | **v7.3** | Dark Factory — 3-gate fail-closed EXECUTE pipeline (parse → type → scope) with structured JSON action contract | Industrial safety systems (defense-in-depth, fail-closed valves) | ✅ Shipped |
973
- | **v7.2** | Executive Planning & DAG tracking | Prefrontal cortex executive control + Directed Acyclic Graph planning | 🔭 Horizon |
889
+ | **v7.2** | Verification-first harness spec-freeze contract, rubric hash lock, multi-layer assertions, CLI `verify` commands | Programmatic verification systems + adversarial validation loops | Shipped |
890
+ | **v7.4** | Adversarial Evaluation — PLAN_CONTRACT + EVALUATE with isolated generator/evaluator roles, pre-committed rubrics, and evidence-bound findings | Anti-sycophancy research, adversarial ML evaluation frameworks | ✅ Shipped |
974
891
  | **v7.x** | Affect-Tagged Memory — sentiment shapes what gets recalled | Affect-modulated retrieval (neuroscience) | 🔭 Horizon |
975
892
  | **v8+** | Zero-Search Retrieval — no index, no ANN, just ask the vector | Holographic Reduced Representations | 🔭 Horizon |
976
893
 
@@ -978,7 +895,7 @@ Prism is evolving from smart session logging toward a **cognitive memory archite
978
895
 
979
896
  ---
980
897
 
981
- ## 📦 Product Roadmap
898
+ ## 📦 Recent Milestones & Roadmap
982
899
 
983
900
  > **[Full ROADMAP.md →](ROADMAP.md)**
984
901
 
@@ -988,6 +905,9 @@ Shipped in v6.2.0. Edge synthesis, graph pruning with SLO observability, tempora
988
905
  ### v6.5: Cognitive Architecture ✅
989
906
  Shipped. Full Superposed Memory (SDM) + Hyperdimensional Computing (HDC/VSA) cognitive routing pipeline. Compositional memory states via XOR binding, Hamming resolution, and policy-gated routing (direct / clarify / fallback). 705 tests passing.
990
907
 
908
+ ### v7.4: Adversarial Evaluation ✅
909
+ Shipped. `PLAN_CONTRACT` + `EVALUATE` steps added to the Dark Factory pipeline. Generator and evaluator operate in isolated roles with pre-committed rubrics. Evidence-bound findings with `criterion_id`, `severity`, `file`, and `line` (number). Conservative `plan_viable=false` default on parse failure escalates to full PLAN re-plan. 78 new tests, 978 total.
910
+
991
911
  ### v7.3: Dark Factory — Fail-Closed Execution ✅
992
912
  Shipped. Structured JSON action contract for autonomous `EXECUTE` steps. 3-gate validation pipeline (parse → type → scope) terminates pipelines on any violation before filesystem side effects. 67 edge-case tests covering adversarial LLM output, path traversal, and type coercion.
993
913
 
@@ -997,8 +917,8 @@ Shipped. Deterministic task routing (`session_task_route`) with optional experie
997
917
  ### v7.0: ACT-R Activation Memory ✅
998
918
  Shipped. Scientifically-grounded retrieval re-ranking via ACT-R base-level activation (`B_i = ln(Σ t_j^(-d))`), candidate-scoped spreading activation, parameterized sigmoid normalization, composite scoring, and zero-cold-start access log infrastructure. 49 dedicated unit tests, 705 total passing.
999
919
 
1000
- ### v7.2: Executive Function 🔭
1001
- Planned. Adds autonomous plan decomposition, DAG-backed step tracking, and self-healing execution loops for complex multi-step operations.
920
+ ### v7.2: Verification Harness
921
+ Shipped. Spec-frozen verification contract (`implementation_plan.md` + `verification_harness.json` + immutable `validation_result`), multi-layer machine checks (`data`, `agent`, `pipeline`), finalization gate policies (`warn` / `gate` / `abort`), and CLI `verify generate` / `verify status --json` with schema-versioned output.
1002
922
 
1003
923
  ### Future Tracks
1004
924
  - **v7.x: Affect-Tagged Memory** — Recall prioritization improves by weighting memories with affective/contextual valence, making surfaced context more behaviorally useful.
@@ -1008,7 +928,7 @@ Planned. Adds autonomous plan decomposition, DAG-backed step tracking, and self-
1008
928
  ## ❓ Troubleshooting FAQ
1009
929
 
1010
930
  **Q: Why is the dashboard project selector stuck on "Loading projects..."?**
1011
- A: This usually means Supabase env values are unresolved placeholders (for example `${SUPABASE_URL}`) or invalid. As of v6.5.1 Prism auto-falls back to local SQLite, but you should still fix env values for cloud mode.
931
+ A: Fixed in v7.3.3. The root cause was a multi-layer quote-escaping trap in the `abortPipeline` onclick handler that generated a `SyntaxError` in the browser, silently killing the entire dashboard IIFE. Update to v7.3.3+ (`npx -y prism-mcp-server`). If still stuck, check that Supabase env values are properly set (unresolved placeholders like `${SUPABASE_URL}` cause `/api/projects` to return empty). Prism auto-falls back to local SQLite when Supabase is misconfigured.
1012
932
 
1013
933
  **Q: Why is semantic search quality weak or inconsistent?**
1014
934
  A: Check embedding provider configuration and key availability. Missing embedding credentials reduce semantic recall quality and can shift behavior toward keyword-heavy matches.
@@ -1019,8 +939,11 @@ A: Use `session_forget_memory` for targeted soft/hard deletion. For manual clean
1019
939
  **Q: How do I verify the install quickly?**
1020
940
  A: Run `npm run build && npm test`, then open the Mind Palace dashboard (`localhost:3000`) and confirm projects load plus Graph Health renders.
1021
941
 
942
+ ---
943
+
944
+ ### 💡 Known Limitations & Quirks
1022
945
 
1023
- - **LLM-dependent features require an API key.** Semantic search, Morning Briefings, auto-compaction, and VLM captioning need a `GOOGLE_API_KEY` (Gemini) or equivalent provider key. Without one, Prism falls back to keyword-only search (FTS5).
946
+ - **LLM-dependent features require an API key.** Semantic search, Morning Briefings, auto-compaction, and VLM captioning need a `GOOGLE_API_KEY` (your Gemini API key) or equivalent provider key. Without one, Prism falls back to keyword-only search (FTS5).
1024
947
  - **Auto-load is model- and client-dependent.** Session auto-loading relies on both the LLM following system prompt instructions *and* the MCP client completing tool registration before the model's first turn. Prism provides platform-specific [Setup Guides](#-setup-guides) and a server-side fallback (v5.2.1) that auto-pushes context after 10 seconds.
1025
948
  - **MCP client race conditions.** Some MCP clients may not finish tool enumeration before the model generates its first response, causing transient `unknown_tool` errors. This is a client-side timing issue — Prism's server completes the MCP handshake in ~60ms. Workaround: the server-side auto-push fallback and the startup skill's retry logic.
1026
949
  - **No real-time sync without Supabase.** Local SQLite mode is single-machine only. Multi-device or team sync requires a Supabase backend.
package/dist/cli.js ADDED
@@ -0,0 +1,50 @@
1
+ #!/usr/bin/env node
2
+ import { Command } from 'commander';
3
+ import { SqliteStorage } from './storage/sqlite.js';
4
+ import { handleVerifyStatus, handleGenerateHarness } from './verification/cliHandler.js';
5
+ import * as path from 'path';
6
+ const program = new Command();
7
+ program
8
+ .name('prism')
9
+ .description('Prism Configuration & CLI')
10
+ .version('7.3.1');
11
+ const verifyCmd = program
12
+ .command('verify')
13
+ .description('Manage the verification harness');
14
+ verifyCmd
15
+ .command('status')
16
+ .description('Check the current verification state and view config drift')
17
+ .option('-p, --project <name>', 'Project name', path.basename(process.cwd()))
18
+ .option('-f, --force', 'Bypass verification failures and drift tracking constraints')
19
+ .option('-u, --user <id>', 'User ID for tenant isolation', 'default')
20
+ .option('--json', 'Emit machine-readable JSON output with stable keys')
21
+ .action(async (options) => {
22
+ const storage = new SqliteStorage();
23
+ await storage.initialize('./prism-local.db');
24
+ // H4 fix: Ensure storage is closed on exit to flush WAL and prevent data loss
25
+ try {
26
+ await handleVerifyStatus(storage, options.project, !!options.force, options.user, !!options.json);
27
+ }
28
+ finally {
29
+ await storage.close();
30
+ }
31
+ });
32
+ verifyCmd
33
+ .command('generate')
34
+ .description('Bless the current ./verification_harness.json as the canonical rubric')
35
+ .option('-p, --project <name>', 'Project name', path.basename(process.cwd()))
36
+ .option('-f, --force', 'Bypass verification failures and drift tracking constraints')
37
+ .option('-u, --user <id>', 'User ID for tenant isolation', 'default')
38
+ .option('--json', 'Emit machine-readable JSON output with stable keys')
39
+ .action(async (options) => {
40
+ const storage = new SqliteStorage();
41
+ await storage.initialize('./prism-local.db');
42
+ // H4 fix: Ensure storage is closed on exit to flush WAL and prevent data loss
43
+ try {
44
+ await handleGenerateHarness(storage, options.project, !!options.force, options.user, !!options.json);
45
+ }
46
+ finally {
47
+ await storage.close();
48
+ }
49
+ });
50
+ program.parse(process.argv);
@@ -30,6 +30,37 @@ RULES:
30
30
  - Do NOT use markdown code fences
31
31
  - If you cannot complete the task, return: {"actions": [], "notes": "reason"}
32
32
  `.trim();
33
+ const PLAN_CONTRACT_SCHEMA = `
34
+ You MUST respond with ONLY a valid JSON object matching this schema:
35
+ {
36
+ "criteria": [
37
+ {
38
+ "id": "string (unique identifier, e.g. 'req-1')",
39
+ "description": "string (clear, testable condition)"
40
+ }
41
+ ]
42
+ }
43
+ `.trim();
44
+ const EVALUATE_SCHEMA = `
45
+ You MUST respond with ONLY a valid JSON object matching this schema:
46
+ {
47
+ "pass": true | false,
48
+ "plan_viable": true | false,
49
+ "notes": "string (optional summary)",
50
+ "findings": [
51
+ {
52
+ "severity": "critical" | "warning" | "info",
53
+ "criterion_id": "string (must match a contract criterion id)",
54
+ "pass_fail": true | false,
55
+ "evidence": {
56
+ "file": "string",
57
+ "line": 42,
58
+ "description": "string"
59
+ }
60
+ }
61
+ ]
62
+ }
63
+ `.trim();
33
64
  /**
34
65
  * Invocation wrapper that routes payload specs to the local Claw agent model (Qwen 2.5),
35
66
  * or the active LLM provider as fallback.
@@ -49,16 +80,40 @@ export async function invokeClawAgent(spec, state, timeoutMs = 120000 // 2 min d
49
80
  : getLLMProvider();
50
81
  // Scope injection via SafetyController — single source of truth
51
82
  const systemPrompt = SafetyController.generateBoundaryPrompt(spec, state);
52
- // v7.3.1: EXECUTE steps get structured JSON output instructions
53
- const isExecuteStep = state.current_step === 'EXECUTE';
54
- const executePrompt = isExecuteStep
55
- ? `Based on the system instructions, execute the necessary actions for the current step (${state.current_step}).\n\n${EXECUTE_JSON_SCHEMA}`
56
- : `Based on the system instructions, execute the necessary task for the current step (${state.current_step}). Respond with your actions and observations.`;
57
- debugLog(`[ClawInvocation] Launching agent on pipeline ${state.id} step=${state.current_step} iter=${state.iteration} with ${timeoutMs}ms limit.${isExecuteStep ? ' (JSON mode)' : ''}`);
83
+ // Inject the appropriate JSON schema according to the step
84
+ let stepPrompt = `Based on the system instructions, execute the necessary task for the current step (${state.current_step}). Respond with your actions and observations.`;
85
+ let isJsonMode = false;
86
+ if (state.current_step === 'EXECUTE') {
87
+ let revisionContext = '';
88
+ // If we are retrying after an EVALUATE failure, state.notes holds the serialized evaluator critique.
89
+ // Inject it so the Generator knows exactly what to fix rather than retrying blindly.
90
+ if (state.eval_revisions && state.eval_revisions > 0) {
91
+ revisionContext = `\n\n=== EVALUATOR CRITIQUE (revision ${state.eval_revisions}) ===\n${state.notes || 'Fix previous errors.'}\n\nYou MUST correct all issues listed above before submitting.`;
92
+ }
93
+ stepPrompt = `Based on the system instructions, execute the necessary actions for the current step (${state.current_step}).${revisionContext}\n\n${EXECUTE_JSON_SCHEMA}`;
94
+ isJsonMode = true;
95
+ }
96
+ else if (state.current_step === 'PLAN_CONTRACT') {
97
+ stepPrompt = `Based on the system instructions from the PLAN phase, formulate a strict, boolean-testable contract rubric.\n\n${PLAN_CONTRACT_SCHEMA}`;
98
+ isJsonMode = true;
99
+ }
100
+ else if (state.current_step === 'EVALUATE') {
101
+ stepPrompt = `Based on the system instructions, evaluate the GENERATOR's execution against the PLAN_CONTRACT rubric. BE STRICT.
102
+
103
+ === GENERATOR'S ACTIONS ===
104
+ ${state.notes || 'No notes provided'}
105
+
106
+ === CONTRACT RUBRIC ===
107
+ ${state.contract_payload ? JSON.stringify(state.contract_payload.criteria, null, 2) : '(See contract_rubric.json on disk)'}
108
+
109
+ ${EVALUATE_SCHEMA}`;
110
+ isJsonMode = true;
111
+ }
112
+ debugLog(`[ClawInvocation] Launching agent on pipeline ${state.id} step=${state.current_step} iter=${state.iteration} with ${timeoutMs}ms limit.${isJsonMode ? ' (JSON mode)' : ''}`);
58
113
  try {
59
114
  // Timeout Promise to ensure the runner thread does not block indefinitely
60
115
  const timeboundExecution = Promise.race([
61
- llm.generateText(executePrompt, systemPrompt),
116
+ llm.generateText(stepPrompt, systemPrompt),
62
117
  new Promise((_, reject) => setTimeout(() => reject(new Error('LLM_EXECUTION_TIMEOUT')), timeoutMs))
63
118
  ]);
64
119
  const result = await timeboundExecution;