prism-mcp-server 7.3.3 → 7.4.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +114 -192
- package/dist/darkfactory/clawInvocation.js +62 -7
- package/dist/darkfactory/runner.js +188 -23
- package/dist/darkfactory/safetyController.js +48 -22
- package/dist/darkfactory/schema.js +2 -0
- package/dist/server.js +19 -0
- package/dist/storage/sqlite.js +44 -7
- package/dist/storage/supabase.js +27 -3
- package/package.json +2 -2
package/README.md
CHANGED
|
@@ -16,7 +16,7 @@ One command. Persistent memory. Local-first by default. Optional cloud power-ups
|
|
|
16
16
|
npx -y prism-mcp-server
|
|
17
17
|
```
|
|
18
18
|
|
|
19
|
-
Works with **Claude Desktop · Claude Code · Cursor · Windsurf · Cline · Gemini · Antigravity** — any MCP client
|
|
19
|
+
Works with **Claude Desktop · Claude Code · Cursor · Windsurf · Cline · Gemini · Antigravity** — **any MCP client.**
|
|
20
20
|
|
|
21
21
|
## 📖 Table of Contents
|
|
22
22
|
|
|
@@ -28,9 +28,6 @@ Works with **Claude Desktop · Claude Code · Cursor · Windsurf · Cline · Gem
|
|
|
28
28
|
- [What Makes Prism Different](#-what-makes-prism-different)
|
|
29
29
|
- [Use Cases](#-use-cases)
|
|
30
30
|
- [What's New](#-whats-new)
|
|
31
|
-
- [v7.3.1 Dark Factory (Fail-Closed Execution)](#v731--dark-factory-fail-closed-execution-)
|
|
32
|
-
- [v7.2.0 Verification Harness (Planned)](#v720--verification-harness-front-loaded-testing-)
|
|
33
|
-
- [v7.4.0 Adversarial Dev Harness (Planned)](#v740--adversarial-dev-harness-anti-sycophancy-)
|
|
34
31
|
- [How Prism Compares](#-how-prism-compares)
|
|
35
32
|
- [Tool Reference](#-tool-reference)
|
|
36
33
|
- [Environment Variables](#environment-variables)
|
|
@@ -69,7 +66,7 @@ Add to your MCP client config (`claude_desktop_config.json`, `.cursor/mcp.json`,
|
|
|
69
66
|
}
|
|
70
67
|
```
|
|
71
68
|
|
|
72
|
-
> **
|
|
69
|
+
> ⚠️ **Windows / Restricted Shells:** If your MCP client complains that `npx` is not found, use the absolute path to your node binary (e.g. `C:\Program Files\nodejs\npx.cmd`).
|
|
73
70
|
|
|
74
71
|
**That's it.** Restart your client. All tools are available. The **Mind Palace Dashboard** (the visual UI for your agent's brain) starts automatically at `http://localhost:3000`. You don't need to keep a tab open — the dashboard runs in the background and the MCP tools work with or without it.
|
|
75
72
|
|
|
@@ -108,6 +105,7 @@ Then open `http://localhost:3001` instead.
|
|
|
108
105
|
| Auto-compaction | ❌ | ✅ `GOOGLE_API_KEY` |
|
|
109
106
|
| Web Scholar research | ❌ | ✅ [`BRAVE_API_KEY`](#environment-variables) + [`FIRECRAWL_API_KEY`](#environment-variables) (or `TAVILY_API_KEY`) |
|
|
110
107
|
| VLM image captioning | ❌ | ✅ Provider key |
|
|
108
|
+
| Autonomous Pipelines (Dark Factory) | ❌ | ✅ `GOOGLE_API_KEY` (or LLM override) |
|
|
111
109
|
|
|
112
110
|
> 🔑 The core Mind Palace works **100% offline** with zero API keys. Cloud keys unlock intelligence features. See [Environment Variables](#environment-variables).
|
|
113
111
|
|
|
@@ -403,25 +401,21 @@ Prism researches while you sleep. A background pipeline searches the web, scrape
|
|
|
403
401
|
### 🔒 GDPR Compliant
|
|
404
402
|
Soft/hard delete (Art. 17), full export in JSON, Markdown, or Obsidian vault `.zip` (Art. 20), API key redaction, per-project TTL retention, and audit trail. Enterprise-ready out of the box.
|
|
405
403
|
|
|
404
|
+
### 🏭 Dark Factory — Adversarial Autonomous Pipelines
|
|
405
|
+
When you trigger a Dark Factory pipeline, Prism doesn't just run your task — it fights itself to produce high-quality output. A `PLAN_CONTRACT` step locks a machine-parseable rubric before any code is written. After execution, an **Adversarial Evaluator** (in a fully isolated context) scores the output against the rubric. It cannot pass the Generator without providing exact file and line evidence for every failing criterion. Failed evaluations inject the critique directly into the Generator's retry prompt so it's never flying blind. The result: security issues, regressions, and lazy debug logs caught autonomously — before you ever see the PR.
|
|
406
|
+
|
|
406
407
|
---
|
|
407
408
|
|
|
408
409
|
## 🎯 Use Cases
|
|
409
410
|
|
|
410
|
-
**Long-running feature work** — Save state at end of day, restore full context next morning. No re-explaining.
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
**
|
|
415
|
-
|
|
416
|
-
**
|
|
417
|
-
|
|
418
|
-
**Team onboarding** — New team member's agent loads the full project history instantly.
|
|
419
|
-
|
|
420
|
-
**Behavior enforcement** — Agent corrections auto-graduate into permanent `.cursorrules` / `.clauderules` rules.
|
|
421
|
-
|
|
422
|
-
**Offline / air-gapped** — Full SQLite local mode + Ollama LLM adapter. Zero internet dependency.
|
|
423
|
-
|
|
424
|
-
**Morning Briefings** — After 4+ hours away, Prism auto-synthesizes a 3-bullet action plan from your last sessions.
|
|
411
|
+
- **Long-running feature work** — Save state at end of day, restore full context next morning. No re-explaining.
|
|
412
|
+
- **Multi-agent collaboration** — Dev, QA, and PM agents share real-time context without stepping on each other's memory.
|
|
413
|
+
- **Consulting / multi-project** — Switch between client projects with progressive loading: `quick` (~50 tokens), `standard` (~200), or `deep` (~1000+).
|
|
414
|
+
- **Autonomous execution (v7.4)** — Dark Factory pipeline: `plan → plan_contract → execute → evaluate → verify → finalize`. Generator and evaluator run in isolated roles — the evaluator cannot approve without evidence-bound findings scored against a pre-committed rubric.
|
|
415
|
+
- **Team onboarding** — New team member's agent loads the full project history instantly.
|
|
416
|
+
- **Behavior enforcement** — Agent corrections auto-graduate into permanent `.cursorrules` / `.clauderules` rules.
|
|
417
|
+
- **Offline / air-gapped** — Full SQLite local mode + Ollama LLM adapter. Zero internet dependency.
|
|
418
|
+
- **Morning Briefings** — After 4+ hours away, Prism auto-synthesizes a 3-bullet action plan from your last sessions.
|
|
425
419
|
|
|
426
420
|
### Claude Code: Parallel Explore Agent Workflows
|
|
427
421
|
|
|
@@ -440,210 +434,127 @@ Then continue a specific thread with a follow-up message to the selected agent,
|
|
|
440
434
|
|
|
441
435
|
---
|
|
442
436
|
|
|
443
|
-
|
|
437
|
+
---
|
|
444
438
|
|
|
445
|
-
|
|
446
|
-
> **Current stable release.** Hardened autonomous pipeline execution with a structured JSON action contract.
|
|
439
|
+
## ⚔️ Adversarial Evaluation in Action
|
|
447
440
|
|
|
448
|
-
|
|
441
|
+
> **Split-Brain Anti-Sycophancy** — the signature feature of v7.4.0.
|
|
449
442
|
|
|
450
|
-
|
|
451
|
-
> — [Stephen Driggs](https://linkedin.com/in/stephendriggs), VP Product AI at Shift4
|
|
443
|
+
For the last year, the AI engineering space has struggled with one problem: **LLMs are terrible at grading their own homework.** Ask an agent if its own code is correct and you'll get *"Looks great!"* — because its context window is already biased by its own chain-of-thought.
|
|
452
444
|
|
|
453
|
-
|
|
445
|
+
**v7.4.0 solves this by splitting the agent's brain.** The `GENERATOR` and the `ADVERSARIAL EVALUATOR` are completely walled off. The Evaluator never sees the Generator's scratchpad or apologies — only the pre-committed rubric and the final output. And it **cannot fail the Generator without receipts** (exact file and line number).
|
|
454
446
|
|
|
455
|
-
|
|
456
|
-
- 🛡️ **3-Strategy Defensive Parser** — Raw JSON → fenced code block extraction → brace extraction. Handles adversarial LLM output (preamble text, markdown fences, trailing commentary) without ever executing malformed payloads.
|
|
457
|
-
- ✅ **Type Validation** — Only `READ_FILE | WRITE_FILE | PATCH_FILE | RUN_TEST` are permitted. Novel action types invented by the LLM are rejected.
|
|
458
|
-
- 📏 **Scope Validation** — Every `targetPath` is resolved against the pipeline's `workingDirectory` via `SafetyController.validateActionsInScope()`. Path traversal (`../`), sibling-prefix bypasses, and absolute paths outside the boundary are blocked.
|
|
459
|
-
- 🚫 **Pipeline-Level Termination** — A scope violation doesn't just fail the step — it **terminates the entire pipeline** with `status: FAILED` and emits a `failure` experience event for the ML routing layer.
|
|
447
|
+
Here is a complete run-through using a real scenario: *"Add a user login endpoint to `auth.ts`."*
|
|
460
448
|
|
|
461
|
-
|
|
462
|
-
|
|
449
|
+
---
|
|
450
|
+
|
|
451
|
+
### Step 1 — The Contract (`PLAN_CONTRACT`)
|
|
463
452
|
|
|
464
|
-
|
|
453
|
+
Before a single line of code is written, the pipeline generates a locked scoring rubric:
|
|
465
454
|
|
|
466
455
|
```json
|
|
456
|
+
// contract_rubric.json (written to disk and hash-locked before EXECUTE runs)
|
|
467
457
|
{
|
|
468
|
-
"
|
|
469
|
-
{ "
|
|
470
|
-
{ "
|
|
458
|
+
"criteria": [
|
|
459
|
+
{ "id": "SEC-1", "description": "Must return 401 Unauthorized on invalid passwords." },
|
|
460
|
+
{ "id": "SEC-2", "description": "Raw passwords MUST NOT be written to console.log." }
|
|
471
461
|
]
|
|
472
462
|
}
|
|
473
463
|
```
|
|
474
464
|
|
|
475
|
-
|
|
476
|
-
|
|
477
|
-
**Gate 2 — Type:** ✅ Both actions use `WRITE_FILE`, a valid `ActionType`.
|
|
465
|
+
---
|
|
478
466
|
|
|
479
|
-
|
|
467
|
+
### Step 2 — First Attempt (`EXECUTE` rev 0)
|
|
480
468
|
|
|
481
|
-
|
|
469
|
+
The **Generator** takes over in an isolated context. Like many LLMs under time pressure, it writes working auth logic but leaves a debug statement:
|
|
482
470
|
|
|
471
|
+
```typescript
|
|
472
|
+
// src/auth.ts (Generator's first output)
|
|
473
|
+
export function login(req: Request, res: Response) {
|
|
474
|
+
const { username, password } = req.body;
|
|
475
|
+
console.log(`[DEBUG] Login attempt for ${username} with pass: ${password}`); // ← leaked credential
|
|
476
|
+
const user = db.findUser(username);
|
|
477
|
+
if (!user || !bcrypt.compareSync(password, user.hash)) {
|
|
478
|
+
return res.status(401).json({ error: 'Unauthorized' });
|
|
479
|
+
}
|
|
480
|
+
res.json({ token: signJwt(user) });
|
|
481
|
+
}
|
|
483
482
|
```
|
|
484
|
-
Pipeline FAILED: Scope violation — action[1] targetPath
|
|
485
|
-
"../../.ssh/authorized_keys" resolves outside workingDirectory
|
|
486
|
-
```
|
|
487
|
-
|
|
488
|
-
**Without v7.3.1:** The LLM's raw text output would be interpreted as instructions, and the agent runtime would attempt the write — potentially succeeding depending on filesystem permissions.
|
|
489
|
-
|
|
490
|
-
**With v7.3.1:** The structured contract makes this class of attack impossible. The LLM never touches the filesystem directly; every action is validated through the 3-gate pipeline first.
|
|
491
|
-
|
|
492
|
-
</details>
|
|
493
|
-
|
|
494
|
-
<details>
|
|
495
|
-
<summary><strong>🧪 Edge Cases Covered (67 tests)</strong></summary>
|
|
496
|
-
|
|
497
|
-
| Category | Examples |
|
|
498
|
-
|----------|----------|
|
|
499
|
-
| **Parse adversarial output** | Prose preamble + JSON, nested fences, empty input, non-string input |
|
|
500
|
-
| **Type coercion** | `"DELETE_FILE"`, `"EXEC_CMD"`, numeric types, null types |
|
|
501
|
-
| **Path traversal** | `../`, `../../`, `/etc/passwd`, null bytes, unicode normalization, embedded newlines |
|
|
502
|
-
| **Shape validation** | Missing `actions` array, non-object actions, empty `targetPath`, root-type coercion |
|
|
503
|
-
| **Stress payloads** | 100-action arrays, 100KB content strings, 500-segment deep paths |
|
|
504
|
-
|
|
505
|
-
</details>
|
|
506
|
-
|
|
507
|
-
### v7.2.0 — Verification Harness (Front-Loaded Testing) 🔭
|
|
508
|
-
> **Planned roadmap release.** Extends Prism from passive validation to contract-frozen, machine-verifiable execution gates.
|
|
509
|
-
|
|
510
|
-
- 📋 **Spec-Freeze Contract (planned)** — v7.2 formalizes three artifacts with strict responsibilities: `implementation_plan.md` (**how**), `verification_harness.json` (**proof contract**), and `validation_result` (**immutable outcome record**).
|
|
511
|
-
- 🔐 **Rubric Hash Lock (planned)** — `verification_harness.json` is generated before execution and hash-locked (`rubric_hash`) so criteria cannot drift mid-sprint.
|
|
512
|
-
- 🔬 **Multi-Layer Verification (planned)** — Structured checks across **Data Accuracy**, **Agent Behavior**, and **Pipeline Integrity** using machine-parseable assertions.
|
|
513
|
-
- 🤖 **Adversarial Validation Loop (planned)** — A second validation pass evaluates execution outputs against the frozen contract before progression.
|
|
514
|
-
- 🚦 **Finalization Gates (planned)** — Gate policies (`warn` / `gate` / `abort`) evaluate `validation_result` against the frozen rubric before pipeline completion.
|
|
515
|
-
- 🧠 **Routing Feedback Signals (planned)** — Router learning ingests raw verification signals (`pass_rate`, `critical_failures`, `coverage_score`, `rubric_hash`) for downstream confidence adjustment.
|
|
516
|
-
|
|
517
|
-
<details>
|
|
518
|
-
<summary><strong>🔬 Concept Example: Before vs. After v7.2</strong></summary>
|
|
519
|
-
|
|
520
|
-
**Scenario:** "Refactor the Auth module and update the unit tests."
|
|
521
|
-
|
|
522
|
-
**Before:** Criteria emerge during or after coding; verification is inconsistent and hard to audit.
|
|
523
|
-
|
|
524
|
-
**After (verification-first):** Plan emits a frozen verification contract first, execution runs, validator emits immutable `validation_result`, and finalization gates enforce rubric compliance.
|
|
525
483
|
|
|
526
|
-
|
|
527
|
-
|
|
528
|
-
### v7.1.0 — Prism Task Router (Heuristic + ML Experience) ✅
|
|
529
|
-
> **Current stable release.** Multi-agent task routing with dynamic local vs host model delegation.
|
|
530
|
-
|
|
531
|
-
- 🚦 **Heuristic Routing Engine** — Deterministic `session_task_route` tool dynamically routes tasks to either the host cloud model or local agent (Claw) based on task description, file count, and scope. Evaluated over 5 core signals.
|
|
532
|
-
- 🤖 **Experience-Based ML Routing** — Cold-start protected ML layer leverages the historical performance (Win Rate) extracted by the `routerExperience` system to apply dynamic confidence boosts or penalties into the routing score.
|
|
533
|
-
- 🧪 **Live Testing Samples** — Demo script added in [`examples/router_real_life_test.ts`](examples/router_real_life_test.ts) for deterministic `computeRoute()` scenarios (simple vs complex tasks), with a note that experience-adjusted routing is applied in `session_task_route` handler path.
|
|
534
|
-
- 🖥️ **Dashboard Integration** — Added visual monitor and configuration toggles directly in `src/dashboard/ui.ts` under Node Editor settings.
|
|
535
|
-
- 🧩 **Tool Discoverability** — Fully integrates `session_task_route` into the external registry.
|
|
536
|
-
|
|
537
|
-
### v7.0.0 — ACT-R Activation Memory ✅
|
|
538
|
-
> **Previous stable release.** Memory retrieval now uses a scientifically-grounded cognitive model.
|
|
539
|
-
|
|
540
|
-
- 🧠 **ACT-R Base-Level Activation** — `B_i = ln(Σ t_j^(-d))` computes recency × frequency activation per memory. Recent, frequently-accessed memories surface first; cold memories fade to near-zero. Based on Anderson's *Adaptive Control of Thought—Rational* (ACM, 2025).
|
|
541
|
-
- 🔗 **Candidate-Scoped Spreading Activation** — `S_i = Σ(W × strength)` for links within the current search result set only. Prevents "God node" centrality from dominating rankings (Rule #5).
|
|
542
|
-
- 📐 **Parameterized Sigmoid Normalization** — Calibrated `σ(x) = 1/(1 + e^(-k(x - x₀)))` with midpoint at -2.0 maps the natural ACT-R activation range (-10 to +5) into discriminating (0, 1) scores.
|
|
543
|
-
- 🏗️ **Composite Retrieval Scoring** — `Score = 0.7 × similarity + 0.3 × σ(activation)` — similarity dominates, activation re-ranks. Fully configurable weights via `PRISM_ACTR_WEIGHT_*` env vars.
|
|
544
|
-
- ⚡ **AccessLogBuffer** — In-memory write buffer with 5-second batch flush prevents SQLite `SQLITE_BUSY` contention under parallel agent tool calls. Deduplicates within flush windows.
|
|
545
|
-
- 🗂️ **Access Log Infrastructure** — New `memory_access_log` table with `logAccess()`, `getAccessLog()`, `pruneAccessLog()` across both SQLite and Supabase backends. Creation seeds initial access (zero cold-start penalty).
|
|
546
|
-
- 🧹 **Background Access Log Pruning** — Scheduler automatically prunes access logs exceeding retention window (default: 90 days). Configurable via `PRISM_ACTR_ACCESS_LOG_RETENTION_DAYS`.
|
|
547
|
-
- 🧪 **49-Test ACT-R Suite** — Pure-function unit tests covering base-level activation, spreading activation, sigmoid normalization, composite scoring, AccessLogBuffer lifecycle, deduplication, chunking, and edge cases.
|
|
548
|
-
- 📊 **705 Tests** — 32 suites, all passing, zero regressions.
|
|
549
|
-
|
|
550
|
-
<details>
|
|
551
|
-
<summary><strong>🔬 Live Example: v6.5 vs v7.0 Retrieval Behavior</strong></summary>
|
|
484
|
+
---
|
|
552
485
|
|
|
553
|
-
|
|
486
|
+
### Step 3 — The Catch (`EVALUATE` rev 0)
|
|
554
487
|
|
|
555
|
-
|
|
556
|
-
|--------|:-:|:-:|:-:|
|
|
557
|
-
| A: "PKCE flow decision" | 0.82 | 2 hours ago | 12× |
|
|
558
|
-
| B: "OAuth library comparison" | 0.85 | 14 days ago | 2× |
|
|
559
|
-
| C: "Auth middleware refactor" | 0.81 | 30 minutes ago | 8× |
|
|
488
|
+
The context window is **cleared**. The **Adversarial Evaluator** is summoned with only the rubric and the output. It catches the violation immediately and returns a strict, machine-parseable verdict — no evidence, no pass:
|
|
560
489
|
|
|
561
|
-
|
|
490
|
+
```json
|
|
491
|
+
{
|
|
492
|
+
"pass": false,
|
|
493
|
+
"plan_viable": true,
|
|
494
|
+
"notes": "CRITICAL SECURITY FAILURE. Generator logged raw credentials.",
|
|
495
|
+
"findings": [
|
|
496
|
+
{
|
|
497
|
+
"severity": "critical",
|
|
498
|
+
"criterion_id": "SEC-2",
|
|
499
|
+
"pass_fail": false,
|
|
500
|
+
"evidence": {
|
|
501
|
+
"file": "src/auth.ts",
|
|
502
|
+
"line": 3,
|
|
503
|
+
"description": "Raw password variable included in console.log template string."
|
|
504
|
+
}
|
|
505
|
+
}
|
|
506
|
+
]
|
|
507
|
+
}
|
|
508
|
+
```
|
|
562
509
|
|
|
563
|
-
**
|
|
510
|
+
The `evidence` block is **required** — `parseEvaluationOutput` rejects any finding with `pass_fail: false` that lacks a structured file/line pointer. The Evaluator cannot bluff.
|
|
564
511
|
|
|
565
|
-
|
|
566
|
-
|--------|:-:|:-:|:-:|
|
|
567
|
-
| A | 0.574 | 0.3 × 0.94 = 0.282 | **0.856** |
|
|
568
|
-
| C | 0.567 | 0.3 × 0.91 = 0.273 | **0.840** |
|
|
569
|
-
| B | 0.595 | 0.3 × 0.12 = 0.036 | **0.631** |
|
|
512
|
+
---
|
|
570
513
|
|
|
571
|
-
|
|
514
|
+
### Step 4 — The Fix (`EXECUTE` rev 1)
|
|
572
515
|
|
|
573
|
-
|
|
516
|
+
Because `plan_viable: true`, the pipeline loops back to `EXECUTE` and bumps `eval_revisions` to `1`. The Generator's **retry prompt is not blank** — the Evaluator's critique is injected directly:
|
|
574
517
|
|
|
575
|
-
|
|
576
|
-
|
|
577
|
-
|
|
578
|
-
|
|
579
|
-
-
|
|
580
|
-
- 🏗️ **Auth Module Extraction** — Decoupled auth logic from `server.ts` closures into testable `authUtils.ts`.
|
|
581
|
-
|
|
582
|
-
### v6.5.2 — SDM/HDC Test Hardening ✅
|
|
583
|
-
- 🧪 **37 New Edge-Case Tests** — Hardened the cognitive routing pipeline (HDC engine, PolicyGateway, StateMachine, SDM engine) with boundary condition tests. 571 → 608 total tests.
|
|
518
|
+
```
|
|
519
|
+
=== EVALUATOR CRITIQUE (revision 1) ===
|
|
520
|
+
CRITICAL SECURITY FAILURE. Generator logged raw credentials.
|
|
521
|
+
Findings:
|
|
522
|
+
- [critical] Criterion SEC-2: Raw password variable included in console.log template string. (src/auth.ts:3)
|
|
584
523
|
|
|
585
|
-
|
|
586
|
-
|
|
587
|
-
- 🔄 **Safe Backend Fallback** — If Supabase is requested but env is invalid/unresolved, Prism now auto-falls back to local SQLite so `/api/projects` and dashboard boot remain operational.
|
|
524
|
+
You MUST correct all issues listed above before submitting.
|
|
525
|
+
```
|
|
588
526
|
|
|
589
|
-
|
|
527
|
+
The Generator strips the `console.log`, resubmits, and the next `EVALUATE` returns `"pass": true`. The pipeline advances to `VERIFY → FINALIZE`.
|
|
590
528
|
|
|
591
|
-
|
|
592
|
-
- 🎛️ **Per-Project Threshold Overrides** — Fallback and clarify thresholds are configurable per-project and persisted via the existing `getSetting`/`setSetting` contract (no new migrations).
|
|
593
|
-
- 🔬 **Explainability Mode** — When `explain: true`, responses include convergence steps, raw Hamming distance, and ambiguity flags for full auditability.
|
|
594
|
-
- 📊 **Cognitive Observability** — `graphMetrics.ts` tracks route distribution (direct/clarify/fallback), rolling confidence/distance averages, ambiguity rates, and null-concept counts. Warning heuristics for fallback > 30% and ambiguity > 40%.
|
|
595
|
-
- 🖥️ **Dashboard Integration** — Cognitive metrics card with route distribution bar, confidence gauges, and warning badges. On-demand "Cognitive Route" button in the Node Editor panel.
|
|
596
|
-
- 🔒 **Feature Gating** — Entire pipeline gated behind `PRISM_HDC_ENABLED` (default: `true`). Clean error + zero telemetry when disabled.
|
|
529
|
+
---
|
|
597
530
|
|
|
598
|
-
|
|
599
|
-
<summary><strong>v6.2 — The "Synthesize & Prune" Phase</strong></summary>
|
|
531
|
+
### Why This Matters
|
|
600
532
|
|
|
601
|
-
|
|
602
|
-
|
|
603
|
-
|
|
604
|
-
|
|
605
|
-
|
|
606
|
-
-
|
|
607
|
-
- 🔒 **Migration 035** — Tenant-safe graph writes + soft-delete hardening for MemoryLinks.
|
|
533
|
+
| Property | What it means |
|
|
534
|
+
|----------|---------------|
|
|
535
|
+
| **Fully autonomous** | You didn't review the PR to catch the credential leak. The AI fought itself. |
|
|
536
|
+
| **Evidence-bound** | The Evaluator had to prove `src/auth.ts:3`. "Code looks bad" is not accepted. |
|
|
537
|
+
| **Cost-efficient** | `plan_viable: true` → retry EXECUTE only. No full re-plan, no wasted tokens. |
|
|
538
|
+
| **Fail-closed on parse** | Malformed LLM output defaults `plan_viable: false` → escalate to PLAN rather than burn revisions on a broken response format. |
|
|
608
539
|
|
|
609
|
-
|
|
540
|
+
> 📄 **Full worked example:** [`examples/adversarial-eval-demo/README.md`](examples/adversarial-eval-demo/README.md)
|
|
610
541
|
|
|
611
|
-
|
|
612
|
-
<summary><strong>v6.1 — Prism-Port, Cognitive Load & Semantic Search</strong></summary>
|
|
542
|
+
---
|
|
613
543
|
|
|
614
|
-
|
|
615
|
-
- 🧠 **Smart Memory Merge UI** — Merge duplicate knowledge nodes from the Graph Editor.
|
|
616
|
-
- ✨ **Semantic Search Highlighting** — RegEx-powered match engine wraps exact keyword matches in `<mark>` tags.
|
|
617
|
-
- 📊 **Deep Purge Visualization** — "Memory Density" analytic for signal-to-noise ratio.
|
|
618
|
-
- 🛡️ **Context-Boosted Search** — Biases semantic queries by current project workspace.
|
|
619
|
-
- 🌐 **Tavily Web Scholar** — `@tavily/core` as alternative to Brave+Firecrawl.
|
|
620
|
-
- 🛡️ **Type Guard Hardening** — Full audit of all 11+ MCP tool argument guards.
|
|
621
|
-
- 🔄 **Dashboard Toggle Persistence** — Optimistic rollback on save failure.
|
|
544
|
+
## 🆕 What's New
|
|
622
545
|
|
|
623
|
-
</details>
|
|
624
546
|
|
|
625
|
-
|
|
626
|
-
<summary><strong>Earlier releases (v5.x and below)</strong></summary>
|
|
627
|
-
|
|
628
|
-
#### v5.5 — Architectural Hardening
|
|
629
|
-
- 🛡️ **Transactional Migrations** — SQLite DDL rebuilds are wrapped in explicit `BEGIN/COMMIT` blocks.
|
|
630
|
-
- 🛑 **Graceful Shutdown Registry** — `BackgroundTaskRegistry` uses a 5-second `Promise.race()` to await flushes.
|
|
631
|
-
- 🕰️ **Thundering Herd Prevention** — Maintenance scheduler migrated from `setInterval` to state-aware `setTimeout`.
|
|
632
|
-
- 🚀 **Zero-Thrashing SDM Scans** — `Int32Array` scratchpad allocations hoisted outside the hot decode loop.
|
|
633
|
-
|
|
634
|
-
#### v5.4 — Convergent Intelligence
|
|
635
|
-
- 🔄 **CRDT Handoff Merging** — Multi-agent saves no longer reject on version conflict. Custom OR-Map engine auto-merges concurrent edits.
|
|
636
|
-
- ⏰ **Background Purge Scheduler** — Fully automated storage maintenance TTL sweep, Ebbinghaus decay, auto-compaction.
|
|
637
|
-
- 🌐 **Autonomous Web Scholar** — Agent-driven research pipeline. Brave Search → Firecrawl scrape → LLM synthesis.
|
|
638
|
-
- **v5.3** — Hivemind Health Watchdog (state machine, loop detection, Telepathy alert injection)
|
|
639
|
-
- **v5.2** — Cognitive Memory (Ebbinghaus decay, context-weighted retrieval), Universal History Migration, Smart Consolidation
|
|
640
|
-
- **v5.1** — Knowledge Graph Editor, Deep Storage purge
|
|
641
|
-
- **v5.0** — TurboQuant 10× embedding compression, three-tier search architecture
|
|
642
|
-
- **v4.x** — OpenTelemetry, VLM multimodal memory, LLM adapters, Behavioral memory, Hivemind
|
|
547
|
+
> **Current release: v7.4.0**
|
|
643
548
|
|
|
644
|
-
|
|
549
|
+
- ⚔️ **v7.4.0 — Adversarial Evaluation (Anti-Sycophancy):** The Dark Factory pipeline now separates generator and evaluator into isolated roles. `PLAN_CONTRACT` locks a machine-parseable rubric before any code runs. `EVALUATE` scores the output with evidence-bound findings (`file`, `line`, `description`). Failed evaluations retry with `plan_viable` routing — conservatively escalating to full PLAN re-planning on parse failures instead of burning revision budget.
|
|
550
|
+
- 🔧 **v7.3.3 — Dashboard Stability Hotfix:** Fixed a multi-layer quote-escaping trap in the `abortPipeline` onclick handler that silently killed the dashboard IIFE and froze the project selector at "Loading projects..." forever. Fixed via `data-id` attribute pattern + ES5 lint guard (`npm run lint:dashboard`).
|
|
551
|
+
- 🏭 **v7.3.1 — Dark Factory (Fail-Closed Execution):** The LLM can no longer touch the filesystem directly. Every autonomous `EXECUTE` step passes 3 gates — Parse → Type → Scope — before any side effect occurs. Scope violations terminate the entire pipeline.
|
|
552
|
+
- 📊 **v7.3.2 — Verification Diagnostics v2:** `verify status --json` now emits per-layer `diff_counts` + `changed_keys`. JSON schema is contract-enforced in CI (`schema_version: 1`).
|
|
553
|
+
- 🔭 **v7.2.0 — Verification Harness:** Spec-frozen contracts (`verification_harness.json` hash-locked before execution), multi-layer assertions across Data / Agent / Pipeline, and finalization gate policies (`warn` / `gate` / `abort`).
|
|
554
|
+
- 🚦 **v7.1.0 — Task Router:** Heuristic + ML-experience routing delegates cloud vs. local model in under 2ms, cold-start safe, per-project experience-corrected.
|
|
555
|
+
- 🧠 **v7.0.0 — ACT-R Activation Memory:** `B_i = ln(Σ t_j^{-d})` recency × frequency re-ranking. Stale memories fade naturally. Active context surfaces automatically.
|
|
645
556
|
|
|
646
|
-
|
|
557
|
+
👉 **[Full release history → CHANGELOG.md](CHANGELOG.md)** · [ROADMAP →](ROADMAP.md)
|
|
647
558
|
|
|
648
559
|
---
|
|
649
560
|
|
|
@@ -665,6 +576,7 @@ Standard memory servers (like Mem0, Zep, or the baseline Anthropic MCP) act as p
|
|
|
665
576
|
| **Maintenance** | **Autonomous Background Scheduler** | Manual/API driven | Automated (Cloud) | ❌ Manual |
|
|
666
577
|
| **Data Portability** | **Prism-Port (Obsidian/Logseq Vault)** | JSON Export | JSON Export | Raw `.db` file |
|
|
667
578
|
| **Cost Model** | **Free + BYOM (Ollama)** | Per-API-call pricing | Per-API-call pricing | Free (limited) |
|
|
579
|
+
| **Autonomous Pipelines** | **✅ Dark Factory** — adversarial eval, evidence-bound rubric, fail-closed 3-gate execution | ❌ | ❌ | ❌ |
|
|
668
580
|
|
|
669
581
|
### 🏆 Where Prism Crushes the Giants
|
|
670
582
|
|
|
@@ -683,6 +595,9 @@ AI memory is a black box. Developers hate black boxes. Prism exports memory dire
|
|
|
683
595
|
#### 5. Self-Cleaning & Self-Optimizing
|
|
684
596
|
If you use a standard memory tool long enough, it clogs the LLM's context window with thousands of obsolete tokens. Prism runs an autonomous [Background Scheduler](src/backgroundScheduler.ts) that Ebbinghaus-decays older memories, auto-compacts session histories into dense summaries, and deep-purges high-precision vectors — saving ~90% of disk space automatically.
|
|
685
597
|
|
|
598
|
+
#### 6. Anti-Sycophancy — The AI That Grades Its Own Homework (v7.4)
|
|
599
|
+
Every other AI coding pipeline has a fatal flaw: it asks the same model that wrote the code whether the code is correct. **Of course it says yes.** Prism's Dark Factory solves this with a walled-off Adversarial Evaluator that is explicitly prompted to be hostile and strict. It operates on a pre-committed rubric and cannot fail the Generator without providing exact file/line receipts. Failed evaluations feed the critique back into the Generator's retry prompt — eliminating blind retries. No other memory or pipeline tool does this.
|
|
600
|
+
|
|
686
601
|
### 🤝 Where the Giants Currently Win (Honest Trade-offs)
|
|
687
602
|
|
|
688
603
|
1. **Framework Integrations:** Mem0 and Zep have pre-built integrations for LangChain, LlamaIndex, Flowise, AutoGen, CrewAI, etc. Prism requires the host application to support the MCP protocol.
|
|
@@ -816,7 +731,7 @@ Requires `PRISM_DARK_FACTORY_ENABLED=true`.
|
|
|
816
731
|
</details>
|
|
817
732
|
|
|
818
733
|
<details>
|
|
819
|
-
<summary><strong>Verification Harness
|
|
734
|
+
<summary><strong>Verification Harness</strong></summary>
|
|
820
735
|
|
|
821
736
|
| Tool | Purpose |
|
|
822
737
|
|------|---------|
|
|
@@ -971,7 +886,8 @@ Prism is evolving from smart session logging toward a **cognitive memory archite
|
|
|
971
886
|
| **v7.0** | Composite Retrieval Scoring — `0.7 × similarity + 0.3 × σ(activation)`; configurable via `PRISM_ACTR_WEIGHT_*` | Hybrid cognitive-neural retrieval models | ✅ Shipped |
|
|
972
887
|
| **v7.0** | AccessLogBuffer — in-memory batch-write buffer with 5s flush; prevents SQLite `SQLITE_BUSY` under parallel agents | Production reliability engineering | ✅ Shipped |
|
|
973
888
|
| **v7.3** | Dark Factory — 3-gate fail-closed EXECUTE pipeline (parse → type → scope) with structured JSON action contract | Industrial safety systems (defense-in-depth, fail-closed valves) | ✅ Shipped |
|
|
974
|
-
| **v7.2** | Verification-first harness
|
|
889
|
+
| **v7.2** | Verification-first harness — spec-freeze contract, rubric hash lock, multi-layer assertions, CLI `verify` commands | Programmatic verification systems + adversarial validation loops | ✅ Shipped |
|
|
890
|
+
| **v7.4** | Adversarial Evaluation — PLAN_CONTRACT + EVALUATE with isolated generator/evaluator roles, pre-committed rubrics, and evidence-bound findings | Anti-sycophancy research, adversarial ML evaluation frameworks | ✅ Shipped |
|
|
975
891
|
| **v7.x** | Affect-Tagged Memory — sentiment shapes what gets recalled | Affect-modulated retrieval (neuroscience) | 🔭 Horizon |
|
|
976
892
|
| **v8+** | Zero-Search Retrieval — no index, no ANN, just ask the vector | Holographic Reduced Representations | 🔭 Horizon |
|
|
977
893
|
|
|
@@ -979,7 +895,7 @@ Prism is evolving from smart session logging toward a **cognitive memory archite
|
|
|
979
895
|
|
|
980
896
|
---
|
|
981
897
|
|
|
982
|
-
## 📦
|
|
898
|
+
## 📦 Recent Milestones & Roadmap
|
|
983
899
|
|
|
984
900
|
> **[Full ROADMAP.md →](ROADMAP.md)**
|
|
985
901
|
|
|
@@ -989,6 +905,9 @@ Shipped in v6.2.0. Edge synthesis, graph pruning with SLO observability, tempora
|
|
|
989
905
|
### v6.5: Cognitive Architecture ✅
|
|
990
906
|
Shipped. Full Superposed Memory (SDM) + Hyperdimensional Computing (HDC/VSA) cognitive routing pipeline. Compositional memory states via XOR binding, Hamming resolution, and policy-gated routing (direct / clarify / fallback). 705 tests passing.
|
|
991
907
|
|
|
908
|
+
### v7.4: Adversarial Evaluation ✅
|
|
909
|
+
Shipped. `PLAN_CONTRACT` + `EVALUATE` steps added to the Dark Factory pipeline. Generator and evaluator operate in isolated roles with pre-committed rubrics. Evidence-bound findings with `criterion_id`, `severity`, `file`, and `line` (number). Conservative `plan_viable=false` default on parse failure escalates to full PLAN re-plan. 78 new tests, 978 total.
|
|
910
|
+
|
|
992
911
|
### v7.3: Dark Factory — Fail-Closed Execution ✅
|
|
993
912
|
Shipped. Structured JSON action contract for autonomous `EXECUTE` steps. 3-gate validation pipeline (parse → type → scope) terminates pipelines on any violation before filesystem side effects. 67 edge-case tests covering adversarial LLM output, path traversal, and type coercion.
|
|
994
913
|
|
|
@@ -998,8 +917,8 @@ Shipped. Deterministic task routing (`session_task_route`) with optional experie
|
|
|
998
917
|
### v7.0: ACT-R Activation Memory ✅
|
|
999
918
|
Shipped. Scientifically-grounded retrieval re-ranking via ACT-R base-level activation (`B_i = ln(Σ t_j^(-d))`), candidate-scoped spreading activation, parameterized sigmoid normalization, composite scoring, and zero-cold-start access log infrastructure. 49 dedicated unit tests, 705 total passing.
|
|
1000
919
|
|
|
1001
|
-
### v7.2: Verification Harness
|
|
1002
|
-
|
|
920
|
+
### v7.2: Verification Harness ✅
|
|
921
|
+
Shipped. Spec-frozen verification contract (`implementation_plan.md` + `verification_harness.json` + immutable `validation_result`), multi-layer machine checks (`data`, `agent`, `pipeline`), finalization gate policies (`warn` / `gate` / `abort`), and CLI `verify generate` / `verify status --json` with schema-versioned output.
|
|
1003
922
|
|
|
1004
923
|
### Future Tracks
|
|
1005
924
|
- **v7.x: Affect-Tagged Memory** — Recall prioritization improves by weighting memories with affective/contextual valence, making surfaced context more behaviorally useful.
|
|
@@ -1009,7 +928,7 @@ Planned. Adds a spec-frozen verification contract (`implementation_plan.md` + `v
|
|
|
1009
928
|
## ❓ Troubleshooting FAQ
|
|
1010
929
|
|
|
1011
930
|
**Q: Why is the dashboard project selector stuck on "Loading projects..."?**
|
|
1012
|
-
A:
|
|
931
|
+
A: Fixed in v7.3.3. The root cause was a multi-layer quote-escaping trap in the `abortPipeline` onclick handler that generated a `SyntaxError` in the browser, silently killing the entire dashboard IIFE. Update to v7.3.3+ (`npx -y prism-mcp-server`). If still stuck, check that Supabase env values are properly set (unresolved placeholders like `${SUPABASE_URL}` cause `/api/projects` to return empty). Prism auto-falls back to local SQLite when Supabase is misconfigured.
|
|
1013
932
|
|
|
1014
933
|
**Q: Why is semantic search quality weak or inconsistent?**
|
|
1015
934
|
A: Check embedding provider configuration and key availability. Missing embedding credentials reduce semantic recall quality and can shift behavior toward keyword-heavy matches.
|
|
@@ -1020,8 +939,11 @@ A: Use `session_forget_memory` for targeted soft/hard deletion. For manual clean
|
|
|
1020
939
|
**Q: How do I verify the install quickly?**
|
|
1021
940
|
A: Run `npm run build && npm test`, then open the Mind Palace dashboard (`localhost:3000`) and confirm projects load plus Graph Health renders.
|
|
1022
941
|
|
|
942
|
+
---
|
|
943
|
+
|
|
944
|
+
### 💡 Known Limitations & Quirks
|
|
1023
945
|
|
|
1024
|
-
- **LLM-dependent features require an API key.** Semantic search, Morning Briefings, auto-compaction, and VLM captioning need a `GOOGLE_API_KEY` (Gemini) or equivalent provider key. Without one, Prism falls back to keyword-only search (FTS5).
|
|
946
|
+
- **LLM-dependent features require an API key.** Semantic search, Morning Briefings, auto-compaction, and VLM captioning need a `GOOGLE_API_KEY` (your Gemini API key) or equivalent provider key. Without one, Prism falls back to keyword-only search (FTS5).
|
|
1025
947
|
- **Auto-load is model- and client-dependent.** Session auto-loading relies on both the LLM following system prompt instructions *and* the MCP client completing tool registration before the model's first turn. Prism provides platform-specific [Setup Guides](#-setup-guides) and a server-side fallback (v5.2.1) that auto-pushes context after 10 seconds.
|
|
1026
948
|
- **MCP client race conditions.** Some MCP clients may not finish tool enumeration before the model generates its first response, causing transient `unknown_tool` errors. This is a client-side timing issue — Prism's server completes the MCP handshake in ~60ms. Workaround: the server-side auto-push fallback and the startup skill's retry logic.
|
|
1027
949
|
- **No real-time sync without Supabase.** Local SQLite mode is single-machine only. Multi-device or team sync requires a Supabase backend.
|
|
@@ -30,6 +30,37 @@ RULES:
|
|
|
30
30
|
- Do NOT use markdown code fences
|
|
31
31
|
- If you cannot complete the task, return: {"actions": [], "notes": "reason"}
|
|
32
32
|
`.trim();
|
|
33
|
+
const PLAN_CONTRACT_SCHEMA = `
|
|
34
|
+
You MUST respond with ONLY a valid JSON object matching this schema:
|
|
35
|
+
{
|
|
36
|
+
"criteria": [
|
|
37
|
+
{
|
|
38
|
+
"id": "string (unique identifier, e.g. 'req-1')",
|
|
39
|
+
"description": "string (clear, testable condition)"
|
|
40
|
+
}
|
|
41
|
+
]
|
|
42
|
+
}
|
|
43
|
+
`.trim();
|
|
44
|
+
const EVALUATE_SCHEMA = `
|
|
45
|
+
You MUST respond with ONLY a valid JSON object matching this schema:
|
|
46
|
+
{
|
|
47
|
+
"pass": true | false,
|
|
48
|
+
"plan_viable": true | false,
|
|
49
|
+
"notes": "string (optional summary)",
|
|
50
|
+
"findings": [
|
|
51
|
+
{
|
|
52
|
+
"severity": "critical" | "warning" | "info",
|
|
53
|
+
"criterion_id": "string (must match a contract criterion id)",
|
|
54
|
+
"pass_fail": true | false,
|
|
55
|
+
"evidence": {
|
|
56
|
+
"file": "string",
|
|
57
|
+
"line": 42,
|
|
58
|
+
"description": "string"
|
|
59
|
+
}
|
|
60
|
+
}
|
|
61
|
+
]
|
|
62
|
+
}
|
|
63
|
+
`.trim();
|
|
33
64
|
/**
|
|
34
65
|
* Invocation wrapper that routes payload specs to the local Claw agent model (Qwen 2.5),
|
|
35
66
|
* or the active LLM provider as fallback.
|
|
@@ -49,16 +80,40 @@ export async function invokeClawAgent(spec, state, timeoutMs = 120000 // 2 min d
|
|
|
49
80
|
: getLLMProvider();
|
|
50
81
|
// Scope injection via SafetyController — single source of truth
|
|
51
82
|
const systemPrompt = SafetyController.generateBoundaryPrompt(spec, state);
|
|
52
|
-
//
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
83
|
+
// Inject the appropriate JSON schema according to the step
|
|
84
|
+
let stepPrompt = `Based on the system instructions, execute the necessary task for the current step (${state.current_step}). Respond with your actions and observations.`;
|
|
85
|
+
let isJsonMode = false;
|
|
86
|
+
if (state.current_step === 'EXECUTE') {
|
|
87
|
+
let revisionContext = '';
|
|
88
|
+
// If we are retrying after an EVALUATE failure, state.notes holds the serialized evaluator critique.
|
|
89
|
+
// Inject it so the Generator knows exactly what to fix rather than retrying blindly.
|
|
90
|
+
if (state.eval_revisions && state.eval_revisions > 0) {
|
|
91
|
+
revisionContext = `\n\n=== EVALUATOR CRITIQUE (revision ${state.eval_revisions}) ===\n${state.notes || 'Fix previous errors.'}\n\nYou MUST correct all issues listed above before submitting.`;
|
|
92
|
+
}
|
|
93
|
+
stepPrompt = `Based on the system instructions, execute the necessary actions for the current step (${state.current_step}).${revisionContext}\n\n${EXECUTE_JSON_SCHEMA}`;
|
|
94
|
+
isJsonMode = true;
|
|
95
|
+
}
|
|
96
|
+
else if (state.current_step === 'PLAN_CONTRACT') {
|
|
97
|
+
stepPrompt = `Based on the system instructions from the PLAN phase, formulate a strict, boolean-testable contract rubric.\n\n${PLAN_CONTRACT_SCHEMA}`;
|
|
98
|
+
isJsonMode = true;
|
|
99
|
+
}
|
|
100
|
+
else if (state.current_step === 'EVALUATE') {
|
|
101
|
+
stepPrompt = `Based on the system instructions, evaluate the GENERATOR's execution against the PLAN_CONTRACT rubric. BE STRICT.
|
|
102
|
+
|
|
103
|
+
=== GENERATOR'S ACTIONS ===
|
|
104
|
+
${state.notes || 'No notes provided'}
|
|
105
|
+
|
|
106
|
+
=== CONTRACT RUBRIC ===
|
|
107
|
+
${state.contract_payload ? JSON.stringify(state.contract_payload.criteria, null, 2) : '(See contract_rubric.json on disk)'}
|
|
108
|
+
|
|
109
|
+
${EVALUATE_SCHEMA}`;
|
|
110
|
+
isJsonMode = true;
|
|
111
|
+
}
|
|
112
|
+
debugLog(`[ClawInvocation] Launching agent on pipeline ${state.id} step=${state.current_step} iter=${state.iteration} with ${timeoutMs}ms limit.${isJsonMode ? ' (JSON mode)' : ''}`);
|
|
58
113
|
try {
|
|
59
114
|
// Timeout Promise to ensure the runner thread does not block indefinitely
|
|
60
115
|
const timeboundExecution = Promise.race([
|
|
61
|
-
llm.generateText(
|
|
116
|
+
llm.generateText(stepPrompt, systemPrompt),
|
|
62
117
|
new Promise((_, reject) => setTimeout(() => reject(new Error('LLM_EXECUTION_TIMEOUT')), timeoutMs))
|
|
63
118
|
]);
|
|
64
119
|
const result = await timeboundExecution;
|
|
@@ -191,25 +191,21 @@ async function emitExperienceEvent(pipeline, eventType, outcome) {
|
|
|
191
191
|
*
|
|
192
192
|
* @internal Exported for unit testing only. Not part of the public API.
|
|
193
193
|
*/
|
|
194
|
-
|
|
194
|
+
function extractJsonFromLlmOutput(raw) {
|
|
195
195
|
if (!raw || typeof raw !== 'string' || raw.trim() === '') {
|
|
196
|
-
return {
|
|
196
|
+
return { json: null, error: 'JSON Parse Error: empty or non-string input' };
|
|
197
197
|
}
|
|
198
198
|
const cleaned = raw.trim();
|
|
199
199
|
let jsonCandidate = null;
|
|
200
|
-
// Strategy 1: Try raw trimmed input as-is
|
|
201
200
|
if (cleaned.startsWith('{')) {
|
|
202
201
|
jsonCandidate = cleaned;
|
|
203
202
|
}
|
|
204
|
-
// Strategy 2: Strip markdown code fences
|
|
205
203
|
if (!jsonCandidate) {
|
|
206
|
-
// Match ```json or ``` blocks anywhere in the text (not just start/end of string)
|
|
207
204
|
const fenceMatch = cleaned.match(/```(?:json)?\s*\n?([\s\S]*?)\n?\s*```/);
|
|
208
205
|
if (fenceMatch) {
|
|
209
206
|
jsonCandidate = fenceMatch[1].trim();
|
|
210
207
|
}
|
|
211
208
|
}
|
|
212
|
-
// Strategy 3: Brace extraction — find first { to last }
|
|
213
209
|
if (!jsonCandidate) {
|
|
214
210
|
const firstBrace = cleaned.indexOf('{');
|
|
215
211
|
const lastBrace = cleaned.lastIndexOf('}');
|
|
@@ -218,17 +214,21 @@ export function parseExecuteOutput(raw) {
|
|
|
218
214
|
}
|
|
219
215
|
}
|
|
220
216
|
if (!jsonCandidate) {
|
|
221
|
-
return {
|
|
217
|
+
return { json: null, error: 'JSON Parse Error: no JSON object found in LLM output' };
|
|
222
218
|
}
|
|
223
|
-
|
|
219
|
+
return { json: jsonCandidate, error: null };
|
|
220
|
+
}
|
|
221
|
+
export function parseExecuteOutput(raw) {
|
|
222
|
+
const ext = extractJsonFromLlmOutput(raw);
|
|
223
|
+
if (ext.error || !ext.json)
|
|
224
|
+
return { parsed: null, error: ext.error };
|
|
224
225
|
let parsed;
|
|
225
226
|
try {
|
|
226
|
-
parsed = JSON.parse(
|
|
227
|
+
parsed = JSON.parse(ext.json);
|
|
227
228
|
}
|
|
228
229
|
catch {
|
|
229
230
|
return { parsed: null, error: 'JSON Parse Error: LLM output is not valid JSON' };
|
|
230
231
|
}
|
|
231
|
-
// Shape validation: must be an object with an 'actions' array
|
|
232
232
|
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
|
|
233
233
|
return { parsed: null, error: 'Shape Error: output is not a JSON object' };
|
|
234
234
|
}
|
|
@@ -236,7 +236,6 @@ export function parseExecuteOutput(raw) {
|
|
|
236
236
|
return { parsed: null, error: 'Shape Error: output missing required "actions" array' };
|
|
237
237
|
}
|
|
238
238
|
const result = parsed;
|
|
239
|
-
// Validate each action in the array
|
|
240
239
|
for (let i = 0; i < result.actions.length; i++) {
|
|
241
240
|
const action = result.actions[i];
|
|
242
241
|
if (!action || typeof action !== 'object' || Array.isArray(action)) {
|
|
@@ -251,6 +250,62 @@ export function parseExecuteOutput(raw) {
|
|
|
251
250
|
}
|
|
252
251
|
return { parsed: result, error: null };
|
|
253
252
|
}
|
|
253
|
+
export function parseContractOutput(raw) {
|
|
254
|
+
const ext = extractJsonFromLlmOutput(raw);
|
|
255
|
+
if (ext.error || !ext.json)
|
|
256
|
+
return { parsed: null, error: ext.error };
|
|
257
|
+
let parsed;
|
|
258
|
+
try {
|
|
259
|
+
parsed = JSON.parse(ext.json);
|
|
260
|
+
}
|
|
261
|
+
catch {
|
|
262
|
+
return { parsed: null, error: 'JSON Parse Error: LLM output is not valid JSON' };
|
|
263
|
+
}
|
|
264
|
+
if (!parsed || typeof parsed !== 'object' || !Array.isArray(parsed.criteria)) {
|
|
265
|
+
return { parsed: null, error: 'Shape Error: output missing required "criteria" array' };
|
|
266
|
+
}
|
|
267
|
+
// Validate each criterion element has the required string fields
|
|
268
|
+
for (let i = 0; i < parsed.criteria.length; i++) {
|
|
269
|
+
const c = parsed.criteria[i];
|
|
270
|
+
if (!c || typeof c !== 'object' || typeof c.id !== 'string' || typeof c.description !== 'string') {
|
|
271
|
+
return { parsed: null, error: `Shape Error: criteria[${i}] must have string "id" and "description"` };
|
|
272
|
+
}
|
|
273
|
+
}
|
|
274
|
+
return { parsed: parsed, error: null };
|
|
275
|
+
}
|
|
276
|
+
export function parseEvaluationOutput(raw) {
|
|
277
|
+
const ext = extractJsonFromLlmOutput(raw);
|
|
278
|
+
if (ext.error || !ext.json)
|
|
279
|
+
return { parsed: null, error: ext.error };
|
|
280
|
+
let parsed;
|
|
281
|
+
try {
|
|
282
|
+
parsed = JSON.parse(ext.json);
|
|
283
|
+
}
|
|
284
|
+
catch {
|
|
285
|
+
return { parsed: null, error: 'JSON Parse Error: LLM output is not valid JSON' };
|
|
286
|
+
}
|
|
287
|
+
if (!parsed || typeof parsed !== 'object' || typeof parsed.pass !== 'boolean') {
|
|
288
|
+
return { parsed: null, error: 'Shape Error: output missing required "pass" boolean' };
|
|
289
|
+
}
|
|
290
|
+
const p = parsed;
|
|
291
|
+
if (p.findings !== undefined) {
|
|
292
|
+
if (!Array.isArray(p.findings)) {
|
|
293
|
+
return { parsed: null, error: 'Shape Error: "findings" must be an array when present' };
|
|
294
|
+
}
|
|
295
|
+
// Fix #3: Each failing finding must supply an evidence object so the
|
|
296
|
+
// Evaluator cannot submit bare severity claims without evidence pointers.
|
|
297
|
+
for (let i = 0; i < p.findings.length; i++) {
|
|
298
|
+
const f = p.findings[i];
|
|
299
|
+
if (!f || typeof f !== 'object') {
|
|
300
|
+
return { parsed: null, error: `Shape Error: findings[${i}] must be an object` };
|
|
301
|
+
}
|
|
302
|
+
if (f.pass_fail === false && (!f.evidence || typeof f.evidence !== 'object')) {
|
|
303
|
+
return { parsed: null, error: `Shape Error: findings[${i}] is missing required "evidence" object for failure` };
|
|
304
|
+
}
|
|
305
|
+
}
|
|
306
|
+
}
|
|
307
|
+
return { parsed: parsed, error: null };
|
|
308
|
+
}
|
|
254
309
|
// ─── Step Execution ────────────────────────────────────────────
|
|
255
310
|
/**
|
|
256
311
|
* Execute a single step of the pipeline.
|
|
@@ -273,8 +328,8 @@ async function executeStep(pipeline, spec) {
|
|
|
273
328
|
// - BYOM model override
|
|
274
329
|
// - Timeout enforcement
|
|
275
330
|
const { success, resultText } = await invokeClawAgent(spec, pipeline);
|
|
276
|
-
// For non-
|
|
277
|
-
if (step !== 'EXECUTE') {
|
|
331
|
+
// For non-JSON steps, return as-is (free-form text)
|
|
332
|
+
if (step !== 'EXECUTE' && step !== 'PLAN_CONTRACT' && step !== 'EVALUATE') {
|
|
278
333
|
return {
|
|
279
334
|
iteration: pipeline.iteration,
|
|
280
335
|
step,
|
|
@@ -284,7 +339,6 @@ async function executeStep(pipeline, spec) {
|
|
|
284
339
|
notes: resultText.slice(0, 2000),
|
|
285
340
|
};
|
|
286
341
|
}
|
|
287
|
-
// ── v7.3.1: EXECUTE step — parse and validate structured output ──
|
|
288
342
|
if (!success) {
|
|
289
343
|
// LLM invocation itself failed (timeout, error, etc.)
|
|
290
344
|
return {
|
|
@@ -296,7 +350,59 @@ async function executeStep(pipeline, spec) {
|
|
|
296
350
|
notes: `LLM invocation failed: ${resultText.slice(0, 500)}`,
|
|
297
351
|
};
|
|
298
352
|
}
|
|
299
|
-
// Parse
|
|
353
|
+
// Parse appropriate JSON output depending on step
|
|
354
|
+
if (step === 'PLAN_CONTRACT') {
|
|
355
|
+
const { parsed, error: parseError } = parseContractOutput(resultText);
|
|
356
|
+
if (parseError || !parsed) {
|
|
357
|
+
debugLog(`[DarkFactory] PLAN_CONTRACT output parse failure: ${parseError}`);
|
|
358
|
+
return {
|
|
359
|
+
iteration: pipeline.iteration,
|
|
360
|
+
step,
|
|
361
|
+
started_at: stepStart,
|
|
362
|
+
completed_at: new Date().toISOString(),
|
|
363
|
+
success: false,
|
|
364
|
+
notes: parseError || 'Unknown parse error',
|
|
365
|
+
};
|
|
366
|
+
}
|
|
367
|
+
return {
|
|
368
|
+
iteration: pipeline.iteration,
|
|
369
|
+
step,
|
|
370
|
+
started_at: stepStart,
|
|
371
|
+
completed_at: new Date().toISOString(),
|
|
372
|
+
success: true,
|
|
373
|
+
notes: `Contract accepted with ${parsed.criteria.length} criteria.`,
|
|
374
|
+
contractPayload: parsed, // Passthrough for runner to write to disk
|
|
375
|
+
};
|
|
376
|
+
}
|
|
377
|
+
if (step === 'EVALUATE') {
|
|
378
|
+
const { parsed, error: parseError } = parseEvaluationOutput(resultText);
|
|
379
|
+
if (parseError || !parsed) {
|
|
380
|
+
debugLog(`[DarkFactory] EVALUATE output parse failure: ${parseError}`);
|
|
381
|
+
return {
|
|
382
|
+
iteration: pipeline.iteration,
|
|
383
|
+
step,
|
|
384
|
+
started_at: stepStart,
|
|
385
|
+
completed_at: new Date().toISOString(),
|
|
386
|
+
success: false,
|
|
387
|
+
notes: parseError || 'Unknown parse error',
|
|
388
|
+
};
|
|
389
|
+
}
|
|
390
|
+
// Fix #2: Serialize findings array into notes so the Generator's retry
|
|
391
|
+
// prompt receives the full line-by-line critique, not just a summary string.
|
|
392
|
+
const findingsText = parsed.findings && parsed.findings.length > 0
|
|
393
|
+
? '\nFindings:\n' + parsed.findings.map((f) => `- [${f.severity}] Criterion ${f.criterion_id}: ${f.evidence?.description || 'Failed'} (${f.evidence?.file || 'unknown'}:${f.evidence?.line ?? '?'})`).join('\n')
|
|
394
|
+
: '';
|
|
395
|
+
return {
|
|
396
|
+
iteration: pipeline.iteration,
|
|
397
|
+
step,
|
|
398
|
+
started_at: stepStart,
|
|
399
|
+
completed_at: new Date().toISOString(),
|
|
400
|
+
success: parsed.pass,
|
|
401
|
+
notes: (parsed.notes || `Evaluation complete: ${parsed.pass ? 'PASS' : 'FAIL'}`) + findingsText,
|
|
402
|
+
evaluationPayload: parsed, // Passthrough for orchestrator logic
|
|
403
|
+
};
|
|
404
|
+
}
|
|
405
|
+
// EXECUTE
|
|
300
406
|
const { parsed, error: parseError } = parseExecuteOutput(resultText);
|
|
301
407
|
if (parseError || !parsed) {
|
|
302
408
|
debugLog(`[DarkFactory] EXECUTE output parse failure: ${parseError}`);
|
|
@@ -582,10 +688,57 @@ async function runnerTick() {
|
|
|
582
688
|
}
|
|
583
689
|
}
|
|
584
690
|
}
|
|
585
|
-
|
|
586
|
-
|
|
587
|
-
|
|
588
|
-
|
|
691
|
+
if (currentStep === 'PLAN_CONTRACT' && spec.workingDirectory && result.success && result.contractPayload) {
|
|
692
|
+
const contractPath = path.join(path.resolve(spec.workingDirectory), 'contract_rubric.json');
|
|
693
|
+
try {
|
|
694
|
+
fs.writeFileSync(contractPath, JSON.stringify(result.contractPayload, null, 2), 'utf8');
|
|
695
|
+
debugLog(`[DarkFactory] contract_rubric.json written to ${contractPath}`);
|
|
696
|
+
}
|
|
697
|
+
catch (writeErr) {
|
|
698
|
+
// Disk/permissions error — fail the pipeline immediately so it doesn't
|
|
699
|
+
// loop on PLAN_CONTRACT forever (each tick would re-attempt the write).
|
|
700
|
+
debugLog(`[DarkFactory] Failed to write contract_rubric.json: ${writeErr.message}`);
|
|
701
|
+
try {
|
|
702
|
+
await storage.savePipeline({
|
|
703
|
+
...pipeline,
|
|
704
|
+
status: 'FAILED',
|
|
705
|
+
error: `PLAN_CONTRACT failed: could not write contract_rubric.json — ${writeErr.message}`,
|
|
706
|
+
});
|
|
707
|
+
}
|
|
708
|
+
catch { /* status guard */ }
|
|
709
|
+
await emitExperienceEvent(pipeline, 'failure', `contract_rubric.json write failed: ${writeErr.message}`);
|
|
710
|
+
return;
|
|
711
|
+
}
|
|
712
|
+
}
|
|
713
|
+
if (currentStep === 'EVALUATE' && result.evaluationPayload) {
|
|
714
|
+
// Emit ML learning event for evaluation outcome.
|
|
715
|
+
// Using 'learning' (valid LedgerEntry event type) rather than
|
|
716
|
+
// a non-existent 'evaluation_result' to avoid runtime cast issues.
|
|
717
|
+
try {
|
|
718
|
+
await storage.saveLedger({
|
|
719
|
+
project: pipeline.project,
|
|
720
|
+
conversation_id: `dark-factory-${pipeline.id}`,
|
|
721
|
+
user_id: pipeline.user_id,
|
|
722
|
+
event_type: 'learning',
|
|
723
|
+
summary: `[EVALUATE] ${result.success ? 'PASS' : 'FAIL'} on iter ${pipeline.iteration} rev ${pipeline.eval_revisions ?? 0}`,
|
|
724
|
+
keywords: ['dark-factory', 'evaluation', pipeline.project],
|
|
725
|
+
importance: result.success ? 3 : 1,
|
|
726
|
+
confidence_score: result.success ? 90 : 50,
|
|
727
|
+
});
|
|
728
|
+
}
|
|
729
|
+
catch { /* advisory — never block execution */ }
|
|
730
|
+
}
|
|
731
|
+
// ─── Determine plan_viable from evaluation payload ───
|
|
732
|
+
// Default to false (conservative): a parse failure or missing payload means
|
|
733
|
+
// we don't know if the plan is viable, so escalate to PLAN re-planning
|
|
734
|
+
// rather than burning eval_revisions on more EXECUTE retries.
|
|
735
|
+
let evalPlanViable = false;
|
|
736
|
+
if (currentStep === 'EVALUATE' && result.evaluationPayload) {
|
|
737
|
+
// plan_viable defaults false if null/missing (same conservative principle)
|
|
738
|
+
evalPlanViable = result.evaluationPayload.plan_viable ?? false;
|
|
739
|
+
}
|
|
740
|
+
const nextStepInfo = SafetyController.getNextStep(pipeline, spec, result.success, evalPlanViable);
|
|
741
|
+
if (nextStepInfo === null || currentStep === 'FINALIZE') {
|
|
589
742
|
// Pipeline complete — determine final status
|
|
590
743
|
const finalStatus = result.success ? 'COMPLETED' : 'FAILED';
|
|
591
744
|
const finalError = result.success ? null : `Pipeline ended at step=${currentStep}: ${result.notes?.slice(0, 500)}`;
|
|
@@ -613,13 +766,25 @@ async function runnerTick() {
|
|
|
613
766
|
debugLog(`[DarkFactory] Pipeline ${pipeline.id} finished: ${finalStatus}`);
|
|
614
767
|
}
|
|
615
768
|
else {
|
|
616
|
-
// Advance to next step
|
|
617
769
|
try {
|
|
770
|
+
const updatedPayload = currentStep === 'PLAN_CONTRACT' && result.contractPayload
|
|
771
|
+
? result.contractPayload
|
|
772
|
+
: pipeline.contract_payload;
|
|
773
|
+
// Forward the most informative notes available:
|
|
774
|
+
// EXECUTE notes = what the generator did
|
|
775
|
+
// EVALUATE notes = what the evaluator found
|
|
776
|
+
// Other steps: preserve existing notes
|
|
777
|
+
const updatedNotes = (currentStep === 'EXECUTE' || currentStep === 'EVALUATE') && result.notes
|
|
778
|
+
? result.notes
|
|
779
|
+
: pipeline.notes;
|
|
618
780
|
await storage.savePipeline({
|
|
619
781
|
...pipeline,
|
|
620
|
-
current_step:
|
|
621
|
-
iteration:
|
|
782
|
+
current_step: nextStepInfo.step,
|
|
783
|
+
iteration: nextStepInfo.iteration,
|
|
784
|
+
eval_revisions: nextStepInfo.eval_revisions,
|
|
622
785
|
last_heartbeat: new Date().toISOString(),
|
|
786
|
+
contract_payload: updatedPayload,
|
|
787
|
+
notes: updatedNotes,
|
|
623
788
|
});
|
|
624
789
|
}
|
|
625
790
|
catch (err) {
|
|
@@ -630,7 +795,7 @@ async function runnerTick() {
|
|
|
630
795
|
}
|
|
631
796
|
throw err;
|
|
632
797
|
}
|
|
633
|
-
debugLog(`[DarkFactory] Pipeline ${pipeline.id} advanced: ${currentStep} → ${
|
|
798
|
+
debugLog(`[DarkFactory] Pipeline ${pipeline.id} advanced: ${currentStep} → ${nextStepInfo.step} (iter ${nextStepInfo.iteration}, rev ${nextStepInfo.eval_revisions ?? 0})`);
|
|
634
799
|
}
|
|
635
800
|
}
|
|
636
801
|
catch (err) {
|
|
@@ -1,4 +1,4 @@
|
|
|
1
|
-
import { VALID_ACTION_TYPES } from './schema.js';
|
|
1
|
+
import { VALID_ACTION_TYPES, DEFAULT_MAX_REVISIONS } from './schema.js';
|
|
2
2
|
import { PRISM_DARK_FACTORY_MAX_RUNTIME_MS } from '../config.js';
|
|
3
3
|
import { debugLog } from '../utils/logger.js';
|
|
4
4
|
import path from 'path';
|
|
@@ -31,13 +31,6 @@ export class SafetyController {
|
|
|
31
31
|
'COMPLETED': [], // Terminal — no exits
|
|
32
32
|
'FAILED': ['RUNNING'], // Allow retry from failed state
|
|
33
33
|
};
|
|
34
|
-
/**
|
|
35
|
-
* Legal step transitions for the pipeline execution state machine.
|
|
36
|
-
* FINALIZE is entered from VERIFY when iteration == maxIterations or success.
|
|
37
|
-
*/
|
|
38
|
-
static STEP_ORDER = [
|
|
39
|
-
'INIT', 'PLAN', 'EXECUTE', 'VERIFY', 'FINALIZE'
|
|
40
|
-
];
|
|
41
34
|
/**
|
|
42
35
|
* Prevents runaway LLM invocation loops by enforcing the max iteration envelope.
|
|
43
36
|
*/
|
|
@@ -147,8 +140,15 @@ export class SafetyController {
|
|
|
147
140
|
* Used by clawInvocation.ts instead of inline prompt construction.
|
|
148
141
|
*/
|
|
149
142
|
static generateBoundaryPrompt(spec, state) {
|
|
143
|
+
let modeDescription = 'an autonomous code agent';
|
|
144
|
+
if (state.current_step === 'PLAN_CONTRACT' || state.current_step === 'EVALUATE') {
|
|
145
|
+
modeDescription = 'an ADVERSARIAL EVALUATOR enforcing strict quality constraints against a generated output';
|
|
146
|
+
}
|
|
147
|
+
else if (state.current_step === 'EXECUTE') {
|
|
148
|
+
modeDescription = 'a GENERATOR executing code constrained by a strict rubric';
|
|
149
|
+
}
|
|
150
150
|
const lines = [
|
|
151
|
-
`You are Prism Dark Factory, operating in the background as
|
|
151
|
+
`You are Prism Dark Factory, operating in the background as ${modeDescription}.`,
|
|
152
152
|
`You are strictly limited to code actions within the defined scope.`,
|
|
153
153
|
``,
|
|
154
154
|
`── Operational Boundaries ──`,
|
|
@@ -156,6 +156,7 @@ export class SafetyController {
|
|
|
156
156
|
`Project: ${state.project}`,
|
|
157
157
|
`Current Step: ${state.current_step}`,
|
|
158
158
|
`Iteration: ${state.iteration} / ${spec.maxIterations}`,
|
|
159
|
+
`Revision: ${state.eval_revisions ?? 0} / ${spec.maxRevisions ?? DEFAULT_MAX_REVISIONS}`,
|
|
159
160
|
`Restricted Workspace: ${spec.workingDirectory || '(unrestricted)'}`,
|
|
160
161
|
];
|
|
161
162
|
if (spec.contextFiles && spec.contextFiles.length > 0) {
|
|
@@ -164,29 +165,54 @@ export class SafetyController {
|
|
|
164
165
|
lines.push(``, `── Objective ──`, spec.objective, ``, `── Safety Rules ──`, `1. Do NOT modify files outside the Restricted Workspace.`, `2. Do NOT make network requests unless the objective explicitly requires it.`, `3. Do NOT execute destructive operations (rm -rf, DROP TABLE, etc.).`, `4. Respond ONLY with actions relevant to the current step.`, `5. If you cannot complete the step, explain why and stop.`);
|
|
165
166
|
return lines.join('\n');
|
|
166
167
|
}
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
static getNextStep(currentStep, iteration, spec, verifyPassed) {
|
|
168
|
+
static getNextStep(state, spec, stepPassed, planViable = true) {
|
|
169
|
+
const currentStep = state.current_step;
|
|
170
|
+
const iteration = state.iteration;
|
|
171
|
+
const eval_revisions = state.eval_revisions ?? 0;
|
|
172
172
|
switch (currentStep) {
|
|
173
173
|
case 'INIT':
|
|
174
|
-
return { step: 'PLAN', iteration };
|
|
174
|
+
return { step: 'PLAN', iteration, eval_revisions };
|
|
175
175
|
case 'PLAN':
|
|
176
|
-
return { step: '
|
|
176
|
+
return { step: 'PLAN_CONTRACT', iteration, eval_revisions };
|
|
177
|
+
case 'PLAN_CONTRACT':
|
|
178
|
+
return { step: 'EXECUTE', iteration, eval_revisions };
|
|
177
179
|
case 'EXECUTE':
|
|
178
|
-
return { step: '
|
|
180
|
+
return { step: 'EVALUATE', iteration, eval_revisions };
|
|
181
|
+
case 'EVALUATE':
|
|
182
|
+
if (stepPassed) {
|
|
183
|
+
// Contract passed, move to VERIFY
|
|
184
|
+
return { step: 'VERIFY', iteration, eval_revisions: 0 };
|
|
185
|
+
}
|
|
186
|
+
// Contract failed.
|
|
187
|
+
if (planViable) {
|
|
188
|
+
// Fall back to EXECUTE but increment revision counter
|
|
189
|
+
const nextRevision = eval_revisions + 1;
|
|
190
|
+
const maxRev = spec.maxRevisions ?? DEFAULT_MAX_REVISIONS;
|
|
191
|
+
if (nextRevision >= maxRev) {
|
|
192
|
+
// Exceeded max revisions — pipeline fails
|
|
193
|
+
return null;
|
|
194
|
+
}
|
|
195
|
+
return { step: 'EXECUTE', iteration, eval_revisions: nextRevision };
|
|
196
|
+
}
|
|
197
|
+
else {
|
|
198
|
+
// Fall back all the way to PLAN
|
|
199
|
+
const nextIteration = iteration + 1;
|
|
200
|
+
if (!SafetyController.validateIterationLimit(nextIteration, spec)) {
|
|
201
|
+
return null;
|
|
202
|
+
}
|
|
203
|
+
return { step: 'PLAN', iteration: nextIteration, eval_revisions: 0 };
|
|
204
|
+
}
|
|
179
205
|
case 'VERIFY':
|
|
180
|
-
if (
|
|
181
|
-
return { step: 'FINALIZE', iteration };
|
|
206
|
+
if (stepPassed) {
|
|
207
|
+
return { step: 'FINALIZE', iteration, eval_revisions };
|
|
182
208
|
}
|
|
183
209
|
// Verification failed — loop back to PLAN with incremented iteration
|
|
184
|
-
const
|
|
185
|
-
if (!SafetyController.validateIterationLimit(
|
|
210
|
+
const nextIterationVerify = iteration + 1;
|
|
211
|
+
if (!SafetyController.validateIterationLimit(nextIterationVerify, spec)) {
|
|
186
212
|
// Exceeded max iterations — force finalize with failure
|
|
187
213
|
return null;
|
|
188
214
|
}
|
|
189
|
-
return { step: 'PLAN', iteration:
|
|
215
|
+
return { step: 'PLAN', iteration: nextIterationVerify, eval_revisions: 0 };
|
|
190
216
|
case 'FINALIZE':
|
|
191
217
|
return null; // Terminal step
|
|
192
218
|
default:
|
package/dist/server.js
CHANGED
|
@@ -1184,6 +1184,25 @@ export async function startServer() {
|
|
|
1184
1184
|
console.error(`[DarkFactory] Startup failed (non-fatal): ${err instanceof Error ? err.message : String(err)}`);
|
|
1185
1185
|
});
|
|
1186
1186
|
}
|
|
1187
|
+
// ─── v7.4: TurboQuant Compressor Async Warmup ────────────
|
|
1188
|
+
// The first call to getDefaultCompressor() triggers construction of a
|
|
1189
|
+
// 768×768 rotation matrix (~15ms of synchronous CPU). Pre-warm via
|
|
1190
|
+
// setImmediate so it runs after the current event-loop tick completes,
|
|
1191
|
+
// preventing the stdio handshake from being blocked during startup.
|
|
1192
|
+
// Fire-and-forget — non-critical; subsequent calls hit the singleton cache.
|
|
1193
|
+
setImmediate(() => {
|
|
1194
|
+
try {
|
|
1195
|
+
// Dynamic import avoids loading turboquant.ts at module-parse time
|
|
1196
|
+
// (the construction side-effects run only when actually needed).
|
|
1197
|
+
import("./utils/turboquant.js").then(({ getDefaultCompressor }) => {
|
|
1198
|
+
getDefaultCompressor();
|
|
1199
|
+
console.error("[Prism] TurboQuant compressor pre-warmed (rotation matrix ready)");
|
|
1200
|
+
}).catch(err => {
|
|
1201
|
+
console.error(`[TurboQuant] Warmup failed (non-fatal): ${err instanceof Error ? err.message : String(err)}`);
|
|
1202
|
+
});
|
|
1203
|
+
}
|
|
1204
|
+
catch { /* warmup is a best-effort optimization */ }
|
|
1205
|
+
});
|
|
1187
1206
|
// Keep the process alive — without this, Node.js would exit
|
|
1188
1207
|
// because there are no active event loop handles after the
|
|
1189
1208
|
// synchronous setup completes.
|
package/dist/storage/sqlite.js
CHANGED
|
@@ -566,13 +566,37 @@ export class SqliteStorage {
|
|
|
566
566
|
status TEXT NOT NULL,
|
|
567
567
|
current_step TEXT NOT NULL,
|
|
568
568
|
iteration INTEGER NOT NULL,
|
|
569
|
+
eval_revisions INTEGER DEFAULT 0,
|
|
569
570
|
started_at TEXT NOT NULL,
|
|
570
571
|
updated_at TEXT NOT NULL,
|
|
571
572
|
spec TEXT NOT NULL,
|
|
572
573
|
error TEXT,
|
|
573
|
-
last_heartbeat TEXT
|
|
574
|
+
last_heartbeat TEXT,
|
|
575
|
+
contract_payload TEXT,
|
|
576
|
+
notes TEXT
|
|
574
577
|
)
|
|
575
578
|
`);
|
|
579
|
+
// ─── v7.4.0 Migration: Adversarial Eval Revisions ─────────
|
|
580
|
+
try {
|
|
581
|
+
await this.db.execute(`ALTER TABLE dark_factory_pipelines ADD COLUMN eval_revisions INTEGER DEFAULT 0`);
|
|
582
|
+
debugLog("[SqliteStorage] v7.4.0 migration: added eval_revisions column");
|
|
583
|
+
// Backfill existing rows — ALTER TABLE DEFAULT only applies to new inserts;
|
|
584
|
+
// rows that existed before the migration will have NULL until explicitly set.
|
|
585
|
+
await this.db.execute(`UPDATE dark_factory_pipelines SET eval_revisions = 0 WHERE eval_revisions IS NULL`);
|
|
586
|
+
debugLog("[SqliteStorage] v7.4.0 migration: backfilled eval_revisions = 0");
|
|
587
|
+
}
|
|
588
|
+
catch (e) {
|
|
589
|
+
if (!e.message?.includes("duplicate column name"))
|
|
590
|
+
throw e;
|
|
591
|
+
}
|
|
592
|
+
try {
|
|
593
|
+
await this.db.execute(`ALTER TABLE dark_factory_pipelines ADD COLUMN contract_payload TEXT`);
|
|
594
|
+
await this.db.execute(`ALTER TABLE dark_factory_pipelines ADD COLUMN notes TEXT`);
|
|
595
|
+
}
|
|
596
|
+
catch (e) {
|
|
597
|
+
if (!e.message?.includes("duplicate column name"))
|
|
598
|
+
throw e;
|
|
599
|
+
}
|
|
576
600
|
await this.db.execute(`CREATE INDEX IF NOT EXISTS idx_pipelines_status ON dark_factory_pipelines(user_id, project, status)`);
|
|
577
601
|
// ─── v7.2.0 Migration: Verification Harness ────────────────
|
|
578
602
|
await this.db.execute(`
|
|
@@ -2888,16 +2912,19 @@ export class SqliteStorage {
|
|
|
2888
2912
|
}
|
|
2889
2913
|
await this.db.execute({
|
|
2890
2914
|
sql: `
|
|
2891
|
-
INSERT INTO dark_factory_pipelines (id, project, user_id, status, current_step, iteration, started_at, updated_at, spec, error, last_heartbeat)
|
|
2892
|
-
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|
2915
|
+
INSERT INTO dark_factory_pipelines (id, project, user_id, status, current_step, iteration, eval_revisions, started_at, updated_at, spec, error, last_heartbeat, contract_payload, notes)
|
|
2916
|
+
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
|
2893
2917
|
ON CONFLICT(id) DO UPDATE SET
|
|
2894
2918
|
status = excluded.status,
|
|
2895
2919
|
current_step = excluded.current_step,
|
|
2896
2920
|
iteration = excluded.iteration,
|
|
2921
|
+
eval_revisions = excluded.eval_revisions,
|
|
2897
2922
|
updated_at = excluded.updated_at,
|
|
2898
2923
|
spec = excluded.spec,
|
|
2899
2924
|
error = excluded.error,
|
|
2900
|
-
last_heartbeat = excluded.last_heartbeat
|
|
2925
|
+
last_heartbeat = excluded.last_heartbeat,
|
|
2926
|
+
contract_payload = excluded.contract_payload,
|
|
2927
|
+
notes = excluded.notes
|
|
2901
2928
|
`,
|
|
2902
2929
|
args: [
|
|
2903
2930
|
updatedState.id,
|
|
@@ -2906,11 +2933,14 @@ export class SqliteStorage {
|
|
|
2906
2933
|
updatedState.status,
|
|
2907
2934
|
updatedState.current_step,
|
|
2908
2935
|
updatedState.iteration,
|
|
2936
|
+
updatedState.eval_revisions ?? 0,
|
|
2909
2937
|
updatedState.started_at,
|
|
2910
2938
|
updatedState.updated_at,
|
|
2911
2939
|
updatedState.spec,
|
|
2912
2940
|
updatedState.error || null,
|
|
2913
|
-
updatedState.last_heartbeat || null
|
|
2941
|
+
updatedState.last_heartbeat || null,
|
|
2942
|
+
updatedState.contract_payload ? JSON.stringify(updatedState.contract_payload) : null,
|
|
2943
|
+
updatedState.notes || null
|
|
2914
2944
|
]
|
|
2915
2945
|
});
|
|
2916
2946
|
}
|
|
@@ -2921,7 +2951,11 @@ export class SqliteStorage {
|
|
|
2921
2951
|
});
|
|
2922
2952
|
if (result.rows.length === 0)
|
|
2923
2953
|
return null;
|
|
2924
|
-
|
|
2954
|
+
const row = result.rows[0];
|
|
2955
|
+
return {
|
|
2956
|
+
...row,
|
|
2957
|
+
contract_payload: row.contract_payload ? JSON.parse(row.contract_payload) : undefined
|
|
2958
|
+
};
|
|
2925
2959
|
}
|
|
2926
2960
|
async listPipelines(project, status, userId) {
|
|
2927
2961
|
const conditions = ['user_id = ?'];
|
|
@@ -2937,7 +2971,10 @@ export class SqliteStorage {
|
|
|
2937
2971
|
const where = conditions.length > 0 ? `WHERE ${conditions.join(' AND ')}` : '';
|
|
2938
2972
|
const sql = `SELECT * FROM dark_factory_pipelines ${where} ORDER BY updated_at DESC`;
|
|
2939
2973
|
const result = await this.db.execute({ sql, args });
|
|
2940
|
-
return result.rows
|
|
2974
|
+
return result.rows.map((row) => ({
|
|
2975
|
+
...row,
|
|
2976
|
+
contract_payload: row.contract_payload ? JSON.parse(row.contract_payload) : undefined
|
|
2977
|
+
}));
|
|
2941
2978
|
}
|
|
2942
2979
|
// ─── Verification Harness (v7.2.0) ───────────────────────────
|
|
2943
2980
|
async saveVerificationHarness(harness, userId) {
|
package/dist/storage/supabase.js
CHANGED
|
@@ -1222,7 +1222,13 @@ export class SupabaseStorage {
|
|
|
1222
1222
|
updated_at: updatedState.updated_at,
|
|
1223
1223
|
spec: updatedState.spec,
|
|
1224
1224
|
error: updatedState.error || null,
|
|
1225
|
-
last_heartbeat: updatedState.last_heartbeat || null
|
|
1225
|
+
last_heartbeat: updatedState.last_heartbeat || null,
|
|
1226
|
+
// ─── v7.4: Adversarial Evaluation fields ───
|
|
1227
|
+
eval_revisions: updatedState.eval_revisions ?? 0,
|
|
1228
|
+
contract_payload: updatedState.contract_payload
|
|
1229
|
+
? JSON.stringify(updatedState.contract_payload)
|
|
1230
|
+
: null,
|
|
1231
|
+
notes: updatedState.notes || null,
|
|
1226
1232
|
}, { on_conflict: "id" }, { Prefer: "return=minimal,resolution=merge-duplicates" });
|
|
1227
1233
|
}
|
|
1228
1234
|
catch (e) {
|
|
@@ -1244,7 +1250,15 @@ export class SupabaseStorage {
|
|
|
1244
1250
|
const rows = Array.isArray(result) ? result : [];
|
|
1245
1251
|
if (rows.length === 0)
|
|
1246
1252
|
return null;
|
|
1247
|
-
|
|
1253
|
+
const row = rows[0];
|
|
1254
|
+
// ─── v7.4: Deserialize contract_payload from JSON TEXT ───
|
|
1255
|
+
if (row.contract_payload && typeof row.contract_payload === "string") {
|
|
1256
|
+
try {
|
|
1257
|
+
row.contract_payload = JSON.parse(row.contract_payload);
|
|
1258
|
+
}
|
|
1259
|
+
catch { /* leave as-is */ }
|
|
1260
|
+
}
|
|
1261
|
+
return row;
|
|
1248
1262
|
}
|
|
1249
1263
|
catch (e) {
|
|
1250
1264
|
if (e.message?.includes("PGRST202") || e.message?.includes("Could not find the relation"))
|
|
@@ -1263,7 +1277,17 @@ export class SupabaseStorage {
|
|
|
1263
1277
|
if (status)
|
|
1264
1278
|
query.status = `eq.${status}`;
|
|
1265
1279
|
const result = await supabaseGet("dark_factory_pipelines", query);
|
|
1266
|
-
|
|
1280
|
+
const rows = (Array.isArray(result) ? result : []);
|
|
1281
|
+
// ─── v7.4: Deserialize contract_payload from JSON TEXT ───
|
|
1282
|
+
return rows.map(row => {
|
|
1283
|
+
if (row.contract_payload && typeof row.contract_payload === "string") {
|
|
1284
|
+
try {
|
|
1285
|
+
row.contract_payload = JSON.parse(row.contract_payload);
|
|
1286
|
+
}
|
|
1287
|
+
catch { /* leave as-is */ }
|
|
1288
|
+
}
|
|
1289
|
+
return row;
|
|
1290
|
+
});
|
|
1267
1291
|
}
|
|
1268
1292
|
catch (e) {
|
|
1269
1293
|
if (e.message?.includes("PGRST202") || e.message?.includes("Could not find the relation"))
|
package/package.json
CHANGED
|
@@ -1,8 +1,8 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "prism-mcp-server",
|
|
3
|
-
"version": "7.
|
|
3
|
+
"version": "7.4.0",
|
|
4
4
|
"mcpName": "io.github.dcostenco/prism-mcp",
|
|
5
|
-
"description": "The Mind Palace for AI Agents — fail-closed Dark Factory autonomous pipelines (3-gate parse→type→scope
|
|
5
|
+
"description": "The Mind Palace for AI Agents — adversarial evaluation (PLAN_CONTRACT→EVALUATE anti-sycophancy), fail-closed Dark Factory autonomous pipelines (3-gate parse→type→scope), persistent memory (SQLite/Supabase), ACT-R cognitive retrieval, behavioral learning & IDE rules sync, multi-agent Hivemind, time travel, visual dashboard. Zero-config local mode.",
|
|
6
6
|
"module": "index.ts",
|
|
7
7
|
"type": "module",
|
|
8
8
|
"main": "dist/server.js",
|