prism-mcp-server 7.0.1 → 7.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -28,6 +28,8 @@ Works with **Claude Desktop · Claude Code · Cursor · Windsurf · Cline · Gem
28
28
  - [What Makes Prism Different](#-what-makes-prism-different)
29
29
  - [Use Cases](#-use-cases)
30
30
  - [What's New](#-whats-new)
31
+ - [v7.3.1 Dark Factory (Fail-Closed Execution)](#v731--dark-factory-fail-closed-execution-)
32
+ - [v7.2.0 The "Executive Function" Update (Planned)](#v720--the-executive-function-update-)
31
33
  - [How Prism Compares](#-how-prism-compares)
32
34
  - [Tool Reference](#-tool-reference)
33
35
  - [Environment Variables](#environment-variables)
@@ -385,6 +387,9 @@ Powered by a pure TypeScript port of Google's TurboQuant (inspired by Google's I
385
387
  ### 🐝 Multi-Agent Hivemind
386
388
  Multiple agents (dev, QA, PM) can work on the same project with **role-isolated memory**. Agents discover each other automatically, share context in real-time via Telepathy sync, and see a team roster during context loading. → [Multi-agent setup example](examples/multi-agent-hivemind/)
387
389
 
390
+ ### 🚦 Task Router
391
+ Prism can score coding tasks and recommend whether to keep execution on the host model or delegate to local Claw. This enables faster handling of small, local-safe edits while preserving host execution for non-delegable or higher-complexity work. In client startup/skill flows, use defensive delegation: route only coding tasks, call `session_task_route` only when available, delegate to `claw` only when executor tooling exists and task is non-destructive, and fallback to host when router/executor is unavailable. → [Task router real-life example](examples/router_real_life_test.ts)
392
+
388
393
  ### 🖼️ Visual Memory
389
394
  Save UI screenshots, architecture diagrams, and bug states to a searchable vault. Images are auto-captioned by a VLM (Claude Vision / GPT-4V / Gemini) and become semantically searchable across sessions.
390
395
 
@@ -407,7 +412,7 @@ Soft/hard delete (Art. 17), full export in JSON, Markdown, or Obsidian vault `.z
407
412
 
408
413
  **Consulting / multi-project** — Switch between client projects with progressive loading: `quick` (~50 tokens), `standard` (~200), or `deep` (~1000+).
409
414
 
410
- **Visual debugging** — Save UI screenshots to searchable memory. Find that CSS bug from last week by description.
415
+ **Complex refactoring (v7.2 planned)** — Prism’s roadmap adds plan-first execution for multi-step changes with persistent plan-state tracking across sessions.
411
416
 
412
417
  **Team onboarding** — New team member's agent loads the full project history instantly.
413
418
 
@@ -436,8 +441,100 @@ Then continue a specific thread with a follow-up message to the selected agent,
436
441
 
437
442
  ## 🆕 What's New
438
443
 
444
+ ### v7.3.1 — Dark Factory (Fail-Closed Execution) 🏭
445
+ > **Current stable release.** Hardened autonomous pipeline execution with a structured JSON action contract.
446
+
447
+ When an AI agent executes code autonomously — no human watching, no approval step — a single hallucinated file path can write outside your project, corrupt sibling repos, or hit system files. This is the "dark factory" problem: **lights-out execution demands machine-enforced safety, not LLM good behavior.**
448
+
449
+ > *"I started building testing harnesses with programmatic checks in the planning phase across 3 layers. I got this idea when I was doing a complex ETL process across 3 databases and I needed to stack 9's on data accuracy, but also across the agent layer. After a considerable amount of hair pulling, I started to front load. It's now part of my lifecycle harness that my dark factory uses by default."*
450
+ > — [Stephen Driggs](https://linkedin.com/in/stephendriggs), VP Product AI at Shift4
451
+
452
+ Prism v7.3.1 implements exactly this: a **3-gate fail-closed pipeline** where every `EXECUTE` step must pass parse, type, and scope validation before any filesystem side effect occurs.
453
+
454
+ - 🔒 **Structured Action Contract** — `EXECUTE` steps must return machine-parseable JSON conforming to `{ actions: [{ type, targetPath, content? }] }`. Free-form text is rejected at the gate.
455
+ - 🛡️ **3-Strategy Defensive Parser** — Raw JSON → fenced code block extraction → brace extraction. Handles adversarial LLM output (preamble text, markdown fences, trailing commentary) without ever executing malformed payloads.
456
+ - ✅ **Type Validation** — Only `READ_FILE | WRITE_FILE | PATCH_FILE | RUN_TEST` are permitted. Novel action types invented by the LLM are rejected.
457
+ - 📏 **Scope Validation** — Every `targetPath` is resolved against the pipeline's `workingDirectory` via `SafetyController.validateActionsInScope()`. Path traversal (`../`), sibling-prefix bypasses, and absolute paths outside the boundary are blocked.
458
+ - 🚫 **Pipeline-Level Termination** — A scope violation doesn't just fail the step — it **terminates the entire pipeline** with `status: FAILED` and emits a `failure` experience event for the ML routing layer.
459
+
460
+ <details>
461
+ <summary><strong>🔬 The 3-Gate Architecture: How a Path Traversal Attack Fails</strong></summary>
462
+
463
+ **Scenario:** An LLM running autonomously in a Dark Factory pipeline targeting `/home/user/my-app` produces this output for an EXECUTE step:
464
+
465
+ ```json
466
+ {
467
+ "actions": [
468
+ { "type": "WRITE_FILE", "targetPath": "src/utils.ts", "content": "// valid" },
469
+ { "type": "WRITE_FILE", "targetPath": "../../.ssh/authorized_keys", "content": "ssh-rsa ATTACK..." }
470
+ ]
471
+ }
472
+ ```
473
+
474
+ **Gate 1 — Parse:** ✅ Valid JSON. The 3-strategy parser extracts it cleanly.
475
+
476
+ **Gate 2 — Type:** ✅ Both actions use `WRITE_FILE`, a valid `ActionType`.
477
+
478
+ **Gate 3 — Scope:** 🚫 **BLOCKED.** `SafetyController.validateActionsInScope()` resolves `../../.ssh/authorized_keys` to `/home/user/.ssh/authorized_keys` — which is **outside** `/home/user/my-app`. The violation is detected *before* any write occurs.
479
+
480
+ **Result:** The entire pipeline is terminated immediately. No files are written — not even the valid `src/utils.ts`. A `failure` experience event is emitted so the ML router can learn to avoid this pattern.
481
+
482
+ ```
483
+ Pipeline FAILED: Scope violation — action[1] targetPath
484
+ "../../.ssh/authorized_keys" resolves outside workingDirectory
485
+ ```
486
+
487
+ **Without v7.3.1:** The LLM's raw text output would be interpreted as instructions, and the agent runtime would attempt the write — potentially succeeding depending on filesystem permissions.
488
+
489
+ **With v7.3.1:** The structured contract makes this class of attack impossible. The LLM never touches the filesystem directly; every action is validated through the 3-gate pipeline first.
490
+
491
+ </details>
492
+
493
+ <details>
494
+ <summary><strong>🧪 Edge Cases Covered (67 tests)</strong></summary>
495
+
496
+ | Category | Examples |
497
+ |----------|----------|
498
+ | **Parse adversarial output** | Prose preamble + JSON, nested fences, empty input, non-string input |
499
+ | **Type coercion** | `"DELETE_FILE"`, `"EXEC_CMD"`, numeric types, null types |
500
+ | **Path traversal** | `../`, `../../`, `/etc/passwd`, null bytes, unicode normalization, embedded newlines |
501
+ | **Shape validation** | Missing `actions` array, non-object actions, empty `targetPath`, root-type coercion |
502
+ | **Stress payloads** | 100-action arrays, 100KB content strings, 500-segment deep paths |
503
+
504
+ </details>
505
+
506
+ ### v7.2.0 — The "Executive Function" Update 🔭
507
+ > **Planned roadmap release.** Extends Prism from persistent memory toward autonomous plan execution.
508
+
509
+ - 🗺️ **Autonomous Plan Decomposition (planned)** — Proposed `session_plan_decompose` tool to transform ambiguous multi-step goals into a structured task DAG.
510
+ - 🔄 **Self-Healing Execution Loop (planned)** — Proposed plan-state engine to capture failed steps, suggest corrective actions, and re-queue recoverable sub-tasks before escalation.
511
+ - 📉 **DAG Plan Visualizer (planned)** — Proposed dashboard Plan/Goal Monitor to render step progress, dependency state, and execution pivots in real time.
512
+ - 🧠 **Context-Aware Goal Tracking (planned)** — Proposed active-plan injection during context loading so agents track not only prior work but current plan position.
513
+ - ⚙️ **Recursive Tool Chaining (planned)** — Proposed middleware path for lower-latency plan-step updates across complex workflows.
514
+ - 🧪 **Plan Integrity Tests (planned)** — Proposed suite validating plan-state persistence across interruptions and session handoffs.
515
+
516
+ <details>
517
+ <summary><strong>🔬 Concept Example: Before vs. After v7.2</strong></summary>
518
+
519
+ **Scenario:** "Refactor the Auth module and update the unit tests."
520
+
521
+ **Before (linear prompting):** The agent executes in sequence but can lose place after errors unless the host prompt restates state.
522
+
523
+ **After (executive planning):** Agent decomposes to a DAG, executes per-step, recovers from failures via plan-state retries, and resumes from the correct dependency node.
524
+
525
+ </details>
526
+
527
+ ### v7.1.0 — Prism Task Router (Heuristic + ML Experience) ✅
528
+ > **Current stable release.** Multi-agent task routing with dynamic local vs host model delegation.
529
+
530
+ - 🚦 **Heuristic Routing Engine** — Deterministic `session_task_route` tool dynamically routes tasks to either the host cloud model or local agent (Claw) based on task description, file count, and scope. Evaluated over 5 core signals.
531
+ - 🤖 **Experience-Based ML Routing** — Cold-start protected ML layer leverages the historical performance (Win Rate) extracted by the `routerExperience` system to apply dynamic confidence boosts or penalties into the routing score.
532
+ - 🧪 **Live Testing Samples** — Demo script added in [`examples/router_real_life_test.ts`](examples/router_real_life_test.ts) for deterministic `computeRoute()` scenarios (simple vs complex tasks), with a note that experience-adjusted routing is applied in `session_task_route` handler path.
533
+ - 🖥️ **Dashboard Integration** — Added visual monitor and configuration toggles directly in `src/dashboard/ui.ts` under Node Editor settings.
534
+ - 🧩 **Tool Discoverability** — Fully integrates `session_task_route` into the external registry.
535
+
439
536
  ### v7.0.0 — ACT-R Activation Memory ✅
440
- > **Current stable release.** Memory retrieval now uses a scientifically-grounded cognitive model.
537
+ > **Previous stable release.** Memory retrieval now uses a scientifically-grounded cognitive model.
441
538
 
442
539
  - 🧠 **ACT-R Base-Level Activation** — `B_i = ln(Σ t_j^(-d))` computes recency × frequency activation per memory. Recent, frequently-accessed memories surface first; cold memories fade to near-zero. Based on Anderson's *Adaptive Control of Thought—Rational* (ACM, 2025).
443
540
  - 🔗 **Candidate-Scoped Spreading Activation** — `S_i = Σ(W × strength)` for links within the current search result set only. Prevents "God node" centrality from dominating rankings (Rule #5).
@@ -693,6 +790,41 @@ Requires `PRISM_ENABLE_HIVEMIND=true`.
693
790
 
694
791
  </details>
695
792
 
793
+ <details>
794
+ <summary><strong>Task Routing (1 tool)</strong></summary>
795
+
796
+ Requires `PRISM_TASK_ROUTER_ENABLED=true` (or dashboard toggle).
797
+
798
+ | Tool | Purpose |
799
+ |------|---------|
800
+ | `session_task_route` | Scores task complexity and recommends host vs. local Claw delegation (`claw_run_task` when delegable; host fallback when executor/tooling is unavailable) |
801
+
802
+ </details>
803
+
804
+ <details>
805
+ <summary><strong>Dark Factory Orchestration (3 tools)</strong></summary>
806
+
807
+ Requires `PRISM_DARK_FACTORY_ENABLED=true`.
808
+
809
+ | Tool | Purpose |
810
+ |------|---------|
811
+ | `session_start_pipeline` | Create and enqueue a background autonomous pipeline |
812
+ | `session_check_pipeline_status` | Poll the current step, iteration, and status of a pipeline |
813
+ | `session_abort_pipeline` | Emergency kill switch to halt a running background pipeline |
814
+
815
+ </details>
816
+
817
+ <details>
818
+ <summary><strong>Executive Planning (Planned for v7.2)</strong></summary>
819
+
820
+ | Tool | Purpose |
821
+ |------|---------|
822
+ | `session_plan_decompose` | Decompose natural language goals into a structured DAG of tasks |
823
+ | `session_plan_step_update` | Atomically update the status/result of a specific sub-task |
824
+ | `session_plan_get_active` | Retrieve the current execution DAG and task statuses |
825
+
826
+ </details>
827
+
696
828
  ---
697
829
 
698
830
  ## Environment Variables
@@ -731,6 +863,9 @@ Requires `PRISM_ENABLE_HIVEMIND=true`.
731
863
  | `PRISM_SCHOLAR_INTERVAL_MS` | No | Scholar interval in ms (default: `0` = manual only) |
732
864
  | `PRISM_SCHOLAR_TOPICS` | No | Comma-separated research topics (default: `"ai,agents"`) |
733
865
  | `PRISM_SCHOLAR_MAX_ARTICLES_PER_RUN` | No | Max articles per Scholar run (default: `3`) |
866
+ | `PRISM_TASK_ROUTER_ENABLED` | No | `"true"` to enable task-router tool registration |
867
+ | `PRISM_TASK_ROUTER_CONFIDENCE_THRESHOLD` | No | Min confidence required to delegate to Claw (default: `0.6`) |
868
+ | `PRISM_TASK_ROUTER_MAX_CLAW_COMPLEXITY` | No | Max complexity score delegable to Claw (default: `4`) |
734
869
  | `PRISM_HDC_ENABLED` | No | `"true"` (default) to enable HDC cognitive routing pipeline |
735
870
  | `PRISM_HDC_EXPLAINABILITY_ENABLED` | No | `"true"` (default) to include convergence/distance/ambiguity in cognitive route responses |
736
871
  | `PRISM_ACTR_ENABLED` | No | `"true"` (default) to enable ACT-R activation re-ranking on semantic search |
@@ -738,6 +873,7 @@ Requires `PRISM_ENABLE_HIVEMIND=true`.
738
873
  | `PRISM_ACTR_WEIGHT_SIMILARITY` | No | Composite score similarity weight (default: `0.7`) |
739
874
  | `PRISM_ACTR_WEIGHT_ACTIVATION` | No | Composite score ACT-R activation weight (default: `0.3`) |
740
875
  | `PRISM_ACTR_ACCESS_LOG_RETENTION_DAYS` | No | Days before access logs are pruned by background scheduler (default: `90`) |
876
+ | `PRISM_DARK_FACTORY_ENABLED` | No | `"true"` to enable Dark Factory autonomous pipeline tools (`session_start_pipeline`, `session_check_pipeline_status`, `session_abort_pipeline`) |
741
877
 
742
878
  </details>
743
879
 
@@ -768,6 +904,7 @@ Prism is a **stdio-based MCP server** that manages persistent agent memory. Here
768
904
  │ ↕ │
769
905
  │ ┌────────────────────────────────────────────────────┐ │
770
906
  │ │ Background Workers │ │
907
+ │ │ • Dark Factory (3-gate fail-closed pipelines) │ │
771
908
  │ │ • Scheduler (TTL, decay, compaction, purge) │ │
772
909
  │ │ • Web Scholar (Brave → Firecrawl → LLM → Ledger) │ │
773
910
  │ │ • Hivemind heartbeats & Telepathy broadcasts │ │
@@ -800,6 +937,7 @@ Each MCP client has its own mechanism for ensuring Prism context loads on sessio
800
937
 
801
938
  - **Claude Code** — Lifecycle hooks (`SessionStart` / `Stop`)
802
939
  - **Gemini / Antigravity** — Three-layer architecture (User Rules + AGENTS.md + Startup Skill)
940
+ - **Task Router Integration (v7.2 guidance)** — For client startup/skills, use defensive delegation flow: route only coding tasks, call `session_task_route` only when available, delegate to `claw` only when executor exists and task is non-destructive, and fallback to host if router/executor is unavailable.
803
941
  - **Cursor / Windsurf / VS Code** — System prompt instructions
804
942
 
805
943
  All platforms benefit from the **server-side fallback** (v5.2.1): if `session_load_context` hasn't been called within 10 seconds, Prism auto-pushes context via `sendLoggingMessage`.
@@ -831,6 +969,8 @@ Prism is evolving from smart session logging toward a **cognitive memory archite
831
969
  | **v7.0** | Candidate-Scoped Spreading Activation — `S_i = Σ(W × strength)` bounded to search result set; prevents God-node dominance | Spreading activation networks (Collins & Loftus, 1975) | ✅ Shipped |
832
970
  | **v7.0** | Composite Retrieval Scoring — `0.7 × similarity + 0.3 × σ(activation)`; configurable via `PRISM_ACTR_WEIGHT_*` | Hybrid cognitive-neural retrieval models | ✅ Shipped |
833
971
  | **v7.0** | AccessLogBuffer — in-memory batch-write buffer with 5s flush; prevents SQLite `SQLITE_BUSY` under parallel agents | Production reliability engineering | ✅ Shipped |
972
+ | **v7.3** | Dark Factory — 3-gate fail-closed EXECUTE pipeline (parse → type → scope) with structured JSON action contract | Industrial safety systems (defense-in-depth, fail-closed valves) | ✅ Shipped |
973
+ | **v7.2** | Executive Planning & DAG tracking | Prefrontal cortex executive control + Directed Acyclic Graph planning | 🔭 Horizon |
834
974
  | **v7.x** | Affect-Tagged Memory — sentiment shapes what gets recalled | Affect-modulated retrieval (neuroscience) | 🔭 Horizon |
835
975
  | **v8+** | Zero-Search Retrieval — no index, no ANN, just ask the vector | Holographic Reduced Representations | 🔭 Horizon |
836
976
 
@@ -848,9 +988,18 @@ Shipped in v6.2.0. Edge synthesis, graph pruning with SLO observability, tempora
848
988
  ### v6.5: Cognitive Architecture ✅
849
989
  Shipped. Full Superposed Memory (SDM) + Hyperdimensional Computing (HDC/VSA) cognitive routing pipeline. Compositional memory states via XOR binding, Hamming resolution, and policy-gated routing (direct / clarify / fallback). 705 tests passing.
850
990
 
991
+ ### v7.3: Dark Factory — Fail-Closed Execution ✅
992
+ Shipped. Structured JSON action contract for autonomous `EXECUTE` steps. 3-gate validation pipeline (parse → type → scope) terminates pipelines on any violation before filesystem side effects. 67 edge-case tests covering adversarial LLM output, path traversal, and type coercion.
993
+
994
+ ### v7.1: Prism Task Router ✅
995
+ Shipped. Deterministic task routing (`session_task_route`) with optional experience-based confidence adjustment for host vs. local Claw delegation.
996
+
851
997
  ### v7.0: ACT-R Activation Memory ✅
852
998
  Shipped. Scientifically-grounded retrieval re-ranking via ACT-R base-level activation (`B_i = ln(Σ t_j^(-d))`), candidate-scoped spreading activation, parameterized sigmoid normalization, composite scoring, and zero-cold-start access log infrastructure. 49 dedicated unit tests, 705 total passing.
853
999
 
1000
+ ### v7.2: Executive Function 🔭
1001
+ Planned. Adds autonomous plan decomposition, DAG-backed step tracking, and self-healing execution loops for complex multi-step operations.
1002
+
854
1003
  ### Future Tracks
855
1004
  - **v7.x: Affect-Tagged Memory** — Recall prioritization improves by weighting memories with affective/contextual valence, making surfaced context more behaviorally useful.
856
1005
  - **v8+: Zero-Search Retrieval** — Direct vector-addressed recall (“just ask the vector”) reduces retrieval indirection and moves Prism toward truly native associative memory.
package/dist/config.js CHANGED
@@ -231,3 +231,29 @@ export const PRISM_ACTR_MAX_ACCESSES_PER_ENTRY = parseInt(process.env.PRISM_ACTR
231
231
  export const PRISM_ACTR_BUFFER_FLUSH_MS = parseInt(process.env.PRISM_ACTR_BUFFER_FLUSH_MS || "5000", 10);
232
232
  /** Days to retain access log entries before pruning. (Default: 90) */
233
233
  export const PRISM_ACTR_ACCESS_LOG_RETENTION_DAYS = parseInt(process.env.PRISM_ACTR_ACCESS_LOG_RETENTION_DAYS || "90", 10);
234
+ // ─── v7.1: Task Router Configuration ─────────────────────────
235
+ // Deterministic heuristic-based routing for delegating coding tasks
236
+ // between the host cloud model and the local claw-code-agent.
237
+ // Set PRISM_TASK_ROUTER_ENABLED=true to unlock the session_task_route tool.
238
+ /** Master switch for the task router tool. */
239
+ export const PRISM_TASK_ROUTER_ENABLED_ENV = process.env.PRISM_TASK_ROUTER_ENABLED === "true";
240
+ /** Confidence threshold below which routing defaults to the host model. (Default: 0.6) */
241
+ export const PRISM_TASK_ROUTER_CONFIDENCE_THRESHOLD = parseFloat(process.env.PRISM_TASK_ROUTER_CONFIDENCE_THRESHOLD || "0.6");
242
+ /** Maximum complexity score (1-10) that Claw can handle. Tasks above this → host. (Default: 4) */
243
+ export const PRISM_TASK_ROUTER_MAX_CLAW_COMPLEXITY = parseInt(process.env.PRISM_TASK_ROUTER_MAX_CLAW_COMPLEXITY || "4", 10);
244
+ // ─── v7.2: Verification Harness ──────────────────────────────
245
+ /** Master switch for the v7.2.0 enhanced verification harness. */
246
+ export const PRISM_VERIFICATION_HARNESS_ENABLED = process.env.PRISM_VERIFICATION_HARNESS_ENABLED === "true";
247
+ /** Comma-separated list of verification layers to run. */
248
+ export const PRISM_VERIFICATION_LAYERS = (process.env.PRISM_VERIFICATION_LAYERS || "data,agent,pipeline").split(",").map(l => l.trim()).filter(Boolean);
249
+ /** Default severity floor for all assertions. Overrides individual assertion severity when higher. */
250
+ export const PRISM_VERIFICATION_DEFAULT_SEVERITY = (process.env.PRISM_VERIFICATION_DEFAULT_SEVERITY || "warn");
251
+ // ─── v7.3: Dark Factory Orchestration ─────────────────────────
252
+ // Autonomous pipeline runner: PLAN → EXECUTE → VERIFY → iterate.
253
+ // Opt-in because it executes LLM calls in the background.
254
+ /** Master switch for the Dark Factory background runner. */
255
+ export const PRISM_DARK_FACTORY_ENABLED = process.env.PRISM_DARK_FACTORY_ENABLED === "true"; // Opt-in
256
+ /** Poll interval for the runner loop (ms). Default: 30s. */
257
+ export const PRISM_DARK_FACTORY_POLL_MS = parseInt(process.env.PRISM_DARK_FACTORY_POLL_MS || "30000", 10);
258
+ /** Default max wall-clock time per pipeline (ms). Default: 15 minutes. */
259
+ export const PRISM_DARK_FACTORY_MAX_RUNTIME_MS = parseInt(process.env.PRISM_DARK_FACTORY_MAX_RUNTIME_MS || "900000", 10);
@@ -0,0 +1,77 @@
1
+ import { getLLMProvider } from '../utils/llm/factory.js';
2
+ import { OpenAIAdapter } from '../utils/llm/adapters/openai.js';
3
+ import { SafetyController } from './safetyController.js';
4
+ import { debugLog } from '../utils/logger.js';
5
+ /**
6
+ * JSON output schema instruction injected into EXECUTE step prompts.
7
+ * Forces the LLM to respond with machine-parseable structured output
8
+ * instead of free-form text. The runner validates this shape before
9
+ * any actions are applied.
10
+ */
11
+ const EXECUTE_JSON_SCHEMA = `
12
+ You MUST respond with ONLY a valid JSON object matching this schema:
13
+ {
14
+ "actions": [
15
+ {
16
+ "type": "READ_FILE" | "WRITE_FILE" | "PATCH_FILE" | "RUN_TEST",
17
+ "targetPath": "<relative path within the workspace>",
18
+ "content": "<file content for WRITE_FILE>",
19
+ "patch": "<patch content for PATCH_FILE>",
20
+ "command": "<test command for RUN_TEST>"
21
+ }
22
+ ],
23
+ "notes": "<optional summary of what you did>"
24
+ }
25
+
26
+ RULES:
27
+ - type MUST be one of: READ_FILE, WRITE_FILE, PATCH_FILE, RUN_TEST
28
+ - targetPath MUST be a relative path within the workspace
29
+ - Do NOT include any text outside the JSON object
30
+ - Do NOT use markdown code fences
31
+ - If you cannot complete the task, return: {"actions": [], "notes": "reason"}
32
+ `.trim();
33
+ /**
34
+ * Invocation wrapper that routes payload specs to the local Claw agent model (Qwen 2.5),
35
+ * or the active LLM provider as fallback.
36
+ *
37
+ * Uses SafetyController.generateBoundaryPrompt() for scope injection
38
+ * instead of inline prompt construction — single source of truth for safety rules.
39
+ *
40
+ * v7.3.1: EXECUTE steps request structured JSON output via EXECUTE_JSON_SCHEMA.
41
+ * Non-EXECUTE steps continue to use free-form text output.
42
+ */
43
+ export async function invokeClawAgent(spec, state, timeoutMs = 120000 // 2 min default timeout for internal executions
44
+ ) {
45
+ // BYOM Override: Provide path to use alternative open-source pipelines
46
+ // (e.g. through the OpenAI structured adapter which also points to local endpoints like Ollama / vLLM if configured)
47
+ const llm = spec.modelOverride
48
+ ? new OpenAIAdapter() // Bypasses the factory to route locally
49
+ : getLLMProvider();
50
+ // Scope injection via SafetyController — single source of truth
51
+ const systemPrompt = SafetyController.generateBoundaryPrompt(spec, state);
52
+ // v7.3.1: EXECUTE steps get structured JSON output instructions
53
+ const isExecuteStep = state.current_step === 'EXECUTE';
54
+ const executePrompt = isExecuteStep
55
+ ? `Based on the system instructions, execute the necessary actions for the current step (${state.current_step}).\n\n${EXECUTE_JSON_SCHEMA}`
56
+ : `Based on the system instructions, execute the necessary task for the current step (${state.current_step}). Respond with your actions and observations.`;
57
+ debugLog(`[ClawInvocation] Launching agent on pipeline ${state.id} step=${state.current_step} iter=${state.iteration} with ${timeoutMs}ms limit.${isExecuteStep ? ' (JSON mode)' : ''}`);
58
+ try {
59
+ // Timeout Promise to ensure the runner thread does not block indefinitely
60
+ const timeboundExecution = Promise.race([
61
+ llm.generateText(executePrompt, systemPrompt),
62
+ new Promise((_, reject) => setTimeout(() => reject(new Error('LLM_EXECUTION_TIMEOUT')), timeoutMs))
63
+ ]);
64
+ const result = await timeboundExecution;
65
+ return {
66
+ success: true,
67
+ resultText: result
68
+ };
69
+ }
70
+ catch (error) {
71
+ debugLog(`[ClawInvocation] Exception during generation: ${error.message}`);
72
+ return {
73
+ success: false,
74
+ resultText: error.message
75
+ };
76
+ }
77
+ }