npm - mindforge-cc - Versions diffs - 11.4.0 → 11.5.1 - Mend

mindforge-cc 11.4.0 → 11.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (98) hide show

package/.agent/CLAUDE.md +13 -0
package/.agent/hooks/lib/hook-flags.js +78 -0
package/.agent/hooks/lib/pretooluse-visible-output.js +46 -0
package/.agent/hooks/mindforge-block-no-verify.js +552 -0
package/.agent/hooks/mindforge-config-protection.js +144 -0
package/.agent/hooks/run-with-flags.js +207 -0
package/.agent/mindforge/checkpoint.md +76 -0
package/.agent/mindforge/harness-audit.md +59 -0
package/.agent/mindforge/instinct.md +46 -0
package/.agent/mindforge/orch-add-feature.md +43 -0
package/.agent/mindforge/orch-build-mvp.md +48 -0
package/.agent/mindforge/orch-change-feature.md +45 -0
package/.agent/mindforge/orch-fix-defect.md +43 -0
package/.agent/mindforge/orch-refine-code.md +43 -0
package/.claude/CLAUDE.md +13 -0
package/.claude/commands/mindforge/checkpoint.md +76 -0
package/.claude/commands/mindforge/execute-phase.md +47 -6
package/.claude/commands/mindforge/harness-audit.md +59 -0
package/.claude/commands/mindforge/instinct.md +46 -0
package/.claude/commands/mindforge/orch-add-feature.md +43 -0
package/.claude/commands/mindforge/orch-build-mvp.md +48 -0
package/.claude/commands/mindforge/orch-change-feature.md +45 -0
package/.claude/commands/mindforge/orch-fix-defect.md +43 -0
package/.claude/commands/mindforge/orch-refine-code.md +43 -0
package/.claude/commands/mindforge/plan-write.md +11 -0
package/.claude/commands/mindforge/product-spec.md +76 -0
package/.mindforge/config.json +2 -2
package/.mindforge/engine/instincts/instinct-schema.md +17 -9
package/.mindforge/imported-agents.jsonl +10 -0
package/.mindforge/manifests/install-components.json +36 -0
package/.mindforge/manifests/install-modules.json +193 -0
package/.mindforge/manifests/install-profiles.json +57 -0
package/.mindforge/memory/sync-manifest.json +1 -1
package/.mindforge/personas/gan-evaluator.md +226 -0
package/.mindforge/personas/gan-generator.md +151 -0
package/.mindforge/personas/gan-planner.md +118 -0
package/.mindforge/personas/harness-optimizer.md +55 -0
package/.mindforge/personas/loop-operator.md +58 -0
package/.mindforge/schemas/hooks.schema.json +199 -0
package/.mindforge/schemas/install-modules.schema.json +44 -0
package/.mindforge/schemas/install-state.schema.json +95 -0
package/.mindforge/schemas/plugin.schema.json +75 -0
package/.mindforge/schemas/provenance.schema.json +31 -0
package/.mindforge/skills/agent-architecture-audit/SKILL.md +272 -0
package/.mindforge/skills/continuous-learning/SKILL.md +16 -0
package/.mindforge/skills/orch-pipeline/SKILL.md +284 -0
package/.mindforge/skills/writing-plans/SKILL.md +76 -0
package/CHANGELOG.md +120 -0
package/MINDFORGE.md +3 -3
package/README.md +0 -1
package/RELEASENOTES.md +131 -0
package/SECURITY.md +16 -0
package/bin/autonomous/auto-runner.js +46 -5
package/bin/autonomous/handoff-schema.js +114 -0
package/bin/autonomous/session-guardian.sh +138 -0
package/bin/autonomous/supervisor.js +98 -0
package/bin/change-classifier.js +19 -5
package/bin/dashboard/api-router.js +10 -1
package/bin/governance/approve.js +65 -28
package/bin/governance/config-manager.js +3 -1
package/bin/governance/rbac-manager.js +14 -6
package/bin/harness-audit.js +520 -0
package/bin/hooks/instinct-capture-hook.js +16 -1
package/bin/hooks/lib/detect-project.js +72 -0
package/bin/installer/harness-adapter-compliance.js +321 -0
package/bin/installer/install-manifests.js +200 -0
package/bin/installer/install-state.js +243 -0
package/bin/installer-core.js +1 -1
package/bin/learning/instinct-cli.js +359 -0
package/bin/learning/lib/ssrf-guard.js +252 -0
package/bin/memory/eis-client.js +31 -10
package/bin/memory/federated-sync.js +11 -2
package/bin/memory/knowledge-capture.js +10 -1
package/bin/memory/pillar-health-tracker.js +9 -1
package/bin/models/llm-errors.js +79 -0
package/bin/models/model-client.js +39 -4
package/bin/models/ollama-provider.js +115 -0
package/bin/models/openai-provider.js +40 -9
package/bin/models/profiles-loader.js +147 -0
package/bin/models/provider-registry.js +59 -0
package/bin/review/ads-engine.js +2 -2
package/bin/revops/market-evaluator.js +23 -2
package/bin/revops/router-steering-v2.js +17 -2
package/bin/security/trust-boundaries.js +20 -3
package/bin/utils/readiness-gate.js +169 -0
package/bin/worktree/engine.js +497 -0
package/package.json +8 -2
package/subagents/categories/04-quality-security/.claude-plugin/plugin.json +10 -0
package/subagents/categories/04-quality-security/go-build-resolver.md +105 -0
package/subagents/categories/04-quality-security/go-reviewer.md +87 -0
package/subagents/categories/04-quality-security/python-reviewer.md +109 -0
package/subagents/categories/04-quality-security/react-build-resolver.md +215 -0
package/subagents/categories/04-quality-security/react-reviewer.md +167 -0
package/subagents/categories/04-quality-security/rust-build-resolver.md +159 -0
package/subagents/categories/04-quality-security/rust-reviewer.md +105 -0
package/subagents/categories/04-quality-security/silent-failure-hunter.md +67 -0
package/subagents/categories/04-quality-security/type-design-analyzer.md +58 -0
package/subagents/categories/04-quality-security/typescript-reviewer.md +126 -0

package/.mindforge/skills/agent-architecture-audit/SKILL.md ADDED Viewed

@@ -0,0 +1,272 @@
+---
+name: agent-architecture-audit
+version: 1.0.0
+min_mindforge_version: 11.4.0
+status: stable
+triggers: agent audit, wrapper regression, memory contamination, tool discipline, hidden repair loop, 12-layer stack, multi-layer self-audit, agent architecture, harness diagnostic, layer-by-layer review, NexusTracer audit, soul-engine audit, swarm wave audit
+compose: agent-introspection-debugging
+---
+# Skill — Agent Architecture Audit
+A full-stack diagnostic for MindForge's own multi-layer agent stack. Audits the
+12-layer stack for wrapper regression, memory contamination, tool-discipline
+failures, hidden repair loops, and rendering/transport corruption. Produces
+severity-ranked findings with code-first fixes — for agent systems that hide
+failures behind wrapper layers, stale memory, retry loops, or transport
+mutations.
+This skill **composes with `agent-introspection-debugging`** (already in
+`.mindforge/skills/`): introspection debugs a single runtime failure (loop,
+timeout, hallucination); this skill audits the whole stack for the systemic
+wrapper-layer causes behind those failures. Invoke via `/mindforge:introspect`
+when a single-failure debug keeps recurring or points at a structural cause.
+## When this skill activates
+**MANDATORY for:**
+- Releasing any MindForge-driven agent or LLM-powered behavior to production
+- Shipping features touching tool calling, the Sharded Memory Loop, or
+  multi-step swarm workflows
+- Agent behavior degrades after adding a wrapper layer (new PersonaFactory
+  patch, new prompt-assembly stage, new WaveExecutor step)
+- User reports "the agent is getting worse" or "tools are flaky"
+- Same model works in playground but breaks inside the MindForge wrapper stack
+- Debugging agent behavior for more than 15 minutes without finding root cause
+**Especially critical when:**
+- New prompt layers, tool definitions, or memory systems have been added
+- Different swarm specialists behave inconsistently on the same input
+- The model was fine yesterday but is hallucinating today
+- You suspect a hidden repair/retry loop is silently mutating responses
+**Do not use for:**
+- Single-failure runtime debugging — use `agent-introspection-debugging`
+- Code review — use language-specific reviewer agents
+- Security scanning — use `/mindforge:security-scan` / `security-reviewer`
+- Agent performance benchmarking — use `/mindforge:agent-eval`
+- Writing new features — use the appropriate workflow skill
+## Mandatory actions when this skill is active
+### The MindForge 12-Layer Stack
+Every MindForge agent run passes through these layers. Any one can corrupt the
+answer. The concrete MindForge component owning each layer is named so the
+audit has a real target, not an abstraction.
+| # | Layer | MindForge Component | What Goes Wrong |
+|---|-------|---------------------|-----------------|
+| 1 | System prompt | SOUL.md + MINDFORGE.md + `.agent/CLAUDE.md` assembly | Conflicting directives, instruction bloat across the source-of-truth hierarchy |
+| 2 | Session history | NexusTracer turn log | Stale context injected from previous turns |
+| 3 | Long-term memory | shard-controller (Cold tier) | Pollution across sessions, old topics in new conversations |
+| 4 | Distillation | shard-controller compaction (Hot/Warm rotation) | Compressed shards re-entering as pseudo-facts |
+| 5 | Active recall | continuous-learning instinct recall | Redundant re-summary / instinct replay wasting context |
+| 6 | Tool selection | PersonaFactory + hooks_route routing | Wrong tool routing, model skips a required tool |
+| 7 | Tool execution | swarm WaveExecutor dispatch | Hallucinated execution — claims to call but doesn't |
+| 8 | Tool interpretation | WaveExecutor result consolidation | Misread or ignored tool output |
+| 9 | Answer shaping | SWARM-SUMMARY consolidation | Format corruption in the final response |
+| 10 | Platform rendering | Dashboard (localhost:7339) / CLI / API transport | Transport-layer mutation of a valid answer |
+| 11 | Hidden repair loops | soul-engine ADS rewrite + Temporal hindsight regeneration | Silent fallback/retry running a second LLM pass |
+| 12 | Persistence | auto-state.json + Merkle audit log | Expired state or cached artifacts reused as live evidence |
+### Common Failure Patterns
+#### 1. Wrapper Regression
+The base model produces correct answers, but MindForge's wrapper layers make it
+worse.
+**Symptoms:**
+- Model works fine in playground or direct API call, breaks in the swarm
+- Added a new PersonaFactory patch or prompt stage, existing behavior degraded
+- Agent sounds confident but is confidently wrong
+- "It was working before the last update"
+#### 2. Memory Contamination
+Old topics leak into new conversations through NexusTracer history, shard
+recall, or distillation.
+**Symptoms:**
+- Agent brings up unrelated past topics
+- User corrections don't stick (old shard/instinct overwrites new)
+- Same-session artifacts re-enter as pseudo-facts
+- Cold-tier memory grows without bound, degrading response quality over time
+#### 3. Tool Discipline Failure
+Tools are declared in the prompt but not enforced in code. The model skips them
+or hallucinates execution.
+**Symptoms:**
+- "Must use tool X" in the prompt, but model answers without calling it
+- Tool results look correct but were never actually executed by WaveExecutor
+- Different swarm specialists fight over the same responsibility
+- Model uses a tool when it shouldn't, or skips it when it must
+#### 4. Rendering/Transport Corruption
+The agent's internal answer is correct, but the platform layer mutates it during
+delivery.
+**Symptoms:**
+- NexusTracer logs show the correct answer, user sees broken output
+- Markdown rendering, JSON parsing, or streaming fragments corrupt valid output
+- A hidden fallback quietly replaces the answer before delivery
+- Output differs between the CLI and the dashboard
+#### 5. Hidden Agent Layers
+Silent repair, retry, summarization, or recall layers run without explicit
+contracts.
+**Symptoms:**
+- Output changes between internal generation and user delivery
+- soul-engine ADS "auto-fix" runs a second LLM pass the user doesn't know about
+- Multiple swarm layers modify the same output without coordination
+- Answers get "smoothed" or "corrected" by invisible layers
+### Audit Workflow
+#### Phase 1: Scope
+Define what you're auditing:
+- **Target system** — which MindForge agent / swarm / command?
+- **Entrypoints** — how do users interact (CLI command, dashboard, hook)?
+- **Model stack** — which LLM(s) and providers?
+- **Symptoms** — what does the user report?
+- **Time window** — when did it start?
+- **Layers to audit** — which of the 12 layers apply?
+#### Phase 2: Evidence Collection
+Gather evidence from the codebase:
+- **Source code** — swarm loop, hooks_route tool router, shard admission, prompt
+  assembly across the source-of-truth hierarchy
+- **Logs** — NexusTracer session traces, Merkle-linked AUDIT entries, tool-call
+  records
+- **Config** — MINDFORGE.md parameters, tool schemas, PersonaFactory patches,
+  provider settings
+- **Memory files** — instinct store, shard archives, `auto-state.json`
+Use `rg` to search for anti-patterns:
+```bash
+# Tool requirements expressed only in prompt text (not code)
+rg "must.*tool|required.*call" --type md
+# Tool execution without validation
+rg "tool_call|toolCall|tool_use"
+# Hidden LLM calls outside the main swarm loop
+rg "completion|chat\.create|messages\.create|llm\.invoke"
+# Shard/instinct admission without user-correction priority
+rg "memory.*admit|shard.*admit|instinct.*capture|persist.*memory"
+# Fallback / repair loops that run additional LLM calls (soul-engine, hindsight)
+rg "fallback|retry.*llm|repair.*prompt|re-?prompt|soul-engine|regenerat"
+# Silent output mutation
+rg "mutate|rewrite.*response|transform.*output|shap"
+```
+#### Phase 3: Failure Mapping
+For each finding, document:
+- **Symptom** — what the user sees
+- **Mechanism** — how the wrapper causes it
+- **Source layer** — which of the 12 layers (and which MindForge component)
+- **Root cause** — the deepest cause
+- **Evidence** — file:line or NexusTracer/AUDIT row reference
+- **Confidence** — 0.0 to 1.0
+#### Phase 4: Fix Strategy
+Default fix order (code-first, not prompt-first):
+1. **Code-gate tool requirements** — enforce in WaveExecutor, not prompt text
+2. **Remove or narrow hidden repair layers** — make soul-engine / hindsight
+   regeneration explicit with contracts
+3. **Reduce context duplication** — same info through prompt + NexusTracer
+   history + shard memory + distillation
+4. **Tighten memory admission** — user corrections > agent assertions (shard +
+   instinct admission)
+5. **Tighten distillation triggers** — don't compact shards that shouldn't be
+   compacted
+6. **Reduce rendering mutation** — dashboard/CLI pass-through, don't transform
+7. **Convert to typed JSON envelopes** — structured internal flow (SWARM-SUMMARY
+   as schema), not freeform prose
+### Severity Model
+| Level | Meaning | Action |
+|-------|---------|--------|
+| `critical` | Agent can confidently produce wrong operational behavior | Fix before next release |
+| `high` | Agent frequently degrades correctness or stability | Fix this sprint |
+| `medium` | Correctness usually survives but output is fragile or wasteful | Plan for next cycle |
+| `low` | Mostly cosmetic or maintainability issues | Backlog |
+### Output Format
+Present findings to the user in this order:
+1. **Severity-ranked findings** (most critical first)
+2. **Architecture diagnosis** (which layer / component corrupted what, and why)
+3. **Ordered fix plan** (code-first, not prompt-first)
+Do not lead with compliments or summaries. If the system is broken, say so
+directly.
+### Quick Diagnostic Questions
+When auditing a MindForge agent system, answer these:
+| # | Question | If Yes → |
+|---|----------|----------|
+| 1 | Can the model skip a required tool and still answer? | Tool not code-gated in WaveExecutor |
+| 2 | Does old conversation content appear in new turns? | Memory contamination (NexusTracer/shard) |
+| 3 | Is the same info in system prompt AND shard memory AND history? | Context duplication |
+| 4 | Does soul-engine or hindsight run a second LLM pass before delivery? | Hidden repair loop |
+| 5 | Does output differ between internal generation and user delivery? | Rendering corruption (dashboard/CLI) |
+| 6 | Are "must use tool X" rules only in prompt text? | Tool discipline failure |
+| 7 | Can the agent's own monologue become a persistent instinct/shard? | Memory poisoning |
+### Anti-Patterns to Avoid
+- Avoid blaming the model before falsifying wrapper-layer regressions.
+- Avoid blaming memory without showing the contamination path (which shard /
+  instinct / NexusTracer turn).
+- Do not let a clean current `auto-state.json` erase a dirty historical incident.
+- Do not treat markdown prose as a trustworthy internal protocol.
+- Do not accept "must use tool" in prompt text when WaveExecutor never enforces it.
+- Keep findings direct, evidence-backed, and severity-ranked.
+### Report Schema
+Audits MUST produce a structured report following this shape:
+```json
+{
+  "schema_version": "mindforge.agent-architecture-audit.report.v1",
+  "executive_verdict": {
+    "overall_health": "high_risk",
+    "primary_failure_mode": "string",
+    "most_urgent_fix": "string"
+  },
+  "scope": {
+    "target_name": "string",
+    "model_stack": ["string"],
+    "layers_to_audit": ["string"]
+  },
+  "findings": [
+    {
+      "severity": "critical|high|medium|low",
+      "title": "string",
+      "mechanism": "string",
+      "source_layer": "string",
+      "root_cause": "string",
+      "evidence_refs": ["file:line"],
+      "confidence": 0.0,
+      "recommended_fix": "string"
+    }
+  ],
+  "ordered_fix_plan": [
+    { "order": 1, "goal": "string", "why_now": "string", "expected_effect": "string" }
+  ]
+}
+```
+## Self-check before task completion
+- [ ] Did I scope the audit to a concrete MindForge target, entrypoint, and model stack?
+- [ ] Did I map each finding to one of the 12 layers AND its MindForge component (NexusTracer, soul-engine, shard-controller, PersonaFactory, WaveExecutor, etc.)?
+- [ ] Did I check all five failure patterns (wrapper regression, memory contamination, tool discipline, rendering/transport corruption, hidden repair loops)?
+- [ ] Is every finding evidence-backed with a file:line or NexusTracer/AUDIT reference and a confidence score?
+- [ ] Did I emit the `mindforge.agent-architecture-audit.report.v1` JSON report (severity-ranked findings + ordered, code-first fix plan)?
+- [ ] Did I hand off to `agent-introspection-debugging` for any single runtime failure that still needs contained recovery?

package/.mindforge/skills/continuous-learning/SKILL.md CHANGED Viewed

@@ -60,6 +60,22 @@ The instinct engine runs in auto-capture mode, watching for:
 - Instincts inactive for 30 days are auto-pruned
 - User can manually deprecate any instinct
+### Project scoping (no cross-project leak)
+Every captured instinct carries a `project_id` (a stable 12-char SHA256 of the
+git remote URL, normalized so https/ssh/scp clones of the same repo match) plus
+a human-readable `project` name. Outside a git repo, both fall back to
+`global`. This enforces the invariant that instincts learned in repo A never
+surface in repo B:
+- **Read-time filter:** when recalling instincts (status, evolve-skills,
+  cluster-instincts), match `project_id` to the current repo's detected id, OR
+  `global`. Ignore entries from other projects.
+- **Detection:** `bin/hooks/lib/detect-project.js` (`detectProject(cwd)`) is the
+  single source of the id; the capture hook stamps it on write.
+- **Promotion across projects:** an instinct seen in 2+ projects with avg
+  confidence >= 0.8 is a candidate to promote to a `global` instinct (deeper
+  cross-project promotion criteria are handled by evolve-skills).
 ### During any task (passive observation)
 - Note patterns that repeat across tasks
 - When user corrects behavior: acknowledge and create instinct

package/.mindforge/skills/orch-pipeline/SKILL.md ADDED Viewed

@@ -0,0 +1,284 @@
+---
+name: orch-pipeline
+version: 1.0.0
+min_mindforge_version: 11.4.0
+status: stable
+triggers: orchestration, multi-agent build, feature pipeline, size classification, blast radius, orch-add-feature, orch-fix-defect, orch-change-feature, orch-refine-code, orch-build-mvp, gated pipeline, research-plan-tdd-review-commit
+origin: ECC (ported and adapted to MindForge)
+---
+# Skill — Orchestrator Pipeline (shared engine)
+The `orch-*` commands are thin wrappers. They do not re-implement any work — they
+classify the request, choose which phases of *this* pipeline run, and delegate
+each phase to an existing MindForge skill, command, persona, or subagent. This
+file is that pipeline.
+> Invoke an operation command (`/mindforge:orch-add-feature`,
+> `/mindforge:orch-fix-defect`, …) rather than this engine directly. This file is
+> the reference they point at.
+## When this skill activates
+- Loaded indirectly whenever an `orch-*` operation command runs.
+- Activates on the triggers above: orchestration, multi-agent build, feature
+  pipeline, size classification, blast radius.
+- Read directly only when adding a new operation to the family or tuning the
+  shared phases, gates, agent map, or the size classifier.
+## Mandatory actions when this skill is active
+1. Run **Step 0 — the SIZE CLASSIFIER** first; ceremony scales to blast radius.
+   Never skip it — emitting the size + plan is the contract.
+2. Drive the work through the phases (research → plan → execute → verify →
+   review → commit) using the MindForge XML plan format.
+3. Honor **both gates**: the plan-approval gate and GATE 2 (commit + AUDIT.jsonl
+   with conventional commits).
+4. Route work through the remapped MindForge agent/capability map — never invoke
+   the original ECC agent names.
+5. Respect the security auto-trigger and Tier-3 governance; the GAN inner loop is
+   descoped to `WaveExecutor` / the swarm-execution protocol.
+## The operation family
+| Command | Operation | Trigger | First move |
+|---------|-----------|---------|------------|
+| `/mindforge:orch-add-feature` | feature | capability does not exist yet | research + plan a new slice |
+| `/mindforge:orch-change-feature` | tweak | works, but desired behavior differs | amend existing behavior *and its tests* |
+| `/mindforge:orch-fix-defect` | fix | broken; behavior is wrong | reproduce as a failing test, then fix |
+| `/mindforge:orch-refine-code` | refactor | behavior stays, structure improves | restructure while keeping tests green |
+| `/mindforge:orch-build-mvp` | mvp | bootstrap from a design/spec doc | ingest doc → vertical slices |
+> These wrappers **compose** existing MindForge capabilities rather than replace
+> them: the `writing-plans` skill + `/mindforge:plan-write`, the
+> `mindforge-tdd_extended` protocol, `/mindforge:cross-review` + `/mindforge:review`,
+> the `04-quality-security` reviewer subagents, and the `quick.md` security
+> auto-trigger. The orch-* family adds the shared **size classifier** and the
+> **two human gates** on top of them, so one umbrella covers all five operations
+> consistently.
+---
+## Step 0 — SIZE CLASSIFIER (right-sizing) — the novel control
+> **This is the whole point of the port. Ceremony scales to blast radius.** Do
+> not skip it. Always run Step 0 first and always state the result in one line so
+> the user can override.
+Score the request on three signals, take the **highest** tier any signal
+reaches, and emit one line:
+```
+[orch] size: <tier> — phases <mask> — rationale: <one clause>  (override with: size=<tier>)
+```
+| Tier | Files touched | New dependency / contract | Design ambiguity | Phases that run |
+|------|---------------|---------------------------|------------------|-----------------|
+| **tiny** | 1, a few lines | none | none — the change is obvious | 4 → 5 → 6 |
+| **small** | 1 file / 1 function | none | clear once you read the code | (1 light) → 4 → 5 → 6 |
+| **medium** | 2–5 files | maybe a new internal module | one real choice to make | 1 → 2 → 4 → 5 → 6 |
+| **large** | many / cross-cutting | new external dep, public API, or a spec doc | multiple open questions | 1 → 2 → (3) → 4 → 5 → 6 |
+Phase 0 (Intake) always runs and is omitted from the mask column above.
+**Tie-breakers (force a floor regardless of file count):**
+- Anything touching a **security trigger** (see below) is **at least medium**.
+- Anything touching a **public API / contract / spec doc** is **at least medium**.
+- Anything classified **Tier 3** under the MindForge security policy (see
+  `.claude/CLAUDE.md` → CRITICAL SECURITY & AUTO-TRIGGER) is **at least large**
+  and requires manual overhead before Gate 1.
+- If the swarm controller (`.mindforge/engine/swarm-controller.md`) would trigger
+  on this task (`compositeScore >= AUTO_SWARM_THRESHOLD`, or a multi-disciplinary
+  marker like `UI + API` / `Auth + Database`), treat the tier as **large** and
+  run the Implement phase via the swarm-execution protocol (see Phase 4).
+Per-operation default floors (the wrappers pass these in):
+- `orch-fix-defect` → floor **small** (often **tiny**).
+- `orch-change-feature` → floor **small**.
+- `orch-refine-code` → floor **medium** (restructures touch multiple files).
+- `orch-build-mvp` → floor **large**, full pipeline incl. Scaffold (Phase 3).
+- `orch-add-feature` → no floor; classify on the three signals.
+---
+## The phases
+Each phase delegates — it does not do the work inline.
+- **0. Intake** — restate the request in one line. For `orch-build-mvp`, read the
+  spec/design doc and extract scope, locked decisions, and a feature list.
+- **1. Research & Reuse** — Search-Before-Building (Builder Ethos): `gh search
+  repos` / `gh search code`, then Context7 / vendor docs, then package registries,
+  then Exa / web search. Prefer adopting a proven implementation over net-new
+  code. Use `/mindforge:research` for a focused research subagent when the task
+  involves unfamiliar libraries.
+- **2. Plan** — delegate to the `writing-plans` skill via `/mindforge:plan-write`.
+  Output a MindForge **XML plan** (format below), ordered as thin vertical
+  slices, written under `.planning/`. → **GATE 1.**
+- **3. Scaffold** — `orch-build-mvp` only: stand up the first end-to-end slice.
+- **4. Implement (TDD)** — drive each task through the `mindforge-tdd_extended`
+  protocol (Red → Green → Refactor). Honor the operation's first-move rule. For
+  **large**/swarm-triggered tasks, delegate the implement wave to
+  `WaveExecutor` / the `mindforge-swarm-execution` protocol
+  (`.mindforge/engine/wave-executor.md`) instead of a single-threaded loop.
+- **5. Review** — `/mindforge:review` (code-quality + security), escalating to
+  `/mindforge:cross-review` (multi-model consensus) for **large** tiers. Pull in
+  the matching `04-quality-security` reviewer subagent for the repo's language
+  (`typescript-reviewer`, `python-reviewer`, `rust-reviewer`, `go-reviewer`, …)
+  and `security-auditor` / `penetration-tester` whenever the diff touches a
+  security trigger.
+- **6. Commit** — conventional commits (`feat:` / `fix:` / `refactor:` / …), one
+  per logical chunk, **+ a Merkle-linked AUDIT.jsonl entry per commit**. → **GATE 2.**
+---
+## The MindForge XML plan format (Phase 2 output)
+Plans are MindForge XML, written under `.planning/` — **never** ECC
+`.claude/PRPs/` paths. For orch-* work the planning home is
+`.planning/quick/[NNN]-[slug]/PLAN.md` (sequential numbering, per `quick.md`) for
+tiny/small/medium, or `.planning/phases/[phase]/PLAN.md` when a phase is active.
+```xml
+<task type="orch-[operation]">
+  <n>[task name]</n>
+  <size>[tiny|small|medium|large]</size>
+  <persona>[appropriate persona — security-reviewer if a trigger is touched]</persona>
+  <files>[files to touch]</files>
+  <context>[why this work exists; cited file:line patterns from EXPLORE]</context>
+  <action>[ordered, no-placeholder vertical slices — the task_list]</action>
+  <verify>[exact verification command + expected output]</verify>
+  <done>[definition of done — incl. tests + coverage ≥ 80%]</done>
+</task>
+```
+The `writing-plans` skill's EXPLORE protocol and "No Prior Knowledge" gate apply:
+every convention the plan references must be cited from the actual codebase.
+---
+## The two gates
+This family is **gated, not autonomous**:
+1. **GATE 1 — after Plan.** Present the XML `<action>` task_list; do not write
+   implementation code until the user approves. For **Tier 3** security-classified
+   work, the manual security overhead is completed here before approval.
+2. **GATE 2 — before Commit.** Present the diff summary and proposed conventional
+   commit messages; do not commit until the user confirms. Each confirmed commit
+   writes a Merkle-linked AUDIT.jsonl entry (below).
+Everything between the gates flows without stopping.
+---
+## Agent / capability map (remapped to MindForge)
+| Phase | Primary (MindForge) | Fallback / escalation |
+|-------|---------------------|-----------------------|
+| Intake / understand | `/mindforge:map-codebase` / `/mindforge:code-tour` | trace existing paths before a tweak, fix, or refactor |
+| Research | `/mindforge:research` | Context7 / vendor docs / package registries / Exa |
+| Plan | `writing-plans` skill via `/mindforge:plan-write` | `/mindforge:system-design`, `/mindforge:rfc` for structural calls |
+| Implement | `mindforge-tdd_extended` protocol (Red-Green-Refactor) | `/mindforge:debug` on build/test breaks |
+| Implement (large/swarm) | `WaveExecutor` / `mindforge-swarm-execution` protocol | `SwarmController` cluster selection (`.mindforge/engine/swarm-controller.md`) |
+| Review | `/mindforge:review` (code-quality + security) | `/mindforge:cross-review` (multi-model) for large tiers |
+| Language review | `04-quality-security/<lang>-reviewer.md` subagent | match the reviewer to the repo (see its `CLAUDE.md`) |
+| Security | `quick.md` security auto-trigger → `security-reviewer.md` persona + `security-auditor` / `penetration-tester` subagents | `/mindforge:security-scan` PRE-COMMIT |
+Match the language reviewer to the repo. The `04-quality-security` subagents live
+under `subagents/categories/04-quality-security/`.
+---
+## Security-review trigger (the quick.md auto-trigger)
+Reuse the `quick.md` **security auto-trigger**: scan the task description and the
+files in scope for security keywords —
+`auth, authentication, login, password, token, JWT, session, payment, PII, upload,
+credential, secret, key` — and for changes to authorization, user-input handling,
+database queries, file-system paths, external API calls, or cryptography.
+If any match:
+1. Load `security-review/SKILL.md` and activate the `security-reviewer.md` persona
+   for the Implement and Review phases (required even for tiny/small tiers).
+2. Pull in the `security-auditor` (+ `penetration-tester` for large) subagents
+   from `04-quality-security/` during Phase 5.
+3. Run `/mindforge:security-scan` PRE-COMMIT. **Fail the gate if any Medium+
+   findings are unaddressed** (per `.claude/CLAUDE.md`).
+4. Treat the task as **at least medium** in Step 0 (Tier 3 → at least large, with
+   manual overhead before Gate 1).
+---
+## Handoff artifacts
+The pipeline carries no hidden state — the planning docs *are* the handoff:
+- The XML `<action>` task_list (from Plan) drives the Implement loop.
+- Larger work may also emit PRD / architecture / system_design under `.planning/`
+  (via `/mindforge:system-design`, `/mindforge:product-spec`, `/mindforge:rfc`).
+- Review findings (Blocking / CRITICAL / HIGH) must be resolved before Gate 2.
+---
+## GATE 2 — commit + AUDIT.jsonl (Phase 6)
+On user confirmation at Gate 2, for **each** logical commit:
+1. Commit with a **conventional** message scoped to one logical change:
+   `feat(<scope>): …` / `fix(<scope>): …` / `refactor(<scope>): …` / etc.
+2. Append a **Merkle-linked** AUDIT.jsonl entry to `.planning/AUDIT.jsonl`. Each
+   entry sets `previous_hash` to the prior entry's `_hash` and computes its own
+   `_hash` (per `.mindforge/audit/AUDIT-SCHEMA.md`):
+```json
+{
+  "id": "uuid",
+  "timestamp": "ISO-8601",
+  "event": "orch_commit",
+  "agent": "orch-pipeline",
+  "operation": "add-feature | fix-defect | change-feature | refine-code | build-mvp",
+  "size_tier": "tiny | small | medium | large",
+  "phase": null,
+  "task_name": "[task name]",
+  "commit_sha": "abc1234",
+  "commit_type": "feat | fix | refactor",
+  "files_changed": ["src/..."],
+  "security_trigger": false,
+  "review_verdict": "approved | approved_with_conditions",
+  "gates_honored": ["gate1_plan", "gate2_commit"],
+  "previous_hash": "<prior _hash>",
+  "_hash": "<sha256 of this entry>"
+}
+```
+---
+## Verification (self-check before task completion)
+- [ ] **Step 0 size tier was stated in one line** and matched the work.
+- [ ] **Gate 1 (plan)** and **Gate 2 (commit)** were both honored.
+- [ ] The plan was MindForge XML written under `.planning/` (never ECC `PRPs/`).
+- [ ] The `quick.md` security auto-trigger ran; `security-reviewer` engaged iff a
+      security trigger was touched; `/mindforge:security-scan` passed PRE-COMMIT
+      with no unaddressed Medium+ findings.
+- [ ] Commits are conventional and scoped to one logical change.
+- [ ] Each commit wrote a Merkle-linked AUDIT.jsonl entry (`previous_hash`/`_hash`).
+- [ ] New / changed behavior has tests; coverage ≥ 80%.
+---
+## Note — GAN inner loop is DESCOPED
+The ECC original drove `orch-build-mvp`'s inner build loop through a **GAN
+generate/evaluate harness** (`/gan-build`, `gan-generator` → `gan-evaluator`,
+`gan-harness/spec.md` + `eval-rubric.md`). **That GAN harness is not ported.** In
+MindForge, `orch-build-mvp`'s build loop instead **delegates each vertical slice
+to `WaveExecutor` / the `mindforge-swarm-execution` protocol**
+(`.mindforge/engine/wave-executor.md`), with `SwarmController`
+(`.mindforge/engine/swarm-controller.md`) selecting the cluster. The adversarial
+quality bar is provided by the existing `mindforge-tdd_extended` (Red-Green) loop
+plus `/mindforge:cross-review`, not by a GAN evaluator.
+> **Deferred:** porting the GAN generate/evaluate harness (and its spec /
+> eval-rubric artifacts) into MindForge is out of scope for v1.0.0 of this engine.

package/.mindforge/skills/writing-plans/SKILL.md CHANGED Viewed

@@ -23,6 +23,79 @@ plans, migration plans, refactoring plans, and any multi-step work breakdown.
 2. **Identify the scope boundary.** What is IN the plan vs OUT of scope?
 3. **Identify dependencies.** Which steps depend on other steps? Which can be parallel?
+### EXPLORE — Structured Codebase Discovery (run BEFORE plan authoring)
+> Adapted from the PRP EXPLORE protocol (PRPs-agentic-eng by Wirasm). This is a
+> discovery discipline that PRECEDES authoring the MindForge XML/structured plan —
+> it does NOT replace the plan format. The output of EXPLORE feeds the plan's
+> Context, Files, and Details sections with real, cited patterns.
+**Golden Rule**: If you would need to search the codebase *during* implementation,
+capture that knowledge NOW, during EXPLORE.
+#### The 8 Search Categories
+Search the codebase directly (grep, find, file reads) for each category. Do NOT
+skip a category by assuming — confirm it against the actual repository.
+1. **Existing Patterns** — Architectural patterns already in use (repository,
+   service, controller, middleware, hook, etc.) in the area you'll modify.
+2. **Similar Features** — Existing features that resemble the planned one. The
+   closest analogue is the single most valuable reference.
+3. **Conventions** — How files, functions, variables, classes, and exports are
+   named and organized in the relevant area.
+4. **Dependencies** — Packages, imports, and internal modules used by similar
+   features. Note versions where they matter.
+5. **Tests** — How similar features are tested: test file locations, naming,
+   setup/teardown, fixtures, and assertion style.
+6. **Configs** — Relevant config files, environment variables, and feature flags.
+7. **Error Handling** — How errors are caught, propagated, logged, and surfaced
+   to users in similar code paths.
+8. **Integration Points** — Entry points, data flow, state changes, and the
+   contracts/interfaces the new code must honor.
+#### Patterns to Mirror (capture as file:line snippet references)
+For each category that yields a concrete convention, record a **Patterns to Mirror**
+entry. Every entry MUST cite a real `file:line` source and quote the actual snippet
+— never paraphrase, never invent:
+```markdown
+## Patterns to Mirror
+### NAMING_CONVENTION
+// SOURCE: src/services/userService.ts:1-5
+[actual snippet copied from the file]
+### ERROR_HANDLING
+// SOURCE: src/middleware/errorHandler.ts:10-25
+[actual snippet copied from the file]
+### TEST_STRUCTURE
+// SOURCE: tests/services/userService.test.ts:1-30
+[actual snippet copied from the file]
+```
+These mirrored snippets get folded into the plan's **Details** blocks so the
+executor writes code indistinguishable from existing code.
+#### The "No Prior Knowledge" Gate (MANDATORY before authoring)
+The plan must be executable by someone with **NO prior knowledge of this repo**.
+Therefore:
+- Every convention the plan references MUST be cited from the actual codebase
+  (a real `file:line` + snippet). If you cannot cite it, you have not explored
+  enough — go search.
+- NEVER invent a convention, path, type, or import. If the codebase does not
+  establish it, the plan does not assume it.
+- If a needed pattern genuinely does not exist yet, say so explicitly and define
+  the new convention in the plan (rather than pretending it already exists).
+**Gate check**: Before moving from EXPLORE to plan authoring, confirm —
+"Could a developer who has never seen this repo implement every step using ONLY
+this plan, without searching or asking?" If not, the EXPLORE pass is incomplete.
 ### During plan writing
 #### Core Principle: NO PLACEHOLDERS
@@ -168,3 +241,6 @@ Before marking a task done when this skill was active:
 - [ ] Plan scope (IN/OUT) is clearly defined.
 - [ ] If > 15 steps, organized into phases with phase-level gates.
 - [ ] All file paths verified to exist (or marked as "new file").
+- [ ] EXPLORE ran across all 8 search categories before authoring.
+- [ ] Patterns to Mirror captured as real `file:line` snippets (no invented conventions).
+- [ ] "No Prior Knowledge" gate passes — every referenced convention is cited from the codebase.