@really-knows-ai/foundry 3.5.7 → 3.5.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -32,7 +32,7 @@ the loop and records every step in git, so the path from draft to approved artef
32
32
  is auditable, repeatable, and defensible to auditors and stakeholders. You can show
33
33
  exactly how the output was made. Confidence is engineered; it is not hoped for.
34
34
 
35
- ### The operating model: assay, then forge → quench → appraise
35
+ ### The operating model: assay, then forge → quench → appraise → attest → finish
36
36
 
37
37
  A codebase-aware cycle can begin with **assay**: a deterministic pre-forge stage
38
38
  that runs project-authored extractor scripts, parses the strict JSONL facts they
@@ -51,15 +51,19 @@ gates. Each loop has four distinct roles that turn a candidate into a verified o
51
51
  fast and non-negotiable, catching errors before they reach appraisers.
52
52
 
53
53
  - **Appraise** judges quality against written laws. Independent evaluators inspect
54
- whether the work meets the subjective standards you define.
54
+ whether the work meets the rules or criteria you define.
55
55
 
56
- - **Human-appraise** provides direct judgement when the stakes require it or the loop
57
- deadlocks. Offers human oversight at critical decision points.
56
+ - **Human-appraise** provides direct judgement when the stakes require it or the
57
+ cycle reaches its iteration limit. Offers human oversight at critical
58
+ decision points.
58
59
 
59
60
  Every stage commits separately, so every step leaves a record. Every decision is
60
61
  timestamped. A single loop produces an **output** — a verified draft. A flow
61
- composes one or more such loops to produce an **outcome** — the final artefact that
62
- reaches your codebase or customers.
62
+ composes one or more such loops to produce an **outcome** — the final artefact.
63
+
64
+ When the loop clears, completing the work branch requires **attest** — a final
65
+ verification that writes and commits `ATTEST.md` — followed by **finish**, which
66
+ squash-merges the approved work to the base branch with a signed attestation block.
63
67
 
64
68
  ### What you describe, what Foundry enforces
65
69
 
@@ -174,7 +178,8 @@ quench → 5/7/5 — passes [commit]
174
178
  appraise → 2 appraisers, one flags weak imagery [commit]
175
179
  forge → revises [commit]
176
180
  appraise → clean [commit]
177
- done squash-merged to main with attestation
181
+ attest ATTEST.md committed [commit]
182
+ finish → squash-merged to main with attestation
178
183
  ```
179
184
 
180
185
  Every stage commits. Every decision is recorded. Every piece of feedback and every
@@ -191,12 +196,12 @@ declare the entity and edge vocabulary, add extractors, and opt a cycle into
191
196
  `assay.extractors`. See [Optional: flow memory](docs/getting-started.md#optional-flow-memory)
192
197
  and [Assay](docs/concepts.md#assay) for the configuration path.
193
198
 
194
- > **Note (3.0.0):** flow memory currently persists to `cozo-node`, which is
199
+ > **Note:** flow memory currently persists to `cozo-node`, which is
195
200
  > unmaintained upstream. Installation produces six cosmetic deprecation warnings
196
201
  > from transitive dependencies (`pnpm audit` is clean). Foundry will migrate to
197
202
  > a maintained backend in a future release; the public `foundry_memory_*` tools
198
203
  > and on-disk vocabulary/NDJSON format are designed to survive that migration.
199
- > See `CHANGELOG.md` and [docs/memory-maintenance.md](docs/memory-maintenance.md#backend-status-as-of-300).
204
+ > See `CHANGELOG.md` and [docs/memory-maintenance.md](docs/memory-maintenance.md#backend-status).
200
205
 
201
206
  ---
202
207
 
@@ -233,7 +238,8 @@ reproducible.
233
238
  state live in tested plugin code, outside LLM control.
234
239
 
235
240
  - **Written quality criteria** — laws are markdown files; an appraiser panel scores
236
- each artefact against them, so quality is objective.
241
+ each artefact against them, providing structured quality assessment from
242
+ multiple perspectives.
237
243
 
238
244
  - **Multi-model diversity** — forge on one model, appraise on another, every
239
245
  appraiser on a different model if you want. Different models catch different
@@ -162,9 +162,8 @@ function cycleArgs(s) { return {
162
162
  artefacts: s.array(s.string()).describe('Artefact type IDs this cycle reads'),
163
163
  }).optional().describe('Input contract for this cycle. Omit for source cycles that start from the user goal; empty artefacts arrays are invalid.'),
164
164
  targets: s.array(s.string()).optional().describe('Downstream cycle IDs this cycle can route to'),
165
- humanAppraise: s.boolean().optional().describe('Include human-appraise in every iteration'),
166
- deadlockAppraise: s.boolean().optional().describe('Route to human-appraise on LLM appraiser deadlock'),
167
- deadlockIterations: s.number().optional().describe('Iteration threshold for deadlock detection'),
165
+ alwaysHumanAppraise: s.boolean().optional().describe('Include human-appraise in every iteration'),
166
+ deadlockHumanAppraise: s.boolean().optional().describe('Route to human-appraise when max-iterations is reached'),
168
167
  maxIterations: s.number().optional().describe('Maximum forge iterations before cycle blocks'),
169
168
  assay: s.object({
170
169
  extractors: s.array(s.string()).describe('Extractor IDs for the assay stage'),
@@ -36,7 +36,7 @@ import { createMemoryAdminTools } from './foundry-tools/memory-admin-tools.js';
36
36
  import { createSnapshotTools } from './foundry-tools/snapshot-tools.js';
37
37
  import { createAttestationTools } from './foundry-tools/attestation-tools.js';
38
38
  import { createRefreshAgentsTool } from './foundry-tools/refresh-agents-tool.js';
39
- import { resolveGit } from '../../scripts/lib/tool-paths.js';
39
+ import { resolveGit, resolvePnpm } from '../../scripts/lib/tool-paths.js';
40
40
 
41
41
  function findPackageRoot(startDir) {
42
42
  let dir = startDir;
@@ -103,7 +103,17 @@ function initGitRepo(worktree) {
103
103
  }
104
104
  }
105
105
 
106
+ function ensurePackageJson(worktree) {
107
+ if (existsSync(path.join(worktree, 'package.json'))) return;
108
+ try {
109
+ execFileSync(resolvePnpm(), ['init'], { cwd: worktree, stdio: 'pipe' });
110
+ } catch (err) {
111
+ console.error('Foundry pnpm init error:', err.message);
112
+ }
113
+ }
114
+
106
115
  function runBootstrapSequence(worktree, pkgRoot) {
116
+ ensurePackageJson(worktree);
107
117
  bootstrapDirectories(worktree);
108
118
  bootstrapGitignore(worktree);
109
119
  refreshAgents(worktree);
package/dist/CHANGELOG.md CHANGED
@@ -1,5 +1,28 @@
1
1
  # Changelog
2
2
 
3
+ ## [3.5.9] - 2026-05-24
4
+
5
+ ### Changed
6
+
7
+ - Cycle frontmatter keys renamed: `human-appraise` → `always-human-appraise`, `deadlock-appraise` → `deadlock-human-appraise`, `deadlock-iterations` replaced with `max-iterations`-based deadlock routing.
8
+ - Deadlock routing replaced per-item deadlocked state with iteration-limit detection using `max-iterations` and `deadlock-human-appraise`.
9
+ - Orchestration uses iteration cap routing instead of per-item deadlock history.
10
+ - Skills and public documentation (`README.md`, `docs/*`) refreshed against the current 65-tool implementation: updated tool reference with structured config-creation arguments, corrected quench/appraise execution models, documented attestation-before-finish workflow, added Validator and ATTEST.md concepts, and fixed memory paths. All stale v3.0.x terminology removed.
11
+
12
+ ### Fixed
13
+
14
+ - Guidance audit tests aligned with current user-facing documentation style (no direct `foundry_git_finish({)` call syntax in walkthrough).
15
+
16
+ ## [3.5.8] - 2026-05-23
17
+
18
+ ### Added
19
+
20
+ - Foundry bootstrap runs `pnpm init` on the project workspace when no `package.json` exists, so validators can install dependencies immediately.
21
+
22
+ ### Changed
23
+
24
+ - Validator guidance in the add-law skill hardened from "prefer libraries" to "hand-rolled heuristics are a last resort."
25
+
3
26
  ## [3.5.7] - 2026-05-23
4
27
 
5
28
  ### Added
package/dist/README.md CHANGED
@@ -32,7 +32,7 @@ the loop and records every step in git, so the path from draft to approved artef
32
32
  is auditable, repeatable, and defensible to auditors and stakeholders. You can show
33
33
  exactly how the output was made. Confidence is engineered; it is not hoped for.
34
34
 
35
- ### The operating model: assay, then forge → quench → appraise
35
+ ### The operating model: assay, then forge → quench → appraise → attest → finish
36
36
 
37
37
  A codebase-aware cycle can begin with **assay**: a deterministic pre-forge stage
38
38
  that runs project-authored extractor scripts, parses the strict JSONL facts they
@@ -51,15 +51,19 @@ gates. Each loop has four distinct roles that turn a candidate into a verified o
51
51
  fast and non-negotiable, catching errors before they reach appraisers.
52
52
 
53
53
  - **Appraise** judges quality against written laws. Independent evaluators inspect
54
- whether the work meets the subjective standards you define.
54
+ whether the work meets the rules or criteria you define.
55
55
 
56
- - **Human-appraise** provides direct judgement when the stakes require it or the loop
57
- deadlocks. Offers human oversight at critical decision points.
56
+ - **Human-appraise** provides direct judgement when the stakes require it or the
57
+ cycle reaches its iteration limit. Offers human oversight at critical
58
+ decision points.
58
59
 
59
60
  Every stage commits separately, so every step leaves a record. Every decision is
60
61
  timestamped. A single loop produces an **output** — a verified draft. A flow
61
- composes one or more such loops to produce an **outcome** — the final artefact that
62
- reaches your codebase or customers.
62
+ composes one or more such loops to produce an **outcome** — the final artefact.
63
+
64
+ When the loop clears, completing the work branch requires **attest** — a final
65
+ verification that writes and commits `ATTEST.md` — followed by **finish**, which
66
+ squash-merges the approved work to the base branch with a signed attestation block.
63
67
 
64
68
  ### What you describe, what Foundry enforces
65
69
 
@@ -174,7 +178,8 @@ quench → 5/7/5 — passes [commit]
174
178
  appraise → 2 appraisers, one flags weak imagery [commit]
175
179
  forge → revises [commit]
176
180
  appraise → clean [commit]
177
- done squash-merged to main with attestation
181
+ attest ATTEST.md committed [commit]
182
+ finish → squash-merged to main with attestation
178
183
  ```
179
184
 
180
185
  Every stage commits. Every decision is recorded. Every piece of feedback and every
@@ -191,12 +196,12 @@ declare the entity and edge vocabulary, add extractors, and opt a cycle into
191
196
  `assay.extractors`. See [Optional: flow memory](docs/getting-started.md#optional-flow-memory)
192
197
  and [Assay](docs/concepts.md#assay) for the configuration path.
193
198
 
194
- > **Note (3.0.0):** flow memory currently persists to `cozo-node`, which is
199
+ > **Note:** flow memory currently persists to `cozo-node`, which is
195
200
  > unmaintained upstream. Installation produces six cosmetic deprecation warnings
196
201
  > from transitive dependencies (`pnpm audit` is clean). Foundry will migrate to
197
202
  > a maintained backend in a future release; the public `foundry_memory_*` tools
198
203
  > and on-disk vocabulary/NDJSON format are designed to survive that migration.
199
- > See `CHANGELOG.md` and [docs/memory-maintenance.md](docs/memory-maintenance.md#backend-status-as-of-300).
204
+ > See `CHANGELOG.md` and [docs/memory-maintenance.md](docs/memory-maintenance.md#backend-status).
200
205
 
201
206
  ---
202
207
 
@@ -233,7 +238,8 @@ reproducible.
233
238
  state live in tested plugin code, outside LLM control.
234
239
 
235
240
  - **Written quality criteria** — laws are markdown files; an appraiser panel scores
236
- each artefact against them, so quality is objective.
241
+ each artefact against them, providing structured quality assessment from
242
+ multiple perspectives.
237
243
 
238
244
  - **Multi-model diversity** — forge on one model, appraise on another, every
239
245
  appraiser on a different model if you want. Different models catch different
@@ -1,6 +1,6 @@
1
1
  # Foundry docs
2
2
 
3
- This directory contains the reference set behind the project README. Every document here serves a single purpose; use this index to find what you need.
3
+ This directory contains the reference set behind the project README. Every document here serves a single purpose; use this index to find what you need. The `docs/superpowers/` directory is future-facing internal scaffolding and is not indexed as public documentation.
4
4
 
5
5
  **How to navigate:** Work through the sections in order: **Start here** establishes conceptual foundations, **Reference** provides detailed specifications for implementation, and **Contributors** covers subsystem maintenance and extensions.
6
6
 
@@ -10,9 +10,9 @@ Getting oriented with Foundry means understanding both the concepts it uses and
10
10
 
11
11
  **Reading order:** Work through them in order; [getting-started.md](getting-started.md) builds hands-on confidence, and [concepts.md](concepts.md) provides reference depth. Most implementers spend 1–2 hours on getting-started before moving to Reference materials.
12
12
 
13
- - **getting-started.md** — Complete end-to-end installation, auto-bootstrapping, and first flow walkthrough. Read this immediately after installing the plugin and before authoring any of your own configuration.
13
+ - **getting-started.md** — Complete end-to-end installation, bootstrap, Foundry agent wizard, and first flow walkthrough. Read this immediately after installing the plugin and before authoring any of your own configuration.
14
14
 
15
- It establishes the operating model, directory structure, and practical confidence in one pass. Includes hands-on guidance on authoring the five foundational concepts (artefact types, laws, appraisers, cycles, flows) with worked examples you can run against real code. Also covers the optional flow-memory path: initialise memory, declare vocabulary, add extractors, and opt a cycle into assay for codebase-aware flows.
15
+ It covers bootstrap states, the Understand–Plan–Confirm–Build wizard protocol, structured config authoring, current flow execution with quench and appraise, attestation and branch finishing, dry-run completion, and the optional flow-memory path (initialise memory, declare vocabulary, add extractors, opt a cycle into assay). Uses worked examples you can run against real code.
16
16
 
17
17
  Implementers must follow every step and complete the bootstrap; architects typically skim for structure before moving to [concepts.md](concepts.md) and [architecture.md](architecture.md) to reason about their designs.
18
18
 
@@ -28,19 +28,19 @@ These documents specify formats, tools, and design principles. Use them when imp
28
28
 
29
29
  **Key property:** These are sources of truth and normative references. Changes to Foundry flow formats or tool behaviour must be reflected here first. Use them together—cross-references appear throughout.
30
30
 
31
- - **work-spec.md** — Complete specification of the `WORK.md`, `WORK.feedback.yaml`, and `WORK.history.yaml` file formats, including frontmatter fields, the artefact registry, and the full feedback state machine with all valid transitions and guards.
31
+ - **work-spec.md** — Complete specification of the `WORK.md`, `WORK.feedback.yaml`, and `WORK.history.yaml` file formats, including frontmatter fields, feedback and history state, and the full feedback state machine with all valid transitions and guards. Artefacts are discovered from branch diffs, not stored as an artefact table in the workfile.
32
32
 
33
33
  Use this when implementing tooling around work files, validating state transitions, or understanding what metadata flows carry through an execution. It is the authoritative source of truth for all transient work-branch structures, format validation rules, and field semantics.
34
34
 
35
35
  Implementers and tool builders rely on this heavily; keep it updated immediately as formats evolve or new fields are added.
36
36
 
37
- - **tools.md** — Categorical index and reference documentation for all custom tools, organised by family (lifecycle, artefacts, feedback, config, memory, etc.) with complete signatures and permissions.
37
+ - **tools.md** — Complete 65-tool categorical index and reference, organised by family (lifecycle, artefacts, feedback, config, memory, etc.) with complete signatures and permissions.
38
38
 
39
39
  Consult this when you need to understand what a specific tool does, its branch requirements, what stage locks apply, what arguments it accepts, and how it integrates with the overall system. Covers calling conventions, enforcement invariants, and the permission model for memory access. References `foundry_assay_run`, `foundry_extractor_create`, and memory data and admin tools.
40
40
 
41
41
  Tool authors and system integrators use this constantly; it is the comprehensive reference for all custom tools in the Foundry ecosystem.
42
42
 
43
- - **architecture.md** — The design and enforcement model covering token lifecycle, stage-locked mutations, write invariants, branch namespaces, multi-model routing, and core design principles.
43
+ - **architecture.md** — The design and enforcement model covering orchestration actions (`dispatch`, `dispatch_multi`, `human_appraise`, `done`, `blocked`, `violation`), internal quench and appraise execution, branch finish paths with attestation preconditions, token lifecycle, stage-locked mutations, write invariants, branch namespaces, multi-model routing with `defaultModel`, forge required-tool verification, and core design principles.
44
44
 
45
45
  Read this when you need to understand how Foundry maintains safety (how tokens prevent replay, why stages lock mutations, how writes are validated), what guarantees it makes and where they live in the code, or why it is structured the way it is. Explains the memory layout, assay write boundary, and failed-flow behaviour that keep extractor-populated memory auditable.
46
46
 
@@ -34,7 +34,7 @@ A forge sub-agent can decline subjective feedback with a justification, and an a
34
34
 
35
35
  ### Humans can step in at known points
36
36
 
37
- Human-in-the-loop gates are first-class stages. A cycle can declare `human-appraise: true` to run a human quality gate every iteration, or rely on `deadlock-appraise: true` (the default) to pull a human in only when LLM appraisers and forge ping-pong on the same items. Human feedback takes absolute priority and cannot be wont-fixed.
37
+ Human-in-the-loop gates are first-class stages. A cycle can declare `always-human-appraise: true` to run a human quality gate every iteration, or rely on `deadlock-human-appraise: true` (the default) to pull a human in when the iteration count reaches `max-iterations`. Human feedback takes absolute priority and cannot be wont-fixed.
38
38
 
39
39
  ### Multi-model diversity
40
40
 
@@ -75,9 +75,10 @@ The following guarantees live in plugin code and are outside LLM control:
75
75
  The `orchestrate` skill is a thin driver around `foundry_orchestrate`. Its entire loop is:
76
76
 
77
77
  ```text
78
- call foundry_orchestrate({lastResult})
78
+ call foundry_orchestrate({lastResult, lastResults, baseBranch, defaultModel})
79
79
  switch on action:
80
- dispatch → dispatch the requested subagent → report back
80
+ dispatch → dispatch single subagent → report back
81
+ dispatch_multi → dispatch parallel appraiser tasks → consolidate → report back
81
82
  human_appraise → run human-appraise inline → report back
82
83
  done / blocked / violation → terminate the loop
83
84
  ```
@@ -92,6 +93,14 @@ switch on action:
92
93
 
93
94
  Because the protocol lives in a plugin tool, the LLM cannot skip steps, reorder them, or silently drop a commit.
94
95
 
96
+ ### Internal quench execution
97
+
98
+ Quench runs inside the orchestrator as `runQuench(ctx)` — a deterministic, non-LLM validation pass. It reads the active stage from `.foundry/active-stage.json`, discovers artefact changes via branch-based artefact discovery, runs validators for each applicable law, and posts feedback with tags in the format `law:<law-id>:<validator-id>`. It also resolves prior quench feedback: items whose issues remain are set to `rejected`; items no longer present are set to `approved` (transitioned to `resolved`). The quench module is available at `src/scripts/quench-module.js`.
99
+
100
+ ### Internal appraise execution
101
+
102
+ Appraise uses `gatherAppraiseContext()` to build parallel subagent tasks, one per (artefact, appraiser) pair. It returns a `dispatch_multi` action containing the task list. The orchestrator's loop dispatches each task independently. After all appraisers report back, `consolidateAppraise()` processes the `lastResults` array: it parses each successful output for structured issues, de-duplicates across appraisers, posts feedback with `law:<law-id>` tags, and resolves prior appraise feedback items (resolves stale items, rejects items still present). The appraise module is available at `src/scripts/appraise-module.js`.
103
+
95
104
  ---
96
105
 
97
106
  ## Token lifecycle
@@ -134,9 +143,9 @@ Foundry partitions mutation across three branch namespaces. The plugin enforces
134
143
 
135
144
  | Namespace | Pattern | Owns | Created from | Finished by |
136
145
  |-----------|---------|------|--------------|-------------|
137
- | **config** | `config/<description>` | `foundry/` (schema and config) | `main` via `foundry_git_branch({ kind: "config", description })` | PR or direct merge to `main` |
138
- | **work** | `work/<flowId>-<description>` | `WORK.md`, `WORK.feedback.yaml`, `WORK.history.yaml`, `foundry-memory/` (row data) | `main` via `foundry_git_branch({ kind: "work", flowId, description })` | `foundry_git_finish` (squash-merges to base, preserves forensic archive) |
139
- | **dry-run** | `dry-run/<parentConfig>/<flowId>-<description>` | Same as `work/*` | `config/*` via `foundry_git_branch({ kind: "dry-run", flowId, description })` | `foundry_git_finish` (captures snapshot, force-deletes branch) |
146
+ | **config** | `config/<description>` | `foundry/` (schema and config) | `main` via `foundry_git_branch({ kind: "config", description })` | Squash-merge to base branch; no attestation required |
147
+ | **work** | `work/<flowId>-<description>` | `WORK.md`, `WORK.feedback.yaml`, `WORK.history.yaml`, `foundry-memory/` (row data) | `main` via `foundry_git_branch({ kind: "work", flowId, description })` | Requires `ATTEST.md` at HEAD (created by `foundry_attest({ confirm: true })`). `foundry_git_finish({ confirm: true })` verifies the attestation, preserves `archive/<work-branch>-<short-sha>`, squash-merges to base, creates a signed commit (`-S`), deletes the work branch |
148
+ | **dry-run** | `dry-run/<parentConfig>/<flowId>-<description>` | Same as `work/*` | `config/*` via `foundry_git_branch({ kind: "dry-run", flowId, description })` | `foundry_git_finish` (captures snapshot to `.snapshots/<run-id>/`, force-deletes branch) |
140
149
 
141
150
  ### Guard implementation
142
151
 
@@ -150,7 +159,7 @@ Implementation: `src/scripts/lib/branch-guard.js`.
150
159
 
151
160
  ### Forensic branches and snapshots
152
161
 
153
- - **Work branches** are preserved as `archive/work/<flowId>-<description>-<hash>` when `foundry_git_finish` completes. The full stage micro-commit history, `WORK.*` files, and all intermediate artefact states remain intact. The signed squash commit on the base branch references the archive branch tip SHA in its attestation block.
162
+ - **Work branches** require `ATTEST.md` at HEAD, created by `foundry_attest({ confirm: true })` before `foundry_git_finish({ confirm: true })` runs. The finish tool verifies the attestation, checks the diff SHA matches, preserves the branch as `archive/work/<flowId>-<description>-<hash>`, squash-merges to the base branch with a signed commit (`-S`), and deletes the work branch. The full stage micro-commit history, `WORK.*` files, and all intermediate artefact states remain intact. The signed squash commit on the base branch references the archive branch tip SHA in its attestation block.
154
163
  - **Dry-run branches** are force-deleted after `foundry_git_finish` captures a snapshot to `.snapshots/<runId>/` on the parent `config/*` working tree. Each snapshot includes `README.md` (metadata), `work/WORK*` (workfile triple), `diff.patch` (full diff), and `trace.jsonl` (tool-call trace).
155
164
 
156
165
  ---
@@ -233,6 +242,20 @@ Every stage runs inside a token-gated lifecycle. The sub-agent must call `foundr
233
242
 
234
243
  Input artefacts (files matching an input type's `file-patterns`) are read-only. Files outside any artefact type's patterns are read-only. Violations hard-stop the cycle with `{error: 'unexpected_files'}`.
235
244
 
245
+ ### Forge required-tool verification
246
+
247
+ During `foundry_stage_end` for a forge stage, the plugin verifies that the forge sub-agent called five required context-reading tools:
248
+
249
+ 1. `foundry_config_cycle`
250
+ 2. `foundry_workfile_get`
251
+ 3. `foundry_config_artefact_type`
252
+ 4. `foundry_config_laws`
253
+ 5. `foundry_feedback_list`
254
+
255
+ Tool calls are logged to `.foundry/.forge-tool-calls.jsonl` during stage execution. When `foundry_stage_end` runs, it checks the log against the required set. Missing required calls generate system feedback with the tag `system:missing-tool-calls` and the forge stage completes normally — the missing-tool feedback acts as a signal to the sort router. When all required tools are present, any prior `system:missing-tool-calls` feedback is resolved.
256
+
257
+ Implementation: `src/plugin/tools/stage-tools.js` (`verifyAndManageForgeTools`) and `src/scripts/lib/stage-calls.js`.
258
+
236
259
  ### Failed flow state
237
260
 
238
261
  When an unrecoverable error occurs (e.g. assay extractor abort, invalid JSONL, or memory-sync failure), the orchestrator marks `WORK.md` frontmatter with `status: failed` and a `reason`. The flow is then locked:
@@ -258,10 +281,10 @@ All pipeline skills (`orchestrate`, `flow`, stage skills) check for this state a
258
281
 
259
282
  `src/scripts/sort.js` (exported as `runSort`) owns the routing engine. It reads `WORK.md`, `WORK.feedback.yaml`, and `WORK.history.yaml`, then decides which stage runs next based on:
260
283
 
261
- - **Unresolved feedback.** If `quench` or `appraise` feedback exists in a non-terminal state (`open`, `actioned`, `wont-fix`), the next stage is usually `forge` or the originating evaluation stage.
262
- - **Deadlock detection.** If the same feedback items ping-pong between forge and appraise for `deadlock-iterations` (default 5) iterations, sort marks them `deadlocked` and inserts a `human-appraise` stage (if `deadlock-appraise: true`, the default).
263
- - **Iteration limits.** If `max-iterations` is exceeded, the cycle is marked `blocked` and control returns to the user.
284
+ - **Unresolved feedback.** If feedback exists in a non-terminal state (`open`, `actioned`, `wont-fix`), the next stage is `forge` (for items needing action) or the originating evaluation stage (for items pending approval).
285
+ - **Iteration limits and deadlock routing.** When the forge iteration count reaches `max-iterations` with unresolved feedback, sort routes to `human-appraise` (if `deadlock-human-appraise: true`, the default) or marks the cycle `blocked` if human routing is disabled. Sort does not write per-item deadlocked state; deadlock is a routing decision, not a feedback item state.
264
286
  - **Clean state.** If all feedback is resolved and no new validation or appraisal failures exist, the cycle is `done`.
287
+ - **Blocked.** If `max-iterations` is exceeded and `deadlock-human-appraise` is `false`, the cycle is marked `blocked` and control returns to the user.
265
288
 
266
289
  ### Feedback state machine
267
290
 
@@ -269,15 +292,15 @@ Feedback items live in `WORK.feedback.yaml` with a full transition history. Each
269
292
 
270
293
  - `id` — a ULID.
271
294
  - `source` — the stage that created it (e.g. `quench:check-syllables`, `appraise:pedantic`, `human-appraise:hitl`).
272
- - `state` — current state (`open`, `actioned`, `wont-fix`, `resolved`, `rejected`, `deadlocked`).
295
+ - `state` — current state (`open`, `actioned`, `wont-fix`, `resolved`, `rejected`).
273
296
  - `history` — append-only log of state transitions with timestamps and metadata.
274
297
 
275
298
  Transitions are **source-based**:
276
299
 
277
300
  | Source stage | Forge can `wont-fix`? | Resolved by |
278
301
  |--------------|------------------------|-------------|
279
- | `quench` (CLI validation) | No — must `actioned` | the originating `quench` stage, or `human-appraise` override |
280
- | `appraise` (subjective law) | Yes (with reason) | the originating `appraise` stage, or `human-appraise` override |
302
+ | `quench` (deterministic validation) | No — must `actioned` | the originating `quench` stage, or `human-appraise` override |
303
+ | `appraise` (law evaluation) | Yes (with reason) | the originating `appraise` stage, or `human-appraise` override |
281
304
  | `human-appraise` (user instruction) | No — must `actioned` | the originating `human-appraise` stage |
282
305
 
283
306
  Implementation: `src/scripts/lib/feedback-transitions.js` and `src/scripts/lib/feedback-store.js`. See [work-spec.md](work-spec.md) for the full state machine table.
@@ -290,24 +313,27 @@ Different stages can run on different models for cognitive diversity. Cycle defi
290
313
 
291
314
  ### Configuration
292
315
 
316
+ - **Orchestrator argument.** `defaultModel` (optional) can be passed as an orchestrator argument. When set, it serves as the fallback for any stage or appraiser that does not declare a model.
293
317
  - **Cycle-level.** Declare a `models` map in the cycle frontmatter:
294
318
  ```yaml
295
319
  models:
320
+ default: anthropic/claude-sonnet-4
296
321
  forge: anthropic/claude-opus-4.7
297
322
  appraise: openai/gpt-5
298
323
  ```
299
- - **Appraiser-level.** Individual appraisers can declare a `model` field in their personality definition; this overrides the cycle-level appraise model on a per-appraiser basis.
324
+ `models.default` provides a cycle-level fallback when no per-stage override exists and no `defaultModel` is passed to the orchestrator.
325
+ - **Appraiser-level.** Individual appraisers can declare a `model` field in their personality definition; this overrides the cycle-level appraise model and `models.default` on a per-appraiser basis.
300
326
 
301
327
  ### Agent files
302
328
 
303
329
  The user-facing `Foundry` agent is installed by the plugin's `config` hook as `.opencode/agents/foundry.md`. Users switch to this agent after restarting OpenCode. It guides authoring and flow execution while generated `foundry-*` stage agents remain hidden routing targets for specific models.
304
330
 
305
- `foundry_refresh_agents()` generates a `foundry-<slug>.md` agent file in `.opencode/agents/` for every model available in the session, where `<slug>` is the model ID with both `/` and `.` replaced by `-` (e.g. `anthropic-claude-opus-4-7.md`).
331
+ `foundry_refresh_agents` generates a `foundry-<slug>.md` agent file in `.opencode/agents/` for every model available in the session, where `<slug>` is the model ID with both `/` and `.` replaced by `-` (e.g. `anthropic-claude-opus-4-7.md`). Call `foundry_refresh_agents()` in code examples when referring to the tool invocation.
306
332
 
307
333
  ### Dispatch behaviour
308
334
 
309
- - **Non-appraise stages** (forge, quench, assay): if the cycle declares `models.<stage>`, the orchestrator dispatches to `foundry-<slug>` and hard-fails if `.opencode/agents/foundry-<slug>.md` is missing. If `models.<stage>` is not set, the stage is dispatched with the `general` subagent (session default).
310
- - **Appraise stage**: each appraiser is dispatched independently by the `appraise` skill. If an appraiser has its own `model`, the skill dispatches to `foundry-<slug>` and hard-fails if that agent file is missing; otherwise the appraiser runs under the `general` subagent.
335
+ - **Non-appraise stages** (forge, quench, assay): the orchestrator resolves the model by checking `models.<stage>`, then `defaultModel`, then `models.default`, then falls back to `general` (session default). If a specific model is resolved, the orchestrator dispatches to `foundry-<slug>` and hard-fails if `.opencode/agents/foundry-<slug>.md` is missing.
336
+ - **Appraise stage**: each appraiser is dispatched independently by the appraise module. The model resolution order is: appraiser's own `model` field, then `defaultModel`, then `models.default`, then `general`. If a specific model is resolved, the task is dispatched to `foundry-<slug>` and the orchestrator hard-fails if that agent file is missing.
311
337
 
312
338
  Implementation: `src/plugin/tools/helpers.js` (`buildCyclePromptExtras`) and `src/skills/appraise/SKILL.md`.
313
339
 
@@ -331,12 +357,13 @@ Implementation: `src/plugin/tools/helpers.js` (`buildCyclePromptExtras`) and `sr
331
357
  │ │ ├── appraise/
332
358
  │ │ ├── human-appraise/
333
359
  │ │ ├── add-artefact-type/ # authoring
334
- │ │ ├── add-artefact-type/
335
360
  │ │ ├── add-law/
336
361
  │ │ ├── add-appraiser/
337
362
  │ │ ├── add-cycle/
338
363
  │ │ ├── add-flow/
339
364
  │ │ ├── add-extractor/
365
+ │ │ ├── assay/ # deterministic extractor execution
366
+ │ │ ├── dry-run/ # dry-run execution and snapshots
340
367
  │ │ ├── list-agents/ # utility
341
368
  │ │ ├── refresh-agents/ # utility (now backed by foundry_refresh_agents tool)
342
369
  │ │ ├── upgrade-foundry/
@@ -352,7 +379,7 @@ Implementation: `src/plugin/tools/helpers.js` (`buildCyclePromptExtras`) and `sr
352
379
  │ └── scripts/
353
380
  │ ├── lib/ # shared libraries (injectable I/O)
354
381
  │ │ ├── workfile.js # WORK.md frontmatter
355
- │ ├── artefacts.js # artefact table operations
382
+ │ ├── artefacts.js # artefact discovery via branch diffs
356
383
  │ │ ├── history.js # WORK.history.yaml operations
357
384
  │ │ ├── feedback-store.js
358
385
  │ │ ├── feedback-transitions.js
@@ -367,10 +394,18 @@ Implementation: `src/plugin/tools/helpers.js` (`buildCyclePromptExtras`) and `sr
367
394
  │ │ ├── state.js
368
395
  │ │ ├── config.js # foundry/ config readers
369
396
  │ │ ├── slug.js
397
+ │ │ ├── tool-paths.js
398
+ │ │ ├── stage-calls.js # forge tool-call logging and verification
399
+ │ │ ├── sort-routing.js
400
+ │ │ ├── sort-reason.js
401
+ │ │ ├── sort-fs-check.js
402
+ │ │ ├── validation.js
370
403
  │ │ ├── ulid.js
371
404
  │ │ ├── tracing.js
372
405
  │ │ ├── failed-flow.js
373
406
  │ │ ├── git-bridge.js
407
+ │ │ ├── git-finish/ # branch finishing logic
408
+ │ │ ├── attestation/ # ATTEST.md generation and verification
374
409
  │ │ ├── git-policy.js
375
410
  │ │ ├── assay/
376
411
  │ │ ├── config-creators/
@@ -378,6 +413,11 @@ Implementation: `src/plugin/tools/helpers.js` (`buildCyclePromptExtras`) and `sr
378
413
  │ │ ├── snapshot/
379
414
  │ │ └── memory/ # flow memory (Cozo 0.7)
380
415
  │ ├── orchestrate.js # orchestration loop (exports runOrchestrate)
416
+ │ ├── orchestrate-cycle.js
417
+ │ ├── orchestrate-phases.js
418
+ │ ├── orchestrate-terminals.js
419
+ │ ├── quench-module.js # deterministic validation (runQuench)
420
+ │ ├── appraise-module.js # appraise gather and consolidate
381
421
  │ └── sort.js # routing engine (exports runSort)
382
422
  ├── scripts/
383
423
  │ └── build.js # builds src/ into dist/
@@ -20,9 +20,9 @@ An iterative unit that produces one artefact type and routes to later cycles thr
20
20
  - `output-type` — the artefact type it produces (read-write).
21
21
  - `inputs` — a contract (`any-of` / `all-of`) over other artefact types. Inputs are discovered on disk; they are read-only unless the output type's patterns happen to cover them.
22
22
  - `targets` — the cycle(s) that may run after this one. May be empty (terminal cycle).
23
- - `human-appraise` — whether a human quality gate runs every iteration (default: `false`).
24
- - `deadlock-appraise` — whether a human is pulled in when LLM appraisers deadlock (default: `true`).
25
- - `deadlock-iterations` — deadlock threshold (default: `5`).
23
+ - `always-human-appraise` — whether a human quality gate runs every iteration (default: `false`).
24
+ - `deadlock-human-appraise` — whether a human is pulled in when the deadlock threshold is reached (default: `true`).
25
+ - `max-iterations` — maximum forge iterations (default: `3`).
26
26
  - `models` — optional per-stage model overrides.
27
27
 
28
28
  A cycle runs **assay** first when configured, then **forge → quench → appraise** (and optionally **human-appraise**), looping until all feedback is resolved or `max-iterations` is hit.
@@ -35,9 +35,9 @@ The stage names come from the foundry metaphor because the system treats AI outp
35
35
 
36
36
  - **assay** — opt-in pre-forge stage that populates flow memory by running project-authored extractor scripts (iteration 0 only). No artefact, no feedback, no output beyond memory writes. See the [Assay](#assay) and [Extractor](#extractor) entries below.
37
37
  - **forge** — produce or revise the artefact.
38
- - **quench** — run deterministic CLI checks declared in laws (via their optional `validators:` blocks).
39
- - **appraise** — subjective evaluation by multiple appraiser sub-agents.
40
- - **human-appraise** — human quality gate. Can run every iteration, only on deadlock, or both.
38
+ - **quench** — deterministic validation run inside the orchestrator against laws that contain validators.
39
+ - **appraise** — orchestrator-managed `dispatch_multi` fan-out to appraiser sub-agents, with internal consolidation.
40
+ - **human-appraise** — human quality gate. Runs every iteration when `always-human-appraise` is true, or on deadlock when `deadlock-human-appraise` is true.
41
41
 
42
42
  Feedback is always *about an artefact* and flows backward to forge. Assay sits outside the artefact-feedback loop because it precedes the artefact and its only failure mode (a broken extractor under `foundry/memory/extractors/`) lives outside forge's `file-patterns`.
43
43
 
@@ -63,31 +63,35 @@ See also: [Extractor](#extractor).
63
63
  A definition of what is being produced. Lives in `foundry/artefacts/<type>/`:
64
64
 
65
65
  - `definition.md` — identity, file patterns, output directory, appraiser config, prose description.
66
- - `laws.md` *(optional)* — type-specific subjective criteria, with optional validators for deterministic checks.
66
+ - `laws.md` *(optional)* — type-specific criteria, with optional validators for deterministic checks.
67
67
 
68
68
  File patterns must not overlap with any other artefact type's patterns — the write-invariant enforcer needs to know which type owns a given file.
69
69
 
70
70
  ## Law
71
71
 
72
- A subjective pass/fail criterion. Two scopes:
72
+ A rule or criterion that defines expectations for artefacts. Two scopes:
73
73
 
74
74
  - **Global** — `foundry/laws/*.md`, all files concatenated, applies to every artefact.
75
75
  - **Type-specific** — `foundry/artefacts/<type>/laws.md`.
76
76
 
77
- Each law is a `## heading` (its identifier, used in feedback tags as `law:<id>`) with a description, passing criteria, and failing criteria.
77
+ Each law is a `## heading` (its identifier, used in feedback tags as `law:<id>`) with a description, passing criteria, and failing criteria. Laws may declare optional validators, which are deterministic scripts that verify the artefact programmatically.
78
+
79
+ ## Validator
80
+
81
+ An optional deterministic script attached to a law. Declared in a law's `validators` field and run during the quench stage. Each validator produces feedback tagged `law:<law-id>:<validator-id>` when its check fails. Validators are the mechanism for automated, deterministic enforcement of law requirements.
78
82
 
79
83
  ## Appraiser
80
84
 
81
- An independent evaluator with a defined personality. Lives in `foundry/appraisers/*.md`. Appraisers may specify a `model` field to override the cycle-level appraise model. Each artefact type picks which appraisers may evaluate it (`appraisers.allowed`) and how many run per iteration (`appraisers.count`). Selection distributes evenly across allowed personalities.
85
+ An evaluator that judges artefacts against laws through its personality or perspective. Lives in `foundry/appraisers/*.md`. Appraisers may specify a `model` field to override the cycle-level appraise model. Each artefact type picks which appraisers may evaluate it (`appraisers.allowed`) and how many run per iteration (`appraisers.count`). Selection distributes evenly across allowed personalities. Appraisers receive the artefact content and applicable laws; they do not receive validator metadata.
82
86
 
83
87
  ## WORK.md
84
88
 
85
- The transient shared state for a flow. Created on the work branch by the flow skill, it tracks:
89
+ The transient shared state for a flow. Created on the work branch, it tracks:
86
90
 
87
91
  - Current position (flow, cycle, stage list, iteration limits) in frontmatter.
88
92
  - The goal (prose — written once).
89
- - An artefact registry (file, type, cycle, status).
90
- - Feedback state lives alongside it in `WORK.feedback.yaml`.
93
+
94
+ Artefacts are discovered from branch changes against the current cycle's output-type file patterns, not stored as an artefact table in `WORK.md`. Feedback state lives alongside `WORK.md` in `WORK.feedback.yaml`.
91
95
 
92
96
  See [work-spec.md](work-spec.md) for the full spec.
93
97
 
@@ -100,12 +104,12 @@ Append-only log of every stage execution, sitting next to WORK.md. Used by sort
100
104
  Feedback items live in `WORK.feedback.yaml` — a yaml file at the worktree
101
105
  root, alongside `WORK.md`. Every item has a ULID, a source stage, and a
102
106
  full history of state transitions (open → actioned → resolved, or variants
103
- including wont-fix / rejected / deadlocked).
107
+ including wont-fix / rejected).
104
108
 
105
109
  Plugins read and write feedback through the `foundry_feedback_*` tools;
106
- skills never edit the yaml directly. Sort-side detection of deadlocked
107
- items (per-item history depth) replaces the earlier global-iteration
108
- counter.
110
+ skills never edit the yaml directly. Sort routing uses the iteration count
111
+ and `max-iterations` / `deadlock-human-appraise` settings to detect
112
+ deadlock, not per-item deadlock history.
109
113
 
110
114
  See `docs/work-spec.md` for the full schema and state machine.
111
115
 
@@ -113,8 +117,8 @@ See `docs/work-spec.md` for the full schema and state machine.
113
117
 
114
118
  Human-in-the-loop checkpoint. A stage where Foundry pauses and asks a human for input. Two triggers:
115
119
 
116
- 1. **Every-iteration** — the cycle declares `human-appraise: true`. The `human-appraise` stage runs after LLM appraise each iteration.
117
- 2. **Deadlock** — the cycle declares `deadlock-appraise: true` (default). If forge and appraisers ping-pong on the same items for `deadlock-iterations` (default 5) iterations, sort inserts a `human-appraise` stage to break the tie.
120
+ 1. **Every-iteration** — the cycle declares `always-human-appraise: true`. The `human-appraise` stage runs after LLM appraise each iteration.
121
+ 2. **Deadlock** — the cycle declares `deadlock-human-appraise: true` (default). If the iteration count reaches `max-iterations`, sort inserts a `human-appraise` stage to break the tie.
118
122
 
119
123
  Human feedback is tagged `human` and takes priority over LLM feedback on the same topic.
120
124
 
@@ -194,6 +198,28 @@ The signed squash commit on the base branch references the archive
194
198
  branch tip SHA in its attestation block. Archive branches accumulate
195
199
  indefinitely — periodic manual pruning is outside the tool's scope.
196
200
 
201
+ ## Attestation
202
+
203
+ A cryptographic claim that a work branch completed all required stages
204
+ and all feedback was resolved. Created by `foundry_attest({ confirm: true })`,
205
+ which writes and commits `ATTEST.md` at `HEAD` containing the cycle
206
+ goal, a diff SHA-256 of the branch changes, and a canonical JSON payload
207
+ with the flow contract, governance hashes, output artefact list, process
208
+ log, and archive branch reference. Work-branch `foundry_git_finish`
209
+ verifies the attestation before allowing the squash merge. Config and
210
+ dry-run branches do not require attestation.
211
+
212
+ ## ATTEST.md
213
+
214
+ The attestation document written by `foundry_attest` as the work-branch
215
+ `HEAD` commit. Contains the goal prose, `diff-sha256` of the branch
216
+ diff, and a signed attestation block between `-----BEGIN FOUNDRY
217
+ ATTESTATION-----` and `-----END FOUNDRY ATTESTATION-----` delimiters.
218
+ `foundry_git_finish` verifies `ATTEST.md` exists at `HEAD`, checks the
219
+ `diff-sha256` matches the recomputed branch diff, uses the attestation
220
+ payload in the signed final commit message, and preserves the archive
221
+ branch reference.
222
+
197
223
  ## Stage token
198
224
 
199
225
  A single-use HMAC-signed string, minted by `foundry_orchestrate` when a stage is dispatched. The sub-agent must redeem the token via `foundry_stage_begin`; mutation tools then check the active stage matches their role. Keys live in `.foundry/.secret` (mode 0600, gitignored, one per worktree). This prevents out-of-band mutations, replayed stages, and sub-agents skipping the lifecycle.
@@ -267,6 +293,16 @@ All deterministic pipeline operations are exposed as custom tools by the Foundry
267
293
 
268
294
  A self-contained workflow written as markdown with YAML frontmatter. Foundry ships a user-facing `Foundry` guide agent plus skills for pipeline execution, authoring, maintenance, and memory administration. The guide agent is the normal interface for users; skills and tools provide the internal workflows it uses to initialise projects, create artefact types, define laws, configure appraisers, build cycles and flows, and run governed artefact generation. Skills are either **atomic** (do one thing) or **composite** (orchestrate other skills).
269
295
 
296
+ ## Foundry agent wizard
297
+
298
+ The interactive configuration process that runs when the user first
299
+ interacts with the Foundry agent. The wizard walks through four phases
300
+ — Understand, Plan, Confirm, Build — asking one question at a time.
301
+ Configuration files are created only after the user confirms the plan.
302
+ The wizard eliminates hand-authoring of normal setup by using the
303
+ structured config creation tools (`foundry_config_create_*`) on a
304
+ `config/*` branch.
305
+
270
306
  ---
271
307
 
272
308
  ## Extractor