npm - baldart - Versions diffs - 4.41.0 → 4.42.0 - Mend

baldart 4.41.0 → 4.42.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/CHANGELOG.md +21 -0
package/VERSION +1 -1
package/framework/.claude/agents/REGISTRY.md +1 -1
package/framework/.claude/skills/new/references/setup.md +19 -7
package/framework/.claude/workflows/new2.js +41 -3
package/framework/scripts/validate-card-baseline.js +133 -3
package/framework/templates/ci/check-card-baseline.yml +0 -3
package/package.json +1 -1

package/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,27 @@ All notable changes to BALDART will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [4.42.0] - 2026-06-15
+**The `/new` worktree-setup can no longer pass on a fabricated baseline: the orchestrator now verifies the worktree on disk instead of trusting the subagent's self-report, and the setup subagent moves off `haiku`.** A real `/new` run reproduced **2/2** a silent failure — the background **worktree-setup** subagent (on `haiku`) returned a well-formed block reporting `baseline: pass` in ~6s **with no worktree on disk**: it pattern-matched the expected output instead of running the multi-step `/nw` skill, and the orchestrator trusted it because the baseline gate only ever checked the *returned* field, never the disk. This release closes both halves: (1) the worktree-setup subagent moves **`haiku → sonnet`** (running `/nw` via the Skill tool is a sustained tool-execution chain a too-weak model fabricates rather than executes); (2) a new **worktree integrity gate** (`setup.md §6a`) verifies the worktree with **orchestrator Bash** — `git worktree list --porcelain` + `test -d` + branch + `node_modules` — which a subagent cannot fabricate, and routes any failure to a **non-circular fallback chain** (`subagent → inline /nw → HALT`, no loop, with a build `timeout`). `new2`'s pre-flight gets the parallel mitigation (it cannot run Bash, so it returns non-falsifiable evidence the workflow string-matches — explicitly declared structurally weaker). Separately, the card-baseline validator is now **dependency-free**: it parsed cards with `js-yaml`, absent from the framework payload, so `require('js-yaml')` failed in consumers and silently disabled card-baseline validation (1b-iii) — replaced with a node-core `parseCardYaml`. **MINOR** (additive integrity gate + a behavior change to the worktree subagent model + a dependency-removing bugfix; no removed surface, the `/nw` `{path,branch,port}` contract is unchanged, and no new `baldart.config.yml` key ⇒ schema-change propagation rule N/A).
+### Added
+- **`framework/.claude/skills/new/references/setup.md` §6a — worktree integrity gate (BLOCKING, orchestrator-Bash).** On `baseline: pass`, the orchestrator verifies the worktree on disk *itself* (`git -C "$MAIN" worktree list --porcelain` lists the path, `test -d`, branch matches, `node_modules` present; registry `buildVerified` + a build-artifact dir as best-effort corroboration) — **never** a `verified` field returned by a subagent (which is exactly as fabricable as the block). Any of checks 1-4 failing routes to the §4d fallback. Honest-failure signals (`baseline: fail`/`timeout`) are trusted and STOP first (recreating cannot fix a broken or hung build).
+- **`framework/scripts/validate-card-baseline.js` — `parseCardYaml()`**, a node-core reader for the card-YAML subset (`|`/`>` block scalars, nested maps, scalar/map lists, inline flow collections, number fidelity for `group.sequence` which drives epic detection). Replaces the `js-yaml` dependency. Exported and guarded by a new **Check C** in `scripts/check-card-baseline.js`.
+### Changed
+- **`framework/.claude/skills/new/references/setup.md` §4b — worktree-setup subagent `model: "haiku" → "sonnet"`** (rationale rewritten: a sustained Skill-tool execution chain, not one-shot plumbing; its return is verified by the §6a disk gate regardless). §4d fallback broadened from "empty / 0-tool-uses" to "did not produce a VERIFIED worktree", made a genuinely **different executor** (inline `/nw`, never a re-spawn or a frozen script) with an explicit cap (`subagent → inline → HALT`; transient-vs-deterministic classification), and the briefing wraps the build in `timeout 600` → new `baseline: timeout`.
+- **`framework/.claude/agents/REGISTRY.md`** — the haiku plumbing carve-out drops the worktree-setup sub-clause (now sonnet); only the file-scoped revert agent remains a sanctioned haiku use.
+- **`framework/.claude/workflows/new2.js`** — `PREFLIGHT_SCHEMA` gains `worktreeVerified` + `worktreeEvidence` (literal `worktree list --porcelain` / `ls` / baseline-log tail) and `baseline: 'timeout'`; a new **E2.5 worktree integrity gate** string-matches that evidence (the workflow JS cannot run Bash, so it is declared structurally weaker than classic `/new`'s orchestrator-Bash gate), and a timeout is batch-fatal.
+- **`framework/templates/ci/check-card-baseline.yml`** — drops the now-unnecessary `npm i js-yaml` step (the validator is dependency-free).
+### Fixed
+- **Card-baseline validation (1b-iii) silently skipped in every consumer lacking `js-yaml`.** The shipped validator `require`-d a package not in the framework payload, so `/new` / `/new2` pre-flight could not run it and degraded to skip-with-note. Now node-core — it runs everywhere.
+- **`/new` worktree-setup accepting a fabricated `baseline: pass`** (reproduced 2/2 on a real consumer) — closed by the §6a integrity gate + the sonnet model + the broadened fallback above.
 ## [4.41.0] - 2026-06-15
 **BALDART becomes opinionated about the *tools*, not just the workflow: a new curated toolchain layer installs best-in-class JS/TS dev tools at first install and makes the agents actually use them.** Until now BALDART shipped agents and workflows but was agnostic about linters/formatters/test-runners — every quality gate hard-coded `eslint`/`tsc`/`jest`. This release adds an opt-in toolchain layer (`features.has_toolchain`) that, on a JS/TS project, PRESELECTS and installs a curated set as devDependencies — **Biome** (format + lint + import organizer), **Vitest**, **tsc**, **Lefthook** (pre-commit) — then records literal gate commands in `toolchain.commands.*` that the gate flows (`/new`, `/new2`, `/qa`, `qa-sentinel`, `coder`) run verbatim instead of guessing. The design is the fourth member of the install-adapter family (alongside routine-/tool-/lsp-adapters) and inherits their invariants: **opinionated but askable** (default Y, opt-out), **non-destructive** (configs written only when absent; existing ESLint/Prettier/Jest/husky are detected and a migration is only ever PROPOSED, never automatic — `.husky/` is never overwritten), **never silent in CI** (`--non-interactive` writes the flag only; `baldart doctor` backfills), and **silent fallback** (an unset command degrades to the project-standard default; the layer is invisible to non-JS projects and consumers with their own toolchain). **MINOR** (additive capability + new `features.has_toolchain` + `toolchain.*` config keys, propagated end-to-end per the schema-change propagation rule; backwards-compatible — flag defaults `false`, every gate falls back to today's behavior when unset).

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 4.41.0
1	+ 4.42.0

package/framework/.claude/agents/REGISTRY.md CHANGED Viewed

@@ -200,7 +200,7 @@ Use this table when spawning agents via the `Task` tool. The `model` field in ea
 **Rules**: Never use haiku for any **named specialist agent** in the matrix above. Opus for code writing and creative/complex work. Sonnet for analysis, review, and documentation. `qa-sentinel` stays on sonnet even though it is a mechanical gate runner — its failure interpretation and SCOPED-vs-FULL tiering benefit from sonnet.
-**Plumbing carve-out (haiku allowed):** the "never haiku" rule above governs the specialist agents. **Mechanical plumbing spawned as `general-purpose`** — with no reasoning and an explicit ROLE BOUNDARY — MAY use `model: "haiku"` via override. Two sanctioned uses today, both in `/new`: (1) the background **worktree-setup** subagent that runs `/nw` (create worktree, install deps, allocate port, write registry — `references/setup.md` §4b); (2) the **file-scoped revert agent** that restores unauthorized files to their pre-commit state (`references/implement.md` Phase 2.4). Both are deterministic git/file ops, never code authoring.
+**Plumbing carve-out (haiku allowed):** the "never haiku" rule above governs the specialist agents. **Mechanical plumbing spawned as `general-purpose`** — with no reasoning and an explicit ROLE BOUNDARY — MAY use `model: "haiku"` via override. One sanctioned use today, in `/new`: the **file-scoped revert agent** that restores unauthorized files to their pre-commit state (`references/implement.md` Phase 2.4) — a deterministic git/file op, never code authoring. (The background **worktree-setup** subagent — `references/setup.md` §4b — was haiku through v4.41.0 but moved to **`model: "sonnet"`** in v4.42.0: it runs the multi-step `/nw` skill **via the Skill tool**, a sustained tool-execution chain a too-weak model fabricates rather than executes — the observed 2/2 `baseline: pass`-with-no-worktree failures. Its return is verified by the orchestrator's disk gate regardless, so a capable model is the right trade.)
 ## Notes

package/framework/.claude/skills/new/references/setup.md CHANGED Viewed

@@ -281,29 +281,41 @@
       - **`WT_PATH` set AND `registry.json` has a complete code entry for this slug** (a finished prior run — `buildVerified` recorded) → **resume**: read `path`/`branch`/`port`/`createdAt`/`buildVerified` from that entry, skip to step 6; re-run the baseline as a single background `Bash` (output to `/tmp`) **only if** `buildVerified` is not `true`.
       - **`WT_PATH` set but NO complete registry entry** (a prior attempt interrupted mid-setup — the normal compaction-mid-barrier state: worktree created, build unfinished, entry never written) → it is a half-built orphan with **no card work** (we are still in pre-flight, zero commits). **Reset clean and recreate**: `git -C "$MAIN" worktree remove --force "$WT_PATH"` then `git -C "$MAIN" branch -D "$WT_BRANCH"` (both ignore-if-absent), then proceed to 4b. A pre-flight worktree has nothing to lose, so a clean recreate is always safe — and it sidesteps the fail-loud collision a naive re-spawn would hit.
       This — detection by `git worktree list`, not the lagging registry — is what makes the deferred-flush pre-flight genuinely idempotent across compaction.
-   b. **Spawn ONE background subagent** (Agent tool, **`mode: "bypassPermissions"`** — mandatory per the SKILL.md meta-rules; a background agent that hit a permission prompt with no human present would stall the barrier forever — `run_in_background: true`, `name: "worktree-setup-<FIRST-CARD-ID>"`, a subagent type that can use the Skill tool — `general-purpose`, **`model: "haiku"`** — this mission is pure deterministic plumbing (create the worktree, install deps, allocate the dev-server port, write the registry entry) with no reasoning; haiku is sufficient and is faster on the barrier, and it is the sanctioned plumbing carve-out to REGISTRY.md's "never haiku" rule) whose ENTIRE mission is to run `/nw` and return the block below. Briefing:
+   b. **Spawn ONE background subagent** (Agent tool, **`mode: "bypassPermissions"`** — mandatory per the SKILL.md meta-rules; a background agent that hit a permission prompt with no human present would stall the barrier forever — `run_in_background: true`, `name: "worktree-setup-<FIRST-CARD-ID>"`, a subagent type that can use the Skill tool — `general-purpose`, **`model: "sonnet"`** — the mission runs the multi-step `/nw` skill (group cards, create the worktree, install deps, allocate the dev-server port, write the registry entry, run the baseline) **via the Skill tool**: a sustained tool-execution chain, NOT one-shot plumbing. A too-weak model pattern-matches the expected return block instead of doing the work — the observed 2/2 fabrication was a well-formed block reporting `baseline: pass` in ~6s with **no worktree on disk**. Sonnet executes it for real or fails honestly; either way the returned block is **never trusted as evidence** — the orchestrator's disk-verification gate (step 6a below) is the source of truth) whose ENTIRE mission is to run `/nw` and return the block below. Briefing:
       ```
       Invoke the worktree-manager skill in `/nw` programmatic mode with:
         { cards: [<all card IDs>], groupParent: <PARENT-ID|null>, slug: "<slug>" }
       Let it: group cards, derive the branch from git_strategy.branch (fallback
       feat/<PARENT-ID>-<slug>), create the worktree in .worktrees/, install deps,
       copy env files, assign a free port, update .worktrees/registry.json (all card
-      IDs in the `cards` field), and run the baseline (tsc + lint + build).
+      IDs in the `cards` field), and run the baseline (tsc + lint + build)
+      UNDER A HARD TIMEOUT — wrap the build step in `timeout 600 <build-cmd>`
+      (10 min) so a hung or interactive build cannot stall the pre-flight barrier.
       Redirect ALL install/build/git output to files under /tmp — never inline it.
       Return ONLY this block, nothing else (no logs, no narration, no skill recap):
         worktree_path: <absolute path>
         branch: <branch>
         port: <n>
         created_at: <ISO-8601, stamped WHEN you create the worktree — before install/build>
-        baseline: pass | fail
-        baseline_log: <path on failure, else "-">
-      If the build fails, return `baseline: fail` with the log path and STOP — do not continue.
+        baseline: pass | fail | timeout
+        baseline_log: <path on failure/timeout, else "-">
+      If the build fails, return `baseline: fail` with the log path and STOP.
+      If the timeout kills the build, return `baseline: timeout` with the partial
+      log path and STOP — do not continue.
       ```
    c. The subagent's context (the worktree-manager skill body, install/build logs, git output) **lives and dies inside it** — the orchestrator receives only the structured block. Combined with the Codex check (3d, also background), this replaces the old ~30-turn foreground pre-flight tail with background ops and a single resume.
-   d. **Fallback if the subagent cannot run `/nw` (the Skill-from-subagent capability is NOT assumed).** If the subagent returns empty, or without the structured block — e.g. it cannot invoke the Skill tool in this consumer, or it returns the known "0 tool uses · Done" empty-result failure — do **NOT** strand the barrier: the orchestrator **falls back to invoking `/nw` inline itself** (the pre-v4.15.0 path — slug from 4a, then record `worktree_path`/`branch`/`port` and stamp `created_at` from the inline return). You lose the prefix saving for this one run, but pre-flight completes. This is the same opt-in-with-fallback discipline the Codex check (3d) and the dynamic-workflow gate use — never leave the worktree uncreated on a missing capability.
+   d. **Fallback executor — when the subagent does not produce a VERIFIED worktree.** The subagent's returned block is **never trusted as evidence**; step 6a verifies the worktree on disk. Trigger this fallback when EITHER (i) the subagent returns empty / without the structured block (cannot invoke the Skill tool in this consumer, or the known "0 tool uses · Done" empty-result), OR (ii) it returns a well-formed block but the **step-6a disk gate is VERIFIED:false** (the observed 2/2 fabrication: `baseline: pass` in ~6s with no worktree on disk). In both cases do **NOT** strand the barrier: the orchestrator **falls back to invoking `/nw` inline itself** — a genuinely **DIFFERENT executor** (the full-model orchestrator interpreting the skill prose), **not** a re-spawn of the same subagent and **not** a frozen script, so neither a Skill-from-subagent capability gap nor a weak-model fabrication can recur on the fallback. Slug from 4a; record `worktree_path`/`branch`/`port` and stamp `created_at` from the inline return; then **re-run the step-6a disk gate ONCE**. **Cap (no loop):** the chain is strictly `subagent → inline /nw → HALT+report` — never re-spawn the subagent, never loop the inline path. If the inline attempt ALSO fails the gate, or the failure is **deterministic** (port exhaustion across 3001-3099, a corrupted lockfile, or a genuine `baseline: fail`/`timeout` build error — recreating cannot fix these), **HALT and report** rather than retry. You lose the prefix saving for this one run, but pre-flight completes or halts cleanly — never silently on a phantom worktree. This is the same opt-in-with-fallback discipline the Codex check (3d) and the dynamic-workflow gate use.
 5. **End the turn — barrier on ALL launched background ops (wait for every one, not the first).** Having launched the Codex cross-card check (3d) and the worktree-setup subagent (4), the orchestrator has nothing to do until they return. **End the turn** — do NOT poll with `sleep`/`echo "waiting"` loops (§ "Context economy"; same rule as team-mode Step C). Background agents and background `Bash` re-invoke the orchestrator automatically on completion. **Wait for EVERY launched op before step 6**: each completion wakes you separately, so on each wake check whether *all* launched ops have returned — if one is still in flight, **end the turn again** and wait. Do NOT proceed to step 6 on the first completion, or you would read a half-written `$AUDIT_FILE` (and 3d's "If PASS or file empty: proceed normally" would silently swallow real conflicts) or a missing worktree block. (If 3d was SKIPPED by the provenance gate, the only op is the worktree subagent — or none, if step 4a2 resumed an existing worktree.) **Recovery**: a compaction mid-barrier re-enters pre-flight from step 4; the 4a2 git pre-check makes that safe (the worktree is detected via `git worktree list` and resumed-or-reset, never blindly re-created into a fail-loud collision).
 6. **On resume — flush the pre-flight tracker sections in one pass (no incremental per-sub-step churn).** When all launched ops have returned:
-   a. **Baseline gate**: if the worktree subagent returned `baseline: fail` → STOP and report (point the user at `baseline_log`). Do NOT continue to Phase 1.
+   a. **Worktree integrity gate (BLOCKING — the disk is the source of truth, not the returned block).**
+      - **Honest-failure signals first** (trust these — a reported failure is real, not a fabrication, and recreating the worktree would not fix it): if the block reports `baseline: fail` → STOP and report (point the user at `baseline_log`); if `baseline: timeout` (the build exceeded the launch timeout, §4b/§5) → STOP and report the timeout. Do NOT continue to Phase 1; do NOT recreate.
+      - **On `baseline: pass`, VERIFY the worktree on disk YOURSELF** — these are orchestrator `Bash` calls run in this (full-model) context; **never** trust a `verified`-style field returned by a subagent (it is exactly as fabricable as the block itself). With `$WT_PATH`/`$WT_BRANCH` from the block and `$MAIN` from the tracker (`## Worktree` `Main repo:`):
+        1. `git -C "$MAIN" worktree list --porcelain` lists `$WT_PATH`;
+        2. `test -d "$WT_PATH"`;
+        3. `git -C "$WT_PATH" rev-parse --abbrev-ref HEAD` equals `$WT_BRANCH`;
+        4. `test -d "$WT_PATH/node_modules"` (proves install actually ran);
+        5. *(corroboration)* the `.worktrees/registry.json` entry for this worktree has `buildVerified: true`, and — best-effort — a build artifact dir if the project emits one (`.next`/`dist`/`build`/`out`) is present with mtime newer than `$WT_PATH/package.json`. A missing build dir is **not** a hard fail on its own (a lib/CLI may emit none); checks 1-4 are the real evidence.
+      - **All pass → VERIFIED**: proceed to (b). **Any of 1-4 fails →** the block was fabricated or setup is incomplete (the observed 2/2: `baseline: pass`, ~6s, nothing on disk) → route to the **4d fallback chain** (inline `/nw`, re-run this gate ONCE, then HALT — never loop). This gate is the actual fix for the haiku-fabrication failure; do not skip it on the happy path.
    b. **Codex verdict**: handle it via the verdict-extraction discipline in 3d (read `$AUDIT_FILE` through the `[codex]`-stripping filter; keep distilled findings only).
    c. **One-pass tracker flush (no round-trips).** Assemble the pre-flight sections **in-context** (they are all small) and fill them with **back-to-back `Edit`s and no intervening reads**. The win is killing the old read-modify-read-modify churn (~5 incremental edits), **not** the literal tool-call count. **Do NOT `Write`-overwrite the whole file from in-context memory**: Phase 0 already wrote `Main repo:` / `Trunk branch:` / `Metrics dir:` into `## Worktree` and the `Status`/divergence lines into `## Phase 0`, and after a barrier compaction you may no longer hold those in-context (`$MAIN` "does not survive context compaction" — § Phase 0 step 1) — an overwrite-from-memory would silently drop them and HALT later with "`$MAIN` absent from tracker". Surgical `Edit`s on the placeholder sections leave Phase 0's content intact. Sections filled here:
       - `## File Ownership Map` (3b).

package/framework/.claude/workflows/new2.js CHANGED Viewed

@@ -141,15 +141,29 @@ if (!cardIds.length) {
 // ───────────────────────────────────────────────────────────────────────────
 const PREFLIGHT_SCHEMA = {
   type: 'object',
-  required: ['ok', 'worktreePath', 'branch', 'baseline', 'cards', 'cardGraph'],
+  required: ['ok', 'worktreePath', 'branch', 'baseline', 'worktreeVerified', 'cards', 'cardGraph'],
   additionalProperties: false,
   properties: {
     ok: { type: 'boolean' },
     worktreePath: { type: 'string' },
     branch: { type: 'string' },
     port: { type: ['number', 'string'] },
-    baseline: { enum: ['pass', 'fail'] },
+    baseline: { enum: ['pass', 'fail', 'timeout'] },
     baselineLog: { type: 'string' },
+    // v4.42.0 — worktree integrity. The workflow JS CANNOT run bash, so it cannot re-verify the
+    // worktree itself the way classic /new does with orchestrator Bash (setup.md §6a) — this gate
+    // is STRUCTURALLY WEAKER. Best mitigation: the agent returns worktreeVerified + the LITERAL
+    // command evidence, which the workflow string-matches (a bare boolean would be as fabricable
+    // as the haiku block was). Real backstop: every card cd's into the worktree and re-runs gates.
+    worktreeVerified: { type: 'boolean', description: 'true ONLY after the agent ran `git worktree list --porcelain` + `test -d <wt>/node_modules` and they confirm the worktree on disk' },
+    worktreeEvidence: {
+      type: 'object', additionalProperties: true,
+      properties: {
+        worktreeListPorcelain: { type: 'string', description: 'literal stdout of `git -C <main> worktree list --porcelain`' },
+        artifactsLs: { type: 'string', description: 'literal stdout of `ls -la <wt>/node_modules <wt>/.next 2>/dev/null | head`' },
+        baselineLogTail: { type: 'string', description: 'last ~20 lines of the baseline log' },
+      },
+    },
     // B4 — executionMode/groups removed: the DAG scheduler is strictly sequential (single
     // worktree); computing team-mode groups was pre-flight work nobody read. Real parallelism
     // is a future release, after A/B data.
@@ -246,7 +260,7 @@ try {
       `ROLE BOUNDARY (specialization integrity): you are the OPS/GIT agent. You NEVER edit source or doc files — any needed content change belongs to the coder specialist; report it instead.\n\n` +
       `DETERMINISTIC GATE POLICIES (NO user prompts):\n` +
       `• G1 dirty-tree (main repo ${MAIN}): partition framework-managed noise exactly as setup.md step 3 ($METRICS=${METRICS}, .baldart/generated|state.json|skill-conflicts.json — NOT overlays/). Genuine user work → auto-stash 'baldart-new2-${firstCard}' (main checkout) and record the label. Never commit/abort/prompt.\n` +
-      `• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → do NOT fix it yourself (role boundary — the coder specialist repairs it); return baseline:'fail' + a baselineLog precise enough for a coder to act (failing command, error excerpt, suspect files).\n` +
+      `• Worktree (setup.md step 4): create ONE code worktree off ${TRUNK}; install deps; assign a port; run the baseline (tsc+lint+build). Copy ONLY the artifacts needed (env/.env.local/.env.example/supabase/.temp) — do NOT bulk-copy untracked files from the main repo (avoids stray backlog cards in the worktree). Use the git-authoritative idempotency pre-check. E2: baseline FAILS → do NOT fix it yourself (role boundary — the coder specialist repairs it); return baseline:'fail' + a baselineLog precise enough for a coder to act (failing command, error excerpt, suspect files). Wrap the build in \`timeout 600 <build-cmd>\` (10 min); if killed, return baseline:'timeout' + the partial log. On baseline PASS, VERIFY the worktree on disk and return EVIDENCE (not just a flag): set worktreeVerified:true ONLY after running \`git -C ${MAIN} worktree list --porcelain\` (the worktree path MUST appear in the output) AND \`test -d <wt>/node_modules\` AND confirming the branch; put the LITERAL stdout into worktreeEvidence{ worktreeListPorcelain, artifactsLs:\`ls -la <wt>/node_modules <wt>/.next 2>/dev/null | head\`, baselineLogTail }. The workflow string-matches this evidence — NEVER report worktreeVerified:true without actually running the commands.\n` +
       codexResolveBullet +
       g3Bullet +
       `• G4 card-field validation (setup.md 1b/1c): card missing requirements/acceptance_criteria/files_likely_touched → EXCLUDE (excluded[] + reason). Never HALT for one bad card.\n` +
@@ -271,6 +285,12 @@ if (!preflight || preflight.ok === false) {
   ledger(firstCard, 'preflight', 'BATCH-FATAL', (preflight && preflight.workspaceNote) || 'workspace unworkable')
   return finalReturn({ fatal: true, reason: 'workspace unworkable — see pre-flight' })
 }
+if (preflight.baseline === 'timeout') {
+  // v4.42.0 — a hung/interactive build hit the §4b 10-min timeout. An autonomous batch cannot
+  // ask the user (classic /new STOPs here); recreating cannot fix a hang → batch-fatal.
+  ledger(firstCard, 'baseline', 'BATCH-FATAL', 'baseline build timed out — see baselineLog')
+  return finalReturn({ fatal: true, reason: 'baseline build timed out (hung/interactive build) — see baselineLog' })
+}
 if (preflight.baseline === 'fail') {
   // E2 (specialization integrity) — baseline repair is CODE work: it belongs to the coder
   // specialist, not the ops pre-flight agent (which never edits source). ONE bounded attempt;
@@ -291,6 +311,24 @@ if (preflight.baseline === 'fail') {
   }
 }
+// E2.5 (v4.42.0) — worktree integrity gate. Mirrors classic /new setup.md §6a, but WEAKER: the
+// workflow JS cannot run bash, so it can only string-match the LITERAL evidence the pre-flight
+// agent returned, not re-derive it on disk. Catches the haiku-class fabrication (baseline:'pass'
+// with no worktree) by requiring the worktree path to appear in the agent's `worktree list
+// --porcelain` output and node_modules in its ls. Skipped on the fail→repair path (that path
+// provably did real work and the coder cd'd in). Backstop: every card re-runs the gates anyway.
+if (preflight.baseline === 'pass') {
+  const ev = preflight.worktreeEvidence || {}
+  const wt = String(preflight.worktreePath || '')
+  const porcelain = String(ev.worktreeListPorcelain || '')
+  const verified = preflight.worktreeVerified === true && !!wt && porcelain.includes(wt) && /node_modules/.test(String(ev.artifactsLs || ''))
+  if (!verified) {
+    ledger(firstCard, 'E2.5-worktree', 'BATCH-FATAL', `worktree not verified on disk (verified=${preflight.worktreeVerified}, pathInPorcelain=${porcelain.includes(wt)}) — possible fabricated baseline:pass`)
+    return finalReturn({ fatal: true, reason: 'worktree integrity gate failed — baseline:pass but the returned evidence does not confirm a worktree on disk' })
+  }
+  ledger(firstCard, 'E2.5-worktree', 'VERIFIED', 'porcelain lists the worktree + node_modules present')
+}
 for (const ex of preflight.excluded || []) ledger(ex.card, 'preflight-exclude', 'EXCLUDED', ex.reason)
 if (preflight.workspaceNote) ledger(firstCard, 'G1/G2-workspace', 'AUTO', preflight.workspaceNote)
 if (preflight.crossCard) ledger(firstCard, 'G3-cross-card', 'INFO', preflight.crossCard)

package/framework/scripts/validate-card-baseline.js CHANGED Viewed

@@ -28,7 +28,6 @@
 const fs = require('fs');
 const path = require('path');
-const yaml = require('js-yaml');
 const SCRIPT_DIR = __dirname;
 const SCHEMA_MD = path.join(SCRIPT_DIR, '..', 'agents', 'card-schema.md');
@@ -141,6 +140,137 @@ function parseEnumBlock(text, header) {
   return vals.length ? vals : null;
 }
+// --- minimal node-core YAML reader (card subset) ----------------------------
+// Why (v4.42.0): the shipped validator must run in ANY consumer via `node <path>`,
+// but the framework payload carries no node_modules — `require('js-yaml')` failed
+// with "Cannot find module" and silently disabled card baseline validation (1b-iii)
+// in every consumer lacking js-yaml. This dependency-free reader parses the regular,
+// machine-generated card YAML (2-space indent, `|`/`>` block scalars, nested maps,
+// scalar/map lists, inline flow collections) well enough for the fields validateCard
+// reads: top-level presence/non-emptiness, group.sequence (as a NUMBER — drives epic
+// detection), group.parent / links.prd (strings), and the scalar enums. Exercised by
+// scripts/check-card-baseline.js Check C — keep them in sync.
+function stripInlineComment(s) {
+  // Drop a trailing ` # comment`, respecting single/double quotes.
+  let inS = false, inD = false;
+  for (let i = 0; i < s.length; i++) {
+    const c = s[i];
+    if (c === "'" && !inD) inS = !inS;
+    else if (c === '"' && !inS) inD = !inD;
+    else if (c === '#' && !inS && !inD && (i === 0 || /\s/.test(s[i - 1]))) return s.slice(0, i).trimEnd();
+  }
+  return s;
+}
+function splitFlow(inner) {
+  // Split a flow collection body on top-level commas (ignore nested [] {} and quotes).
+  const out = [];
+  let depth = 0, inS = false, inD = false, buf = '';
+  for (const c of inner) {
+    if (c === "'" && !inD) inS = !inS;
+    else if (c === '"' && !inS) inD = !inD;
+    if (!inS && !inD) {
+      if (c === '[' || c === '{') depth++;
+      else if (c === ']' || c === '}') depth--;
+      else if (c === ',' && depth === 0) { out.push(buf); buf = ''; continue; }
+    }
+    buf += c;
+  }
+  if (buf.trim() !== '') out.push(buf);
+  return out;
+}
+function parseScalar(raw) {
+  const s = stripInlineComment(String(raw)).trim();
+  if (s === '' || s === '~' || s === 'null') return null;
+  if (s === '[]') return [];
+  if (s === '{}') return {};
+  if (s === 'true') return true;
+  if (s === 'false') return false;
+  const q = s[0];
+  if ((q === '"' || q === "'") && s[s.length - 1] === q && s.length >= 2) return s.slice(1, -1);
+  if (s[0] === '[' && s[s.length - 1] === ']') {
+    const inner = s.slice(1, -1).trim();
+    return inner === '' ? [] : splitFlow(inner).map(parseScalar);
+  }
+  if (s[0] === '{' && s[s.length - 1] === '}') {
+    const inner = s.slice(1, -1).trim();
+    const o = {};
+    if (inner !== '') for (const part of splitFlow(inner)) {
+      const ci = part.indexOf(':');
+      if (ci >= 0) o[part.slice(0, ci).trim()] = parseScalar(part.slice(ci + 1));
+    }
+    return o;
+  }
+  if (/^-?\d+$/.test(s)) return parseInt(s, 10);
+  if (/^-?\d*\.\d+$/.test(s)) return parseFloat(s);
+  return s;
+}
+function parseCardYaml(text) {
+  const rows = [];
+  for (const rawLine of String(text).split('\n')) {
+    const line = rawLine.replace(/\t/g, '  ');           // defensive: tabs → 2 spaces
+    const trimmed = line.trim();
+    if (trimmed === '' || trimmed[0] === '#') continue;   // blank / full-line comment
+    rows.push({ indent: line.length - line.trimStart().length, content: trimmed });
+  }
+  let i = 0;
+  const isSeq = (r) => r.content === '-' || r.content.startsWith('- ');
+  const isBlockScalar = (rest) => /^[|>][+-]?\d*$/.test(rest);
+  function consumeKeyInto(obj, indent) {
+    const m = rows[i].content.match(/^([^:]+):(.*)$/);
+    if (!m) { i++; return; }                              // not a key line — skip defensively
+    const key = m[1].trim();
+    const rest = m[2].trim();
+    i++;
+    if (isBlockScalar(rest)) {
+      const buf = [];
+      while (i < rows.length && rows[i].indent > indent) { buf.push(rows[i].content); i++; }
+      obj[key] = buf.join('\n');                          // value content irrelevant — only non-emptiness
+    } else if (rest === '') {
+      obj[key] = (i < rows.length && rows[i].indent > indent) ? parseBlock(rows[i].indent) : null;
+    } else {
+      obj[key] = parseScalar(rest);
+    }
+  }
+  function parseBlock(indent) {
+    if (i < rows.length && rows[i].indent === indent && isSeq(rows[i])) {
+      const arr = [];
+      while (i < rows.length && rows[i].indent === indent && isSeq(rows[i])) {
+        const dashIndent = rows[i].indent;
+        const item = rows[i].content === '-' ? '' : rows[i].content.slice(2);
+        i++;
+        if (item === '') {
+          arr.push((i < rows.length && rows[i].indent > dashIndent) ? parseBlock(rows[i].indent) : null);
+        } else if (isBlockScalar(item)) {
+          const buf = [];
+          while (i < rows.length && rows[i].indent > dashIndent) { buf.push(rows[i].content); i++; }
+          arr.push(buf.join('\n'));
+        } else if (/^[A-Za-z_][\w.-]*:(\s|$)/.test(item)) {
+          const obj = {};
+          const m = item.match(/^([^:]+):(.*)$/);
+          const rest = m[2].trim();
+          obj[m[1].trim()] = rest === '' ? null : parseScalar(rest);
+          while (i < rows.length && rows[i].indent > dashIndent && !isSeq(rows[i])) consumeKeyInto(obj, rows[i].indent);
+          arr.push(obj);
+        } else {
+          arr.push(parseScalar(item));
+        }
+      }
+      return arr;
+    }
+    const obj = {};
+    while (i < rows.length && rows[i].indent === indent && !isSeq(rows[i])) consumeKeyInto(obj, indent);
+    return obj;
+  }
+  return rows.length ? parseBlock(rows[0].indent) : null;
+}
 // --- profile detection + validation ----------------------------------------
 function detectProfile(card, filename = '') {
@@ -233,7 +363,7 @@ function main(argv) {
   for (const file of files) {
     let card;
     try {
-      card = yaml.load(fs.readFileSync(file, 'utf8'));
+      card = parseCardYaml(fs.readFileSync(file, 'utf8'));
     } catch (e) {
       process.stdout.write(`✖ ${file}: YAML parse error — ${e.message}\n`);
       failed++;
@@ -260,4 +390,4 @@ if (require.main === module) {
   process.exit(main(process.argv));
 }
-module.exports = { validateCard, detectProfile, loadSchema, loadEnums, nonEmpty };
+module.exports = { validateCard, detectProfile, loadSchema, loadEnums, nonEmpty, parseCardYaml };

package/framework/templates/ci/check-card-baseline.yml CHANGED Viewed

@@ -28,9 +28,6 @@ jobs:
         with:
           node-version: '20'
-      - name: Install js-yaml (validator dependency)
-        run: npm i js-yaml --no-save
       - name: Validate backlog card baseline
         run: |
           VALIDATOR=.framework/framework/scripts/validate-card-baseline.js

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "baldart",
-  "version": "4.41.0",
+  "version": "4.42.0",
   "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
   "bin": {
     "baldart": "./bin/baldart.js"