baldart 4.42.0 → 4.43.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -5,6 +5,29 @@ All notable changes to BALDART will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [4.43.1] - 2026-06-15
9
+
10
+ **The npm publish workflow is now race-safe: pushing several `v*.*.*` tags close together can no longer leave `latest` on a lower version.** Publishing v4.41.0/4.42.0/4.43.0 together exposed the bug — the three tag-push workflow runs executed in **parallel**, and `npm publish` (with no explicit dist-tag) points `latest` at whichever version finishes **last chronologically**, not the highest. Real outcome: `latest` landed on **4.42.0**, so `npx baldart` installed a non-latest version (fixed by hand with `npm dist-tag add baldart@4.43.0 latest`). This release closes the race two ways: (1) a workflow-level **`concurrency` group** (`group: publish-npm`, `cancel-in-progress: false`) serializes all publish runs so they never race on the dist-tag and a publish is never cancelled; (2) a new final step **reconciles `latest` to the highest published version** — it computes the max stable semver across `npm view baldart versions` (unioned with the just-published `VERSION` to survive registry propagation lag, numeric per-segment compare so `4.10.0 > 4.9.0`) and repoints `latest` only when it has drifted. Because the runs are serialized, the last run always leaves `latest` on the maximum regardless of push order. The existing "Verify tag matches VERSION" guard is untouched. **PATCH** (CI/release-machinery bugfix, no change to installed surface or `baldart.config.yml` ⇒ schema-change propagation rule N/A).
11
+
12
+ ### Fixed
13
+
14
+ - **`.github/workflows/publish-npm.yml` — race-safe `latest` dist-tag.** Added a `concurrency` group to serialize parallel tag-push publish runs, and a `Reconcile 'latest' dist-tag to highest published version` step that repoints `latest` at `max(npm view baldart versions ∪ VERSION)` via a self-contained numeric semver comparator (no `semver` dependency), only when `latest` drifts. Fixes `latest` landing on a lower version when multiple tags are pushed close together.
15
+
16
+ ### Changed
17
+
18
+ - **`MAINTAINING.md` — release protocol note** that pushing multiple tags at once is now safe (since v4.43.1) and why.
19
+
20
+ ## [4.43.0] - 2026-06-15
21
+
22
+ **The classic `/new` team-mode director stops bloating its own context by legitimately running the review cluster inline — the delegation gate is hardened to the same no-discretion enforcement the sequential path already carries.** A real run analysis (the same FEAT-0028/0029 investigation that produced v4.42.0) found the dominant orchestrator-context driver in team mode is NOT the review token volume (that already runs out-of-context in the `new-card-review` workflow) but the **director's turn count**: the team-mode delegation gate `D.1.6` was labelled "opt-in, additive" with a soft `IF available → delegate`, while the sequential path (`review-cycle.md` Phase 2.5x) carries a strict **MECHANICAL · NO discretion · MUST delegate · GATE VIOLATION if inline** clause. That asymmetry let a director legitimately run the chattiest sub-steps (the per-card Codex loop + per-card Simplify fan-out) inline, re-reading a growing prefix every turn. This release ports the strong clause verbatim into `D.1.6`, plus two pure-distillation fixes (batch the D.1.5 per-card diffs to `/tmp` and read only the compact summary; the same diffs-to-disk discipline on the sequential path for symmetry). Also lands the one **safe** survivor of the deferred worktree-prose cleanup: the `/nw` code-mode now **auto-appends** `.worktrees/` to `.gitignore` (idempotent + WARN) instead of leaving a programmatic `/new` run to create a git-tracked worktree. An adversarial review refuted the rest of that cleanup and a separate review-gate proposal (per-card AC-conformance gate + a second relevance-gating layer) as net-negative/regressive — both deferred. **MINOR** (FIX A changes which path runs by default in team mode — an observable behavior change; no removed surface, no new `baldart.config.yml` key ⇒ schema-change propagation rule N/A).
23
+
24
+ ### Changed
25
+
26
+ - **`framework/.claude/skills/new/references/team-mode.md` — `D.1.6` delegation gate hardened** (`v4.34.0 → enforcement hardened`). The "opt-in, additive" framing is replaced with the verbatim no-discretion clause from `review-cycle.md` Phase 2.5x ("MECHANICAL, BINARY decision — you have NO discretion … the factors that feel like reasons to run inline are NOT inputs to this gate"), plus the twin telemetry log `D.1.6: GATE VIOLATION — ran inline despite Workflow available + new-card-review.js linked + non-trivial group`. Forces the cluster out of the director's context whenever the `Workflow` tool + `new-card-review.js` are present and the group is non-trivial.
27
+ - **`framework/.claude/skills/new/references/team-mode.md` — D.1.5 diff batching (context economy).** The per-card effective-profile computation is now an explicit ONE-bash pass that writes each card's diff to `/tmp/diff-<CARD-ID>.txt` and derives `source|non-source` + doc-touch + the Step-A trigger count there; only the compact per-card summary + `--name-only` lists enter the director's context, never the raw diffs.
28
+ - **`framework/.claude/skills/new/references/review-cycle.md` — sequential-path symmetry.** Phase 2.5x input-building writes the card's diff to `/tmp/diff-<CARD-ID>.txt` once and derives both the Step-A grep and `scopeFiles` from it (same diffs-to-disk discipline as team-mode, so the two paths stay symmetric).
29
+ - **`framework/.claude/skills/worktree-manager/SKILL.md` — `/nw` code-mode `.gitignore` auto-heal.** When `.worktrees/` is not ignored, the code-mode path now auto-appends it (idempotent) and WARNs, instead of either silently creating a git-tracked worktree or hard-aborting (`exit 1`) on a non-attended `/new` run. (nw-docs keeps its interactive `exit 1` — that path is human-driven.)
30
+
8
31
  ## [4.42.0] - 2026-06-15
9
32
 
10
33
  **The `/new` worktree-setup can no longer pass on a fabricated baseline: the orchestrator now verifies the worktree on disk instead of trusting the subagent's self-report, and the setup subagent moves off `haiku`.** A real `/new` run reproduced **2/2** a silent failure — the background **worktree-setup** subagent (on `haiku`) returned a well-formed block reporting `baseline: pass` in ~6s **with no worktree on disk**: it pattern-matched the expected output instead of running the multi-step `/nw` skill, and the orchestrator trusted it because the baseline gate only ever checked the *returned* field, never the disk. This release closes both halves: (1) the worktree-setup subagent moves **`haiku → sonnet`** (running `/nw` via the Skill tool is a sustained tool-execution chain a too-weak model fabricates rather than executes); (2) a new **worktree integrity gate** (`setup.md §6a`) verifies the worktree with **orchestrator Bash** — `git worktree list --porcelain` + `test -d` + branch + `node_modules` — which a subagent cannot fabricate, and routes any failure to a **non-circular fallback chain** (`subagent → inline /nw → HALT`, no loop, with a build `timeout`). `new2`'s pre-flight gets the parallel mitigation (it cannot run Bash, so it returns non-falsifiable evidence the workflow string-matches — explicitly declared structurally weaker). Separately, the card-baseline validator is now **dependency-free**: it parsed cards with `js-yaml`, absent from the framework payload, so `require('js-yaml')` failed in consumers and silently disabled card-baseline validation (1b-iii) — replaced with a node-core `parseCardYaml`. **MINOR** (additive integrity gate + a behavior change to the worktree subagent model + a dependency-removing bugfix; no removed surface, the `/nw` `{path,branch,port}` contract is unchanged, and no new `baldart.config.yml` key ⇒ schema-change propagation rule N/A).
package/VERSION CHANGED
@@ -1 +1 @@
1
- 4.42.0
1
+ 4.43.1
@@ -37,7 +37,7 @@ so it surfaces in telemetry.
37
37
  `skip`/`light` ⇒ `qaTier:"light"` (qa-sentinel deferred to Final), `deep` **or any Phase 3.7 Step-A
38
38
  high-risk trigger on this card's diff** ⇒ `qaTier:"full"`). Compute the **Step-A detector**
39
39
  (`references/codex-gate.md` Step A) once here — it also tells the workflow's Codex pass the depth.
40
- - `scopeFiles` ← this card's committed diff (`git diff --name-only "$TRUNK...HEAD"`, fallback `HEAD~1..HEAD`).
40
+ - `scopeFiles` ← this card's committed diff (`git diff --name-only "$TRUNK...HEAD"`, fallback `HEAD~1..HEAD`). **Diffs to disk (context economy, v4.43.0):** write the card's full diff to `/tmp/diff-<CARD-ID>.txt` once and derive BOTH the Step-A grep (above) and this `--name-only` list from it — never read the raw diff inline (mirrors the team-mode D.1.5 batching discipline, so the two paths stay symmetric).
41
41
  - `editableFiles` ← this card's **File Ownership Map** entries (the coder's write scope; `setup.md` step 3b).
42
42
  - `archBaselinePath` ← `/tmp/arch-baseline-<CARD-ID>.md` (persisted at `implement.md` step 5b).
43
43
  - `hasSecurityFiles` ← any `scopeFiles` path matches `paths.high_risk_modules`.
@@ -162,12 +162,23 @@ After ALL agents in the group complete successfully:
162
162
  - **Sub-classify `TRIVIAL_CARDS` ⊆ `LIGHT_CARDS`** = cards that are `IS_TRIVIAL` on the committed diff (§ "Trivial-card fast-lane": `review_profile == skip` AND 0 Step-A triggers AND **non-source diff**). These are the LIGHT cards that have nothing for `code-reviewer` to review. (A `skip` card whose diff DID touch a source file is in `LIGHT_CARDS` but NOT `TRIVIAL_CARDS` — the guard keeps it on the code-review path.)
163
163
  - **Sub-classify `DOC_DEFER_CARDS`** (since v4.7.0, for #2 doc deferral) = cards with `review_profile == light` whose committed diff touches **NO documentation file** (no `.md`, no path under `${paths.references_dir}`, no data-model/ssot/api doc). Their per-card doc-review is deferred to the Final F.3 doc-reviewer. (Trivial cards, and any card whose diff touches a doc file, are NOT in this set — doc-review stays relevant for them.)
164
164
  - Log: `## D.1.5 Effective Profiles\n<CARD-ID>: profile=<floor> triggers=<n> diff=<source|non-source> → effective=<light|full> (<LIGHT_CARDS|FULL_CARDS>)<, TRIVIAL / DOC_DEFER if applicable>` per card. This single computation is the SSOT for D.2 (doc-reviewer scoping), D.3b/D.3c (already skipped for trivial), and D.4b (inclusion + per-card Codex profile: `light` cards → `/codexreview` `light`, `full` cards → `full`) — do NOT recompute it downstream.
165
+ - **Batching (context economy, v4.43.0):** compute the whole group in ONE bash pass, not N inline round-trips — write each card's committed diff to `/tmp/diff-<CARD-ID>.txt`, and in the SAME script derive per-card `source|non-source`, doc-file touch, and the Step-A trigger count (grep over the `/tmp` file). Read only the compact per-card summary (the `## D.1.5 Effective Profiles` log above) + the `--name-only` lists into context — **never the raw diffs inline** (§ "Context economy" — diffs to disk; same discipline D.3b already uses for its scope diff).
165
166
 
166
- 1.6. **D.1.6 — Review-cluster workflow delegation gate (v4.34.0 opt-in, additive)** — The group's
167
+ 1.6. **D.1.6 — Review-cluster workflow delegation gate (v4.34.0; enforcement hardened v4.43.0)** — The group's
167
168
  code-review cluster — **D.3b Simplify + D.4 QA + D.4b Codex** (per-card discovery) **and their fix
168
169
  application** — runs ONCE per wave OUTSIDE this orchestrator's context when delegated to a dynamic
169
170
  workflow. This is the single biggest context-economy win in team mode (the D.4b per-card Codex loop +
170
- the D.3b per-card Simplify fan-out are the chattiest sub-steps).
171
+ the D.3b per-card Simplify fan-out are the chattiest sub-steps — running them inline is precisely what
172
+ bloats the director's context window across the wave).
173
+ **This is a MECHANICAL, BINARY decision — you have NO discretion** (same enforcement as the sequential
174
+ path, `references/review-cycle.md` § "Phase 2.5x — Branch"): if the delegation conditions below hold you
175
+ **MUST** delegate. The factors that feel like reasons to "just run it inline" — a single-card wave, a
176
+ small/well-scoped diff, "the context-economy benefit is marginal here", "a workflow once degenerated on
177
+ this project" (that concerns the `new2` whole-batch host — a *different* workflow), "inline is safer / I
178
+ have full control" — are **NOT inputs to this gate** and **MUST NOT** override it. The inline D.2→D.6
179
+ fallback exists **only** for installs where the `Workflow` tool is genuinely absent (the ELSE branch
180
+ below), never as a judgment-call alternative. If you run inline while the delegation conditions were met,
181
+ that is a **gate violation**: log `D.1.6: GATE VIOLATION — ran inline despite Workflow available + new-card-review.js linked + non-trivial group` so it surfaces in telemetry.
171
182
 
172
183
  - **IF** the `Workflow` tool is available **AND** `.claude/workflows/new-card-review.js` is present →
173
184
  **delegate the cluster for the whole group in ONE call.** First run **D.3a (AC-Closure Gate)** for
@@ -423,6 +423,16 @@ Supports three modes:
423
423
  # persist both onto the registry entry created in step 6 (R6). Every later step
424
424
  # reads them from the registry with a presence guard — never from in-context state.
425
425
  MAIN="$(git rev-parse --show-toplevel)"
426
+ # .gitignore safety (auto-heal, NON-blocking — code-mode differs from nw-docs, which
427
+ # hard-aborts). Without `.worktrees/` ignored the worktree is git-tracked and `git status`
428
+ # on the main repo explodes. The programmatic /new path is NON-attended, so we MUST NOT
429
+ # `exit 1` here (it would break a batch that today completes dirty-but-successful); instead
430
+ # auto-append the line (idempotent, lossless — owning a one-line .gitignore edit is
431
+ # framework-legitimate) and WARN. The grep guard means we append at most once.
432
+ if [ ! -f "$MAIN/.gitignore" ] || ! grep -qE '^\.worktrees/?$' "$MAIN/.gitignore"; then
433
+ printf '\n.worktrees/\n' >> "$MAIN/.gitignore"
434
+ echo "WARN: appended '.worktrees/' to $MAIN/.gitignore (was missing — the worktree would otherwise be git-tracked)." >&2
435
+ fi
426
436
  # Prefer git.trunk_branch; autodetect the repo's real default branch if the
427
437
  # config key is absent (consumer updated to >=4.0.0 without re-running
428
438
  # `configure`). Same fallback as nw-docs — never hard-fail on a resolvable repo.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "baldart",
3
- "version": "4.42.0",
3
+ "version": "4.43.1",
4
4
  "description": "Claude Agent Framework - Reusable framework for coordinating AI agents and humans in software projects",
5
5
  "bin": {
6
6
  "baldart": "./bin/baldart.js"