baldart 4.47.0 → 4.48.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/VERSION +1 -1
- package/framework/.claude/skills/new2/SKILL.md +61 -10
- package/package.json +1 -1
package/CHANGELOG.md
CHANGED
|
@@ -5,6 +5,14 @@ All notable changes to BALDART will be documented in this file.
|
|
|
5
5
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
6
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
7
|
|
|
8
|
+
## [4.48.0] - 2026-06-16
|
|
9
|
+
|
|
10
|
+
**`new2` is relaxed: a post-batch, interactive-only escape hatch hands genuinely-blocked hard cases to `/new`'s real human gate — without re-introducing rigidity, a twin, or breaking the A/B.** `new2` runs the batch autonomously, so a card the deterministic policy can't salvage becomes a tracked follow-up + is left `IN_PROGRESS` — the "edge case that wants human intelligence" the user flagged. A 3-lens **adversarial review before implementation** killed the obvious fix (a Step-5 `AskUserQuestion` that lets the skill implement/resolve the residual): (1) **correctness** — the workflow auto-merges and removes the worktree *before* Step 5, so a skill-side fix has no worktree, lands unreviewed code on trunk, and bypasses the F-029 DONE gate; (2) **value** — on the only run with a full `deferral_breakdown` (14 residuals) a post-batch fix changes the outcome of *zero* of them (the batch already merged); only a mid-batch checkpoint could salvage-before-merge, and the data (n=1) doesn't justify breaking `new2`'s background/no-poll contract; (3) **prior-art** — it would twin `/new`'s Phase 2.5b AC-Closure gate and muddy the autonomous-vs-`/new` A/B. What survived all three: the sound escalation is **not to re-implement a gate but to invoke `/new` on the already-materialised follow-up** — `/new` owns the real worktree+review+AC-Closure+F-029+merge pipeline. **MINOR** (additive skill behaviour, interactive-only, autonomous-mode-safe; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed surface).
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
|
|
14
|
+
- **`framework/.claude/skills/new2/SKILL.md`** — new **Step 3b "Escape-hatch escalation"** in the skill's post-batch reconciliation: in INTERACTIVE mode only (skipped when `BALDART_AUTONOMOUS`/`CI`/`GITHUB_ACTIONS` is set), after follow-ups are materialised on disk (offline-safe ordering preserved — the offer is additive over an already-safe ledger) and any `degraded` resume has converged, it presents **one batched `AskUserQuestion`** offering to run `/new` on the **code-actionable** hard-case follow-ups (`deferralClass ∈ {unresolved, out-of-ownership, scope-expansion}`; `owner-gated`/`not-a-code-defect`/`policy-deferred-ac` excluded — `/new` can't perform infra steps). "Sì" invokes `/new <followup-id …>` via the Skill tool (which closes them through its own gates — the skill never marks DONE itself, never re-implements the gate). The **ZERO-ASK CONTRACT** banner is rewritten to scope it precisely to *the workflow during the batch* (the skill may interact pre-launch AND post-batch, interactive-only). New `escape_hatch` telemetry field. Honest limitation documented: post-batch, so it gives the human gate on the follow-up but does NOT salvage a card before its merge (that would need a mid-batch checkpoint — out of scope by design).
|
|
15
|
+
|
|
8
16
|
## [4.47.0] - 2026-06-16
|
|
9
17
|
|
|
10
18
|
**`/new`'s orchestrator context economy is re-aimed at its real driver — turn count — and the user-visible Progress Bar + native Task spine are removed.** Telemetry of two real 8-card batches (FEAT-0028/0029 on a consumer) showed the orchestrator paying ~285M `cache_read`: 613 turns each replaying a ~490k-token accumulated context (growing toward ~800k), so total cost ≈ **turn count × accumulated context**. A 3-lens **adversarial review before implementation** refuted the obvious diagnoses: the static prefix is only ~77k (not the ~225k first assumed — context is ~86% *accumulated*, not static); narration prose is only ~7% of the fuel; and the existing § "Context economy" (bulk-content-inline) rule targets a channel that totals only ~119k cumulatively. The measurement reviewer surfaced the actual missed lever — **0 of 274 tool turns batched any calls, and ~55% of turns carried no tool call at all** — and the correctness + prior-art reviewers established that delegating bookkeeping out of the orchestrator is a previously-trodden trap (v4.15.0 reverted a Write-from-memory tracker flush; the tracker is the recovery SSOT; `card_status: DONE` needs orchestrator-side disk re-read; the weak-subagent fabrication precedent applies). What survived: (1) a new turn-economy HARD RULE (batch independent tool calls; no narration-only turns; never poll/wait), and (2) since the Progress Bar + Task spine are pure *mirrors* of the internal tracker (recovery reads the tracker, never the spine), removing them is correctness-safe and eliminates ~45 dedicated visibility turns (~8% of a batch's `cache_read`, guaranteed, not batching-dependent). **MINOR** (skill behaviour change; **no new `baldart.config.yml` key** ⇒ schema-change propagation rule N/A; no removed agent/command/skill/routine).
|
package/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
4.
|
|
1
|
+
4.48.0
|
|
@@ -5,9 +5,11 @@ description: >
|
|
|
5
5
|
EXPERIMENTAL workflow-hosted variant of /new (A/B testing). Implements one or
|
|
6
6
|
more backlog cards end-to-end by delegating the WHOLE batch to a background
|
|
7
7
|
dynamic workflow — so subagent output never enters the main orchestrator
|
|
8
|
-
context.
|
|
9
|
-
a deterministic policy + a self-healing resolution pass
|
|
10
|
-
|
|
8
|
+
context. The batch runs autonomously (zero AskUserQuestion during the run): every
|
|
9
|
+
/new gate is replaced by a deterministic policy + a self-healing resolution pass;
|
|
10
|
+
in interactive mode an optional post-batch escape hatch can hand the hard-case
|
|
11
|
+
follow-ups to /new for the real human gate. Claude-only (needs the Workflow tool).
|
|
12
|
+
Usage: /new2 CARD-IDS (same arg grammar as /new). Triggers on:
|
|
11
13
|
/new2, "implementa le card con workflow", "new2".
|
|
12
14
|
---
|
|
13
15
|
|
|
@@ -17,12 +19,18 @@ description: >
|
|
|
17
19
|
> default, and the recovery-safe path. Do NOT route to `new2` unless the user
|
|
18
20
|
> explicitly asks for it.
|
|
19
21
|
|
|
20
|
-
> **ZERO-ASK CONTRACT
|
|
21
|
-
>
|
|
22
|
-
> gate is replaced by a deterministic
|
|
23
|
-
> fail → self-healing `new2-resolve`, or — last
|
|
24
|
-
> tracked follow-up card).
|
|
25
|
-
>
|
|
22
|
+
> **ZERO-ASK CONTRACT — scoped to the *batch*, not the skill.** A dynamic workflow
|
|
23
|
+
> cannot prompt the user mid-run, so the **workflow runs the entire batch autonomously**:
|
|
24
|
+
> every `/new` `AskUserQuestion` gate *inside the batch* is replaced by a deterministic
|
|
25
|
+
> policy (auto-resolve seamless defaults, or fail → self-healing `new2-resolve`, or — last
|
|
26
|
+
> resort — auto-materialise a tracked follow-up card). The **skill main loop** (which CAN
|
|
27
|
+
> prompt) may interact at exactly two boundaries that are NOT mid-batch: **pre-launch**
|
|
28
|
+
> (Step 2 card-ID question, Step 3.5 migration gate) and **post-batch** (Step 3b
|
|
29
|
+
> escape-hatch escalation + Step 5 reconciliation) — both are interactive-only and skipped
|
|
30
|
+
> in autonomous mode (`BALDART_AUTONOMOUS`/`CI`/`GITHUB_ACTIONS`). The zero-ask invariant is
|
|
31
|
+
> about the **workflow during the batch**, which stays untouched. Destructive/outward ops
|
|
32
|
+
> (`reset --hard`, force-push, stash drop) are NEVER auto-run; they degrade to "leave intact
|
|
33
|
+
> + report".
|
|
26
34
|
|
|
27
35
|
## Project Context
|
|
28
36
|
|
|
@@ -226,10 +234,49 @@ returns when the batch is done. It returns:
|
|
|
226
234
|
per-card **skip-completed** guard makes the resume idempotent — already-committed
|
|
227
235
|
cards are skipped, only the incomplete/blocked ones run. Repeat until `degraded`
|
|
228
236
|
is false (or the same cards stall twice → surface to the user).
|
|
237
|
+
3b. **Escape-hatch escalation for the hard cases (INTERACTIVE mode only — the `new2`
|
|
238
|
+
"relaxation").** `new2` is autonomous *during the batch* — but a genuinely-blocked
|
|
239
|
+
card (the workflow rolled it back / left it `IN_PROGRESS`, DoD not met) is exactly the
|
|
240
|
+
"edge case that wants human intelligence" the deterministic policy cannot supply. The
|
|
241
|
+
sound way to give that intelligence is NOT to re-implement a gate here (that would twin
|
|
242
|
+
`/new`'s Phase 2.5b and bypass review/F-029) — it is to hand the card's **already-tracked
|
|
243
|
+
follow-up** to `/new`, which owns the real per-card pipeline (worktree + review + the
|
|
244
|
+
interactive AC-Closure gate + F-029 + gated merge). Ordering is load-bearing: this runs
|
|
245
|
+
**after** step 1 materialised every follow-up on disk and step 3's resume converged, so
|
|
246
|
+
the offer is purely additive over an already-safe ledger — declining (or a closed
|
|
247
|
+
terminal) never drops a residual.
|
|
248
|
+
- **Skip this step entirely in AUTONOMOUS mode** (env `BALDART_AUTONOMOUS` / `CI` /
|
|
249
|
+
`GITHUB_ACTIONS` set, or no TTY) — leave the cards `IN_PROGRESS` + their follow-ups,
|
|
250
|
+
exactly as before. The escape hatch is interactive-only.
|
|
251
|
+
- **Eligible set** = the follow-ups whose residual `deferralClass` is **code-actionable**:
|
|
252
|
+
`unresolved`, `out-of-ownership`, `scope-expansion`. EXCLUDE `owner-gated` /
|
|
253
|
+
`not-a-code-defect` / `policy-deferred-ac` (external infra steps — `/new` cannot perform
|
|
254
|
+
a DB deploy / secret / DNS action, so escalating them is noise; they stay tracked
|
|
255
|
+
follow-ups). If the eligible set is empty → skip silently.
|
|
256
|
+
- In interactive mode, present **ONE batched `AskUserQuestion`** (never one-per-residual —
|
|
257
|
+
that would re-introduce the ~25-question profile `new2` exists to remove): *"N card sono
|
|
258
|
+
rimaste IN_PROGRESS / con residui code-actionable (DoD non soddisfatta) — i follow-up
|
|
259
|
+
sono già tracciati su disco. Vuoi che lanci `/new` su quei follow-up adesso, per chiuderli
|
|
260
|
+
col gate umano completo?"* Options: **[Sì — lancia `/new` sui follow-up]** / **[No —
|
|
261
|
+
lasciali tracciati]**.
|
|
262
|
+
- **Sì** → invoke `/new <followup-id …>` via the **Skill tool**, passing the materialised
|
|
263
|
+
follow-up card IDs. `/new` runs its full pipeline on the current trunk; do NOT
|
|
264
|
+
re-implement any of it here and do NOT mark anything DONE yourself — `/new` closes each
|
|
265
|
+
follow-up through its own gates. (This is post-batch follow-up work at the skill layer —
|
|
266
|
+
the same class as the Step 3.5 / Step 5 skill interactions; the autonomous workflow has
|
|
267
|
+
already returned, so the zero-ask-**during-batch** invariant is untouched.)
|
|
268
|
+
- **No** → leave as-is (prior behaviour).
|
|
269
|
+
- **Honest limitation (do not over-sell):** this is post-batch — it gives the human the real
|
|
270
|
+
gate on the *follow-up*, but it does NOT salvage a card *before* its merge (the workflow
|
|
271
|
+
already merged the committed cards). Pre-merge salvage would require a mid-batch checkpoint
|
|
272
|
+
(out of scope by design — the workflow is autonomous).
|
|
273
|
+
- Record `escape_hatch: { eligible: N, offered: <bool>, ran_new: <bool>, followups: [...] }`
|
|
274
|
+
in telemetry (step 5 below) so the A/B stays honest about when the hatch was used.
|
|
229
275
|
4. **Present.** Print `report` verbatim. Surface `residuals` prominently
|
|
230
276
|
("questi residui sono tracciati come follow-up: …") — the post-run review that
|
|
231
277
|
replaced the ~25 mid-run questions. If `degraded`, say so plainly (the run was
|
|
232
|
-
incomplete and resumed).
|
|
278
|
+
incomplete and resumed). If the escape hatch ran `/new` (step 3b), fold its outcome
|
|
279
|
+
into the presentation (which follow-ups were closed by `/new`).
|
|
233
280
|
5. **Record truthful telemetry — reconciled against disk (F-040).** Before appending `telemetry`
|
|
234
281
|
to `${metricsDir}/skill-runs.jsonl`, fill the fields the workflow could not compute and
|
|
235
282
|
**reconcile the report against the real disk state** (agent `reason` strings can over-claim — a
|
|
@@ -264,6 +311,10 @@ returns when the batch is done. It returns:
|
|
|
264
311
|
already satisfied (work the skill used to suppress by hand; a persistently high value signals
|
|
265
312
|
deferrals resolving too late — order the dependent card earlier), and `owner_gated_deduped` > 0
|
|
266
313
|
means N defers were collapsed to one external action.
|
|
314
|
+
Also record `escape_hatch: { eligible, offered, ran_new, followups }` (Step 3b) — it keeps the
|
|
315
|
+
A/B honest about when the post-batch human escalation was used and whether the user chose to run
|
|
316
|
+
`/new` on the hard-case follow-ups (vs leaving them tracked). In autonomous mode it is
|
|
317
|
+
`{ eligible:N, offered:false, ran_new:false }`.
|
|
267
318
|
Do NOT re-summarise the cards — the workflow already did.
|
|
268
319
|
6. **Process hygiene — reap orphaned Codex MCP servers (NON-BLOCKING).** The batch's per-card Codex
|
|
269
320
|
finder calls drive `codex app-server`, whose broker spawns the `~/.codex/config.toml` MCP servers
|