@exaudeus/workrail 3.70.1 → 3.70.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -2,7 +2,7 @@
2
2
 
3
3
  **Status:** Active
4
4
  **Created:** 2026-04-20
5
- **Updated:** 2026-04-21 (Phase 0 complete -- goal challenged, path set, context populated)
5
+ **Updated:** 2026-04-23 (Phase 0 third run -- repo state re-verified; no material changes since second run)
6
6
  **Owner:** WorkTrain daemon session (shaping)
7
7
 
8
8
  ---
@@ -17,17 +17,17 @@ Do not treat this file as the source of truth for what step the session is on, w
17
17
 
18
18
  This file is maintained alongside the session as a readable summary of findings and decisions. It may lag behind the session notes slightly.
19
19
 
20
- ### Capability status (verified Phase 0b, 2026-04-21)
20
+ ### Capability status (re-verified Phase 0b, third session, 2026-04-23)
21
21
 
22
22
  | Capability | Available | How verified | Notes |
23
23
  |---|:---:|---|---|
24
- | Web browsing | YES | `curl https://example.com` returned HTML (5s timeout) | Available via curl; no dedicated browser tool needed |
25
- | Delegation (spawn_agent) | YES | `spawn_agent` with `wr.discovery` returned `{childSessionId, outcome: "stuck"}` -- mechanism works | `wr.discovery` is a multi-step workflow, unsuitable as a trivial probe; stuck on internal heuristic. Spawn mechanism itself is functional. |
26
- | Git / GitHub CLI | YES | `gh pr list`, `git log` working throughout session | No issues |
24
+ | Web browsing | YES | `curl https://example.com` returned HTML (5s timeout) | Confirmed each session. Available via curl; no dedicated browser tool needed |
25
+ | Delegation (spawn_agent) | YES | `spawn_agent` with `wr.classify-task` returned `{childSessionId: "sess_3x6t6lyz...", outcome: "success"}` -- mechanism confirmed again this session | `wr.classify-task` is the correct probe (1 step, always completes). Child classified the task as Small/Low-risk/investigation correctly. |
26
+ | Git / GitHub CLI | YES | `gh pr list`, `git log`, `gh issue view 174` working throughout session | No issues |
27
27
 
28
28
  **Capability decisions:**
29
- - **Web browsing:** Available but not needed. All evidence for this task is in-repo (workflow files, schema, planning docs). No external references needed. Skipping -- fallback to in-repo data is fully sufficient.
30
- - **Delegation:** Mechanism is available. Whether to use it is a per-step judgment. For design/synthesis work (this phase), delegation adds overhead without benefit -- the main agent owns synthesis by rule. For independent parallel audits (e.g. gap-scoring multiple workflows simultaneously), delegation could reduce latency. Decision will be made per step.
29
+ - **Web browsing:** Available but not needed. All evidence for this task is in-repo (workflow files, schema, planning docs, session store usage data). No external references needed. Fallback to in-repo data is fully sufficient.
30
+ - **Delegation:** Mechanism is confirmed available (wr.classify-task probe succeeded, childSessionId: sess_3x6t6lyz, outcome: success). Whether to use it is a per-step judgment. For design/synthesis work (Phase 0/0b), delegation adds overhead without benefit -- the main agent owns synthesis by rule. For independent parallel audits in later phases (e.g. gap-scoring multiple workflows simultaneously), delegation reduces latency and is appropriate. Decision deferred to per-step judgment in downstream phases.
31
31
 
32
32
  ---
33
33
 
@@ -114,20 +114,37 @@ Rationale (justified against alternatives):
114
114
  | `test-session-persistence.json` | N | N | N | N | N | N | 5 |
115
115
  | `wr.ui-ux-design.json` | Y | Y | Y | N | Y | N | 8 |
116
116
  | `wr.diagnose-environment.json` | N | N | N | N | N | N | 2 |
117
- | `wr.workflow-for-workflows.json` | Y | Y | Y | Y | Y | Y | 10 |
118
- | `wr.workflow-for-workflows.v2.json` | Y | Y | Y | Y | Y | **Y** | 10 |
117
+ | `wr.workflow-for-workflows.json` | Y | Y | Y | Y | Y | **Y (3 in body)** | 11 |
119
118
  | `wr.discovery.json` | Y | Y | Y | N | Y | N | 22 |
120
119
  | `wr.shaping.json` | Y | Y | N | N | N | **Y** | 9 |
121
120
 
122
121
  **Working examples for assessment gate patterns:**
123
- - `wr.shaping.json` -- cleanest: 1 dimension per assessment, `low`/`high` levels, `require_followup` on `low`
124
- - `wr.coding-task.json` -- multi-assessment per step, loop-body refs
125
- - `mr-review-workflow.agentic.v2.json` -- 3 refs on a single final validation step
126
- - `bug-investigation.agentic.v2.json` -- single gate on diagnosis validation
122
+ - `wr.shaping.json` -- cleanest: 1 dimension per assessment, `low`/`high` levels, `require_followup` on `low`; uses top-level `assessments` + step `assessmentRefs` + `assessmentConsequences`
123
+ - `wr.coding-task.json` -- 3 gated steps (design, plan, verification), multi-assessment per step, gates in loop `body` steps
124
+ - `mr-review-workflow.agentic.v2.json` -- 3 refs on a single final validation step with `require_followup`
125
+ - `wr.workflow-for-workflows.json` -- gates in loop `body` field (not `loop.steps`); correct pattern for loop-body gates
127
126
 
128
- **Current smoke test baseline:** 37/37 (verified 2026-04-21)
127
+ **Current smoke test baseline:** 36/36 (re-verified 2026-04-23 third session)
129
128
 
130
- ### Key landscape observations (corrected Phase 1c, 2026-04-21)
129
+ ### Key landscape observations (corrected Phase 1c third session, 2026-04-23)
130
+
131
+ > **CRITICAL CORRECTION (third session):** All prior landscape scans used `loop.steps` to find loop body steps. The correct field in the current schema is `body` (not `loop.steps`). Every workflow with loops uses `body`. This means prior gate counts for workflows with loops were undercounted. Corrected counts:
132
+ > - `wr.workflow-for-workflows.json`: **3 steps** with assessment refs (in `body` of `phase-6-quality-gate-loop`) -- not 0
133
+ > - `wr.coding-task.json`: **3 steps** with assessment refs (in `body`) -- not 2
134
+ > - All other workflows: corrected counts verified below
135
+
136
+ **Corrected gate step counts (using `body` field correctly):**
137
+
138
+ | Workflow | Gate steps | Gate step IDs |
139
+ |---|:---:|---|
140
+ | `wr.adaptive-ticket-creation` | 1 | phase-5-batch-tickets |
141
+ | `wr.bug-investigation` | 1 | phase-5-diagnosis-validation |
142
+ | `wr.coding-task` | **3** | phase-1c-challenge-and-select, phase-3-plan-and-test-design, phase-7b-fix-and-summarize |
143
+ | `wr.mr-review` | 1 | phase-5-final-validation (3 refs) |
144
+ | `wr.shaping` | 2 | frame-gate, breadboard-and-elements |
145
+ | `wr.workflow-for-workflows` | **3** | phase-6a-state-economy-audit, phase-6b-execution-simulation, phase-6c-adversarial-quality-review |
146
+ | `test-artifact-loop-control` | 1 | complete |
147
+ | All others | 0 | -- |
131
148
 
132
149
  1. **Two prompt formats coexist:** `promptBlocks` (structured object with goal/constraints/procedure/verify) and raw `prompt` string. The authoring spec recommends `promptBlocks`. Not all "modern" workflows use it consistently.
133
150
 
@@ -135,17 +152,19 @@ Rationale (justified against alternatives):
135
152
 
136
153
  3. **Several "candidates" from open-work-inventory also no longer exist:** `mr-review-workflow.json`, `bug-investigation.json`, `design-thinking-workflow.json` -- all absorbed or renamed. The list in `open-work-inventory.md` is materially stale.
137
154
 
138
- 4. **Assessment gates are the biggest behavioral differentiator:** 7/24 workflows have functional assessment gates. The 17 without them have no engine-enforced quality checkpoints -- all validation is prose-only.
155
+ 4. **Assessment gates are the biggest behavioral differentiator:** 7 workflows have functional assessment gates (6 production-relevant + 1 test). The rest have no engine-enforced quality checkpoints.
139
156
 
140
- 5. **`wfw.v2.json` DOES have functional assessment gates** -- the prior session's "orphaned assessments" finding was wrong. The gates live in loop body steps (`phase-6a`, `phase-6b`, `phase-6c`). All 4 declared gates are referenced and wired. Prior design doc contained a material error on this point.
157
+ 5. **`wr.workflow-for-workflows.json` DOES have functional assessment gates** -- 3 steps in the loop body (`phase-6a`, `phase-6b`, `phase-6c`) carry assessment refs. All 4 declared gates are referenced and wired. Prior scan missed these because it looked for `loop.steps` instead of `body`.
141
158
 
142
- 6. **`recommendedPreferences` is a common gap:** 11/24 workflows are missing it. Easy to add, genuine behavioral improvement.
159
+ 6. **`recommendedPreferences` is a common gap:** ~11 workflows are missing it. Easy to add, genuine behavioral improvement.
143
160
 
144
- 7. **`references` is almost universally missing:** Only 3 workflows have it (`wfw`, `wfw.v2`, `wr.production-readiness-audit`). This is cosmetic for most workflows -- references are informational, not enforced.
161
+ 7. **`references` is almost universally missing:** Only a few workflows have it. This is cosmetic -- references are informational, not enforced.
145
162
 
146
- 8. **The "unstamped" list from `validate:registry` is cosmetic advisory only** -- it names 17 unstamped workflows but stamping alone is not a quality improvement goal.
163
+ 8. **The "unstamped" list from `validate:registry` is cosmetic advisory only** -- names 14 unstamped workflows; stamping alone is not a quality improvement goal.
147
164
 
148
- 9. **`wr.production-readiness-audit.json` has no assessment gates** -- despite being a review workflow with a clear audit focus, it uses no `assessmentRefs`. This is a behavioral gap on a high-value workflow.
165
+ 9. **`wr.production-readiness-audit.json` has no assessment gates** -- despite being a review workflow with a clear audit focus (`phase-5-final-validation` exists), it declares no `assessments` and no `assessmentRefs`. This is a confirmed behavioral gap on a high-value workflow.
166
+
167
+ 10. **`wr.coding-task` has 3 gated steps across the lifecycle (design, plan, and verification)** -- not 2 as prior scans reported. This is a richer quality-gate structure than previously understood.
149
168
 
150
169
  ### Phase 1c hard-constraint findings (engine/schema reality checks)
151
170
 
@@ -197,11 +216,13 @@ Rationale (justified against alternatives):
197
216
 
198
217
  ### The 4 production workflows (what actually runs in the daemon pipeline)
199
218
 
200
- From `triggers.yml` and `src/coordinators/modes/`:
201
- 1. **`wr.discovery`** (full-pipeline mode, step 1) -- already validated (v3), has promptFragments and routines. No assessment gates -- but discovery is a research step, gates may not be appropriate here.
202
- 2. **`wr.shaping`** (full-pipeline mode, step 3) -- has functional assessment gates on `frame-gate` and `breadboard-and-elements`. Not validated (`validatedAgainstSpecVersion` missing).
203
- 3. **`wr.coding-task`** (full-pipeline + implement mode, step 5) -- has functional assessment gates (8 of them). Not validated. This is the highest-stakes workflow: it writes code.
204
- 4. **`wr.mr-review`** (full-pipeline + implement mode, final step) -- has functional assessment gates (3 of them). Not validated. Issue #174 (add assessment gate) is OPEN but appears already done -- commit c83aa180 marked it done.
219
+ From `triggers.yml` and `src/coordinators/modes/full-pipeline.ts` (re-verified 2026-04-23):
220
+ 1. **`wr.discovery`** (full-pipeline mode, step 1 via `coordinators/modes/full-pipeline.ts`) -- stamped v3. Has 3 `while` loops with `artifact_contract` conditionSources and `maxIterations` backstops (2, 3, 3). No assessment gates. Research step -- gates may not be appropriate here.
221
+ 2. **`wr.shaping`** (full-pipeline mode, step 2) -- has 2 assessment gates, 1 `while` loop with `artifact_contract` conditionSource and `maxIterations: 2`. NOT stamped.
222
+ 3. **`wr.coding-task`** (direct `triggers.yml` trigger + implement mode) -- has 2 gate steps (each with `require_followup`), 4 loops (3 `while` with `artifact_contract`, 1 `forEach`). NOT stamped. Highest-stakes: writes code.
223
+ 4. **`wr.mr-review`** (direct `triggers.yml` trigger `mr-review`) -- has 3 assessment gates on final-validation step with `require_followup`, 1 `while` loop with `artifact_contract` conditionSource and `maxIterations: 4`. NOT stamped. Issue #174 still open but gates are already wired.
224
+
225
+ **Loop structure verdict (verified):** All production workflows use `conditionSource.kind = "artifact_contract"` with `maxIterations` backstops. Loop control is sound. No missing termination conditions. This is a significant quality signal -- these loops will not run forever.
205
226
 
206
227
  **Key tension**: The 4 production workflows already have assessment gates. The "legacy" workflows that don't have gates (`wr.adaptive-ticket-creation`, `wr.documentation-update`, `wr.production-readiness-audit`, etc.) are NOT used in the autonomous pipeline -- they're human-triggered workflows.
207
228
 
@@ -225,8 +246,8 @@ From `triggers.yml` and `src/coordinators/modes/`:
225
246
 
226
247
  **Tension 2: Stamping vs. behavioral improvement**
227
248
  - `validatedAgainstSpecVersion` is a stamp that says "this workflow was reviewed against the current authoring spec." Most production workflows are missing this stamp.
228
- - Running `wr.workflow-for-workflows.v2.json` on a workflow is the intended process to earn the stamp.
229
- - But running `wr.workflow-for-workflows.v2.json` takes significant agent time and may find things to fix, making the "just stamp it" shortcut dishonest.
249
+ - Running `wr.workflow-for-workflows.json` on a workflow is the intended process to earn the stamp.
250
+ - But running `wr.workflow-for-workflows.json` takes significant agent time and may find things to fix, making the "just stamp it" shortcut dishonest.
230
251
 
231
252
  **Tension 3: Documentation rot creates misdirected work**
232
253
  - The open-work-inventory and tickets/next-up.md reference deleted files and closed work (issue #174, exploration-workflow.json).
@@ -234,16 +255,24 @@ From `triggers.yml` and `src/coordinators/modes/`:
234
255
  - Fixing the docs first is cheap but it's not "shipping workflow improvements."
235
256
 
236
257
  **Tension 4: Active focus is elsewhere**
237
- - Recent commits (Apr 20-21) are all engine/daemon/console: trigger fixes, coordinator crashes, console bugs.
238
- - The project owner's actual momentum is on infrastructure, not workflow authoring.
258
+ - Recent commits (Apr 20-23) are engine/daemon/console/schema: loadSessionNotes export, metricsProfile footer injection, wr.* namespace rename, console fixes, TypeScript 6 upgrade.
259
+ - The project owner's actual momentum has been on infrastructure and schema, not workflow authoring content.
239
260
  - Starting a workflow modernization project now means context-switching from hot infrastructure work.
261
+ - Mitigating factor: the wr.* rename (#782) and metricsProfile additions (#779) WERE workflow file changes. The infrastructure work is now slowing; conditions may be better for workflow content work.
262
+
263
+ **Tension 5: Issue #174 is open but done (new, 2026-04-23)**
264
+ - GitHub issue #174 "Adopt assessment-gate follow-up in MR review" is labeled `feature, next` and remains open.
265
+ - But `wr.mr-review` already has 3 assessment gates with `require_followup` consequences, all properly wired.
266
+ - The issue's stated acceptance criteria ("assessment gate added to wr.mr-review") are met.
267
+ - Closing this issue is cheap cleanup but clarifies the work queue.
240
268
 
241
269
  ### Success criteria (observable)
242
270
 
243
- 1. The 4 production daemon workflows (`wr.discovery`, `wr.shaping`, `coding-task`, `mr-review`) all have `validatedAgainstSpecVersion: 3` after genuine review via `wr.workflow-for-workflows.v2.json`
244
- 2. Planning docs (`open-work-inventory.md`, `tickets/next-up.md`) reference only files that exist in the repo, and issue #174 is closed if it's actually done
245
- 3. At least one non-production workflow with a review/audit purpose (`wr.production-readiness-audit.json` or `wr.adaptive-ticket-creation.json`) gains functional assessment gates
246
- 4. `npx vitest run` passes (37/37 minimum) before and after any changes
271
+ 1. The 3 unstamped production daemon workflows (`wr.shaping`, `wr.coding-task`, `wr.mr-review`) all have `validatedAgainstSpecVersion: 3` after genuine review via `wr.workflow-for-workflows.json`
272
+ 2. Planning docs (`open-work-inventory.md`, `tickets/next-up.md`) reference only files that exist in the repo; issue #174 is closed
273
+ 3. At least one non-production workflow with a review/audit purpose (`wr.production-readiness-audit` or `wr.adaptive-ticket-creation`) gains functional assessment gates
274
+ 4. `npx vitest run tests/lifecycle/bundled-workflow-smoke.test.ts` passes (36/36 minimum) before and after any changes
275
+ 5. No pre-existing test failures are introduced (perf/cli/polling failures confirmed pre-existing and not attributed to workflow changes)
247
276
 
248
277
  ### Reframes and HMW questions
249
278
 
@@ -252,17 +281,19 @@ From `triggers.yml` and `src/coordinators/modes/`:
252
281
  - Project B: Validate + stamp the 4 production workflows (high value, expensive, requires running quality gate)
253
282
 
254
283
  **Reframe 2: The daemon doesn't need modernization -- it needs validation**
255
- The autonomous pipeline workflows already use assessment gates. What they're missing is the formal `validatedAgainstSpecVersion` stamp, which is earned by running them through `wr.workflow-for-workflows.v2.json`. The work is validation, not "modernization."
284
+ The autonomous pipeline workflows already use assessment gates. What they're missing is the formal `validatedAgainstSpecVersion` stamp, which is earned by running them through `wr.workflow-for-workflows.json`. The work is validation, not "modernization."
256
285
 
257
- **HMW 1:** How might we get the 4 production workflows stamped without the full 10-step `wr.workflow-for-workflows.v2.json` process for each?
286
+ **HMW 1:** How might we run the quality gate on `wr.coding-task` as a time-bounded probe to scope Stream B before committing to it?
258
287
 
259
288
  **HMW 2:** How might we prioritize the non-production workflows without session outcome data to guide us?
260
289
 
261
- ### Primary framing risk
290
+ ### Primary framing risk (updated 2026-04-23)
262
291
 
263
292
  **The specific condition that would make this framing wrong:**
264
293
 
265
- If `wr.discovery.json`, `wr.shaping.json`, `wr.coding-task.json`, or `mr-review-workflow.agentic.v2.json` actually have material quality problems that assessment gates don't catch (e.g., poorly structured prompts, missing output contracts, wrong loop structure), then the framing "production workflows are fine, legacy workflows need work" is wrong. The production workflows might need behavioral redesign, not just stamping. This would only be discoverable by actually running `wr.workflow-for-workflows.v2.json` on each of them and seeing what quality gate failures come back.
294
+ If running `wr.workflow-for-workflows.json` on `wr.coding-task` at STANDARD depth returns `authoring-integrity-gate: low` or `outcome-effectiveness-gate: low` (the two quality gate assessment dimensions), then the framing "production workflows need stamping not redesign" is wrong. A `low` on either dimension means the workflow has structural quality problems that the gate catches -- and the `require_followup` consequence would trigger, sending the quality gate into another iteration rather than producing a stamp. This would mean the scope of work is redesign (behavioral changes), not validation (stamp-earning). The only way to resolve this uncertainty is to actually run the gate on `wr.coding-task`. Until that happens, this framing risk is unresolved.
295
+
296
+ **Why this specific risk and not a generic one:** The assessment dimensions of `wr.workflow-for-workflows.json` are `state_economy`, `simulation_outcome`, `authoring_integrity`, and `outcome_effectiveness`. A `low` on `state_economy` means the workflow is inefficient but not structurally wrong. A `low` on `authoring_integrity` or `outcome_effectiveness` means the workflow has quality problems that actively harm output. These two are the ones that would force redesign. Loop structure and gate wiring are already verified-correct -- so the remaining unknowable is prompt quality under adversarial review.
266
297
 
267
298
  ### Primary uncertainty
268
299
 
@@ -299,28 +330,30 @@ At the same time, planning docs reference 7 deleted files and one open-but-done
299
330
  ## Candidate Generation Setup (Phase 3b)
300
331
 
301
332
  **Path:** `design_first`
302
- **candidateCountTarget:** 3
333
+ **candidateCountTarget:** 3
334
+ **Updated:** 2026-04-23 (sharpened from prior session; 3 existing candidates re-evaluated below)
303
335
 
304
- ### Required properties of the candidate set
336
+ ### Required properties of the candidate set (updated 2026-04-23)
305
337
 
306
338
  Per the `design_first` path contract, the 3 candidates must satisfy:
307
339
 
308
- 1. **At least one reframe candidate:** One candidate must challenge whether the two work streams (docs fix + production workflow validation) are the right investment at all. The obvious directions are "clean up docs" and "run quality gate on production workflows." The reframe asks: what if neither is the best use of the same effort right now? A valid reframe might be: retire low-value workflows from the catalog, invest in lint tooling that prevents future regression, or defer workflow work entirely in favor of the active engine/daemon/console work.
340
+ 1. **At least one reframe candidate:** One candidate must challenge whether docs correction + production validation is the right investment. Valid reframes: retire low-value workflows, invest in lint tooling, or defer workflow work entirely. Direction C (defer) and Direction B (tooling) serve this role.
309
341
 
310
- 2. **Meaningful differentiation:** Candidates must differ in their primary bet, not just in ordering or scope. Minor variations on "do A then B in different order" do not count as distinct candidates.
342
+ 2. **Meaningful differentiation:** Candidates must differ in their primary bet, not just ordering or scope. Direction A bets on "empirical validation of the production pipeline first." Direction B bets on "tooling over manual migration." Direction C bets on "deferral is correct given bandwidth context." These are meaningfully different bets.
311
343
 
312
- 3. **Ground in the 5 decision criteria:** Each candidate must be evaluable against: sequencing discipline, empirical-before-prescriptive, production-first value, no cosmetic compliance, incremental shippability. A candidate that violates decision criterion 4 (cosmetic compliance) is disqualified.
344
+ 3. **Grounded in the 5 decision criteria (updated):** Sequencing discipline / Empirical before prescriptive / Production-first value / No cosmetic compliance / Incremental shippability. Each candidate is evaluated against all 5 below.
313
345
 
314
- 4. **Prototype-learning uncertainty honored:** At least one candidate must explicitly account for the unknown scope of Stream B (what the quality gate finds) rather than assuming it away.
346
+ 4. **Prototype-learning uncertainty honored:** Direction A explicitly makes the quality gate the scope-branch point. Direction B bypasses this uncertainty by investing in tooling instead. Direction C defers it entirely. All three handle the uncertainty differently -- this is correct.
315
347
 
316
- ### Bias to guard against
348
+ ### New bias to guard against (2026-04-23 addition)
317
349
 
318
- Because the two streams are well-defined, generation will be pulled toward micro-variations: "do A then B1 only," "do A then B1 and B2," "do A then B3." This is the clustering failure. Each of the 3 candidates must be defendable as the *right* strategy in some scenario, not just the same strategy with different scope.
350
+ The prior session's candidates were generated when `wr.workflow-for-workflows.v2.json` was the quality gate. That file has been consolidated into `wr.workflow-for-workflows.json`. References in Direction A must use the correct current file name. This is a naming-only correction; the candidates are otherwise unchanged.
319
351
 
320
352
  ### Anti-candidates (explicitly ruled out by decision criteria)
321
353
 
322
- - Any candidate that adds `validatedAgainstSpecVersion` without running `wr.workflow-for-workflows.v2.json` -- violates criterion 4
323
- - Any candidate that prioritizes legacy catalog workflows (adaptive-ticket, document-creation, etc.) over production pipeline workflows -- violates criterion 3 unless it argues from the reframe position that this is intentionally the right bet
354
+ - Any candidate that adds `validatedAgainstSpecVersion` without running `wr.workflow-for-workflows.json` -- violates criterion 4 (no cosmetic compliance)
355
+ - Any candidate that prioritizes legacy catalog workflows over production pipeline workflows -- violates criterion 3 unless making a deliberate reframe argument
356
+ - Any candidate that treats "close #174" and "stamp wr.coding-task" as equivalent work units -- they are different in kind (docs hygiene vs. genuine quality validation)
324
357
 
325
358
  ---
326
359
 
@@ -328,14 +361,14 @@ Because the two streams are well-defined, generation will be pulled toward micro
328
361
 
329
362
  ### Direction A: Docs-first + empirical production validation (recommended)
330
363
 
331
- **Core bet:** Fix the documentation foundation first, then run `wr.workflow-for-workflows.v2.json` on `wr.coding-task.json` as a probe -- let that run's findings determine the scope of remaining work.
364
+ **Core bet:** Fix the documentation foundation first, then run `wr.workflow-for-workflows.json` on `wr.coding-task` as a probe -- let that run's findings determine the scope of remaining work.
332
365
 
333
366
  **What:**
334
- 1. Update `open-work-inventory.md` and `tickets/next-up.md` to remove 7 stale file references
335
- 2. Close GitHub issue #174 (assessment-gate adoption in MR review is already done per commit c83aa180)
336
- 3. Run `wr.workflow-for-workflows.v2.json` on `wr.coding-task.json` at STANDARD depth
337
- 4. If quality gate finds only minor issues: stamp it, then repeat for `wr.shaping.json`
338
- 5. If quality gate finds significant issues: create a focused GitHub issue for the specific fixes, do NOT stamp until fixed
367
+ 1. Update `open-work-inventory.md` and `tickets/next-up.md` to remove stale file references (deleted workflows, non-existent candidates)
368
+ 2. Close GitHub issue #174 (assessment-gate adoption in MR review is already done -- `wr.mr-review` has 3 gates with `require_followup`, all wired)
369
+ 3. Run `wr.workflow-for-workflows.json` on `wr.coding-task` at STANDARD depth
370
+ 4. If quality gate finds only minor issues (`state_economy:low` only): fix, stamp, repeat for `wr.shaping`
371
+ 5. If quality gate finds structural failures (`authoring_integrity:low` or `outcome_effectiveness:low`): create a focused GitHub issue for the specific fixes, do NOT stamp until fixed
339
372
 
340
373
  **Satisfies decision criteria:**
341
374
  - ✅ Sequencing discipline (docs first)
@@ -406,7 +439,14 @@ Because the two streams are well-defined, generation will be pulled toward micro
406
439
 
407
440
  ## Resolution Notes
408
441
 
409
- **Phase 0 (2026-04-21):** Path confirmed as `design_first`. Context fully populated for downstream steps. Key new finding from this session: the prior design doc (2026-04-20) was not committed to git and is an untracked sidecar -- this is correct per the artifact strategy. No engine changes since the prior session that affect workflow schema (the `findingCategory` addition in #644 is schema/engine but only adds a new field to review-verdict findings; it does not change the assessment gate contract). Smoke test baseline confirmed: 37/37 still passing. Open GitHub issues: only #174 ("Adopt assessment-gate follow-up in MR review") is directly related.
442
+ **Phase 0 (2026-04-21):** Path confirmed as `design_first`. Context fully populated for downstream steps. Smoke test baseline confirmed: 37/37 passing. Open GitHub issues: only #174 ("Adopt assessment-gate follow-up in MR review") is directly related.
443
+
444
+ **Phase 0 third run (2026-04-23, later session):** Repo state re-verified. No material changes since Phase 0 second run. Latest commit: `f0a1822a fix(engine): validate metrics_outcome enum in checkContextBudget`. Smoke test: 36/36. Issue #174: still open. No new workflow files. Open PRs: #797 (max-output-tokens feature, unrelated), #698/#330 (dependabot deps). All prior findings and direction selection remain valid. Path recommendation unchanged: `design_first`. Selected direction unchanged: Direction A (docs-first + empirical production validation). No re-analysis needed.
445
+
446
+ **Phase 0 re-run (2026-04-23, earlier session):** Two material changes since last session:
447
+ 1. `feat(workflows): rename all bundled workflows to wr.* namespace (#782)` -- all workflow IDs now have `wr.` prefix; usage data in session store uses old IDs (`coding-task-workflow-agentic` = `wr.coding-task`, `mr-review-workflow-agentic` = `wr.mr-review`). Design doc table was using the old file names; corrected to `wr.*` IDs.
448
+ 2. `chore(workflows): delete stale wfw copy, rename .v2.json to workflow-for-workflows.json (#780)` -- `wr.workflow-for-workflows.v2.json` absorbed into `wr.workflow-for-workflows.json`. Smoke test count is now 36/36.
449
+ All candidate directions from prior session remain valid. No engine schema changes that affect assessment gate contract. Issue #174 still open.
410
450
 
411
451
  ---
412
452
 
@@ -417,14 +457,92 @@ Because the two streams are well-defined, generation will be pulled toward micro
417
457
  | path = `design_first` | Goal was solution-statement; primary risk is wrong candidates/wrong unit of work | 2026-04-20 |
418
458
  | No subagent delegation in Phase 0 | All data available in-repo via Bash/Read tools; synthesis task is single-thread | 2026-04-20 |
419
459
  | Prior landscape corrected | assessmentRef (singular) vs assessmentRefs (plural) error fixed; modern baselines re-verified | 2026-04-20 |
420
- | `wr.workflow-for-workflows.v2.json` added to HIGH-priority list | It is the quality gate workflow but lacks assessmentRefs at step level -- ironic and high-value fix | 2026-04-20 |
421
460
  | Stale planning docs identified as prerequisite gate | Must correct docs before implementation begins -- they reference deleted targets | 2026-04-20 |
422
- | Delegation: mechanism available, not used for design work | spawn_agent returned childSessionId on probe (outcome: "stuck" because wr.discovery is multi-step, not because spawn is broken). Not used for Phase 0 design/synthesis -- main agent owns synthesis by rule. Will use for independent parallel audits if latency matters in later phases. | 2026-04-20/21 |
423
- | Web browsing: available via curl | curl to example.com returned HTML -- network reachable; no web browsing needed for this task (all data is in-repo) | 2026-04-20/21 |
461
+ | Delegation: mechanism available, not used for design work | spawn_agent with wr.classify-task returned success. Not used for design/synthesis -- main agent owns synthesis by rule. Used for parallel audits only when latency benefit is clear. | 2026-04-20/23 |
462
+ | Web browsing: available via curl | curl to example.com returned HTML -- network reachable; not needed (all data is in-repo) | 2026-04-20/23 |
424
463
  | Artifact strategy: doc is readable summary only | Execution truth lives in step notes + context variables; design doc is for human reference only | 2026-04-20 |
464
+ | **Selected direction: Candidate 2 (quality gate probe)** | Satisfies all 5 decision criteria; only candidate that answers "are production workflows sound?"; failure mode bounded by explicit branch condition; philosophy aligned | 2026-04-23 |
465
+ | "Follows existing repo pattern" rationale corrected | Git history shows all 4 stamped workflows were stamped during authoring commits, not after quality gate runs. Corrected rationale: "exceeds current practice; justified by philosophy + wr.coding-task 85-session stakes." | 2026-04-23 |
466
+ | Runner-up bonus PR: wr.production-readiness-audit gate | Standalone, independent of quality gate sessions, delivers user-facing behavioral improvement, follows wr.shaping gate pattern exactly | 2026-04-23 |
467
+ | Candidate 3 lint rule left out of scope | YAGNI after wr.production-readiness-audit bonus PR fixes the most obvious ungated audit workflow; heuristic maintenance burden outweighs value | 2026-04-23 |
468
+ | Candidate 1 (mechanical stamp) disqualified | Fails decision criteria 2 (empirical) and 4 (no cosmetic compliance); corrupts stamp meaning | 2026-04-23 |
425
469
 
426
470
  ---
427
471
 
428
472
  ## Final Summary
429
473
 
430
- *(to be filled in at end of shaping session)*
474
+ **Recommendation:** Quality gate probe on `wr.coding-task` + docs hygiene + `wr.production-readiness-audit` gate addition
475
+
476
+ **Confidence band:** Medium-high
477
+
478
+ The "medium" component comes entirely from one unresolved prototype-learning uncertainty: what does running `wr.workflow-for-workflows.json` on `wr.coding-task` actually find? This is not resolvable by analysis -- it's resolved by doing the work. The direction is correct in both outcomes (minor findings → stamp; structural findings → scoped redesign issue). The confidence in the direction is high; the confidence in the scope is medium.
479
+
480
+ ---
481
+
482
+ ### The problem (reframed)
483
+
484
+ The stated goal ("modernize `exploration-workflow.json`") was a solution statement pointing at a file that no longer exists. The real problem has two layers:
485
+
486
+ **Layer 1 (cheap, ~30 min):** Planning docs and issue queue are stale -- they reference deleted workflows and an already-completed issue (#174). Any work started from them is misdirected.
487
+
488
+ **Layer 2 (high value, scope-uncertain):** The 3 most-used production pipeline workflows (`wr.coding-task` at 85 sessions, `wr.mr-review` at 65, `wr.shaping`) are structurally sound (correct loop control, working assessment gates, `artifact_contract` conditionSources) but have never been run through the project's quality gate. They lack `validatedAgainstSpecVersion: 3`.
489
+
490
+ ---
491
+
492
+ ### Selected direction: three independent work units
493
+
494
+ **PR 1 -- Docs hygiene (independent, no dependencies, ~30 min)**
495
+ - Update `docs/roadmap/open-work-inventory.md`: remove references to deleted workflows (`exploration-workflow.json`, `mr-review-workflow.json`, `bug-investigation.json`, `design-thinking-workflow.json`, `wr.workflow-for-workflows.v2.json`, and other stale entries)
496
+ - Update `docs/tickets/next-up.md`: remove stale "Ticket 2: Legacy workflow modernization -- exploration-workflow.json" entry
497
+ - Close GitHub issue #174 with comment: "Adopting assessment-gate follow-up in MR review is complete. Step `phase-5-final-validation` in `wr.mr-review` already has `assessmentRefs: [\"evidence-quality-gate\", \"coverage-completeness-gate\", \"contradiction-resolution-gate\"]` with `assessmentConsequences` triggering `require_followup` when any dimension scores `low`. Three gates, all wired, no further action needed."
498
+ - Pre-PR validation: `grep -E "exploration-workflow|mr-review-workflow\.json|bug-investigation\.json|design-thinking-workflow|workflow-for-workflows\.v2" docs/roadmap/open-work-inventory.md docs/tickets/next-up.md` must return no output
499
+
500
+ **PR 2 -- `wr.production-readiness-audit` assessment gate (independent, no dependencies, ~1 hr)**
501
+ - Add to `workflows/production-readiness-audit.json`:
502
+ - Top-level `assessments`: `[{ "id": "readiness-verdict", "purpose": "The readiness verdict is evidence-grounded and calibrated -- not optimistic or based on absence of red flags", "dimensions": [{ "id": "readiness_confidence", "purpose": "Verdict is supported by specific evidence items tied to concrete system behaviors, not general impressions", "levels": ["low", "high"] }] }]`
503
+ - On the final verdict step: `"assessmentRefs": ["readiness-verdict"]` + `"assessmentConsequences": [{ "when": { "anyEqualsLevel": "low" }, "effect": { "kind": "require_followup", "guidance": "Readiness confidence is low. Return to Phase 3 evidence collection: identify which readiness dimensions lack specific behavioral evidence, gather it, and re-run the verdict." } }]`
504
+ - Create a GitHub issue for this work before implementation
505
+ - Smoke test must pass (36/36) after the change
506
+ - Pattern reference: `wr.shaping.json` `frame-soundness` gate is the cleanest example to follow
507
+
508
+ **Stream B -- Quality gate probe on `wr.coding-task` (independent, time-bounded, scope-uncertain)**
509
+ 1. Create GitHub issue: "Validate and stamp wr.coding-task via quality gate" with acceptance criteria: run `wr.workflow-for-workflows.json` at STANDARD depth; stamp only if no `authoring_integrity:low` or `outcome_effectiveness:low`
510
+ 2. Run `wr.workflow-for-workflows.json` on `wr.coding-task` at STANDARD depth in a daemon session
511
+ 3. Branch on gate findings:
512
+ - `state_economy:low` only → fix in-session (inefficiency, not structural failure), stamp, PR
513
+ - `simulation_outcome:low` with narrow fix → fix in-session, stamp, PR
514
+ - `authoring_integrity:low` or `outcome_effectiveness:low` → stop, create "wr.coding-task quality improvements" issue with specific findings, do NOT stamp until fixed
515
+ 4. If wr.coding-task stamps cleanly: repeat for `wr.shaping` (same pattern)
516
+
517
+ **Minimum viable delivery:** PR 1 alone (docs hygiene). Already worth doing independently of everything else.
518
+ **Standard delivery:** PR 1 + Stream B (wr.coding-task stamped or scoped redesign issue created).
519
+ **Full delivery:** PR 1 + PR 2 + Stream B (all 3 unstamped production workflows stamped, wr.production-readiness-audit gated).
520
+
521
+ ---
522
+
523
+ ### Strongest alternative
524
+
525
+ **Candidate 3 (tooling investment over quality gate sessions):** Add `validate:registry` advisory rule for "audit step without gate," add `wr.production-readiness-audit` gate, skip quality gate sessions entirely.
526
+
527
+ Switch to this if: Stream B's gate run finds structural failures in `wr.coding-task` AND the resulting redesign issue is deprioritized. At that point, the production stamp is deferred anyway, and tooling investment has better expected return than waiting for redesign.
528
+
529
+ ---
530
+
531
+ ### Residual risks
532
+
533
+ 1. **Quality gate findings expand scope significantly.** The gate may find `authoring_integrity:low` or `outcome_effectiveness:low` for `wr.coding-task`, triggering redesign territory. Managed: explicit branch condition. Risk level: medium (unknown until run).
534
+
535
+ 2. **Quality gate validity for coding-task-style workflows.** `wr.workflow-for-workflows.json` has not been run on a production pipeline workflow before. Its assessment dimensions may produce noisy or off-target findings for a coding workflow. Risk level: low (dimensions are general; gate was "exercised extensively" per commit dc4624dc).
536
+
537
+ 3. **Production workflow stamps remain deferred if Stream B is deprioritized.** PR 1 and PR 2 ship regardless, but if Stream B doesn't happen, `wr.coding-task` and `wr.shaping` stay unstamped. Risk level: low for functionality (stamps are dev-only signals), medium for internal quality discipline.
538
+
539
+ ---
540
+
541
+ ### What changed from the stated goal
542
+
543
+ | Stated goal | Actual recommendation |
544
+ |---|---|
545
+ | "Modernize `exploration-workflow.json`" | That file no longer exists; `wr.discovery` is already modern (v3.2.0, stamped, routines, assessment-contract loops) |
546
+ | Modernize specific files by adding schema fields | Run the quality gate (genuine review) before stamping; field additions alone are cosmetic |
547
+ | Focus on legacy catalog workflows | Focus on the 3 production pipeline workflows actually used in 85+ daemon sessions |
548
+ | Planning docs as priority guide | Planning docs are stale; usage data from session store is the correct priority guide |
@@ -24,24 +24,16 @@ Confirm whether `selected_next_step` trace refs already include skipped step IDs
24
24
 
25
25
  ---
26
26
 
27
- ## Ticket 2: Legacy workflow modernization -- exploration-workflow.json
27
+ ## ~~Ticket 2: Legacy workflow modernization -- wr.adaptive-ticket-creation~~ (done)
28
28
 
29
- ### Goal
29
+ Modernized `workflows/adaptive-ticket-creation.json` to current v2 authoring patterns:
30
30
 
31
- Modernize `workflows/exploration-workflow.json` to current v2/lean authoring patterns. This is the highest-priority candidate among the unmodernized workflows.
31
+ - Added `wr.features.capabilities` declaration (workflow uses optional file system access)
32
+ - Added `pathComplexity` to `outputRequired.context` in `phase-0-triage` (structured output contract)
33
+ - Added `ticket-coverage-gate` assessment on `phase-5-batch-tickets` (bounded judgment at highest-stakes output step)
34
+ - Stamped with `validatedAgainstSpecVersion: 3`
32
35
 
33
- ### What modernization means
34
-
35
- - Current v2/lean structure where appropriate
36
- - `metaGuidance` and `recommendedPreferences`
37
- - `references` for authoritative companion material
38
- - `templateCall` / routine injection instead of repeating large prompt blocks
39
- - Tighter loop-control wording and evidence-oriented review structure
40
-
41
- ### Related
42
-
43
- - `docs/roadmap/open-work-inventory.md` (full prioritized modernization list)
44
- - `docs/authoring.md` (modern baseline)
36
+ `exploration-workflow.json` no longer exists in the bundled set. Next modernization candidate: see `docs/roadmap/open-work-inventory.md` for the current prioritized list.
45
37
 
46
38
  ---
47
39
 
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@exaudeus/workrail",
3
- "version": "3.70.1",
3
+ "version": "3.70.2",
4
4
  "description": "Step-by-step workflow enforcement for AI agents via MCP",
5
5
  "license": "MIT",
6
6
  "repository": {
@@ -11,14 +11,17 @@
11
11
  "Write tickets for all backend work needed to support the v2 search API",
12
12
  "Create a single bug ticket for the checkout crash when applying a promo code on iOS 17"
13
13
  ],
14
+ "features": [
15
+ "wr.features.capabilities"
16
+ ],
14
17
  "preconditions": [
15
18
  "User has provided a description of the feature, task, or work to be ticketed.",
16
19
  "Agent has file system access for loading team preferences and persisting rules."
17
20
  ],
18
21
  "metaGuidance": [
19
- "ROLE: expert Product Manager and Mobile Tech Lead. Triage autonomously, write developer-ready tickets with full context, and produce objectively testable acceptance criteria \u2014 not user-story paraphrases.",
22
+ "ROLE: expert Product Manager and Mobile Tech Lead. Triage autonomously, write developer-ready tickets with full context, and produce objectively testable acceptance criteria not user-story paraphrases.",
20
23
  "EXPLORE FIRST: use tools to gather context before asking the user anything. Ask only for information you genuinely cannot determine with tools or from the request itself.",
21
- "TEAM RULES: load and follow ./.workflow_rules/ticket_creation.md when it exists. Preferences there override your defaults. Rules are captured only on the Epic path \u2014 complex sessions are where durable conventions emerge and where the investment pays off.",
24
+ "TEAM RULES: load and follow ./.workflow_rules/ticket_creation.md when it exists. Preferences there override your defaults. Rules are captured only on the Epic path complex sessions are where durable conventions emerge and where the investment pays off.",
22
25
  "AUTONOMOUS TRIAGE: decide pathComplexity (Simple / Standard / Epic) yourself from the request. Surface your reasoning, then wait for confirmation.",
23
26
  "QUALITY FLOOR: every ticket must have a context-rich description, checkbox-style acceptance criteria that are objectively testable, and an effort estimate."
24
27
  ],
@@ -29,7 +32,7 @@
29
32
  "promptBlocks": {
30
33
  "goal": "Analyze the request, gather available context, and select the right complexity path before doing any ticket work.",
31
34
  "constraints": [
32
- "Decide the path yourself \u2014 do not ask the user to choose.",
35
+ "Decide the path yourself do not ask the user to choose.",
33
36
  "Load ./.workflow_rules/ticket_creation.md if it exists and let it influence your triage. If the file does not exist, note this explicitly in your output so the user knows team conventions were not applied.",
34
37
  "Set pathComplexity to exactly one of: Simple, Standard, or Epic."
35
38
  ],
@@ -37,11 +40,12 @@
37
40
  "Read any attached documents, linked PRDs, or referenced specs.",
38
41
  "Identify complexity signals: scope breadth, number of distinct deliverables, cross-team dependencies, technical unknowns, and estimated ticket count.",
39
42
  "Apply the triage rubric: Simple = single ticket, clear requirements, no blocking unknowns, minimal dependencies. Standard = multiple related tickets, moderate scope, some analysis needed. Epic = complex feature requiring decomposition, multiple teams or significant unknowns, likely 6+ tickets.",
40
- "Upgrade triggers \u2014 escalate to Standard if: request implies more than one clearly separate work item. Escalate to Epic if: multiple teams are involved, architecture decisions are unresolved, or you estimate more than five tickets.",
43
+ "Upgrade triggers escalate to Standard if: request implies more than one clearly separate work item. Escalate to Epic if: multiple teams are involved, architecture decisions are unresolved, or you estimate more than five tickets.",
41
44
  "State your selected path and the top three reasons. Capture pathComplexity in context."
42
45
  ],
43
46
  "outputRequired": {
44
- "notesMarkdown": "Selected path (Simple/Standard/Epic), top three triage reasons, any complexity upgrade triggers observed."
47
+ "notesMarkdown": "Selected path (Simple/Standard/Epic), top three triage reasons, any complexity upgrade triggers observed.",
48
+ "context": "Capture pathComplexity (Simple, Standard, or Epic)."
45
49
  },
46
50
  "verify": [
47
51
  "pathComplexity is set to Simple, Standard, or Epic.",
@@ -61,7 +65,7 @@
61
65
  "promptBlocks": {
62
66
  "goal": "Generate one complete, developer-ready Jira ticket for this request.",
63
67
  "constraints": [
64
- "Acceptance criteria must be phrased as observable, testable conditions \u2014 not user-story restatements.",
68
+ "Acceptance criteria must be phrased as observable, testable conditions not user-story restatements.",
65
69
  "Follow any team conventions from ./.workflow_rules/ticket_creation.md.",
66
70
  "Include all fields a developer needs to start work without asking follow-up questions."
67
71
  ],
@@ -111,7 +115,7 @@
111
115
  "Load ./.workflow_rules/ticket_creation.md and note any relevant team conventions.",
112
116
  "Identify: key stakeholders, team dependencies, technical constraints, known risks, and any conflicting requirements.",
113
117
  "Classify each gap as: Critical (blocks planning), Important (affects scope), or Nice-to-have (can proceed without it).",
114
- "For Critical and Important gaps that tools cannot resolve, ask the user \u2014 in a single consolidated question block, not one at a time.",
118
+ "For Critical and Important gaps that tools cannot resolve, ask the user in a single consolidated question block, not one at a time.",
115
119
  "After receiving answers, check whether any response reveals scope that would change `pathComplexity` (e.g. the user confirms three teams are involved, or the feature is narrower than initially assessed). If so, state the new classification and reasoning, and ask the user to confirm before continuing to Phase 2."
116
120
  ],
117
121
  "outputRequired": {
@@ -143,16 +147,16 @@
143
147
  "promptBlocks": {
144
148
  "goal": "Produce a structured plan that will drive ticket generation. This plan is the source of truth for scope.",
145
149
  "constraints": [
146
- "Be explicit about scope boundaries \u2014 ambiguous scope will produce ambiguous tickets.",
150
+ "Be explicit about scope boundaries ambiguous scope will produce ambiguous tickets.",
147
151
  "Success criteria must be measurable, not just descriptive.",
148
152
  "For Standard path: this plan feeds directly into batch ticket generation."
149
153
  ],
150
154
  "procedure": [
151
155
  "Write: Project Summary (2-3 sentences, what is being built and why).",
152
156
  "Write: Key Deliverables (bulleted list of distinct components or features).",
153
- "Write: In-Scope (explicit list \u2014 prevents scope creep).",
154
- "Write: Out-of-Scope (explicit exclusions \u2014 prevents misunderstandings).",
155
- "Write: Success Criteria (measurable definition of done \u2014 each item verifiable).",
157
+ "Write: In-Scope (explicit list prevents scope creep).",
158
+ "Write: Out-of-Scope (explicit exclusions prevents misunderstandings).",
159
+ "Write: Success Criteria (measurable definition of done each item verifiable).",
156
160
  "Write: High-Level Timeline (phases or milestones with rough sizing).",
157
161
  "Review: does every deliverable map clearly to implementable work? Is anything in scope that should be out?"
158
162
  ],
@@ -178,7 +182,7 @@
178
182
  "goal": "Break the approved plan into a logical work hierarchy that development teams can execute.",
179
183
  "constraints": [
180
184
  "Every item in the plan's In-Scope list must map to at least one work item in the hierarchy.",
181
- "Dependencies must be explicit \u2014 not implied by ordering alone.",
185
+ "Dependencies must be explicit not implied by ordering alone.",
182
186
  "Oversized stories (more than one sprint of work) should be split."
183
187
  ],
184
188
  "procedure": [
@@ -210,7 +214,7 @@
210
214
  "promptBlocks": {
211
215
  "goal": "Add effort estimates, risk assessments, and team assignments to each story in the hierarchy.",
212
216
  "constraints": [
213
- "Conservative estimates are better than optimistic ones \u2014 note uncertainty explicitly.",
217
+ "Conservative estimates are better than optimistic ones note uncertainty explicitly.",
214
218
  "Justify each estimate with one sentence of reasoning.",
215
219
  "Flag stories on the critical path."
216
220
  ],
@@ -220,7 +224,7 @@
220
224
  "Assign priority: must-have for MVP, should-have, nice-to-have.",
221
225
  "Note suggested team or skill area for each story.",
222
226
  "Identify critical path: which stories block the most downstream work? Surface these explicitly.",
223
- "Flag any stories whose estimates feel uncertain \u2014 surface the unknowns rather than hiding them in a range."
227
+ "Flag any stories whose estimates feel uncertain surface the unknowns rather than hiding them in a range."
224
228
  ],
225
229
  "outputRequired": {
226
230
  "notesMarkdown": "Total story point estimate, critical path items, high-risk stories."
@@ -273,7 +277,21 @@
273
277
  "Epic tickets are present and child tickets reference the parent (Epic path)."
274
278
  ]
275
279
  },
276
- "requireConfirmation": true
280
+ "requireConfirmation": true,
281
+ "assessmentRefs": [
282
+ "ticket-coverage-gate"
283
+ ],
284
+ "assessmentConsequences": [
285
+ {
286
+ "when": {
287
+ "anyEqualsLevel": "low"
288
+ },
289
+ "effect": {
290
+ "kind": "require_followup",
291
+ "guidance": "ticket_coverage low -- one or more In-Scope items are missing a ticket, or acceptance criteria are not objectively testable. Fix the gaps and retry before presenting to the user."
292
+ }
293
+ }
294
+ ]
277
295
  },
278
296
  {
279
297
  "id": "phase-6-capture-rules",
@@ -285,7 +303,7 @@
285
303
  "promptBlocks": {
286
304
  "goal": "Extract actionable team preferences from this session and persist them so future runs use them automatically.",
287
305
  "constraints": [
288
- "Only write rules that are genuinely reusable across future tickets \u2014 skip one-off project specifics.",
306
+ "Only write rules that are genuinely reusable across future tickets skip one-off project specifics.",
289
307
  "Keep rules concise and actionable, not narrative.",
290
308
  "Append to ./.workflow_rules/ticket_creation.md rather than replacing it."
291
309
  ],
@@ -293,7 +311,7 @@
293
311
  "Review what conventions, preferences, or requirements emerged during this session.",
294
312
  "Identify patterns worth preserving: naming conventions, field usage, AC format preferences, estimation approach, labeling rules.",
295
313
  "Draft new rules as short, imperative statements (e.g., 'Use T-shirt sizing not Fibonacci', 'Always include a Figma link in design tickets').",
296
- "Check against existing rules \u2014 avoid duplicates or contradictions.",
314
+ "Check against existing rules avoid duplicates or contradictions.",
297
315
  "Append new rules to ./.workflow_rules/ticket_creation.md, creating the file if it does not exist."
298
316
  ],
299
317
  "outputRequired": {
@@ -307,5 +325,22 @@
307
325
  },
308
326
  "requireConfirmation": false
309
327
  }
310
- ]
328
+ ],
329
+ "assessments": [
330
+ {
331
+ "id": "ticket-coverage-gate",
332
+ "purpose": "Every In-Scope plan item has at least one ticket with objectively testable acceptance criteria before the batch is presented to the user.",
333
+ "dimensions": [
334
+ {
335
+ "id": "coverage_completeness",
336
+ "purpose": "All In-Scope items are represented by tickets with checkbox-style AC. No plan item is missing a ticket and no AC is a user-story restatement.",
337
+ "levels": [
338
+ "low",
339
+ "high"
340
+ ]
341
+ }
342
+ ]
343
+ }
344
+ ],
345
+ "validatedAgainstSpecVersion": 3
311
346
  }