@exaudeus/workrail 3.70.1 → 3.70.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/console-ui/assets/{index-BcZJOyVG.js → index-Gmbzhc2B.js} +1 -1
- package/dist/console-ui/index.html +1 -1
- package/dist/daemon/daemon-events.d.ts +1 -1
- package/dist/daemon/workflow-runner.js +4 -2
- package/dist/manifest.json +15 -15
- package/dist/trigger/polling-scheduler.d.ts +2 -1
- package/dist/trigger/polling-scheduler.js +3 -2
- package/dist/v2/durable-core/domain/prompt-renderer.js +18 -8
- package/docs/discovery/design-review-findings.md +62 -65
- package/docs/ideas/backlog.md +222 -106
- package/docs/plans/workflow-modernization-design.md +177 -59
- package/docs/tickets/next-up.md +7 -15
- package/package.json +1 -1
- package/workflows/adaptive-ticket-creation.json +53 -18
- package/workflows/mr-review-workflow.agentic.v2.json +10 -4
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
|
|
3
3
|
**Status:** Active
|
|
4
4
|
**Created:** 2026-04-20
|
|
5
|
-
**Updated:** 2026-04-
|
|
5
|
+
**Updated:** 2026-04-23 (Phase 0 third run -- repo state re-verified; no material changes since second run)
|
|
6
6
|
**Owner:** WorkTrain daemon session (shaping)
|
|
7
7
|
|
|
8
8
|
---
|
|
@@ -17,17 +17,17 @@ Do not treat this file as the source of truth for what step the session is on, w
|
|
|
17
17
|
|
|
18
18
|
This file is maintained alongside the session as a readable summary of findings and decisions. It may lag behind the session notes slightly.
|
|
19
19
|
|
|
20
|
-
### Capability status (verified Phase 0b, 2026-04-
|
|
20
|
+
### Capability status (re-verified Phase 0b, third session, 2026-04-23)
|
|
21
21
|
|
|
22
22
|
| Capability | Available | How verified | Notes |
|
|
23
23
|
|---|:---:|---|---|
|
|
24
|
-
| Web browsing | YES | `curl https://example.com` returned HTML (5s timeout) | Available via curl; no dedicated browser tool needed |
|
|
25
|
-
| Delegation (spawn_agent) | YES | `spawn_agent` with `wr.
|
|
26
|
-
| Git / GitHub CLI | YES | `gh pr list`, `git log` working throughout session | No issues |
|
|
24
|
+
| Web browsing | YES | `curl https://example.com` returned HTML (5s timeout) | Confirmed each session. Available via curl; no dedicated browser tool needed |
|
|
25
|
+
| Delegation (spawn_agent) | YES | `spawn_agent` with `wr.classify-task` returned `{childSessionId: "sess_3x6t6lyz...", outcome: "success"}` -- mechanism confirmed again this session | `wr.classify-task` is the correct probe (1 step, always completes). Child classified the task as Small/Low-risk/investigation correctly. |
|
|
26
|
+
| Git / GitHub CLI | YES | `gh pr list`, `git log`, `gh issue view 174` working throughout session | No issues |
|
|
27
27
|
|
|
28
28
|
**Capability decisions:**
|
|
29
|
-
- **Web browsing:** Available but not needed. All evidence for this task is in-repo (workflow files, schema, planning docs). No external references needed.
|
|
30
|
-
- **Delegation:** Mechanism is available. Whether to use it is a per-step judgment. For design/synthesis work (
|
|
29
|
+
- **Web browsing:** Available but not needed. All evidence for this task is in-repo (workflow files, schema, planning docs, session store usage data). No external references needed. Fallback to in-repo data is fully sufficient.
|
|
30
|
+
- **Delegation:** Mechanism is confirmed available (wr.classify-task probe succeeded, childSessionId: sess_3x6t6lyz, outcome: success). Whether to use it is a per-step judgment. For design/synthesis work (Phase 0/0b), delegation adds overhead without benefit -- the main agent owns synthesis by rule. For independent parallel audits in later phases (e.g. gap-scoring multiple workflows simultaneously), delegation reduces latency and is appropriate. Decision deferred to per-step judgment in downstream phases.
|
|
31
31
|
|
|
32
32
|
---
|
|
33
33
|
|
|
@@ -114,20 +114,37 @@ Rationale (justified against alternatives):
|
|
|
114
114
|
| `test-session-persistence.json` | N | N | N | N | N | N | 5 |
|
|
115
115
|
| `wr.ui-ux-design.json` | Y | Y | Y | N | Y | N | 8 |
|
|
116
116
|
| `wr.diagnose-environment.json` | N | N | N | N | N | N | 2 |
|
|
117
|
-
| `wr.workflow-for-workflows.json` | Y | Y | Y | Y | Y | Y |
|
|
118
|
-
| `wr.workflow-for-workflows.v2.json` | Y | Y | Y | Y | Y | **Y** | 10 |
|
|
117
|
+
| `wr.workflow-for-workflows.json` | Y | Y | Y | Y | Y | **Y (3 in body)** | 11 |
|
|
119
118
|
| `wr.discovery.json` | Y | Y | Y | N | Y | N | 22 |
|
|
120
119
|
| `wr.shaping.json` | Y | Y | N | N | N | **Y** | 9 |
|
|
121
120
|
|
|
122
121
|
**Working examples for assessment gate patterns:**
|
|
123
|
-
- `wr.shaping.json` -- cleanest: 1 dimension per assessment, `low`/`high` levels, `require_followup` on `low`
|
|
124
|
-
- `wr.coding-task.json` -- multi-assessment per step, loop
|
|
125
|
-
- `mr-review-workflow.agentic.v2.json` -- 3 refs on a single final validation step
|
|
126
|
-
- `
|
|
122
|
+
- `wr.shaping.json` -- cleanest: 1 dimension per assessment, `low`/`high` levels, `require_followup` on `low`; uses top-level `assessments` + step `assessmentRefs` + `assessmentConsequences`
|
|
123
|
+
- `wr.coding-task.json` -- 3 gated steps (design, plan, verification), multi-assessment per step, gates in loop `body` steps
|
|
124
|
+
- `mr-review-workflow.agentic.v2.json` -- 3 refs on a single final validation step with `require_followup`
|
|
125
|
+
- `wr.workflow-for-workflows.json` -- gates in loop `body` field (not `loop.steps`); correct pattern for loop-body gates
|
|
127
126
|
|
|
128
|
-
**Current smoke test baseline:**
|
|
127
|
+
**Current smoke test baseline:** 36/36 (re-verified 2026-04-23 third session)
|
|
129
128
|
|
|
130
|
-
### Key landscape observations (corrected Phase 1c, 2026-04-
|
|
129
|
+
### Key landscape observations (corrected Phase 1c third session, 2026-04-23)
|
|
130
|
+
|
|
131
|
+
> **CRITICAL CORRECTION (third session):** All prior landscape scans used `loop.steps` to find loop body steps. The correct field in the current schema is `body` (not `loop.steps`). Every workflow with loops uses `body`. This means prior gate counts for workflows with loops were undercounted. Corrected counts:
|
|
132
|
+
> - `wr.workflow-for-workflows.json`: **3 steps** with assessment refs (in `body` of `phase-6-quality-gate-loop`) -- not 0
|
|
133
|
+
> - `wr.coding-task.json`: **3 steps** with assessment refs (in `body`) -- not 2
|
|
134
|
+
> - All other workflows: corrected counts verified below
|
|
135
|
+
|
|
136
|
+
**Corrected gate step counts (using `body` field correctly):**
|
|
137
|
+
|
|
138
|
+
| Workflow | Gate steps | Gate step IDs |
|
|
139
|
+
|---|:---:|---|
|
|
140
|
+
| `wr.adaptive-ticket-creation` | 1 | phase-5-batch-tickets |
|
|
141
|
+
| `wr.bug-investigation` | 1 | phase-5-diagnosis-validation |
|
|
142
|
+
| `wr.coding-task` | **3** | phase-1c-challenge-and-select, phase-3-plan-and-test-design, phase-7b-fix-and-summarize |
|
|
143
|
+
| `wr.mr-review` | 1 | phase-5-final-validation (3 refs) |
|
|
144
|
+
| `wr.shaping` | 2 | frame-gate, breadboard-and-elements |
|
|
145
|
+
| `wr.workflow-for-workflows` | **3** | phase-6a-state-economy-audit, phase-6b-execution-simulation, phase-6c-adversarial-quality-review |
|
|
146
|
+
| `test-artifact-loop-control` | 1 | complete |
|
|
147
|
+
| All others | 0 | -- |
|
|
131
148
|
|
|
132
149
|
1. **Two prompt formats coexist:** `promptBlocks` (structured object with goal/constraints/procedure/verify) and raw `prompt` string. The authoring spec recommends `promptBlocks`. Not all "modern" workflows use it consistently.
|
|
133
150
|
|
|
@@ -135,17 +152,19 @@ Rationale (justified against alternatives):
|
|
|
135
152
|
|
|
136
153
|
3. **Several "candidates" from open-work-inventory also no longer exist:** `mr-review-workflow.json`, `bug-investigation.json`, `design-thinking-workflow.json` -- all absorbed or renamed. The list in `open-work-inventory.md` is materially stale.
|
|
137
154
|
|
|
138
|
-
4. **Assessment gates are the biggest behavioral differentiator:** 7
|
|
155
|
+
4. **Assessment gates are the biggest behavioral differentiator:** 7 workflows have functional assessment gates (6 production-relevant + 1 test). The rest have no engine-enforced quality checkpoints.
|
|
139
156
|
|
|
140
|
-
5. **`
|
|
157
|
+
5. **`wr.workflow-for-workflows.json` DOES have functional assessment gates** -- 3 steps in the loop body (`phase-6a`, `phase-6b`, `phase-6c`) carry assessment refs. All 4 declared gates are referenced and wired. Prior scan missed these because it looked for `loop.steps` instead of `body`.
|
|
141
158
|
|
|
142
|
-
6. **`recommendedPreferences` is a common gap:** 11
|
|
159
|
+
6. **`recommendedPreferences` is a common gap:** ~11 workflows are missing it. Easy to add, genuine behavioral improvement.
|
|
143
160
|
|
|
144
|
-
7. **`references` is almost universally missing:** Only
|
|
161
|
+
7. **`references` is almost universally missing:** Only a few workflows have it. This is cosmetic -- references are informational, not enforced.
|
|
145
162
|
|
|
146
|
-
8. **The "unstamped" list from `validate:registry` is cosmetic advisory only** --
|
|
163
|
+
8. **The "unstamped" list from `validate:registry` is cosmetic advisory only** -- names 14 unstamped workflows; stamping alone is not a quality improvement goal.
|
|
147
164
|
|
|
148
|
-
9. **`wr.production-readiness-audit.json` has no assessment gates** -- despite being a review workflow with a clear audit focus, it
|
|
165
|
+
9. **`wr.production-readiness-audit.json` has no assessment gates** -- despite being a review workflow with a clear audit focus (`phase-5-final-validation` exists), it declares no `assessments` and no `assessmentRefs`. This is a confirmed behavioral gap on a high-value workflow.
|
|
166
|
+
|
|
167
|
+
10. **`wr.coding-task` has 3 gated steps across the lifecycle (design, plan, and verification)** -- not 2 as prior scans reported. This is a richer quality-gate structure than previously understood.
|
|
149
168
|
|
|
150
169
|
### Phase 1c hard-constraint findings (engine/schema reality checks)
|
|
151
170
|
|
|
@@ -197,11 +216,13 @@ Rationale (justified against alternatives):
|
|
|
197
216
|
|
|
198
217
|
### The 4 production workflows (what actually runs in the daemon pipeline)
|
|
199
218
|
|
|
200
|
-
From `triggers.yml` and `src/coordinators/modes
|
|
201
|
-
1. **`wr.discovery`** (full-pipeline mode, step 1) --
|
|
202
|
-
2. **`wr.shaping`** (full-pipeline mode, step
|
|
203
|
-
3. **`wr.coding-task`** (
|
|
204
|
-
4. **`wr.mr-review`** (
|
|
219
|
+
From `triggers.yml` and `src/coordinators/modes/full-pipeline.ts` (re-verified 2026-04-23):
|
|
220
|
+
1. **`wr.discovery`** (full-pipeline mode, step 1 via `coordinators/modes/full-pipeline.ts`) -- stamped v3. Has 3 `while` loops with `artifact_contract` conditionSources and `maxIterations` backstops (2, 3, 3). No assessment gates. Research step -- gates may not be appropriate here.
|
|
221
|
+
2. **`wr.shaping`** (full-pipeline mode, step 2) -- has 2 assessment gates, 1 `while` loop with `artifact_contract` conditionSource and `maxIterations: 2`. NOT stamped.
|
|
222
|
+
3. **`wr.coding-task`** (direct `triggers.yml` trigger + implement mode) -- has 2 gate steps (each with `require_followup`), 4 loops (3 `while` with `artifact_contract`, 1 `forEach`). NOT stamped. Highest-stakes: writes code.
|
|
223
|
+
4. **`wr.mr-review`** (direct `triggers.yml` trigger `mr-review`) -- has 3 assessment gates on final-validation step with `require_followup`, 1 `while` loop with `artifact_contract` conditionSource and `maxIterations: 4`. NOT stamped. Issue #174 still open but gates are already wired.
|
|
224
|
+
|
|
225
|
+
**Loop structure verdict (verified):** All production workflows use `conditionSource.kind = "artifact_contract"` with `maxIterations` backstops. Loop control is sound. No missing termination conditions. This is a significant quality signal -- these loops will not run forever.
|
|
205
226
|
|
|
206
227
|
**Key tension**: The 4 production workflows already have assessment gates. The "legacy" workflows that don't have gates (`wr.adaptive-ticket-creation`, `wr.documentation-update`, `wr.production-readiness-audit`, etc.) are NOT used in the autonomous pipeline -- they're human-triggered workflows.
|
|
207
228
|
|
|
@@ -225,8 +246,8 @@ From `triggers.yml` and `src/coordinators/modes/`:
|
|
|
225
246
|
|
|
226
247
|
**Tension 2: Stamping vs. behavioral improvement**
|
|
227
248
|
- `validatedAgainstSpecVersion` is a stamp that says "this workflow was reviewed against the current authoring spec." Most production workflows are missing this stamp.
|
|
228
|
-
- Running `wr.workflow-for-workflows.
|
|
229
|
-
- But running `wr.workflow-for-workflows.
|
|
249
|
+
- Running `wr.workflow-for-workflows.json` on a workflow is the intended process to earn the stamp.
|
|
250
|
+
- But running `wr.workflow-for-workflows.json` takes significant agent time and may find things to fix, making the "just stamp it" shortcut dishonest.
|
|
230
251
|
|
|
231
252
|
**Tension 3: Documentation rot creates misdirected work**
|
|
232
253
|
- The open-work-inventory and tickets/next-up.md reference deleted files and closed work (issue #174, exploration-workflow.json).
|
|
@@ -234,16 +255,24 @@ From `triggers.yml` and `src/coordinators/modes/`:
|
|
|
234
255
|
- Fixing the docs first is cheap but it's not "shipping workflow improvements."
|
|
235
256
|
|
|
236
257
|
**Tension 4: Active focus is elsewhere**
|
|
237
|
-
- Recent commits (Apr 20-
|
|
238
|
-
- The project owner's actual momentum
|
|
258
|
+
- Recent commits (Apr 20-23) are engine/daemon/console/schema: loadSessionNotes export, metricsProfile footer injection, wr.* namespace rename, console fixes, TypeScript 6 upgrade.
|
|
259
|
+
- The project owner's actual momentum has been on infrastructure and schema, not workflow authoring content.
|
|
239
260
|
- Starting a workflow modernization project now means context-switching from hot infrastructure work.
|
|
261
|
+
- Mitigating factor: the wr.* rename (#782) and metricsProfile additions (#779) WERE workflow file changes. The infrastructure work is now slowing; conditions may be better for workflow content work.
|
|
262
|
+
|
|
263
|
+
**Tension 5: Issue #174 is open but done (new, 2026-04-23)**
|
|
264
|
+
- GitHub issue #174 "Adopt assessment-gate follow-up in MR review" is labeled `feature, next` and remains open.
|
|
265
|
+
- But `wr.mr-review` already has 3 assessment gates with `require_followup` consequences, all properly wired.
|
|
266
|
+
- The issue's stated acceptance criteria ("assessment gate added to wr.mr-review") are met.
|
|
267
|
+
- Closing this issue is cheap cleanup but clarifies the work queue.
|
|
240
268
|
|
|
241
269
|
### Success criteria (observable)
|
|
242
270
|
|
|
243
|
-
1. The
|
|
244
|
-
2. Planning docs (`open-work-inventory.md`, `tickets/next-up.md`) reference only files that exist in the repo
|
|
245
|
-
3. At least one non-production workflow with a review/audit purpose (`wr.production-readiness-audit
|
|
246
|
-
4. `npx vitest run` passes (
|
|
271
|
+
1. The 3 unstamped production daemon workflows (`wr.shaping`, `wr.coding-task`, `wr.mr-review`) all have `validatedAgainstSpecVersion: 3` after genuine review via `wr.workflow-for-workflows.json`
|
|
272
|
+
2. Planning docs (`open-work-inventory.md`, `tickets/next-up.md`) reference only files that exist in the repo; issue #174 is closed
|
|
273
|
+
3. At least one non-production workflow with a review/audit purpose (`wr.production-readiness-audit` or `wr.adaptive-ticket-creation`) gains functional assessment gates
|
|
274
|
+
4. `npx vitest run tests/lifecycle/bundled-workflow-smoke.test.ts` passes (36/36 minimum) before and after any changes
|
|
275
|
+
5. No pre-existing test failures are introduced (perf/cli/polling failures confirmed pre-existing and not attributed to workflow changes)
|
|
247
276
|
|
|
248
277
|
### Reframes and HMW questions
|
|
249
278
|
|
|
@@ -252,17 +281,19 @@ From `triggers.yml` and `src/coordinators/modes/`:
|
|
|
252
281
|
- Project B: Validate + stamp the 4 production workflows (high value, expensive, requires running quality gate)
|
|
253
282
|
|
|
254
283
|
**Reframe 2: The daemon doesn't need modernization -- it needs validation**
|
|
255
|
-
The autonomous pipeline workflows already use assessment gates. What they're missing is the formal `validatedAgainstSpecVersion` stamp, which is earned by running them through `wr.workflow-for-workflows.
|
|
284
|
+
The autonomous pipeline workflows already use assessment gates. What they're missing is the formal `validatedAgainstSpecVersion` stamp, which is earned by running them through `wr.workflow-for-workflows.json`. The work is validation, not "modernization."
|
|
256
285
|
|
|
257
|
-
**HMW 1:** How might we
|
|
286
|
+
**HMW 1:** How might we run the quality gate on `wr.coding-task` as a time-bounded probe to scope Stream B before committing to it?
|
|
258
287
|
|
|
259
288
|
**HMW 2:** How might we prioritize the non-production workflows without session outcome data to guide us?
|
|
260
289
|
|
|
261
|
-
### Primary framing risk
|
|
290
|
+
### Primary framing risk (updated 2026-04-23)
|
|
262
291
|
|
|
263
292
|
**The specific condition that would make this framing wrong:**
|
|
264
293
|
|
|
265
|
-
If `wr.
|
|
294
|
+
If running `wr.workflow-for-workflows.json` on `wr.coding-task` at STANDARD depth returns `authoring-integrity-gate: low` or `outcome-effectiveness-gate: low` (the two quality gate assessment dimensions), then the framing "production workflows need stamping not redesign" is wrong. A `low` on either dimension means the workflow has structural quality problems that the gate catches -- and the `require_followup` consequence would trigger, sending the quality gate into another iteration rather than producing a stamp. This would mean the scope of work is redesign (behavioral changes), not validation (stamp-earning). The only way to resolve this uncertainty is to actually run the gate on `wr.coding-task`. Until that happens, this framing risk is unresolved.
|
|
295
|
+
|
|
296
|
+
**Why this specific risk and not a generic one:** The assessment dimensions of `wr.workflow-for-workflows.json` are `state_economy`, `simulation_outcome`, `authoring_integrity`, and `outcome_effectiveness`. A `low` on `state_economy` means the workflow is inefficient but not structurally wrong. A `low` on `authoring_integrity` or `outcome_effectiveness` means the workflow has quality problems that actively harm output. These two are the ones that would force redesign. Loop structure and gate wiring are already verified-correct -- so the remaining unknowable is prompt quality under adversarial review.
|
|
266
297
|
|
|
267
298
|
### Primary uncertainty
|
|
268
299
|
|
|
@@ -299,28 +330,30 @@ At the same time, planning docs reference 7 deleted files and one open-but-done
|
|
|
299
330
|
## Candidate Generation Setup (Phase 3b)
|
|
300
331
|
|
|
301
332
|
**Path:** `design_first`
|
|
302
|
-
**candidateCountTarget:** 3
|
|
333
|
+
**candidateCountTarget:** 3
|
|
334
|
+
**Updated:** 2026-04-23 (sharpened from prior session; 3 existing candidates re-evaluated below)
|
|
303
335
|
|
|
304
|
-
### Required properties of the candidate set
|
|
336
|
+
### Required properties of the candidate set (updated 2026-04-23)
|
|
305
337
|
|
|
306
338
|
Per the `design_first` path contract, the 3 candidates must satisfy:
|
|
307
339
|
|
|
308
|
-
1. **At least one reframe candidate:** One candidate must challenge whether
|
|
340
|
+
1. **At least one reframe candidate:** One candidate must challenge whether docs correction + production validation is the right investment. Valid reframes: retire low-value workflows, invest in lint tooling, or defer workflow work entirely. Direction C (defer) and Direction B (tooling) serve this role.
|
|
309
341
|
|
|
310
|
-
2. **Meaningful differentiation:** Candidates must differ in their primary bet, not just
|
|
342
|
+
2. **Meaningful differentiation:** Candidates must differ in their primary bet, not just ordering or scope. Direction A bets on "empirical validation of the production pipeline first." Direction B bets on "tooling over manual migration." Direction C bets on "deferral is correct given bandwidth context." These are meaningfully different bets.
|
|
311
343
|
|
|
312
|
-
3. **
|
|
344
|
+
3. **Grounded in the 5 decision criteria (updated):** Sequencing discipline / Empirical before prescriptive / Production-first value / No cosmetic compliance / Incremental shippability. Each candidate is evaluated against all 5 below.
|
|
313
345
|
|
|
314
|
-
4. **Prototype-learning uncertainty honored:**
|
|
346
|
+
4. **Prototype-learning uncertainty honored:** Direction A explicitly makes the quality gate the scope-branch point. Direction B bypasses this uncertainty by investing in tooling instead. Direction C defers it entirely. All three handle the uncertainty differently -- this is correct.
|
|
315
347
|
|
|
316
|
-
###
|
|
348
|
+
### New bias to guard against (2026-04-23 addition)
|
|
317
349
|
|
|
318
|
-
|
|
350
|
+
The prior session's candidates were generated when `wr.workflow-for-workflows.v2.json` was the quality gate. That file has been consolidated into `wr.workflow-for-workflows.json`. References in Direction A must use the correct current file name. This is a naming-only correction; the candidates are otherwise unchanged.
|
|
319
351
|
|
|
320
352
|
### Anti-candidates (explicitly ruled out by decision criteria)
|
|
321
353
|
|
|
322
|
-
- Any candidate that adds `validatedAgainstSpecVersion` without running `wr.workflow-for-workflows.
|
|
323
|
-
- Any candidate that prioritizes legacy catalog workflows
|
|
354
|
+
- Any candidate that adds `validatedAgainstSpecVersion` without running `wr.workflow-for-workflows.json` -- violates criterion 4 (no cosmetic compliance)
|
|
355
|
+
- Any candidate that prioritizes legacy catalog workflows over production pipeline workflows -- violates criterion 3 unless making a deliberate reframe argument
|
|
356
|
+
- Any candidate that treats "close #174" and "stamp wr.coding-task" as equivalent work units -- they are different in kind (docs hygiene vs. genuine quality validation)
|
|
324
357
|
|
|
325
358
|
---
|
|
326
359
|
|
|
@@ -328,14 +361,14 @@ Because the two streams are well-defined, generation will be pulled toward micro
|
|
|
328
361
|
|
|
329
362
|
### Direction A: Docs-first + empirical production validation (recommended)
|
|
330
363
|
|
|
331
|
-
**Core bet:** Fix the documentation foundation first, then run `wr.workflow-for-workflows.
|
|
364
|
+
**Core bet:** Fix the documentation foundation first, then run `wr.workflow-for-workflows.json` on `wr.coding-task` as a probe -- let that run's findings determine the scope of remaining work.
|
|
332
365
|
|
|
333
366
|
**What:**
|
|
334
|
-
1. Update `open-work-inventory.md` and `tickets/next-up.md` to remove
|
|
335
|
-
2. Close GitHub issue #174 (assessment-gate adoption in MR review is already done
|
|
336
|
-
3. Run `wr.workflow-for-workflows.
|
|
337
|
-
4. If quality gate finds only minor issues:
|
|
338
|
-
5. If quality gate finds
|
|
367
|
+
1. Update `open-work-inventory.md` and `tickets/next-up.md` to remove stale file references (deleted workflows, non-existent candidates)
|
|
368
|
+
2. Close GitHub issue #174 (assessment-gate adoption in MR review is already done -- `wr.mr-review` has 3 gates with `require_followup`, all wired)
|
|
369
|
+
3. Run `wr.workflow-for-workflows.json` on `wr.coding-task` at STANDARD depth
|
|
370
|
+
4. If quality gate finds only minor issues (`state_economy:low` only): fix, stamp, repeat for `wr.shaping`
|
|
371
|
+
5. If quality gate finds structural failures (`authoring_integrity:low` or `outcome_effectiveness:low`): create a focused GitHub issue for the specific fixes, do NOT stamp until fixed
|
|
339
372
|
|
|
340
373
|
**Satisfies decision criteria:**
|
|
341
374
|
- ✅ Sequencing discipline (docs first)
|
|
@@ -406,7 +439,14 @@ Because the two streams are well-defined, generation will be pulled toward micro
|
|
|
406
439
|
|
|
407
440
|
## Resolution Notes
|
|
408
441
|
|
|
409
|
-
**Phase 0 (2026-04-21):** Path confirmed as `design_first`. Context fully populated for downstream steps.
|
|
442
|
+
**Phase 0 (2026-04-21):** Path confirmed as `design_first`. Context fully populated for downstream steps. Smoke test baseline confirmed: 37/37 passing. Open GitHub issues: only #174 ("Adopt assessment-gate follow-up in MR review") is directly related.
|
|
443
|
+
|
|
444
|
+
**Phase 0 third run (2026-04-23, later session):** Repo state re-verified. No material changes since Phase 0 second run. Latest commit: `f0a1822a fix(engine): validate metrics_outcome enum in checkContextBudget`. Smoke test: 36/36. Issue #174: still open. No new workflow files. Open PRs: #797 (max-output-tokens feature, unrelated), #698/#330 (dependabot deps). All prior findings and direction selection remain valid. Path recommendation unchanged: `design_first`. Selected direction unchanged: Direction A (docs-first + empirical production validation). No re-analysis needed.
|
|
445
|
+
|
|
446
|
+
**Phase 0 re-run (2026-04-23, earlier session):** Two material changes since last session:
|
|
447
|
+
1. `feat(workflows): rename all bundled workflows to wr.* namespace (#782)` -- all workflow IDs now have `wr.` prefix; usage data in session store uses old IDs (`coding-task-workflow-agentic` = `wr.coding-task`, `mr-review-workflow-agentic` = `wr.mr-review`). Design doc table was using the old file names; corrected to `wr.*` IDs.
|
|
448
|
+
2. `chore(workflows): delete stale wfw copy, rename .v2.json to workflow-for-workflows.json (#780)` -- `wr.workflow-for-workflows.v2.json` absorbed into `wr.workflow-for-workflows.json`. Smoke test count is now 36/36.
|
|
449
|
+
All candidate directions from prior session remain valid. No engine schema changes that affect assessment gate contract. Issue #174 still open.
|
|
410
450
|
|
|
411
451
|
---
|
|
412
452
|
|
|
@@ -417,14 +457,92 @@ Because the two streams are well-defined, generation will be pulled toward micro
|
|
|
417
457
|
| path = `design_first` | Goal was solution-statement; primary risk is wrong candidates/wrong unit of work | 2026-04-20 |
|
|
418
458
|
| No subagent delegation in Phase 0 | All data available in-repo via Bash/Read tools; synthesis task is single-thread | 2026-04-20 |
|
|
419
459
|
| Prior landscape corrected | assessmentRef (singular) vs assessmentRefs (plural) error fixed; modern baselines re-verified | 2026-04-20 |
|
|
420
|
-
| `wr.workflow-for-workflows.v2.json` added to HIGH-priority list | It is the quality gate workflow but lacks assessmentRefs at step level -- ironic and high-value fix | 2026-04-20 |
|
|
421
460
|
| Stale planning docs identified as prerequisite gate | Must correct docs before implementation begins -- they reference deleted targets | 2026-04-20 |
|
|
422
|
-
| Delegation: mechanism available, not used for design work | spawn_agent
|
|
423
|
-
| Web browsing: available via curl | curl to example.com returned HTML -- network reachable;
|
|
461
|
+
| Delegation: mechanism available, not used for design work | spawn_agent with wr.classify-task returned success. Not used for design/synthesis -- main agent owns synthesis by rule. Used for parallel audits only when latency benefit is clear. | 2026-04-20/23 |
|
|
462
|
+
| Web browsing: available via curl | curl to example.com returned HTML -- network reachable; not needed (all data is in-repo) | 2026-04-20/23 |
|
|
424
463
|
| Artifact strategy: doc is readable summary only | Execution truth lives in step notes + context variables; design doc is for human reference only | 2026-04-20 |
|
|
464
|
+
| **Selected direction: Candidate 2 (quality gate probe)** | Satisfies all 5 decision criteria; only candidate that answers "are production workflows sound?"; failure mode bounded by explicit branch condition; philosophy aligned | 2026-04-23 |
|
|
465
|
+
| "Follows existing repo pattern" rationale corrected | Git history shows all 4 stamped workflows were stamped during authoring commits, not after quality gate runs. Corrected rationale: "exceeds current practice; justified by philosophy + wr.coding-task 85-session stakes." | 2026-04-23 |
|
|
466
|
+
| Runner-up bonus PR: wr.production-readiness-audit gate | Standalone, independent of quality gate sessions, delivers user-facing behavioral improvement, follows wr.shaping gate pattern exactly | 2026-04-23 |
|
|
467
|
+
| Candidate 3 lint rule left out of scope | YAGNI after wr.production-readiness-audit bonus PR fixes the most obvious ungated audit workflow; heuristic maintenance burden outweighs value | 2026-04-23 |
|
|
468
|
+
| Candidate 1 (mechanical stamp) disqualified | Fails decision criteria 2 (empirical) and 4 (no cosmetic compliance); corrupts stamp meaning | 2026-04-23 |
|
|
425
469
|
|
|
426
470
|
---
|
|
427
471
|
|
|
428
472
|
## Final Summary
|
|
429
473
|
|
|
430
|
-
|
|
474
|
+
**Recommendation:** Quality gate probe on `wr.coding-task` + docs hygiene + `wr.production-readiness-audit` gate addition
|
|
475
|
+
|
|
476
|
+
**Confidence band:** Medium-high
|
|
477
|
+
|
|
478
|
+
The "medium" component comes entirely from one unresolved prototype-learning uncertainty: what does running `wr.workflow-for-workflows.json` on `wr.coding-task` actually find? This is not resolvable by analysis -- it's resolved by doing the work. The direction is correct in both outcomes (minor findings → stamp; structural findings → scoped redesign issue). The confidence in the direction is high; the confidence in the scope is medium.
|
|
479
|
+
|
|
480
|
+
---
|
|
481
|
+
|
|
482
|
+
### The problem (reframed)
|
|
483
|
+
|
|
484
|
+
The stated goal ("modernize `exploration-workflow.json`") was a solution statement pointing at a file that no longer exists. The real problem has two layers:
|
|
485
|
+
|
|
486
|
+
**Layer 1 (cheap, ~30 min):** Planning docs and issue queue are stale -- they reference deleted workflows and an already-completed issue (#174). Any work started from them is misdirected.
|
|
487
|
+
|
|
488
|
+
**Layer 2 (high value, scope-uncertain):** The 3 most-used production pipeline workflows (`wr.coding-task` at 85 sessions, `wr.mr-review` at 65, `wr.shaping`) are structurally sound (correct loop control, working assessment gates, `artifact_contract` conditionSources) but have never been run through the project's quality gate. They lack `validatedAgainstSpecVersion: 3`.
|
|
489
|
+
|
|
490
|
+
---
|
|
491
|
+
|
|
492
|
+
### Selected direction: three independent work units
|
|
493
|
+
|
|
494
|
+
**PR 1 -- Docs hygiene (independent, no dependencies, ~30 min)**
|
|
495
|
+
- Update `docs/roadmap/open-work-inventory.md`: remove references to deleted workflows (`exploration-workflow.json`, `mr-review-workflow.json`, `bug-investigation.json`, `design-thinking-workflow.json`, `wr.workflow-for-workflows.v2.json`, and other stale entries)
|
|
496
|
+
- Update `docs/tickets/next-up.md`: remove stale "Ticket 2: Legacy workflow modernization -- exploration-workflow.json" entry
|
|
497
|
+
- Close GitHub issue #174 with comment: "Adopting assessment-gate follow-up in MR review is complete. Step `phase-5-final-validation` in `wr.mr-review` already has `assessmentRefs: [\"evidence-quality-gate\", \"coverage-completeness-gate\", \"contradiction-resolution-gate\"]` with `assessmentConsequences` triggering `require_followup` when any dimension scores `low`. Three gates, all wired, no further action needed."
|
|
498
|
+
- Pre-PR validation: `grep -E "exploration-workflow|mr-review-workflow\.json|bug-investigation\.json|design-thinking-workflow|workflow-for-workflows\.v2" docs/roadmap/open-work-inventory.md docs/tickets/next-up.md` must return no output
|
|
499
|
+
|
|
500
|
+
**PR 2 -- `wr.production-readiness-audit` assessment gate (independent, no dependencies, ~1 hr)**
|
|
501
|
+
- Add to `workflows/production-readiness-audit.json`:
|
|
502
|
+
- Top-level `assessments`: `[{ "id": "readiness-verdict", "purpose": "The readiness verdict is evidence-grounded and calibrated -- not optimistic or based on absence of red flags", "dimensions": [{ "id": "readiness_confidence", "purpose": "Verdict is supported by specific evidence items tied to concrete system behaviors, not general impressions", "levels": ["low", "high"] }] }]`
|
|
503
|
+
- On the final verdict step: `"assessmentRefs": ["readiness-verdict"]` + `"assessmentConsequences": [{ "when": { "anyEqualsLevel": "low" }, "effect": { "kind": "require_followup", "guidance": "Readiness confidence is low. Return to Phase 3 evidence collection: identify which readiness dimensions lack specific behavioral evidence, gather it, and re-run the verdict." } }]`
|
|
504
|
+
- Create a GitHub issue for this work before implementation
|
|
505
|
+
- Smoke test must pass (36/36) after the change
|
|
506
|
+
- Pattern reference: `wr.shaping.json` `frame-soundness` gate is the cleanest example to follow
|
|
507
|
+
|
|
508
|
+
**Stream B -- Quality gate probe on `wr.coding-task` (independent, time-bounded, scope-uncertain)**
|
|
509
|
+
1. Create GitHub issue: "Validate and stamp wr.coding-task via quality gate" with acceptance criteria: run `wr.workflow-for-workflows.json` at STANDARD depth; stamp only if no `authoring_integrity:low` or `outcome_effectiveness:low`
|
|
510
|
+
2. Run `wr.workflow-for-workflows.json` on `wr.coding-task` at STANDARD depth in a daemon session
|
|
511
|
+
3. Branch on gate findings:
|
|
512
|
+
- `state_economy:low` only → fix in-session (inefficiency, not structural failure), stamp, PR
|
|
513
|
+
- `simulation_outcome:low` with narrow fix → fix in-session, stamp, PR
|
|
514
|
+
- `authoring_integrity:low` or `outcome_effectiveness:low` → stop, create "wr.coding-task quality improvements" issue with specific findings, do NOT stamp until fixed
|
|
515
|
+
4. If wr.coding-task stamps cleanly: repeat for `wr.shaping` (same pattern)
|
|
516
|
+
|
|
517
|
+
**Minimum viable delivery:** PR 1 alone (docs hygiene). Already worth doing independently of everything else.
|
|
518
|
+
**Standard delivery:** PR 1 + Stream B (wr.coding-task stamped or scoped redesign issue created).
|
|
519
|
+
**Full delivery:** PR 1 + PR 2 + Stream B (all 3 unstamped production workflows stamped, wr.production-readiness-audit gated).
|
|
520
|
+
|
|
521
|
+
---
|
|
522
|
+
|
|
523
|
+
### Strongest alternative
|
|
524
|
+
|
|
525
|
+
**Candidate 3 (tooling investment over quality gate sessions):** Add `validate:registry` advisory rule for "audit step without gate," add `wr.production-readiness-audit` gate, skip quality gate sessions entirely.
|
|
526
|
+
|
|
527
|
+
Switch to this if: Stream B's gate run finds structural failures in `wr.coding-task` AND the resulting redesign issue is deprioritized. At that point, the production stamp is deferred anyway, and tooling investment has better expected return than waiting for redesign.
|
|
528
|
+
|
|
529
|
+
---
|
|
530
|
+
|
|
531
|
+
### Residual risks
|
|
532
|
+
|
|
533
|
+
1. **Quality gate findings expand scope significantly.** The gate may find `authoring_integrity:low` or `outcome_effectiveness:low` for `wr.coding-task`, triggering redesign territory. Managed: explicit branch condition. Risk level: medium (unknown until run).
|
|
534
|
+
|
|
535
|
+
2. **Quality gate validity for coding-task-style workflows.** `wr.workflow-for-workflows.json` has not been run on a production pipeline workflow before. Its assessment dimensions may produce noisy or off-target findings for a coding workflow. Risk level: low (dimensions are general; gate was "exercised extensively" per commit dc4624dc).
|
|
536
|
+
|
|
537
|
+
3. **Production workflow stamps remain deferred if Stream B is deprioritized.** PR 1 and PR 2 ship regardless, but if Stream B doesn't happen, `wr.coding-task` and `wr.shaping` stay unstamped. Risk level: low for functionality (stamps are dev-only signals), medium for internal quality discipline.
|
|
538
|
+
|
|
539
|
+
---
|
|
540
|
+
|
|
541
|
+
### What changed from the stated goal
|
|
542
|
+
|
|
543
|
+
| Stated goal | Actual recommendation |
|
|
544
|
+
|---|---|
|
|
545
|
+
| "Modernize `exploration-workflow.json`" | That file no longer exists; `wr.discovery` is already modern (v3.2.0, stamped, routines, assessment-contract loops) |
|
|
546
|
+
| Modernize specific files by adding schema fields | Run the quality gate (genuine review) before stamping; field additions alone are cosmetic |
|
|
547
|
+
| Focus on legacy catalog workflows | Focus on the 3 production pipeline workflows actually used in 85+ daemon sessions |
|
|
548
|
+
| Planning docs as priority guide | Planning docs are stale; usage data from session store is the correct priority guide |
|
package/docs/tickets/next-up.md
CHANGED
|
@@ -24,24 +24,16 @@ Confirm whether `selected_next_step` trace refs already include skipped step IDs
|
|
|
24
24
|
|
|
25
25
|
---
|
|
26
26
|
|
|
27
|
-
## Ticket 2: Legacy workflow modernization --
|
|
27
|
+
## ~~Ticket 2: Legacy workflow modernization -- wr.adaptive-ticket-creation~~ (done)
|
|
28
28
|
|
|
29
|
-
|
|
29
|
+
Modernized `workflows/adaptive-ticket-creation.json` to current v2 authoring patterns:
|
|
30
30
|
|
|
31
|
-
|
|
31
|
+
- Added `wr.features.capabilities` declaration (workflow uses optional file system access)
|
|
32
|
+
- Added `pathComplexity` to `outputRequired.context` in `phase-0-triage` (structured output contract)
|
|
33
|
+
- Added `ticket-coverage-gate` assessment on `phase-5-batch-tickets` (bounded judgment at highest-stakes output step)
|
|
34
|
+
- Stamped with `validatedAgainstSpecVersion: 3`
|
|
32
35
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
- Current v2/lean structure where appropriate
|
|
36
|
-
- `metaGuidance` and `recommendedPreferences`
|
|
37
|
-
- `references` for authoritative companion material
|
|
38
|
-
- `templateCall` / routine injection instead of repeating large prompt blocks
|
|
39
|
-
- Tighter loop-control wording and evidence-oriented review structure
|
|
40
|
-
|
|
41
|
-
### Related
|
|
42
|
-
|
|
43
|
-
- `docs/roadmap/open-work-inventory.md` (full prioritized modernization list)
|
|
44
|
-
- `docs/authoring.md` (modern baseline)
|
|
36
|
+
`exploration-workflow.json` no longer exists in the bundled set. Next modernization candidate: see `docs/roadmap/open-work-inventory.md` for the current prioritized list.
|
|
45
37
|
|
|
46
38
|
---
|
|
47
39
|
|
package/package.json
CHANGED
|
@@ -11,14 +11,17 @@
|
|
|
11
11
|
"Write tickets for all backend work needed to support the v2 search API",
|
|
12
12
|
"Create a single bug ticket for the checkout crash when applying a promo code on iOS 17"
|
|
13
13
|
],
|
|
14
|
+
"features": [
|
|
15
|
+
"wr.features.capabilities"
|
|
16
|
+
],
|
|
14
17
|
"preconditions": [
|
|
15
18
|
"User has provided a description of the feature, task, or work to be ticketed.",
|
|
16
19
|
"Agent has file system access for loading team preferences and persisting rules."
|
|
17
20
|
],
|
|
18
21
|
"metaGuidance": [
|
|
19
|
-
"ROLE: expert Product Manager and Mobile Tech Lead. Triage autonomously, write developer-ready tickets with full context, and produce objectively testable acceptance criteria
|
|
22
|
+
"ROLE: expert Product Manager and Mobile Tech Lead. Triage autonomously, write developer-ready tickets with full context, and produce objectively testable acceptance criteria — not user-story paraphrases.",
|
|
20
23
|
"EXPLORE FIRST: use tools to gather context before asking the user anything. Ask only for information you genuinely cannot determine with tools or from the request itself.",
|
|
21
|
-
"TEAM RULES: load and follow ./.workflow_rules/ticket_creation.md when it exists. Preferences there override your defaults. Rules are captured only on the Epic path
|
|
24
|
+
"TEAM RULES: load and follow ./.workflow_rules/ticket_creation.md when it exists. Preferences there override your defaults. Rules are captured only on the Epic path — complex sessions are where durable conventions emerge and where the investment pays off.",
|
|
22
25
|
"AUTONOMOUS TRIAGE: decide pathComplexity (Simple / Standard / Epic) yourself from the request. Surface your reasoning, then wait for confirmation.",
|
|
23
26
|
"QUALITY FLOOR: every ticket must have a context-rich description, checkbox-style acceptance criteria that are objectively testable, and an effort estimate."
|
|
24
27
|
],
|
|
@@ -29,7 +32,7 @@
|
|
|
29
32
|
"promptBlocks": {
|
|
30
33
|
"goal": "Analyze the request, gather available context, and select the right complexity path before doing any ticket work.",
|
|
31
34
|
"constraints": [
|
|
32
|
-
"Decide the path yourself
|
|
35
|
+
"Decide the path yourself — do not ask the user to choose.",
|
|
33
36
|
"Load ./.workflow_rules/ticket_creation.md if it exists and let it influence your triage. If the file does not exist, note this explicitly in your output so the user knows team conventions were not applied.",
|
|
34
37
|
"Set pathComplexity to exactly one of: Simple, Standard, or Epic."
|
|
35
38
|
],
|
|
@@ -37,11 +40,12 @@
|
|
|
37
40
|
"Read any attached documents, linked PRDs, or referenced specs.",
|
|
38
41
|
"Identify complexity signals: scope breadth, number of distinct deliverables, cross-team dependencies, technical unknowns, and estimated ticket count.",
|
|
39
42
|
"Apply the triage rubric: Simple = single ticket, clear requirements, no blocking unknowns, minimal dependencies. Standard = multiple related tickets, moderate scope, some analysis needed. Epic = complex feature requiring decomposition, multiple teams or significant unknowns, likely 6+ tickets.",
|
|
40
|
-
"Upgrade triggers
|
|
43
|
+
"Upgrade triggers — escalate to Standard if: request implies more than one clearly separate work item. Escalate to Epic if: multiple teams are involved, architecture decisions are unresolved, or you estimate more than five tickets.",
|
|
41
44
|
"State your selected path and the top three reasons. Capture pathComplexity in context."
|
|
42
45
|
],
|
|
43
46
|
"outputRequired": {
|
|
44
|
-
"notesMarkdown": "Selected path (Simple/Standard/Epic), top three triage reasons, any complexity upgrade triggers observed."
|
|
47
|
+
"notesMarkdown": "Selected path (Simple/Standard/Epic), top three triage reasons, any complexity upgrade triggers observed.",
|
|
48
|
+
"context": "Capture pathComplexity (Simple, Standard, or Epic)."
|
|
45
49
|
},
|
|
46
50
|
"verify": [
|
|
47
51
|
"pathComplexity is set to Simple, Standard, or Epic.",
|
|
@@ -61,7 +65,7 @@
|
|
|
61
65
|
"promptBlocks": {
|
|
62
66
|
"goal": "Generate one complete, developer-ready Jira ticket for this request.",
|
|
63
67
|
"constraints": [
|
|
64
|
-
"Acceptance criteria must be phrased as observable, testable conditions
|
|
68
|
+
"Acceptance criteria must be phrased as observable, testable conditions — not user-story restatements.",
|
|
65
69
|
"Follow any team conventions from ./.workflow_rules/ticket_creation.md.",
|
|
66
70
|
"Include all fields a developer needs to start work without asking follow-up questions."
|
|
67
71
|
],
|
|
@@ -111,7 +115,7 @@
|
|
|
111
115
|
"Load ./.workflow_rules/ticket_creation.md and note any relevant team conventions.",
|
|
112
116
|
"Identify: key stakeholders, team dependencies, technical constraints, known risks, and any conflicting requirements.",
|
|
113
117
|
"Classify each gap as: Critical (blocks planning), Important (affects scope), or Nice-to-have (can proceed without it).",
|
|
114
|
-
"For Critical and Important gaps that tools cannot resolve, ask the user
|
|
118
|
+
"For Critical and Important gaps that tools cannot resolve, ask the user — in a single consolidated question block, not one at a time.",
|
|
115
119
|
"After receiving answers, check whether any response reveals scope that would change `pathComplexity` (e.g. the user confirms three teams are involved, or the feature is narrower than initially assessed). If so, state the new classification and reasoning, and ask the user to confirm before continuing to Phase 2."
|
|
116
120
|
],
|
|
117
121
|
"outputRequired": {
|
|
@@ -143,16 +147,16 @@
|
|
|
143
147
|
"promptBlocks": {
|
|
144
148
|
"goal": "Produce a structured plan that will drive ticket generation. This plan is the source of truth for scope.",
|
|
145
149
|
"constraints": [
|
|
146
|
-
"Be explicit about scope boundaries
|
|
150
|
+
"Be explicit about scope boundaries — ambiguous scope will produce ambiguous tickets.",
|
|
147
151
|
"Success criteria must be measurable, not just descriptive.",
|
|
148
152
|
"For Standard path: this plan feeds directly into batch ticket generation."
|
|
149
153
|
],
|
|
150
154
|
"procedure": [
|
|
151
155
|
"Write: Project Summary (2-3 sentences, what is being built and why).",
|
|
152
156
|
"Write: Key Deliverables (bulleted list of distinct components or features).",
|
|
153
|
-
"Write: In-Scope (explicit list
|
|
154
|
-
"Write: Out-of-Scope (explicit exclusions
|
|
155
|
-
"Write: Success Criteria (measurable definition of done
|
|
157
|
+
"Write: In-Scope (explicit list — prevents scope creep).",
|
|
158
|
+
"Write: Out-of-Scope (explicit exclusions — prevents misunderstandings).",
|
|
159
|
+
"Write: Success Criteria (measurable definition of done — each item verifiable).",
|
|
156
160
|
"Write: High-Level Timeline (phases or milestones with rough sizing).",
|
|
157
161
|
"Review: does every deliverable map clearly to implementable work? Is anything in scope that should be out?"
|
|
158
162
|
],
|
|
@@ -178,7 +182,7 @@
|
|
|
178
182
|
"goal": "Break the approved plan into a logical work hierarchy that development teams can execute.",
|
|
179
183
|
"constraints": [
|
|
180
184
|
"Every item in the plan's In-Scope list must map to at least one work item in the hierarchy.",
|
|
181
|
-
"Dependencies must be explicit
|
|
185
|
+
"Dependencies must be explicit — not implied by ordering alone.",
|
|
182
186
|
"Oversized stories (more than one sprint of work) should be split."
|
|
183
187
|
],
|
|
184
188
|
"procedure": [
|
|
@@ -210,7 +214,7 @@
|
|
|
210
214
|
"promptBlocks": {
|
|
211
215
|
"goal": "Add effort estimates, risk assessments, and team assignments to each story in the hierarchy.",
|
|
212
216
|
"constraints": [
|
|
213
|
-
"Conservative estimates are better than optimistic ones
|
|
217
|
+
"Conservative estimates are better than optimistic ones — note uncertainty explicitly.",
|
|
214
218
|
"Justify each estimate with one sentence of reasoning.",
|
|
215
219
|
"Flag stories on the critical path."
|
|
216
220
|
],
|
|
@@ -220,7 +224,7 @@
|
|
|
220
224
|
"Assign priority: must-have for MVP, should-have, nice-to-have.",
|
|
221
225
|
"Note suggested team or skill area for each story.",
|
|
222
226
|
"Identify critical path: which stories block the most downstream work? Surface these explicitly.",
|
|
223
|
-
"Flag any stories whose estimates feel uncertain
|
|
227
|
+
"Flag any stories whose estimates feel uncertain — surface the unknowns rather than hiding them in a range."
|
|
224
228
|
],
|
|
225
229
|
"outputRequired": {
|
|
226
230
|
"notesMarkdown": "Total story point estimate, critical path items, high-risk stories."
|
|
@@ -273,7 +277,21 @@
|
|
|
273
277
|
"Epic tickets are present and child tickets reference the parent (Epic path)."
|
|
274
278
|
]
|
|
275
279
|
},
|
|
276
|
-
"requireConfirmation": true
|
|
280
|
+
"requireConfirmation": true,
|
|
281
|
+
"assessmentRefs": [
|
|
282
|
+
"ticket-coverage-gate"
|
|
283
|
+
],
|
|
284
|
+
"assessmentConsequences": [
|
|
285
|
+
{
|
|
286
|
+
"when": {
|
|
287
|
+
"anyEqualsLevel": "low"
|
|
288
|
+
},
|
|
289
|
+
"effect": {
|
|
290
|
+
"kind": "require_followup",
|
|
291
|
+
"guidance": "ticket_coverage low -- one or more In-Scope items are missing a ticket, or acceptance criteria are not objectively testable. Fix the gaps and retry before presenting to the user."
|
|
292
|
+
}
|
|
293
|
+
}
|
|
294
|
+
]
|
|
277
295
|
},
|
|
278
296
|
{
|
|
279
297
|
"id": "phase-6-capture-rules",
|
|
@@ -285,7 +303,7 @@
|
|
|
285
303
|
"promptBlocks": {
|
|
286
304
|
"goal": "Extract actionable team preferences from this session and persist them so future runs use them automatically.",
|
|
287
305
|
"constraints": [
|
|
288
|
-
"Only write rules that are genuinely reusable across future tickets
|
|
306
|
+
"Only write rules that are genuinely reusable across future tickets — skip one-off project specifics.",
|
|
289
307
|
"Keep rules concise and actionable, not narrative.",
|
|
290
308
|
"Append to ./.workflow_rules/ticket_creation.md rather than replacing it."
|
|
291
309
|
],
|
|
@@ -293,7 +311,7 @@
|
|
|
293
311
|
"Review what conventions, preferences, or requirements emerged during this session.",
|
|
294
312
|
"Identify patterns worth preserving: naming conventions, field usage, AC format preferences, estimation approach, labeling rules.",
|
|
295
313
|
"Draft new rules as short, imperative statements (e.g., 'Use T-shirt sizing not Fibonacci', 'Always include a Figma link in design tickets').",
|
|
296
|
-
"Check against existing rules
|
|
314
|
+
"Check against existing rules — avoid duplicates or contradictions.",
|
|
297
315
|
"Append new rules to ./.workflow_rules/ticket_creation.md, creating the file if it does not exist."
|
|
298
316
|
],
|
|
299
317
|
"outputRequired": {
|
|
@@ -307,5 +325,22 @@
|
|
|
307
325
|
},
|
|
308
326
|
"requireConfirmation": false
|
|
309
327
|
}
|
|
310
|
-
]
|
|
328
|
+
],
|
|
329
|
+
"assessments": [
|
|
330
|
+
{
|
|
331
|
+
"id": "ticket-coverage-gate",
|
|
332
|
+
"purpose": "Every In-Scope plan item has at least one ticket with objectively testable acceptance criteria before the batch is presented to the user.",
|
|
333
|
+
"dimensions": [
|
|
334
|
+
{
|
|
335
|
+
"id": "coverage_completeness",
|
|
336
|
+
"purpose": "All In-Scope items are represented by tickets with checkbox-style AC. No plan item is missing a ticket and no AC is a user-story restatement.",
|
|
337
|
+
"levels": [
|
|
338
|
+
"low",
|
|
339
|
+
"high"
|
|
340
|
+
]
|
|
341
|
+
}
|
|
342
|
+
]
|
|
343
|
+
}
|
|
344
|
+
],
|
|
345
|
+
"validatedAgainstSpecVersion": 3
|
|
311
346
|
}
|