prizmkit 1.1.10 → 1.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (47) hide show
  1. package/bundled/VERSION.json +3 -3
  2. package/bundled/dev-pipeline/README.md +10 -46
  3. package/bundled/dev-pipeline/reset-bug.sh +84 -10
  4. package/bundled/dev-pipeline/reset-feature.sh +86 -10
  5. package/bundled/dev-pipeline/reset-refactor.sh +68 -4
  6. package/bundled/dev-pipeline/scripts/generate-bootstrap-prompt.py +47 -46
  7. package/bundled/dev-pipeline/scripts/generate-bugfix-prompt.py +7 -12
  8. package/bundled/dev-pipeline/scripts/generate-refactor-prompt.py +124 -20
  9. package/bundled/dev-pipeline/scripts/utils.py +20 -0
  10. package/bundled/dev-pipeline/templates/agent-prompts/dev-implement.md +13 -7
  11. package/bundled/dev-pipeline/templates/bootstrap-tier1.md +62 -66
  12. package/bundled/dev-pipeline/templates/bootstrap-tier2.md +37 -40
  13. package/bundled/dev-pipeline/templates/bootstrap-tier3.md +35 -48
  14. package/bundled/dev-pipeline/templates/bugfix-bootstrap-prompt.md +135 -182
  15. package/bundled/dev-pipeline/templates/feature-list-schema.json +6 -21
  16. package/bundled/dev-pipeline/templates/refactor-bootstrap-prompt.md +9 -9
  17. package/bundled/dev-pipeline/templates/sections/context-budget-rules.md +1 -1
  18. package/bundled/dev-pipeline/templates/sections/feature-context.md +4 -0
  19. package/bundled/dev-pipeline/templates/sections/phase-browser-verification.md +41 -24
  20. package/bundled/dev-pipeline/templates/sections/phase-commit-full.md +4 -12
  21. package/bundled/dev-pipeline/templates/sections/phase-deploy-verification.md +9 -17
  22. package/bundled/dev-pipeline/templates/sections/phase-implement-lite.md +1 -1
  23. package/bundled/dev-pipeline/templates/sections/phase-plan-agent.md +3 -2
  24. package/bundled/dev-pipeline/templates/sections/phase-plan-lite.md +4 -2
  25. package/bundled/dev-pipeline/templates/sections/phase-specify-plan-full.md +0 -18
  26. package/bundled/dev-pipeline/templates/sections/session-context.md +1 -2
  27. package/bundled/dev-pipeline/templates/sections/test-failure-recovery-agent.md +75 -0
  28. package/bundled/dev-pipeline/templates/sections/test-failure-recovery-lite.md +66 -0
  29. package/bundled/skills/_metadata.json +1 -1
  30. package/bundled/skills/bugfix-pipeline-launcher/SKILL.md +3 -8
  31. package/bundled/skills/feature-pipeline-launcher/SKILL.md +4 -16
  32. package/bundled/skills/feature-planner/SKILL.md +8 -4
  33. package/bundled/skills/feature-planner/assets/planning-guide.md +16 -11
  34. package/bundled/skills/feature-planner/references/browser-interaction.md +9 -8
  35. package/bundled/skills/feature-planner/references/completeness-review.md +1 -1
  36. package/bundled/skills/feature-planner/references/error-recovery.md +1 -1
  37. package/bundled/skills/feature-planner/references/incremental-feature-planning.md +1 -1
  38. package/bundled/skills/feature-planner/scripts/validate-and-generate.py +10 -7
  39. package/bundled/skills/recovery-workflow/SKILL.md +3 -3
  40. package/bundled/skills/refactor-pipeline-launcher/SKILL.md +4 -15
  41. package/package.json +1 -1
  42. package/bundled/dev-pipeline/retry-bugfix.sh +0 -429
  43. package/bundled/dev-pipeline/retry-feature.sh +0 -445
  44. package/bundled/dev-pipeline/retry-refactor.sh +0 -441
  45. package/bundled/dev-pipeline/templates/sections/failure-log-check.md +0 -9
  46. package/bundled/dev-pipeline/templates/sections/resume-header.md +0 -5
  47. package/bundled/dev-pipeline/templates/sections/test-failure-recovery.md +0 -75
@@ -1,12 +1,5 @@
1
1
  ### Architecture Sync & Commit (SINGLE COMMIT) — DO NOT SKIP
2
2
 
3
- **Bug Fix Documentation Policy**:
4
- - DEFAULT: Run `/prizmkit-retrospective` with structural sync only (update file counts, interfaces, dependencies). Skip knowledge injection.
5
- - UPDATE DOCS (run full retrospective — Job 1 + Job 2) when bug fix causes: interface signature changes, dependency additions/removals, observable behavior changes to existing features, or newly discovered TRAPs.
6
- - Simple bugs: No new spec.md/plan.md needed. Use fast path.
7
- - Complex bugs (multi-module, cascading): Use `/prizmkit-plan` with `artifact_dir=.prizmkit/bugfix/<BUG_ID>/`.
8
- - Commit prefix: `fix(<scope>):` (not `feat:`).
9
-
10
3
  **a.** Check if feature already committed:
11
4
  ```bash
12
5
  git log --oneline | grep "{{FEATURE_ID}}" | head -3
@@ -15,12 +8,11 @@ git log --oneline | grep "{{FEATURE_ID}}" | head -3
15
8
  - If no existing commit → proceed normally with b–d.
16
9
 
17
10
  **b.** Run `/prizmkit-retrospective` (**before commit**, maintains `.prizm-docs/` architecture index):
18
- - **Structural sync**: update KEY_FILES/INTERFACES/DEPENDENCIES/file counts for changed modules
19
- - **Architecture knowledge** (feature sessions only): extract TRAPS, RULES, DECISIONS from completed work into `.prizm-docs/`
20
- - **L2 coverage check**: For any module/sub-module with source files created or significantly modified in this session but no L2 `.prizm` doc — evaluate whether L2 is warranted and create if so. The current session has the best context for accurate KEY_FILES, TRAPS, and DECISIONS.
21
- - Stage doc changes: `git add .prizm-docs/`
11
+ 1. **Structural sync**: update KEY_FILES/INTERFACES/DEPENDENCIES/file counts for changed modules
12
+ 2. **Architecture knowledge**: extract TRAPS, RULES, DECISIONS from completed work into `.prizm-docs/`
13
+ 3. **L2 coverage check**: For any module/sub-module with source files created or significantly modified in this session but no L2 `.prizm` doc — evaluate whether L2 is warranted and create if so. The current session has the best context for accurate KEY_FILES, TRAPS, and DECISIONS.
14
+ 4. Stage doc changes: `git add .prizm-docs/`
22
15
  ⚠️ Do NOT commit here. Only stage.
23
- - **For bug-fix sessions**: structural sync (Job 1) by default. Run knowledge injection (Job 2) when the fix causes interface signature changes, dependency additions/removals, observable behavior changes, or reveals new TRAPs
24
16
 
25
17
  **c.** Stage all feature code explicitly (NEVER use `git add -A` or `git add .`):
26
18
  ```bash
@@ -7,23 +7,7 @@ You just implemented this feature — you know the project's tech stack and buil
7
7
  3. **Assess and record** — append to context-snapshot.md:
8
8
  - **ALL builds pass** → `## Deploy Verification: PASS` — proceed to commit
9
9
  - **Some builds fail with fixable errors** → fix and re-verify (already handled in step 2)
10
- - **Cannot build locally** (missing system-level deps you cannot install) → Generate `.prizmkit/deploy.md` with:
11
- ```
12
- # Local Development Setup
13
-
14
- ## Prerequisites
15
- - [tool]: [install instruction]
16
-
17
- ## Build Steps
18
- 1. [exact command]
19
-
20
- ## Run / Dev Mode
21
- [exact command to start the app locally]
22
-
23
- ## Verify
24
- [how to confirm the app is running correctly]
25
- ```
26
- Record: `## Deploy Verification: PARTIAL — see .prizmkit/deploy.md for missing prerequisites`
10
+ - **Cannot build locally** (missing system-level deps you cannot install) → Record: `## Deploy Verification: PARTIAL — missing system deps (see below)`
27
11
 
28
12
  Deploy verification does NOT block the commit, but you MUST attempt it.
29
13
 
@@ -35,5 +19,13 @@ Deploy verification does NOT block the commit, but you MUST attempt it.
35
19
 
36
20
  If the project cannot be started locally (e.g., requires external services, databases, credentials), skip the smoke test and note why.
37
21
 
22
+ **Deploy documentation update** — Run `/prizmkit-deploy` ONLY if this feature introduced new infrastructure or deployment-affecting changes:
23
+ - New database, cache, message queue, or external service dependency
24
+ - New environment variables required
25
+ - New build steps or deployment configuration (Dockerfile, CI/CD, cloud config)
26
+ - Changed ports, protocols, or service topology
27
+
28
+ If none of the above apply (pure application logic change), skip `/prizmkit-deploy`.
29
+
38
30
 
39
31
  **Checkpoint update**: Update `workflow-checkpoint.json` — set step `deploy-verification` to `"completed"`.
@@ -27,7 +27,7 @@ $TEST_CMD 2>&1 | tee /tmp/test-baseline.txt | tail -20
27
27
  1. All tasks in plan.md are `[x]`
28
28
  2. Run the full test suite to ensure nothing is broken
29
29
  3. Verify each acceptance criterion from Section 1 of context-snapshot.md is met — check mentally, do NOT re-read files you already wrote
30
- 4. If any criterion is not met, fix it now (max 2 fix rounds)
30
+ 4. If any criterion is not met, fix it now using the convergence-based recovery loop (see Test Failure Recovery Protocol)
31
31
 
32
32
  **CP-2**: All acceptance criteria met, all tests pass.
33
33
 
@@ -4,8 +4,9 @@
4
4
  ls .prizmkit/specs/{{FEATURE_SLUG}}/plan.md 2>/dev/null
5
5
  ```
6
6
 
7
- If missing, write it yourself:
8
- - `plan.md`: architecture — components, interfaces, data flow, files to create/modify, testing approach, and a Tasks section with `[ ]` checkboxes ordered by dependency
7
+ If missing, run `/prizmkit-plan` with `artifact_dir=.prizmkit/specs/{{FEATURE_SLUG}}/` to generate `plan.md`:
8
+ - The plan.md should include: architecture — components, interfaces, data flow, files to create/modify, testing approach, and a Tasks section with `[ ]` checkboxes ordered by dependency.
9
+ - Resolve any `[NEEDS CLARIFICATION]` markers using the feature description — do NOT pause for interactive input.
9
10
 
10
11
  **Database Design Gate** (if feature involves data persistence — new tables, schema changes, new entities):
11
12
  Before proceeding past CP-1, verify:
@@ -4,8 +4,10 @@
4
4
  ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
5
5
  ```
6
6
 
7
- If plan.md missing, write it directly:
8
- - `plan.md`: key components, data flow, files to create/modify, and a Tasks section with `[ ]` checkboxes (each task = one implementable unit). Keep under 80 lines.
7
+ If plan.md missing, run `/prizmkit-plan` with `artifact_dir=.prizmkit/specs/{{FEATURE_SLUG}}/`:
8
+ - Pass the feature description and acceptance criteria from the Feature Context section above as input
9
+ - The plan.md should include: key components, data flow, files to create/modify, and a Tasks section with `[ ]` checkboxes (each task = one implementable unit). Keep under 80 lines.
10
+ - Resolve any `[NEEDS CLARIFICATION]` markers using the feature description — do NOT pause for interactive input.
9
11
 
10
12
  **Database Design Gate** (if feature involves data persistence — new tables, schema changes, new entities):
11
13
  Before proceeding past CP-1:
@@ -1,14 +1,5 @@
1
1
  ### Specify + Plan (Full Workflow)
2
2
 
3
- **Check for previous failure log:**
4
- ```bash
5
- cat .prizmkit/specs/{{FEATURE_SLUG}}/failure-log.md 2>/dev/null || echo "NO_PREVIOUS_FAILURE"
6
- ```
7
- If failure-log.md exists:
8
- - Read ROOT_CAUSE and SUGGESTION — adjust your approach accordingly
9
- - Read DISCOVERED_TRAPS — if any are genuine, inject into .prizm-docs/ during Phase 6 retrospective
10
- - Do NOT delete failure-log.md until this session completes all phases and commits successfully
11
-
12
3
  Check existing artifacts first:
13
4
  ```bash
14
5
  ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
@@ -18,15 +9,6 @@ ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
18
9
  - `context-snapshot.md` exists → use it directly, skip context snapshot building
19
10
  - Some missing → generate only missing files
20
11
 
21
- Before planning, check whether feature code already exists in the project (search in source directories identified from `root.prizm` or the project tree scan):
22
- ```bash
23
- grep -r "{{FEATURE_SLUG}}" . --include="*.js" --include="*.ts" --include="*.py" --include="*.go" --include="*.java" --include="*.rb" --include="*.rs" -l --exclude-dir=node_modules --exclude-dir=.git --exclude-dir=dist --exclude-dir=build --exclude-dir=vendor --exclude-dir=.prizmkit 2>/dev/null | head -20
24
- ```
25
-
26
- Record result as `EXISTING_CODE` (list of files, or empty).
27
-
28
- If `EXISTING_CODE` is non-empty: your spec/plan/tasks must reflect this existing implementation — document what exists, identify gaps, do NOT re-implement what is already done.
29
-
30
12
  **Step A — Build Context Snapshot** (skip if `context-snapshot.md` already exists):
31
13
 
32
14
  1. Read `.prizm-docs/root.prizm` and relevant L1/L2 prizm docs
@@ -1,6 +1,5 @@
1
1
  ## Session Context
2
2
 
3
3
  - **Feature ID**: {{FEATURE_ID}} | **Session**: {{SESSION_ID}} | **Run**: {{RUN_ID}}
4
- - **Complexity**: {{COMPLEXITY}} | **Retry**: {{RETRY_COUNT}} / {{MAX_RETRIES}}
5
- - **Previous Status**: {{PREV_SESSION_STATUS}} | **Resume From**: {{RESUME_PHASE}}
4
+ - **Complexity**: {{COMPLEXITY}}
6
5
  - **Init**: {{INIT_DONE}} | Artifacts: spec={{HAS_SPEC}} plan={{HAS_PLAN}}
@@ -0,0 +1,75 @@
1
+ ## Test Failure Recovery Protocol
2
+
3
+ When tests fail during implementation (Phase 3 / Phase 4), use **convergence-based recovery** — keep fixing as long as progress is being made.
4
+
5
+ ### Recovery Loop
6
+
7
+ 1. **Run tests and record results**:
8
+ - Count total failures and note which tests failed
9
+ - Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
10
+
11
+ 2. **Check termination conditions** (evaluate BEFORE each fix attempt):
12
+ - **All tests pass** → Done. Exit recovery loop.
13
+ - **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
14
+ - **Still making progress** — failure count decreased compared to previous round → Continue fixing.
15
+ - **First round** — no history yet → Proceed to fix.
16
+
17
+ 3. **Fix and iterate**:
18
+ - Analyze remaining failures: root cause (code bug vs. test brittleness vs. environment issue)
19
+ - Categorize:
20
+ - **Pre-existing baseline failure**: Expected, do NOT fix
21
+ - **New regression**: Fix the code
22
+ - **Brittle test**: Fix the test or environment setup
23
+ - Apply fix, re-run `$TEST_CMD`, go back to step 1
24
+
25
+ ### Convergence Tracking
26
+
27
+ Maintain a mental (or logged) record each round:
28
+
29
+ ```
30
+ Round 1: 5 failures [test_a, test_b, test_c, test_d, test_e]
31
+ Round 2: 3 failures [test_b, test_d, test_e] ← progress, continue
32
+ Round 3: 3 failures [test_b, test_d, test_e] ← same as round 2 (plateau 1/3)
33
+ Round 4: 3 failures [test_b, test_d, test_e] ← plateau 2/3
34
+ Round 5: 3 failures [test_b, test_d, test_e] ← plateau 3/3 → STOP
35
+ ```
36
+
37
+ **Key rule**: If failures decrease (even by 1), the plateau counter resets to 0.
38
+
39
+ ### Escalation — Dev + Reviewer Workflow
40
+
41
+ When the recovery loop exits with remaining failures:
42
+ - Dev appends failure details to Implementation Log
43
+ - Reviewer agent runs full test suite in Phase 5
44
+ - If Reviewer confirms NEW regressions (not in baseline): mark verdict as `NEEDS_FIXES`
45
+ - If Reviewer confirms only baseline failures remain: proceed with `PASS_WITH_WARNINGS`
46
+
47
+ ### Context-Aware Test Re-run (Performance Optimization)
48
+
49
+ **Skip redundant re-runs**:
50
+ - If Implementation Log section in context-snapshot.md already confirms "all tests passing"
51
+ - → Skip Phase 5 test suite re-run (Reviewer will verify baseline log instead)
52
+ - This avoids rebuilding/re-running tests when already verified
53
+
54
+ **When to re-run**:
55
+ - If Implementation Log is missing or incomplete
56
+ - If any new code was added after the last test run
57
+ - If Reviewer suspects brittleness or environment drift
58
+
59
+ ### Failure Capture Rules
60
+
61
+ If tests remain broken after recovery:
62
+
63
+ ```
64
+ ## Test Failures Encountered
65
+
66
+ - **Test**: [test name/path]
67
+ - Root Cause: [explanation]
68
+ - Category: [pre-existing baseline | new regression | brittle test | environment]
69
+ - Rounds Attempted: [N rounds, plateau at round M]
70
+ - Status: [still failing | requires next session | known limitation]
71
+
72
+ - **Impact on Feature**: [can AC be verified despite failure | blocks AC verification]
73
+ ```
74
+
75
+ **Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.
@@ -0,0 +1,66 @@
1
+ ## Test Failure Recovery Protocol
2
+
3
+ When tests fail during implementation, use **convergence-based recovery** — keep fixing as long as progress is being made.
4
+
5
+ ### Recovery Loop
6
+
7
+ 1. **Run tests and record results**:
8
+ - Count total failures and note which tests failed
9
+ - Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
10
+
11
+ 2. **Check termination conditions** (evaluate BEFORE each fix attempt):
12
+ - **All tests pass** → Done. Exit recovery loop.
13
+ - **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
14
+ - **Still making progress** — failure count decreased compared to previous round → Continue fixing.
15
+ - **First round** — no history yet → Proceed to fix.
16
+
17
+ 3. **Fix and iterate**:
18
+ - Analyze remaining failures: root cause (code bug vs. test brittleness vs. environment issue)
19
+ - Categorize:
20
+ - **Pre-existing baseline failure**: Expected, do NOT fix
21
+ - **New regression**: Fix the code
22
+ - **Brittle test**: Fix the test or environment setup
23
+ - Apply fix, re-run `$TEST_CMD`, go back to step 1
24
+
25
+ ### Convergence Tracking
26
+
27
+ Maintain a mental (or logged) record each round:
28
+
29
+ ```
30
+ Round 1: 5 failures [test_a, test_b, test_c, test_d, test_e]
31
+ Round 2: 3 failures [test_b, test_d, test_e] ← progress, continue
32
+ Round 3: 3 failures [test_b, test_d, test_e] ← same as round 2 (plateau 1/3)
33
+ Round 4: 3 failures [test_b, test_d, test_e] ← plateau 2/3
34
+ Round 5: 3 failures [test_b, test_d, test_e] ← plateau 3/3 → STOP
35
+ ```
36
+
37
+ **Key rule**: If failures decrease (even by 1), the plateau counter resets to 0.
38
+
39
+ ### Escalation — Single Agent
40
+
41
+ When the recovery loop exits with remaining failures:
42
+ - Document all remaining failures in Implementation Log with root cause analysis
43
+ - Record PARTIAL status with known failure list
44
+ - **Do NOT block commit** — unresolved test failures are deferred to next session
45
+
46
+ ### Context-Aware Optimization
47
+
48
+ **Skip redundant re-runs**: If Implementation Log already confirms "all tests passing", skip full suite re-run.
49
+
50
+ ### Failure Capture Rules
51
+
52
+ If tests remain broken after recovery:
53
+
54
+ ```
55
+ ## Test Failures Encountered
56
+
57
+ - **Test**: [test name/path]
58
+ - Root Cause: [explanation]
59
+ - Category: [pre-existing baseline | new regression | brittle test | environment]
60
+ - Rounds Attempted: [N rounds, plateau at round M]
61
+ - Status: [still failing | requires next session | known limitation]
62
+
63
+ - **Impact on Feature**: [can AC be verified despite failure | blocks AC verification]
64
+ ```
65
+
66
+ **Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.
@@ -1,5 +1,5 @@
1
1
  {
2
- "version": "1.1.10",
2
+ "version": "1.1.12",
3
3
  "skills": {
4
4
  "prizm-kit": {
5
5
  "description": "Full-lifecycle dev toolkit. Covers spec-driven development, Prizm context docs, code quality, debugging, deployment, and knowledge management.",
@@ -204,7 +204,7 @@ Detect user intent from their message, then follow the corresponding workflow:
204
204
  **If foreground**: Pipeline runs to completion in the terminal. After it finishes:
205
205
  - Summarize results: total bugs, fixed, failed, skipped
206
206
  - If all fixed: each bug session has already run `prizmkit-retrospective` internally (structural sync by default; full retrospective when the fix changed interfaces, dependencies, or observable behavior). Ask user what's next.
207
- - If some failed: show failed bug IDs and suggest `retry-bugfix.sh <B-XXX>` or `dev-pipeline/reset-bug.sh <B-XXX> --clean --run`
207
+ - If some failed: show failed bug IDs and suggest `dev-pipeline/reset-bug.sh <B-XXX> --clean --run` for a fresh retry
208
208
 
209
209
  **If background daemon**:
210
210
  1. Verify launch:
@@ -295,15 +295,10 @@ Detect user intent from their message, then follow the corresponding workflow:
295
295
  When user says "retry B-001":
296
296
 
297
297
  ```bash
298
- dev-pipeline/retry-bugfix.sh B-001 .prizmkit/plans/bug-fix-list.json
298
+ dev-pipeline/reset-bug.sh B-001 --clean --run .prizmkit/plans/bug-fix-list.json
299
299
  ```
300
300
 
301
- **Note:** `retry-bugfix.sh` runs exactly one bug session and exits. It **preserves prior session artifacts and checkpoint state** reads `retry_count` and `resume_from_phase` from `status.json` so the AI session can resume from where it left off. For a full clean retry, use `dev-pipeline/reset-bug.sh <B-XXX> --clean --run`.
302
-
303
- Environment variables (optional):
304
- ```bash
305
- SESSION_TIMEOUT=3600 dev-pipeline/retry-bugfix.sh B-001 .prizmkit/plans/bug-fix-list.json
306
- ```
301
+ **Note:** `reset-bug.sh --clean --run` performs a full clean (deletes session history and artifacts) before retryingthis gives a fresh start.
307
302
 
308
303
  ### Error Handling
309
304
 
@@ -231,7 +231,7 @@ Detect user intent from their message, then follow the corresponding workflow:
231
231
  **If foreground**: Pipeline runs to completion in the terminal. After it finishes:
232
232
  - Summarize results: total features, succeeded, failed, skipped
233
233
  - If all succeeded: each feature session has already run `prizmkit-retrospective` internally. Ask user what's next.
234
- - If some failed: show failed feature IDs and suggest `retry-feature.sh <F-XXX>` or `reset-feature.sh <F-XXX> --clean --run`
234
+ - If some failed: show failed feature IDs and suggest `reset-feature.sh <F-XXX> --clean --run` for a fresh retry
235
235
  - **Browser verification**: If any completed features have `browser_interaction` and `playwright-cli` is installed, offer to run browser verification (see §Post-Pipeline Browser Verification)
236
236
 
237
237
  **If background daemon**:
@@ -320,26 +320,14 @@ Detect user intent from their message, then follow the corresponding workflow:
320
320
 
321
321
  #### Intent E: Retry Single Feature Node
322
322
 
323
- When user says "retry F-003":
324
-
325
- ```bash
326
- dev-pipeline/retry-feature.sh F-003 .prizmkit/plans/feature-list.json
327
- ```
328
-
329
- When user says "clean retry F-003" or "retry F-003 from scratch":
323
+ When user says "retry F-003" or "clean retry F-003":
330
324
 
331
325
  ```bash
332
326
  dev-pipeline/reset-feature.sh F-003 --clean --run .prizmkit/plans/feature-list.json
333
327
  ```
334
328
 
335
- Environment variables (optional):
336
- ```bash
337
- SESSION_TIMEOUT=3600 dev-pipeline/retry-feature.sh F-003 .prizmkit/plans/feature-list.json
338
- ```
339
-
340
329
  Notes:
341
- - `retry-feature.sh` runs exactly one feature session and exits. It **preserves prior session artifacts and checkpoint state** reads `retry_count` and `resume_from_phase` from `status.json` so the AI session can resume from where it left off rather than starting from scratch.
342
- - `reset-feature.sh --clean --run` performs a full clean (deletes session history and artifacts) before retrying — use this for a fresh start when checkpoint recovery is not desired.
330
+ - `reset-feature.sh --clean --run` performs a full clean (deletes session history and artifacts) before retryingthis gives a fresh start.
343
331
  - Keep pipeline daemon mode for main run management (`launch-feature-daemon.sh`).
344
332
 
345
333
  ---
@@ -387,7 +375,7 @@ After pipeline completion, if features have `browser_interaction` fields and `pl
387
375
  | Pipeline already running | Show status, ask if user wants to stop and restart |
388
376
  | PID file stale (process dead) | `launch-feature-daemon.sh` auto-cleans, retry start |
389
377
  | Launch failed (process died immediately) | Show last 20 lines of log: `tail -20 .prizmkit/state/features/pipeline-daemon.log` |
390
- | Feature stuck/blocked | Use `retry-feature.sh <F-XXX>` to retry; use `reset-feature.sh <F-XXX> --clean --run` for fresh start |
378
+ | Feature stuck/blocked | Use `reset-feature.sh <F-XXX> --clean --run` for a fresh retry |
391
379
  | All features blocked/failed | Show status, suggest daemon-safe recovery: `dev-pipeline/reset-feature.sh <F-XXX> --clean --run .prizmkit/plans/feature-list.json` |
392
380
  | `playwright-cli` not installed | Browser verification skipped (non-blocking). Suggest: `npm install -g @playwright/cli@latest && playwright-cli install --skills` |
393
381
  | Permission denied on script | Run `chmod +x dev-pipeline/launch-feature-daemon.sh dev-pipeline/run-feature.sh` |
@@ -80,13 +80,13 @@ Do NOT use this skill when:
80
80
  If the script is not available, perform these manual validation checks:
81
81
  1. **ID sequence**: All feature IDs are sequential (F-001, F-002, F-003, ...)
82
82
  2. **No circular dependencies**: No feature depends (directly or transitively) on itself
83
- 3. **Description length**: Minimum 15 words per description (error), 30/50/80 recommended
83
+ 3. **Description length**: Minimum 15 words per description (error), recommended minimum 30/50/80/100+ for low/medium/high/critical (warning). No upper limit — more detail is always better
84
84
  4. **Dependency references**: All referenced features in dependencies exist in features array
85
85
  5. **Priority enums**: All priority values are exactly "critical", "high", "medium", or "low" (case-sensitive)
86
86
  6. **Status enum**: All status values are one of: pending, in_progress, completed, failed, skipped, split, auto_skipped
87
87
  7. **Acceptance criteria**: At least 1 criterion per feature, each is a concrete, measurable statement
88
- 8. **Browser interaction**: If present, has url, verify_steps array, and optional setup_command
89
- 9. **Complexity enum**: If present, is one of: low, medium, high
88
+ 8. **Browser interaction**: If present, has verify_steps array (optional — AI auto-detects dev server, URL, port at runtime)
89
+ 9. **Complexity enum**: If present, is one of: low, medium, high, critical
90
90
  10. **Model field**: If present, is a non-empty string
91
91
  11. **Critic field**: If present, is boolean; if true, critic_count should be 1 or 3
92
92
  12. **Root schema**: Has $schema='dev-pipeline-feature-list-v1', project_name, and non-empty features array
@@ -258,7 +258,11 @@ Key requirements:
258
258
  - English feature titles for stable slug generation
259
259
  - `critic` / `critic_count` defaults per Testing Defaults section
260
260
  - `browser_interaction` auto-generated for qualifying frontend features
261
- - descriptions: minimum 15 words (error), recommended 30/50/80 for low/medium/high complexity (warning)
261
+ - descriptions: minimum 15 words (error), recommended minimum 30/50/80/100+ for low/medium/high/critical (warning). No upper limit — more detail prevents AI guessing
262
+ - `estimated_complexity` determines pipeline execution tier:
263
+ - `low` / `medium` → **lite** (single agent, no subagents)
264
+ - `high` → **standard** (orchestrator + dev + reviewer, 3 agents)
265
+ - `critical` → **full** (full team + critic agents, 5 agents). Use for: architectural changes touching 10+ files, cross-module refactoring with API surface changes, features requiring multi-critic voting
262
266
 
263
267
  Run the validation script after generation:
264
268
  ```bash
@@ -12,13 +12,16 @@ Feature descriptions are the **primary input** for autonomous pipeline sessions.
12
12
 
13
13
  ### Minimum Word Counts
14
14
 
15
- | Complexity | Minimum Words | Warning Threshold |
16
- |------------|---------------|-------------------|
17
- | low | 15 | 30 |
18
- | medium | 15 | 50 |
19
- | high | 15 | 80 |
15
+ | Complexity | Hard Minimum (error) | Recommended Minimum (warning below) |
16
+ |------------|---------------------|-------------------------------------|
17
+ | low | 15 | 30+ |
18
+ | medium | 15 | 50+ |
19
+ | high | 15 | 80+ |
20
+ | critical | 15 | 100+ |
20
21
 
21
- Below 15 words is a validation error. Below the threshold triggers a warning.
22
+ Below 15 words is a validation error. Below the recommended minimum triggers a warning.
23
+
24
+ **There is NO upper limit** — the more detail the better. Rich descriptions prevent the AI from guessing, producing higher quality code. Always aim to describe the feature as thoroughly as possible: what to build, how it should behave, what data it touches, and what edge cases to handle.
22
25
 
23
26
  ### What to Include
24
27
 
@@ -113,11 +116,12 @@ Then [expected outcome]
113
116
 
114
117
  ## Complexity Estimation Guide
115
118
 
116
- | Complexity | Characteristics | Typical Scope |
117
- |------------|----------------|---------------|
118
- | low | Single module, straightforward CRUD, minimal UI | 1-2 API endpoints, 1-2 pages |
119
- | medium | Multiple modules, business logic, moderate UI | 3-5 API endpoints, 2-4 pages |
120
- | high | Cross-cutting concerns, complex state, advanced UI | 5+ API endpoints, complex interactions |
119
+ | Complexity | Characteristics | Typical Scope | Pipeline Tier |
120
+ |------------|----------------|---------------|---------------|
121
+ | low | Single module, straightforward CRUD, minimal UI | 1-2 API endpoints, 1-2 pages | lite (1 agent) |
122
+ | medium | Multiple modules, business logic, moderate UI | 3-5 API endpoints, 2-4 pages | lite (1 agent) |
123
+ | high | Cross-cutting concerns, complex state, advanced UI | 5+ API endpoints, complex interactions | standard (3 agents) |
124
+ | critical | Architectural changes, 10+ files, multi-module API surface changes | System-wide refactoring, new infrastructure + app logic | full (5 agents + critic) |
121
125
 
122
126
  ### Complexity Red Flags
123
127
 
@@ -134,6 +138,7 @@ Consider splitting a feature if it exhibits any of the following:
134
138
 
135
139
  - If a feature is marked as "low" complexity, it should not have more than 5 acceptance criteria.
136
140
  - If a feature is marked as "high" complexity, it should have a clear justification (e.g., "involves payment processing with webhook handling and idempotency").
141
+ - Use "critical" complexity only for features requiring architectural changes that touch 10+ files, involve cross-module API surface changes, or need multi-critic voting for safety.
137
142
  - When in doubt, estimate higher -- it is better to over-allocate than to under-allocate.
138
143
 
139
144
  ---
@@ -11,23 +11,24 @@ For each qualifying feature, generate the `browser_interaction` object:
11
11
  ```json
12
12
  {
13
13
  "browser_interaction": {
14
- "url": "http://localhost:3000/login",
15
- "setup_command": "npm run dev",
16
14
  "verify_steps": [
17
15
  "Verify login form renders with email and password fields",
18
16
  "Verify valid credentials redirect to dashboard",
19
17
  "Verify invalid password shows error message"
20
- ],
21
- "screenshot": true
18
+ ]
22
19
  }
23
20
  }
24
21
  ```
25
22
 
26
23
  ## Field Rules
27
24
 
28
- - `url` is required the page URL to verify
29
- - `setup_command` is optional command to start dev server (omit if already running)
30
- - `verify_steps` are **verification goals**, not specific playwright-cli commands. Describe WHAT to verify, not HOW to verify it. The pipeline AI will read the actual code, use `playwright-cli snapshot` to discover real element refs, and decide the concrete click/fill/assert operations itself. This works better than prescribing steps at planning time because: (1) element refs don't exist yet, (2) UI structure may change during implementation, (3) the AI has full context of the actual code when it runs verification.
25
+ - `verify_steps` are **verification goals**, not specific playwright-cli commands. Describe WHAT to verify, not HOW to verify it. The pipeline AI will:
26
+ 1. Auto-detect the dev server start command from project config (`package.json`, `Makefile`, etc.)
27
+ 2. Start the server and discover the URL/port at runtime
28
+ 3. Use `playwright-cli snapshot` to discover real element refs
29
+ 4. Decide the concrete click/fill/assert operations itself
30
+ This works better than prescribing URLs/commands at planning time because: (1) ports may differ across environments, (2) element refs don't exist yet, (3) UI structure may change during implementation, (4) the AI has full context of the actual code when it runs verification.
31
31
  - **Good**: `"Verify login form accepts valid credentials and redirects to dashboard"`
32
32
  - **Bad**: `"click <ref> — click login button"` (guesses at refs that don't exist yet)
33
- - `screenshot` defaults to `true` — capture final state for human review
33
+ - Do NOT specify `url`, `setup_command`, or `port` — the AI detects these at runtime from the actual project configuration
34
+ - An empty `browser_interaction: {}` object (no verify_steps) is valid — the AI will explore the app and verify the feature works as expected
@@ -7,7 +7,7 @@ Before generating `.prizmkit/plans/feature-list.json`, review the full feature s
7
7
  For each feature, evaluate against the word-count thresholds in `planning-guide.md`:
8
8
  - Does the description cover: what to build, key behaviors, integration points, data model (if applicable), error/edge cases?
9
9
  - Is the description specific enough for an AI coding session to implement without guessing?
10
- - Flag any feature below the recommended word count for its complexity level (30/50/80 words for low/medium/high).
10
+ - Flag any feature below the recommended minimum word count for its complexity level (30+/50+/80+/100+ words for low/medium/high/critical). There is no upper limit — more detail is always better.
11
11
 
12
12
  **Implementation clarity check** — Every feature description will be consumed by an autonomous AI session. Verify each description specifies:
13
13
  1. Concrete deliverables (files to create, endpoints to build, components to implement, models to define)
@@ -26,7 +26,7 @@ Group errors by type and apply targeted fixes:
26
26
  | **Dependency errors** | Circular dependency, undefined target features | "Show cycle chain (e.g., `F-003 → F-005 → F-003`), suggest break point" | No |
27
27
  | **Missing fields** | Feature missing required keys (title, description, AC) | "List each feature + missing keys, guide patch" | Partial |
28
28
  | **Insufficient AC** | Feature has <2 acceptance criteria | "Show feature, suggest AC examples" | No |
29
- | **Invalid values** | complexity not in [low/medium/high], status not pending | "Show field, valid values" | Yes |
29
+ | **Invalid values** | complexity not in [low/medium/high/critical], status not pending | "Show field, valid values" | Yes |
30
30
 
31
31
  ### Execution
32
32
 
@@ -73,7 +73,7 @@ For each new feature:
73
73
  - keep title in English
74
74
  - **write rich descriptions** (see `planning-guide.md` §4):
75
75
  - minimum 15 words (validation error below this)
76
- - recommended: 30+ words (low), 50+ words (medium), 80+ words (high complexity)
76
+ - recommended minimum: 30+ (low), 50+ (medium), 80+ (high), 100+ (critical) — no upper limit, more detail is always better
77
77
  - include: what to build, key behaviors, integration points, data model, error/edge cases
78
78
 
79
79
  ### Step 4: Rebalance Priority
@@ -33,7 +33,7 @@ from datetime import datetime, timezone
33
33
  SCHEMA_VERSION = "dev-pipeline-feature-list-v1"
34
34
 
35
35
  VALID_STATUSES = {"pending", "in_progress", "completed", "failed", "skipped", "split", "auto_skipped"}
36
- VALID_COMPLEXITIES = {"low", "medium", "high"}
36
+ VALID_COMPLEXITIES = {"low", "medium", "high", "critical"}
37
37
  VALID_PRIORITIES = {"critical", "high", "medium", "low"}
38
38
  VALID_GRANULARITIES = {"feature", "sub_feature", "auto"}
39
39
  VALID_PLANNING_MODES = {"new", "incremental"}
@@ -206,7 +206,7 @@ def validate_feature_list(data, planning_mode="new"):
206
206
 
207
207
  seen_ids = set()
208
208
  priorities = []
209
- complexity_dist = {"low": 0, "medium": 0, "high": 0}
209
+ complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
210
210
  total_sub_features = 0
211
211
 
212
212
  for idx, feat in enumerate(features):
@@ -244,7 +244,9 @@ def validate_feature_list(data, planning_mode="new"):
244
244
  if isinstance(desc, str) and desc.strip():
245
245
  word_count = len(desc.split())
246
246
  complexity = feat.get("estimated_complexity", "medium")
247
- min_words = {"low": 30, "medium": 50, "high": 80}.get(complexity, 50)
247
+ min_words = {
248
+ "low": 30, "medium": 50, "high": 80, "critical": 100,
249
+ }.get(complexity, 50)
248
250
  if word_count < 15:
249
251
  errors.append(
250
252
  "{}: description too short ({} words, minimum 15). "
@@ -580,7 +582,7 @@ def generate_summary_markdown(data):
580
582
  lines.append("")
581
583
 
582
584
  # Statistics
583
- complexity_dist = {"low": 0, "medium": 0, "high": 0}
585
+ complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
584
586
  total_sub = 0
585
587
  for feat in features:
586
588
  c = feat.get("estimated_complexity")
@@ -596,8 +598,9 @@ def generate_summary_markdown(data):
596
598
  lines.append("- Total features: {}".format(len(features)))
597
599
  if total_sub > 0:
598
600
  lines.append("- Total sub-features: {}".format(total_sub))
599
- lines.append("- Complexity: {} low, {} medium, {} high".format(
600
- complexity_dist["low"], complexity_dist["medium"], complexity_dist["high"]
601
+ lines.append("- Complexity: {} low, {} medium, {} high, {} critical".format(
602
+ complexity_dist["low"], complexity_dist["medium"],
603
+ complexity_dist["high"], complexity_dist["critical"]
601
604
  ))
602
605
  lines.append("- Max dependency depth: {}".format(max_depth))
603
606
 
@@ -608,7 +611,7 @@ def generate_summary_json(data):
608
611
  """Generate a JSON summary of the feature list."""
609
612
  features = data.get("features", [])
610
613
 
611
- complexity_dist = {"low": 0, "medium": 0, "high": 0}
614
+ complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
612
615
  total_sub = 0
613
616
  for feat in features:
614
617
  c = feat.get("estimated_complexity")
@@ -16,7 +16,7 @@ User says:
16
16
  - "Don't want to restart from scratch"
17
17
 
18
18
  **Do NOT use when:**
19
- - Pipeline interrupted → use `retry-feature.sh` / `retry-bugfix.sh`
19
+ - Pipeline interrupted → use `reset-feature.sh --clean --run` / `reset-bug.sh --clean --run` for a fresh retry
20
20
  - User wants a clean restart → use the original workflow skill directly (`/feature-workflow`, `/bug-fix-workflow`, `/refactor-workflow`)
21
21
  - Nothing was ever started → use the original workflow skill
22
22
 
@@ -257,8 +257,8 @@ Recovery complete.
257
257
  | `bug-fix-workflow` | **Recovery target** — this skill can resume interrupted bug-fix-workflow sessions |
258
258
  | `refactor-workflow` | **Recovery target** — this skill can resume interrupted refactor-workflow sessions |
259
259
  | `feature-pipeline-launcher` | **Called in Phase 2.2** — launches or checks pipeline status for feature recovery |
260
- | `retry-feature.sh` | **Alternative** — full clean retry for pipeline failures; this skill is the smart interactive alternative |
261
- | `retry-bugfix.sh` | **Alternative** — full clean retry for bugfix pipeline failures |
260
+ | `reset-feature.sh --clean --run` | **Alternative** — full clean retry for pipeline failures; this skill is the smart interactive alternative |
261
+ | `reset-bug.sh --clean --run` | **Alternative** — full clean retry for bugfix pipeline failures |
262
262
  | `/prizmkit-code-review` | **Called in Phase 2.1** — reviews recovered bug-fix code |
263
263
  | `/prizmkit-committer` | **Called in Phase 2.1** — commits the recovered result |
264
264