prizmkit 1.1.10 → 1.1.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bundled/VERSION.json +3 -3
- package/bundled/dev-pipeline/README.md +10 -46
- package/bundled/dev-pipeline/reset-bug.sh +84 -10
- package/bundled/dev-pipeline/reset-feature.sh +86 -10
- package/bundled/dev-pipeline/reset-refactor.sh +68 -4
- package/bundled/dev-pipeline/scripts/generate-bootstrap-prompt.py +47 -46
- package/bundled/dev-pipeline/scripts/generate-bugfix-prompt.py +7 -12
- package/bundled/dev-pipeline/scripts/generate-refactor-prompt.py +124 -20
- package/bundled/dev-pipeline/scripts/utils.py +20 -0
- package/bundled/dev-pipeline/templates/agent-prompts/dev-implement.md +13 -7
- package/bundled/dev-pipeline/templates/bootstrap-tier1.md +62 -66
- package/bundled/dev-pipeline/templates/bootstrap-tier2.md +37 -40
- package/bundled/dev-pipeline/templates/bootstrap-tier3.md +35 -48
- package/bundled/dev-pipeline/templates/bugfix-bootstrap-prompt.md +135 -182
- package/bundled/dev-pipeline/templates/feature-list-schema.json +6 -21
- package/bundled/dev-pipeline/templates/refactor-bootstrap-prompt.md +9 -9
- package/bundled/dev-pipeline/templates/sections/context-budget-rules.md +1 -1
- package/bundled/dev-pipeline/templates/sections/feature-context.md +4 -0
- package/bundled/dev-pipeline/templates/sections/phase-browser-verification.md +41 -24
- package/bundled/dev-pipeline/templates/sections/phase-commit-full.md +4 -12
- package/bundled/dev-pipeline/templates/sections/phase-deploy-verification.md +9 -17
- package/bundled/dev-pipeline/templates/sections/phase-implement-lite.md +1 -1
- package/bundled/dev-pipeline/templates/sections/phase-plan-agent.md +3 -2
- package/bundled/dev-pipeline/templates/sections/phase-plan-lite.md +4 -2
- package/bundled/dev-pipeline/templates/sections/phase-specify-plan-full.md +0 -18
- package/bundled/dev-pipeline/templates/sections/session-context.md +1 -2
- package/bundled/dev-pipeline/templates/sections/test-failure-recovery-agent.md +75 -0
- package/bundled/dev-pipeline/templates/sections/test-failure-recovery-lite.md +66 -0
- package/bundled/skills/_metadata.json +1 -1
- package/bundled/skills/bugfix-pipeline-launcher/SKILL.md +3 -8
- package/bundled/skills/feature-pipeline-launcher/SKILL.md +4 -16
- package/bundled/skills/feature-planner/SKILL.md +8 -4
- package/bundled/skills/feature-planner/assets/planning-guide.md +16 -11
- package/bundled/skills/feature-planner/references/browser-interaction.md +9 -8
- package/bundled/skills/feature-planner/references/completeness-review.md +1 -1
- package/bundled/skills/feature-planner/references/error-recovery.md +1 -1
- package/bundled/skills/feature-planner/references/incremental-feature-planning.md +1 -1
- package/bundled/skills/feature-planner/scripts/validate-and-generate.py +10 -7
- package/bundled/skills/recovery-workflow/SKILL.md +3 -3
- package/bundled/skills/refactor-pipeline-launcher/SKILL.md +4 -15
- package/package.json +1 -1
- package/bundled/dev-pipeline/retry-bugfix.sh +0 -429
- package/bundled/dev-pipeline/retry-feature.sh +0 -445
- package/bundled/dev-pipeline/retry-refactor.sh +0 -441
- package/bundled/dev-pipeline/templates/sections/failure-log-check.md +0 -9
- package/bundled/dev-pipeline/templates/sections/resume-header.md +0 -5
- package/bundled/dev-pipeline/templates/sections/test-failure-recovery.md +0 -75
|
@@ -1,12 +1,5 @@
|
|
|
1
1
|
### Architecture Sync & Commit (SINGLE COMMIT) — DO NOT SKIP
|
|
2
2
|
|
|
3
|
-
**Bug Fix Documentation Policy**:
|
|
4
|
-
- DEFAULT: Run `/prizmkit-retrospective` with structural sync only (update file counts, interfaces, dependencies). Skip knowledge injection.
|
|
5
|
-
- UPDATE DOCS (run full retrospective — Job 1 + Job 2) when bug fix causes: interface signature changes, dependency additions/removals, observable behavior changes to existing features, or newly discovered TRAPs.
|
|
6
|
-
- Simple bugs: No new spec.md/plan.md needed. Use fast path.
|
|
7
|
-
- Complex bugs (multi-module, cascading): Use `/prizmkit-plan` with `artifact_dir=.prizmkit/bugfix/<BUG_ID>/`.
|
|
8
|
-
- Commit prefix: `fix(<scope>):` (not `feat:`).
|
|
9
|
-
|
|
10
3
|
**a.** Check if feature already committed:
|
|
11
4
|
```bash
|
|
12
5
|
git log --oneline | grep "{{FEATURE_ID}}" | head -3
|
|
@@ -15,12 +8,11 @@ git log --oneline | grep "{{FEATURE_ID}}" | head -3
|
|
|
15
8
|
- If no existing commit → proceed normally with b–d.
|
|
16
9
|
|
|
17
10
|
**b.** Run `/prizmkit-retrospective` (**before commit**, maintains `.prizm-docs/` architecture index):
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
11
|
+
1. **Structural sync**: update KEY_FILES/INTERFACES/DEPENDENCIES/file counts for changed modules
|
|
12
|
+
2. **Architecture knowledge**: extract TRAPS, RULES, DECISIONS from completed work into `.prizm-docs/`
|
|
13
|
+
3. **L2 coverage check**: For any module/sub-module with source files created or significantly modified in this session but no L2 `.prizm` doc — evaluate whether L2 is warranted and create if so. The current session has the best context for accurate KEY_FILES, TRAPS, and DECISIONS.
|
|
14
|
+
4. Stage doc changes: `git add .prizm-docs/`
|
|
22
15
|
⚠️ Do NOT commit here. Only stage.
|
|
23
|
-
- **For bug-fix sessions**: structural sync (Job 1) by default. Run knowledge injection (Job 2) when the fix causes interface signature changes, dependency additions/removals, observable behavior changes, or reveals new TRAPs
|
|
24
16
|
|
|
25
17
|
**c.** Stage all feature code explicitly (NEVER use `git add -A` or `git add .`):
|
|
26
18
|
```bash
|
|
@@ -7,23 +7,7 @@ You just implemented this feature — you know the project's tech stack and buil
|
|
|
7
7
|
3. **Assess and record** — append to context-snapshot.md:
|
|
8
8
|
- **ALL builds pass** → `## Deploy Verification: PASS` — proceed to commit
|
|
9
9
|
- **Some builds fail with fixable errors** → fix and re-verify (already handled in step 2)
|
|
10
|
-
- **Cannot build locally** (missing system-level deps you cannot install) →
|
|
11
|
-
```
|
|
12
|
-
# Local Development Setup
|
|
13
|
-
|
|
14
|
-
## Prerequisites
|
|
15
|
-
- [tool]: [install instruction]
|
|
16
|
-
|
|
17
|
-
## Build Steps
|
|
18
|
-
1. [exact command]
|
|
19
|
-
|
|
20
|
-
## Run / Dev Mode
|
|
21
|
-
[exact command to start the app locally]
|
|
22
|
-
|
|
23
|
-
## Verify
|
|
24
|
-
[how to confirm the app is running correctly]
|
|
25
|
-
```
|
|
26
|
-
Record: `## Deploy Verification: PARTIAL — see .prizmkit/deploy.md for missing prerequisites`
|
|
10
|
+
- **Cannot build locally** (missing system-level deps you cannot install) → Record: `## Deploy Verification: PARTIAL — missing system deps (see below)`
|
|
27
11
|
|
|
28
12
|
Deploy verification does NOT block the commit, but you MUST attempt it.
|
|
29
13
|
|
|
@@ -35,5 +19,13 @@ Deploy verification does NOT block the commit, but you MUST attempt it.
|
|
|
35
19
|
|
|
36
20
|
If the project cannot be started locally (e.g., requires external services, databases, credentials), skip the smoke test and note why.
|
|
37
21
|
|
|
22
|
+
**Deploy documentation update** — Run `/prizmkit-deploy` ONLY if this feature introduced new infrastructure or deployment-affecting changes:
|
|
23
|
+
- New database, cache, message queue, or external service dependency
|
|
24
|
+
- New environment variables required
|
|
25
|
+
- New build steps or deployment configuration (Dockerfile, CI/CD, cloud config)
|
|
26
|
+
- Changed ports, protocols, or service topology
|
|
27
|
+
|
|
28
|
+
If none of the above apply (pure application logic change), skip `/prizmkit-deploy`.
|
|
29
|
+
|
|
38
30
|
|
|
39
31
|
**Checkpoint update**: Update `workflow-checkpoint.json` — set step `deploy-verification` to `"completed"`.
|
|
@@ -27,7 +27,7 @@ $TEST_CMD 2>&1 | tee /tmp/test-baseline.txt | tail -20
|
|
|
27
27
|
1. All tasks in plan.md are `[x]`
|
|
28
28
|
2. Run the full test suite to ensure nothing is broken
|
|
29
29
|
3. Verify each acceptance criterion from Section 1 of context-snapshot.md is met — check mentally, do NOT re-read files you already wrote
|
|
30
|
-
4. If any criterion is not met, fix it now (
|
|
30
|
+
4. If any criterion is not met, fix it now using the convergence-based recovery loop (see Test Failure Recovery Protocol)
|
|
31
31
|
|
|
32
32
|
**CP-2**: All acceptance criteria met, all tests pass.
|
|
33
33
|
|
|
@@ -4,8 +4,9 @@
|
|
|
4
4
|
ls .prizmkit/specs/{{FEATURE_SLUG}}/plan.md 2>/dev/null
|
|
5
5
|
```
|
|
6
6
|
|
|
7
|
-
If missing,
|
|
8
|
-
-
|
|
7
|
+
If missing, run `/prizmkit-plan` with `artifact_dir=.prizmkit/specs/{{FEATURE_SLUG}}/` to generate `plan.md`:
|
|
8
|
+
- The plan.md should include: architecture — components, interfaces, data flow, files to create/modify, testing approach, and a Tasks section with `[ ]` checkboxes ordered by dependency.
|
|
9
|
+
- Resolve any `[NEEDS CLARIFICATION]` markers using the feature description — do NOT pause for interactive input.
|
|
9
10
|
|
|
10
11
|
**Database Design Gate** (if feature involves data persistence — new tables, schema changes, new entities):
|
|
11
12
|
Before proceeding past CP-1, verify:
|
|
@@ -4,8 +4,10 @@
|
|
|
4
4
|
ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
|
|
5
5
|
```
|
|
6
6
|
|
|
7
|
-
If plan.md missing,
|
|
8
|
-
-
|
|
7
|
+
If plan.md missing, run `/prizmkit-plan` with `artifact_dir=.prizmkit/specs/{{FEATURE_SLUG}}/`:
|
|
8
|
+
- Pass the feature description and acceptance criteria from the Feature Context section above as input
|
|
9
|
+
- The plan.md should include: key components, data flow, files to create/modify, and a Tasks section with `[ ]` checkboxes (each task = one implementable unit). Keep under 80 lines.
|
|
10
|
+
- Resolve any `[NEEDS CLARIFICATION]` markers using the feature description — do NOT pause for interactive input.
|
|
9
11
|
|
|
10
12
|
**Database Design Gate** (if feature involves data persistence — new tables, schema changes, new entities):
|
|
11
13
|
Before proceeding past CP-1:
|
|
@@ -1,14 +1,5 @@
|
|
|
1
1
|
### Specify + Plan (Full Workflow)
|
|
2
2
|
|
|
3
|
-
**Check for previous failure log:**
|
|
4
|
-
```bash
|
|
5
|
-
cat .prizmkit/specs/{{FEATURE_SLUG}}/failure-log.md 2>/dev/null || echo "NO_PREVIOUS_FAILURE"
|
|
6
|
-
```
|
|
7
|
-
If failure-log.md exists:
|
|
8
|
-
- Read ROOT_CAUSE and SUGGESTION — adjust your approach accordingly
|
|
9
|
-
- Read DISCOVERED_TRAPS — if any are genuine, inject into .prizm-docs/ during Phase 6 retrospective
|
|
10
|
-
- Do NOT delete failure-log.md until this session completes all phases and commits successfully
|
|
11
|
-
|
|
12
3
|
Check existing artifacts first:
|
|
13
4
|
```bash
|
|
14
5
|
ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
|
|
@@ -18,15 +9,6 @@ ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
|
|
|
18
9
|
- `context-snapshot.md` exists → use it directly, skip context snapshot building
|
|
19
10
|
- Some missing → generate only missing files
|
|
20
11
|
|
|
21
|
-
Before planning, check whether feature code already exists in the project (search in source directories identified from `root.prizm` or the project tree scan):
|
|
22
|
-
```bash
|
|
23
|
-
grep -r "{{FEATURE_SLUG}}" . --include="*.js" --include="*.ts" --include="*.py" --include="*.go" --include="*.java" --include="*.rb" --include="*.rs" -l --exclude-dir=node_modules --exclude-dir=.git --exclude-dir=dist --exclude-dir=build --exclude-dir=vendor --exclude-dir=.prizmkit 2>/dev/null | head -20
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
Record result as `EXISTING_CODE` (list of files, or empty).
|
|
27
|
-
|
|
28
|
-
If `EXISTING_CODE` is non-empty: your spec/plan/tasks must reflect this existing implementation — document what exists, identify gaps, do NOT re-implement what is already done.
|
|
29
|
-
|
|
30
12
|
**Step A — Build Context Snapshot** (skip if `context-snapshot.md` already exists):
|
|
31
13
|
|
|
32
14
|
1. Read `.prizm-docs/root.prizm` and relevant L1/L2 prizm docs
|
|
@@ -1,6 +1,5 @@
|
|
|
1
1
|
## Session Context
|
|
2
2
|
|
|
3
3
|
- **Feature ID**: {{FEATURE_ID}} | **Session**: {{SESSION_ID}} | **Run**: {{RUN_ID}}
|
|
4
|
-
- **Complexity**: {{COMPLEXITY}}
|
|
5
|
-
- **Previous Status**: {{PREV_SESSION_STATUS}} | **Resume From**: {{RESUME_PHASE}}
|
|
4
|
+
- **Complexity**: {{COMPLEXITY}}
|
|
6
5
|
- **Init**: {{INIT_DONE}} | Artifacts: spec={{HAS_SPEC}} plan={{HAS_PLAN}}
|
|
@@ -0,0 +1,75 @@
|
|
|
1
|
+
## Test Failure Recovery Protocol
|
|
2
|
+
|
|
3
|
+
When tests fail during implementation (Phase 3 / Phase 4), use **convergence-based recovery** — keep fixing as long as progress is being made.
|
|
4
|
+
|
|
5
|
+
### Recovery Loop
|
|
6
|
+
|
|
7
|
+
1. **Run tests and record results**:
|
|
8
|
+
- Count total failures and note which tests failed
|
|
9
|
+
- Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
|
|
10
|
+
|
|
11
|
+
2. **Check termination conditions** (evaluate BEFORE each fix attempt):
|
|
12
|
+
- **All tests pass** → Done. Exit recovery loop.
|
|
13
|
+
- **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
|
|
14
|
+
- **Still making progress** — failure count decreased compared to previous round → Continue fixing.
|
|
15
|
+
- **First round** — no history yet → Proceed to fix.
|
|
16
|
+
|
|
17
|
+
3. **Fix and iterate**:
|
|
18
|
+
- Analyze remaining failures: root cause (code bug vs. test brittleness vs. environment issue)
|
|
19
|
+
- Categorize:
|
|
20
|
+
- **Pre-existing baseline failure**: Expected, do NOT fix
|
|
21
|
+
- **New regression**: Fix the code
|
|
22
|
+
- **Brittle test**: Fix the test or environment setup
|
|
23
|
+
- Apply fix, re-run `$TEST_CMD`, go back to step 1
|
|
24
|
+
|
|
25
|
+
### Convergence Tracking
|
|
26
|
+
|
|
27
|
+
Maintain a mental (or logged) record each round:
|
|
28
|
+
|
|
29
|
+
```
|
|
30
|
+
Round 1: 5 failures [test_a, test_b, test_c, test_d, test_e]
|
|
31
|
+
Round 2: 3 failures [test_b, test_d, test_e] ← progress, continue
|
|
32
|
+
Round 3: 3 failures [test_b, test_d, test_e] ← same as round 2 (plateau 1/3)
|
|
33
|
+
Round 4: 3 failures [test_b, test_d, test_e] ← plateau 2/3
|
|
34
|
+
Round 5: 3 failures [test_b, test_d, test_e] ← plateau 3/3 → STOP
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Key rule**: If failures decrease (even by 1), the plateau counter resets to 0.
|
|
38
|
+
|
|
39
|
+
### Escalation — Dev + Reviewer Workflow
|
|
40
|
+
|
|
41
|
+
When the recovery loop exits with remaining failures:
|
|
42
|
+
- Dev appends failure details to Implementation Log
|
|
43
|
+
- Reviewer agent runs full test suite in Phase 5
|
|
44
|
+
- If Reviewer confirms NEW regressions (not in baseline): mark verdict as `NEEDS_FIXES`
|
|
45
|
+
- If Reviewer confirms only baseline failures remain: proceed with `PASS_WITH_WARNINGS`
|
|
46
|
+
|
|
47
|
+
### Context-Aware Test Re-run (Performance Optimization)
|
|
48
|
+
|
|
49
|
+
**Skip redundant re-runs**:
|
|
50
|
+
- If Implementation Log section in context-snapshot.md already confirms "all tests passing"
|
|
51
|
+
- → Skip Phase 5 test suite re-run (Reviewer will verify baseline log instead)
|
|
52
|
+
- This avoids rebuilding/re-running tests when already verified
|
|
53
|
+
|
|
54
|
+
**When to re-run**:
|
|
55
|
+
- If Implementation Log is missing or incomplete
|
|
56
|
+
- If any new code was added after the last test run
|
|
57
|
+
- If Reviewer suspects brittleness or environment drift
|
|
58
|
+
|
|
59
|
+
### Failure Capture Rules
|
|
60
|
+
|
|
61
|
+
If tests remain broken after recovery:
|
|
62
|
+
|
|
63
|
+
```
|
|
64
|
+
## Test Failures Encountered
|
|
65
|
+
|
|
66
|
+
- **Test**: [test name/path]
|
|
67
|
+
- Root Cause: [explanation]
|
|
68
|
+
- Category: [pre-existing baseline | new regression | brittle test | environment]
|
|
69
|
+
- Rounds Attempted: [N rounds, plateau at round M]
|
|
70
|
+
- Status: [still failing | requires next session | known limitation]
|
|
71
|
+
|
|
72
|
+
- **Impact on Feature**: [can AC be verified despite failure | blocks AC verification]
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
**Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
## Test Failure Recovery Protocol
|
|
2
|
+
|
|
3
|
+
When tests fail during implementation, use **convergence-based recovery** — keep fixing as long as progress is being made.
|
|
4
|
+
|
|
5
|
+
### Recovery Loop
|
|
6
|
+
|
|
7
|
+
1. **Run tests and record results**:
|
|
8
|
+
- Count total failures and note which tests failed
|
|
9
|
+
- Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
|
|
10
|
+
|
|
11
|
+
2. **Check termination conditions** (evaluate BEFORE each fix attempt):
|
|
12
|
+
- **All tests pass** → Done. Exit recovery loop.
|
|
13
|
+
- **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
|
|
14
|
+
- **Still making progress** — failure count decreased compared to previous round → Continue fixing.
|
|
15
|
+
- **First round** — no history yet → Proceed to fix.
|
|
16
|
+
|
|
17
|
+
3. **Fix and iterate**:
|
|
18
|
+
- Analyze remaining failures: root cause (code bug vs. test brittleness vs. environment issue)
|
|
19
|
+
- Categorize:
|
|
20
|
+
- **Pre-existing baseline failure**: Expected, do NOT fix
|
|
21
|
+
- **New regression**: Fix the code
|
|
22
|
+
- **Brittle test**: Fix the test or environment setup
|
|
23
|
+
- Apply fix, re-run `$TEST_CMD`, go back to step 1
|
|
24
|
+
|
|
25
|
+
### Convergence Tracking
|
|
26
|
+
|
|
27
|
+
Maintain a mental (or logged) record each round:
|
|
28
|
+
|
|
29
|
+
```
|
|
30
|
+
Round 1: 5 failures [test_a, test_b, test_c, test_d, test_e]
|
|
31
|
+
Round 2: 3 failures [test_b, test_d, test_e] ← progress, continue
|
|
32
|
+
Round 3: 3 failures [test_b, test_d, test_e] ← same as round 2 (plateau 1/3)
|
|
33
|
+
Round 4: 3 failures [test_b, test_d, test_e] ← plateau 2/3
|
|
34
|
+
Round 5: 3 failures [test_b, test_d, test_e] ← plateau 3/3 → STOP
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
**Key rule**: If failures decrease (even by 1), the plateau counter resets to 0.
|
|
38
|
+
|
|
39
|
+
### Escalation — Single Agent
|
|
40
|
+
|
|
41
|
+
When the recovery loop exits with remaining failures:
|
|
42
|
+
- Document all remaining failures in Implementation Log with root cause analysis
|
|
43
|
+
- Record PARTIAL status with known failure list
|
|
44
|
+
- **Do NOT block commit** — unresolved test failures are deferred to next session
|
|
45
|
+
|
|
46
|
+
### Context-Aware Optimization
|
|
47
|
+
|
|
48
|
+
**Skip redundant re-runs**: If Implementation Log already confirms "all tests passing", skip full suite re-run.
|
|
49
|
+
|
|
50
|
+
### Failure Capture Rules
|
|
51
|
+
|
|
52
|
+
If tests remain broken after recovery:
|
|
53
|
+
|
|
54
|
+
```
|
|
55
|
+
## Test Failures Encountered
|
|
56
|
+
|
|
57
|
+
- **Test**: [test name/path]
|
|
58
|
+
- Root Cause: [explanation]
|
|
59
|
+
- Category: [pre-existing baseline | new regression | brittle test | environment]
|
|
60
|
+
- Rounds Attempted: [N rounds, plateau at round M]
|
|
61
|
+
- Status: [still failing | requires next session | known limitation]
|
|
62
|
+
|
|
63
|
+
- **Impact on Feature**: [can AC be verified despite failure | blocks AC verification]
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
**Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.
|
|
@@ -204,7 +204,7 @@ Detect user intent from their message, then follow the corresponding workflow:
|
|
|
204
204
|
**If foreground**: Pipeline runs to completion in the terminal. After it finishes:
|
|
205
205
|
- Summarize results: total bugs, fixed, failed, skipped
|
|
206
206
|
- If all fixed: each bug session has already run `prizmkit-retrospective` internally (structural sync by default; full retrospective when the fix changed interfaces, dependencies, or observable behavior). Ask user what's next.
|
|
207
|
-
- If some failed: show failed bug IDs and suggest `
|
|
207
|
+
- If some failed: show failed bug IDs and suggest `dev-pipeline/reset-bug.sh <B-XXX> --clean --run` for a fresh retry
|
|
208
208
|
|
|
209
209
|
**If background daemon**:
|
|
210
210
|
1. Verify launch:
|
|
@@ -295,15 +295,10 @@ Detect user intent from their message, then follow the corresponding workflow:
|
|
|
295
295
|
When user says "retry B-001":
|
|
296
296
|
|
|
297
297
|
```bash
|
|
298
|
-
dev-pipeline/
|
|
298
|
+
dev-pipeline/reset-bug.sh B-001 --clean --run .prizmkit/plans/bug-fix-list.json
|
|
299
299
|
```
|
|
300
300
|
|
|
301
|
-
**Note:** `
|
|
302
|
-
|
|
303
|
-
Environment variables (optional):
|
|
304
|
-
```bash
|
|
305
|
-
SESSION_TIMEOUT=3600 dev-pipeline/retry-bugfix.sh B-001 .prizmkit/plans/bug-fix-list.json
|
|
306
|
-
```
|
|
301
|
+
**Note:** `reset-bug.sh --clean --run` performs a full clean (deletes session history and artifacts) before retrying — this gives a fresh start.
|
|
307
302
|
|
|
308
303
|
### Error Handling
|
|
309
304
|
|
|
@@ -231,7 +231,7 @@ Detect user intent from their message, then follow the corresponding workflow:
|
|
|
231
231
|
**If foreground**: Pipeline runs to completion in the terminal. After it finishes:
|
|
232
232
|
- Summarize results: total features, succeeded, failed, skipped
|
|
233
233
|
- If all succeeded: each feature session has already run `prizmkit-retrospective` internally. Ask user what's next.
|
|
234
|
-
- If some failed: show failed feature IDs and suggest `
|
|
234
|
+
- If some failed: show failed feature IDs and suggest `reset-feature.sh <F-XXX> --clean --run` for a fresh retry
|
|
235
235
|
- **Browser verification**: If any completed features have `browser_interaction` and `playwright-cli` is installed, offer to run browser verification (see §Post-Pipeline Browser Verification)
|
|
236
236
|
|
|
237
237
|
**If background daemon**:
|
|
@@ -320,26 +320,14 @@ Detect user intent from their message, then follow the corresponding workflow:
|
|
|
320
320
|
|
|
321
321
|
#### Intent E: Retry Single Feature Node
|
|
322
322
|
|
|
323
|
-
When user says "retry F-003":
|
|
324
|
-
|
|
325
|
-
```bash
|
|
326
|
-
dev-pipeline/retry-feature.sh F-003 .prizmkit/plans/feature-list.json
|
|
327
|
-
```
|
|
328
|
-
|
|
329
|
-
When user says "clean retry F-003" or "retry F-003 from scratch":
|
|
323
|
+
When user says "retry F-003" or "clean retry F-003":
|
|
330
324
|
|
|
331
325
|
```bash
|
|
332
326
|
dev-pipeline/reset-feature.sh F-003 --clean --run .prizmkit/plans/feature-list.json
|
|
333
327
|
```
|
|
334
328
|
|
|
335
|
-
Environment variables (optional):
|
|
336
|
-
```bash
|
|
337
|
-
SESSION_TIMEOUT=3600 dev-pipeline/retry-feature.sh F-003 .prizmkit/plans/feature-list.json
|
|
338
|
-
```
|
|
339
|
-
|
|
340
329
|
Notes:
|
|
341
|
-
- `
|
|
342
|
-
- `reset-feature.sh --clean --run` performs a full clean (deletes session history and artifacts) before retrying — use this for a fresh start when checkpoint recovery is not desired.
|
|
330
|
+
- `reset-feature.sh --clean --run` performs a full clean (deletes session history and artifacts) before retrying — this gives a fresh start.
|
|
343
331
|
- Keep pipeline daemon mode for main run management (`launch-feature-daemon.sh`).
|
|
344
332
|
|
|
345
333
|
---
|
|
@@ -387,7 +375,7 @@ After pipeline completion, if features have `browser_interaction` fields and `pl
|
|
|
387
375
|
| Pipeline already running | Show status, ask if user wants to stop and restart |
|
|
388
376
|
| PID file stale (process dead) | `launch-feature-daemon.sh` auto-cleans, retry start |
|
|
389
377
|
| Launch failed (process died immediately) | Show last 20 lines of log: `tail -20 .prizmkit/state/features/pipeline-daemon.log` |
|
|
390
|
-
| Feature stuck/blocked | Use `
|
|
378
|
+
| Feature stuck/blocked | Use `reset-feature.sh <F-XXX> --clean --run` for a fresh retry |
|
|
391
379
|
| All features blocked/failed | Show status, suggest daemon-safe recovery: `dev-pipeline/reset-feature.sh <F-XXX> --clean --run .prizmkit/plans/feature-list.json` |
|
|
392
380
|
| `playwright-cli` not installed | Browser verification skipped (non-blocking). Suggest: `npm install -g @playwright/cli@latest && playwright-cli install --skills` |
|
|
393
381
|
| Permission denied on script | Run `chmod +x dev-pipeline/launch-feature-daemon.sh dev-pipeline/run-feature.sh` |
|
|
@@ -80,13 +80,13 @@ Do NOT use this skill when:
|
|
|
80
80
|
If the script is not available, perform these manual validation checks:
|
|
81
81
|
1. **ID sequence**: All feature IDs are sequential (F-001, F-002, F-003, ...)
|
|
82
82
|
2. **No circular dependencies**: No feature depends (directly or transitively) on itself
|
|
83
|
-
3. **Description length**: Minimum 15 words per description (error), 30/50/80
|
|
83
|
+
3. **Description length**: Minimum 15 words per description (error), recommended minimum 30/50/80/100+ for low/medium/high/critical (warning). No upper limit — more detail is always better
|
|
84
84
|
4. **Dependency references**: All referenced features in dependencies exist in features array
|
|
85
85
|
5. **Priority enums**: All priority values are exactly "critical", "high", "medium", or "low" (case-sensitive)
|
|
86
86
|
6. **Status enum**: All status values are one of: pending, in_progress, completed, failed, skipped, split, auto_skipped
|
|
87
87
|
7. **Acceptance criteria**: At least 1 criterion per feature, each is a concrete, measurable statement
|
|
88
|
-
8. **Browser interaction**: If present, has
|
|
89
|
-
9. **Complexity enum**: If present, is one of: low, medium, high
|
|
88
|
+
8. **Browser interaction**: If present, has verify_steps array (optional — AI auto-detects dev server, URL, port at runtime)
|
|
89
|
+
9. **Complexity enum**: If present, is one of: low, medium, high, critical
|
|
90
90
|
10. **Model field**: If present, is a non-empty string
|
|
91
91
|
11. **Critic field**: If present, is boolean; if true, critic_count should be 1 or 3
|
|
92
92
|
12. **Root schema**: Has $schema='dev-pipeline-feature-list-v1', project_name, and non-empty features array
|
|
@@ -258,7 +258,11 @@ Key requirements:
|
|
|
258
258
|
- English feature titles for stable slug generation
|
|
259
259
|
- `critic` / `critic_count` defaults per Testing Defaults section
|
|
260
260
|
- `browser_interaction` auto-generated for qualifying frontend features
|
|
261
|
-
- descriptions: minimum 15 words (error), recommended 30/50/80 for low/medium/high
|
|
261
|
+
- descriptions: minimum 15 words (error), recommended minimum 30/50/80/100+ for low/medium/high/critical (warning). No upper limit — more detail prevents AI guessing
|
|
262
|
+
- `estimated_complexity` determines pipeline execution tier:
|
|
263
|
+
- `low` / `medium` → **lite** (single agent, no subagents)
|
|
264
|
+
- `high` → **standard** (orchestrator + dev + reviewer, 3 agents)
|
|
265
|
+
- `critical` → **full** (full team + critic agents, 5 agents). Use for: architectural changes touching 10+ files, cross-module refactoring with API surface changes, features requiring multi-critic voting
|
|
262
266
|
|
|
263
267
|
Run the validation script after generation:
|
|
264
268
|
```bash
|
|
@@ -12,13 +12,16 @@ Feature descriptions are the **primary input** for autonomous pipeline sessions.
|
|
|
12
12
|
|
|
13
13
|
### Minimum Word Counts
|
|
14
14
|
|
|
15
|
-
| Complexity | Minimum
|
|
16
|
-
|
|
17
|
-
| low | 15
|
|
18
|
-
| medium | 15
|
|
19
|
-
| high | 15
|
|
15
|
+
| Complexity | Hard Minimum (error) | Recommended Minimum (warning below) |
|
|
16
|
+
|------------|---------------------|-------------------------------------|
|
|
17
|
+
| low | 15 | 30+ |
|
|
18
|
+
| medium | 15 | 50+ |
|
|
19
|
+
| high | 15 | 80+ |
|
|
20
|
+
| critical | 15 | 100+ |
|
|
20
21
|
|
|
21
|
-
Below 15 words is a validation error. Below the
|
|
22
|
+
Below 15 words is a validation error. Below the recommended minimum triggers a warning.
|
|
23
|
+
|
|
24
|
+
**There is NO upper limit** — the more detail the better. Rich descriptions prevent the AI from guessing, producing higher quality code. Always aim to describe the feature as thoroughly as possible: what to build, how it should behave, what data it touches, and what edge cases to handle.
|
|
22
25
|
|
|
23
26
|
### What to Include
|
|
24
27
|
|
|
@@ -113,11 +116,12 @@ Then [expected outcome]
|
|
|
113
116
|
|
|
114
117
|
## Complexity Estimation Guide
|
|
115
118
|
|
|
116
|
-
| Complexity | Characteristics | Typical Scope |
|
|
117
|
-
|
|
118
|
-
| low | Single module, straightforward CRUD, minimal UI | 1-2 API endpoints, 1-2 pages |
|
|
119
|
-
| medium | Multiple modules, business logic, moderate UI | 3-5 API endpoints, 2-4 pages |
|
|
120
|
-
| high | Cross-cutting concerns, complex state, advanced UI | 5+ API endpoints, complex interactions |
|
|
119
|
+
| Complexity | Characteristics | Typical Scope | Pipeline Tier |
|
|
120
|
+
|------------|----------------|---------------|---------------|
|
|
121
|
+
| low | Single module, straightforward CRUD, minimal UI | 1-2 API endpoints, 1-2 pages | lite (1 agent) |
|
|
122
|
+
| medium | Multiple modules, business logic, moderate UI | 3-5 API endpoints, 2-4 pages | lite (1 agent) |
|
|
123
|
+
| high | Cross-cutting concerns, complex state, advanced UI | 5+ API endpoints, complex interactions | standard (3 agents) |
|
|
124
|
+
| critical | Architectural changes, 10+ files, multi-module API surface changes | System-wide refactoring, new infrastructure + app logic | full (5 agents + critic) |
|
|
121
125
|
|
|
122
126
|
### Complexity Red Flags
|
|
123
127
|
|
|
@@ -134,6 +138,7 @@ Consider splitting a feature if it exhibits any of the following:
|
|
|
134
138
|
|
|
135
139
|
- If a feature is marked as "low" complexity, it should not have more than 5 acceptance criteria.
|
|
136
140
|
- If a feature is marked as "high" complexity, it should have a clear justification (e.g., "involves payment processing with webhook handling and idempotency").
|
|
141
|
+
- Use "critical" complexity only for features requiring architectural changes that touch 10+ files, involve cross-module API surface changes, or need multi-critic voting for safety.
|
|
137
142
|
- When in doubt, estimate higher -- it is better to over-allocate than to under-allocate.
|
|
138
143
|
|
|
139
144
|
---
|
|
@@ -11,23 +11,24 @@ For each qualifying feature, generate the `browser_interaction` object:
|
|
|
11
11
|
```json
|
|
12
12
|
{
|
|
13
13
|
"browser_interaction": {
|
|
14
|
-
"url": "http://localhost:3000/login",
|
|
15
|
-
"setup_command": "npm run dev",
|
|
16
14
|
"verify_steps": [
|
|
17
15
|
"Verify login form renders with email and password fields",
|
|
18
16
|
"Verify valid credentials redirect to dashboard",
|
|
19
17
|
"Verify invalid password shows error message"
|
|
20
|
-
]
|
|
21
|
-
"screenshot": true
|
|
18
|
+
]
|
|
22
19
|
}
|
|
23
20
|
}
|
|
24
21
|
```
|
|
25
22
|
|
|
26
23
|
## Field Rules
|
|
27
24
|
|
|
28
|
-
- `
|
|
29
|
-
-
|
|
30
|
-
|
|
25
|
+
- `verify_steps` are **verification goals**, not specific playwright-cli commands. Describe WHAT to verify, not HOW to verify it. The pipeline AI will:
|
|
26
|
+
1. Auto-detect the dev server start command from project config (`package.json`, `Makefile`, etc.)
|
|
27
|
+
2. Start the server and discover the URL/port at runtime
|
|
28
|
+
3. Use `playwright-cli snapshot` to discover real element refs
|
|
29
|
+
4. Decide the concrete click/fill/assert operations itself
|
|
30
|
+
This works better than prescribing URLs/commands at planning time because: (1) ports may differ across environments, (2) element refs don't exist yet, (3) UI structure may change during implementation, (4) the AI has full context of the actual code when it runs verification.
|
|
31
31
|
- **Good**: `"Verify login form accepts valid credentials and redirects to dashboard"`
|
|
32
32
|
- **Bad**: `"click <ref> — click login button"` (guesses at refs that don't exist yet)
|
|
33
|
-
- `
|
|
33
|
+
- Do NOT specify `url`, `setup_command`, or `port` — the AI detects these at runtime from the actual project configuration
|
|
34
|
+
- An empty `browser_interaction: {}` object (no verify_steps) is valid — the AI will explore the app and verify the feature works as expected
|
|
@@ -7,7 +7,7 @@ Before generating `.prizmkit/plans/feature-list.json`, review the full feature s
|
|
|
7
7
|
For each feature, evaluate against the word-count thresholds in `planning-guide.md`:
|
|
8
8
|
- Does the description cover: what to build, key behaviors, integration points, data model (if applicable), error/edge cases?
|
|
9
9
|
- Is the description specific enough for an AI coding session to implement without guessing?
|
|
10
|
-
- Flag any feature below the recommended word count for its complexity level (30
|
|
10
|
+
- Flag any feature below the recommended minimum word count for its complexity level (30+/50+/80+/100+ words for low/medium/high/critical). There is no upper limit — more detail is always better.
|
|
11
11
|
|
|
12
12
|
**Implementation clarity check** — Every feature description will be consumed by an autonomous AI session. Verify each description specifies:
|
|
13
13
|
1. Concrete deliverables (files to create, endpoints to build, components to implement, models to define)
|
|
@@ -26,7 +26,7 @@ Group errors by type and apply targeted fixes:
|
|
|
26
26
|
| **Dependency errors** | Circular dependency, undefined target features | "Show cycle chain (e.g., `F-003 → F-005 → F-003`), suggest break point" | No |
|
|
27
27
|
| **Missing fields** | Feature missing required keys (title, description, AC) | "List each feature + missing keys, guide patch" | Partial |
|
|
28
28
|
| **Insufficient AC** | Feature has <2 acceptance criteria | "Show feature, suggest AC examples" | No |
|
|
29
|
-
| **Invalid values** | complexity not in [low/medium/high], status not pending | "Show field, valid values" | Yes |
|
|
29
|
+
| **Invalid values** | complexity not in [low/medium/high/critical], status not pending | "Show field, valid values" | Yes |
|
|
30
30
|
|
|
31
31
|
### Execution
|
|
32
32
|
|
|
@@ -73,7 +73,7 @@ For each new feature:
|
|
|
73
73
|
- keep title in English
|
|
74
74
|
- **write rich descriptions** (see `planning-guide.md` §4):
|
|
75
75
|
- minimum 15 words (validation error below this)
|
|
76
|
-
- recommended: 30+
|
|
76
|
+
- recommended minimum: 30+ (low), 50+ (medium), 80+ (high), 100+ (critical) — no upper limit, more detail is always better
|
|
77
77
|
- include: what to build, key behaviors, integration points, data model, error/edge cases
|
|
78
78
|
|
|
79
79
|
### Step 4: Rebalance Priority
|
|
@@ -33,7 +33,7 @@ from datetime import datetime, timezone
|
|
|
33
33
|
SCHEMA_VERSION = "dev-pipeline-feature-list-v1"
|
|
34
34
|
|
|
35
35
|
VALID_STATUSES = {"pending", "in_progress", "completed", "failed", "skipped", "split", "auto_skipped"}
|
|
36
|
-
VALID_COMPLEXITIES = {"low", "medium", "high"}
|
|
36
|
+
VALID_COMPLEXITIES = {"low", "medium", "high", "critical"}
|
|
37
37
|
VALID_PRIORITIES = {"critical", "high", "medium", "low"}
|
|
38
38
|
VALID_GRANULARITIES = {"feature", "sub_feature", "auto"}
|
|
39
39
|
VALID_PLANNING_MODES = {"new", "incremental"}
|
|
@@ -206,7 +206,7 @@ def validate_feature_list(data, planning_mode="new"):
|
|
|
206
206
|
|
|
207
207
|
seen_ids = set()
|
|
208
208
|
priorities = []
|
|
209
|
-
complexity_dist = {"low": 0, "medium": 0, "high": 0}
|
|
209
|
+
complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
|
|
210
210
|
total_sub_features = 0
|
|
211
211
|
|
|
212
212
|
for idx, feat in enumerate(features):
|
|
@@ -244,7 +244,9 @@ def validate_feature_list(data, planning_mode="new"):
|
|
|
244
244
|
if isinstance(desc, str) and desc.strip():
|
|
245
245
|
word_count = len(desc.split())
|
|
246
246
|
complexity = feat.get("estimated_complexity", "medium")
|
|
247
|
-
min_words = {
|
|
247
|
+
min_words = {
|
|
248
|
+
"low": 30, "medium": 50, "high": 80, "critical": 100,
|
|
249
|
+
}.get(complexity, 50)
|
|
248
250
|
if word_count < 15:
|
|
249
251
|
errors.append(
|
|
250
252
|
"{}: description too short ({} words, minimum 15). "
|
|
@@ -580,7 +582,7 @@ def generate_summary_markdown(data):
|
|
|
580
582
|
lines.append("")
|
|
581
583
|
|
|
582
584
|
# Statistics
|
|
583
|
-
complexity_dist = {"low": 0, "medium": 0, "high": 0}
|
|
585
|
+
complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
|
|
584
586
|
total_sub = 0
|
|
585
587
|
for feat in features:
|
|
586
588
|
c = feat.get("estimated_complexity")
|
|
@@ -596,8 +598,9 @@ def generate_summary_markdown(data):
|
|
|
596
598
|
lines.append("- Total features: {}".format(len(features)))
|
|
597
599
|
if total_sub > 0:
|
|
598
600
|
lines.append("- Total sub-features: {}".format(total_sub))
|
|
599
|
-
lines.append("- Complexity: {} low, {} medium, {} high".format(
|
|
600
|
-
complexity_dist["low"], complexity_dist["medium"],
|
|
601
|
+
lines.append("- Complexity: {} low, {} medium, {} high, {} critical".format(
|
|
602
|
+
complexity_dist["low"], complexity_dist["medium"],
|
|
603
|
+
complexity_dist["high"], complexity_dist["critical"]
|
|
601
604
|
))
|
|
602
605
|
lines.append("- Max dependency depth: {}".format(max_depth))
|
|
603
606
|
|
|
@@ -608,7 +611,7 @@ def generate_summary_json(data):
|
|
|
608
611
|
"""Generate a JSON summary of the feature list."""
|
|
609
612
|
features = data.get("features", [])
|
|
610
613
|
|
|
611
|
-
complexity_dist = {"low": 0, "medium": 0, "high": 0}
|
|
614
|
+
complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
|
|
612
615
|
total_sub = 0
|
|
613
616
|
for feat in features:
|
|
614
617
|
c = feat.get("estimated_complexity")
|
|
@@ -16,7 +16,7 @@ User says:
|
|
|
16
16
|
- "Don't want to restart from scratch"
|
|
17
17
|
|
|
18
18
|
**Do NOT use when:**
|
|
19
|
-
- Pipeline interrupted → use `
|
|
19
|
+
- Pipeline interrupted → use `reset-feature.sh --clean --run` / `reset-bug.sh --clean --run` for a fresh retry
|
|
20
20
|
- User wants a clean restart → use the original workflow skill directly (`/feature-workflow`, `/bug-fix-workflow`, `/refactor-workflow`)
|
|
21
21
|
- Nothing was ever started → use the original workflow skill
|
|
22
22
|
|
|
@@ -257,8 +257,8 @@ Recovery complete.
|
|
|
257
257
|
| `bug-fix-workflow` | **Recovery target** — this skill can resume interrupted bug-fix-workflow sessions |
|
|
258
258
|
| `refactor-workflow` | **Recovery target** — this skill can resume interrupted refactor-workflow sessions |
|
|
259
259
|
| `feature-pipeline-launcher` | **Called in Phase 2.2** — launches or checks pipeline status for feature recovery |
|
|
260
|
-
| `
|
|
261
|
-
| `
|
|
260
|
+
| `reset-feature.sh --clean --run` | **Alternative** — full clean retry for pipeline failures; this skill is the smart interactive alternative |
|
|
261
|
+
| `reset-bug.sh --clean --run` | **Alternative** — full clean retry for bugfix pipeline failures |
|
|
262
262
|
| `/prizmkit-code-review` | **Called in Phase 2.1** — reviews recovered bug-fix code |
|
|
263
263
|
| `/prizmkit-committer` | **Called in Phase 2.1** — commits the recovered result |
|
|
264
264
|
|