npm - prizmkit - Versions diffs - 1.1.10 → 1.1.12 - Mend

prizmkit 1.1.10 → 1.1.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (47) hide show

package/bundled/dev-pipeline/templates/sections/phase-commit-full.md CHANGED Viewed

@@ -1,12 +1,5 @@
 ### Architecture Sync & Commit (SINGLE COMMIT) — DO NOT SKIP
-**Bug Fix Documentation Policy**:
-- DEFAULT: Run `/prizmkit-retrospective` with structural sync only (update file counts, interfaces, dependencies). Skip knowledge injection.
-- UPDATE DOCS (run full retrospective — Job 1 + Job 2) when bug fix causes: interface signature changes, dependency additions/removals, observable behavior changes to existing features, or newly discovered TRAPs.
-- Simple bugs: No new spec.md/plan.md needed. Use fast path.
-- Complex bugs (multi-module, cascading): Use `/prizmkit-plan` with `artifact_dir=.prizmkit/bugfix/<BUG_ID>/`.
-- Commit prefix: `fix(<scope>):` (not `feat:`).
 **a.** Check if feature already committed:
 ```bash
 git log --oneline | grep "{{FEATURE_ID}}" | head -3
@@ -15,12 +8,11 @@ git log --oneline | grep "{{FEATURE_ID}}" | head -3
 - If no existing commit → proceed normally with b–d.
 **b.** Run `/prizmkit-retrospective` (**before commit**, maintains `.prizm-docs/` architecture index):
-- **Structural sync**: update KEY_FILES/INTERFACES/DEPENDENCIES/file counts for changed modules
-- **Architecture knowledge** (feature sessions only): extract TRAPS, RULES, DECISIONS from completed work into `.prizm-docs/`
-- **L2 coverage check**: For any module/sub-module with source files created or significantly modified in this session but no L2 `.prizm` doc — evaluate whether L2 is warranted and create if so. The current session has the best context for accurate KEY_FILES, TRAPS, and DECISIONS.
-- Stage doc changes: `git add .prizm-docs/`
+1. **Structural sync**: update KEY_FILES/INTERFACES/DEPENDENCIES/file counts for changed modules
+2. **Architecture knowledge**: extract TRAPS, RULES, DECISIONS from completed work into `.prizm-docs/`
+3. **L2 coverage check**: For any module/sub-module with source files created or significantly modified in this session but no L2 `.prizm` doc — evaluate whether L2 is warranted and create if so. The current session has the best context for accurate KEY_FILES, TRAPS, and DECISIONS.
+4. Stage doc changes: `git add .prizm-docs/`
 ⚠️ Do NOT commit here. Only stage.
-- **For bug-fix sessions**: structural sync (Job 1) by default. Run knowledge injection (Job 2) when the fix causes interface signature changes, dependency additions/removals, observable behavior changes, or reveals new TRAPs
 **c.** Stage all feature code explicitly (NEVER use `git add -A` or `git add .`):
 ```bash

package/bundled/dev-pipeline/templates/sections/phase-deploy-verification.md CHANGED Viewed

@@ -7,23 +7,7 @@ You just implemented this feature — you know the project's tech stack and buil
 3. **Assess and record** — append to context-snapshot.md:
    - **ALL builds pass** → `## Deploy Verification: PASS` — proceed to commit
    - **Some builds fail with fixable errors** → fix and re-verify (already handled in step 2)
-   - **Cannot build locally** (missing system-level deps you cannot install) → Generate `.prizmkit/deploy.md` with:
-     ```
-     # Local Development Setup
-     ## Prerequisites
-     - [tool]: [install instruction]
-     ## Build Steps
-     1. [exact command]
-     ## Run / Dev Mode
-     [exact command to start the app locally]
-     ## Verify
-     [how to confirm the app is running correctly]
-     ```
-     Record: `## Deploy Verification: PARTIAL — see .prizmkit/deploy.md for missing prerequisites`
+   - **Cannot build locally** (missing system-level deps you cannot install) → Record: `## Deploy Verification: PARTIAL — missing system deps (see below)`
 Deploy verification does NOT block the commit, but you MUST attempt it.
@@ -35,5 +19,13 @@ Deploy verification does NOT block the commit, but you MUST attempt it.
 If the project cannot be started locally (e.g., requires external services, databases, credentials), skip the smoke test and note why.
+**Deploy documentation update** — Run `/prizmkit-deploy` ONLY if this feature introduced new infrastructure or deployment-affecting changes:
+- New database, cache, message queue, or external service dependency
+- New environment variables required
+- New build steps or deployment configuration (Dockerfile, CI/CD, cloud config)
+- Changed ports, protocols, or service topology
+If none of the above apply (pure application logic change), skip `/prizmkit-deploy`.
 **Checkpoint update**: Update `workflow-checkpoint.json` — set step `deploy-verification` to `"completed"`.

package/bundled/dev-pipeline/templates/sections/phase-implement-lite.md CHANGED Viewed

@@ -27,7 +27,7 @@ $TEST_CMD 2>&1 | tee /tmp/test-baseline.txt | tail -20
 1. All tasks in plan.md are `[x]`
 2. Run the full test suite to ensure nothing is broken
 3. Verify each acceptance criterion from Section 1 of context-snapshot.md is met — check mentally, do NOT re-read files you already wrote
-4. If any criterion is not met, fix it now (max 2 fix rounds)
+4. If any criterion is not met, fix it now using the convergence-based recovery loop (see Test Failure Recovery Protocol)
 **CP-2**: All acceptance criteria met, all tests pass.

package/bundled/dev-pipeline/templates/sections/phase-plan-agent.md CHANGED Viewed

@@ -4,8 +4,9 @@
 ls .prizmkit/specs/{{FEATURE_SLUG}}/plan.md 2>/dev/null
 ```
-If missing, write it yourself:
-- `plan.md`: architecture — components, interfaces, data flow, files to create/modify, testing approach, and a Tasks section with `[ ]` checkboxes ordered by dependency
+If missing, run `/prizmkit-plan` with `artifact_dir=.prizmkit/specs/{{FEATURE_SLUG}}/` to generate `plan.md`:
+- The plan.md should include: architecture — components, interfaces, data flow, files to create/modify, testing approach, and a Tasks section with `[ ]` checkboxes ordered by dependency.
+- Resolve any `[NEEDS CLARIFICATION]` markers using the feature description — do NOT pause for interactive input.
 **Database Design Gate** (if feature involves data persistence — new tables, schema changes, new entities):
 Before proceeding past CP-1, verify:

package/bundled/dev-pipeline/templates/sections/phase-plan-lite.md CHANGED Viewed

@@ -4,8 +4,10 @@
 ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
 ```
-If plan.md missing, write it directly:
-- `plan.md`: key components, data flow, files to create/modify, and a Tasks section with `[ ]` checkboxes (each task = one implementable unit). Keep under 80 lines.
+If plan.md missing, run `/prizmkit-plan` with `artifact_dir=.prizmkit/specs/{{FEATURE_SLUG}}/`:
+- Pass the feature description and acceptance criteria from the Feature Context section above as input
+- The plan.md should include: key components, data flow, files to create/modify, and a Tasks section with `[ ]` checkboxes (each task = one implementable unit). Keep under 80 lines.
+- Resolve any `[NEEDS CLARIFICATION]` markers using the feature description — do NOT pause for interactive input.
 **Database Design Gate** (if feature involves data persistence — new tables, schema changes, new entities):
 Before proceeding past CP-1:

package/bundled/dev-pipeline/templates/sections/phase-specify-plan-full.md CHANGED Viewed

@@ -1,14 +1,5 @@
 ### Specify + Plan (Full Workflow)
-**Check for previous failure log:**
-```bash
-cat .prizmkit/specs/{{FEATURE_SLUG}}/failure-log.md 2>/dev/null || echo "NO_PREVIOUS_FAILURE"
-```
-If failure-log.md exists:
-- Read ROOT_CAUSE and SUGGESTION — adjust your approach accordingly
-- Read DISCOVERED_TRAPS — if any are genuine, inject into .prizm-docs/ during Phase 6 retrospective
-- Do NOT delete failure-log.md until this session completes all phases and commits successfully
 Check existing artifacts first:
 ```bash
 ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
@@ -18,15 +9,6 @@ ls .prizmkit/specs/{{FEATURE_SLUG}}/ 2>/dev/null
 - `context-snapshot.md` exists → use it directly, skip context snapshot building
 - Some missing → generate only missing files
-Before planning, check whether feature code already exists in the project (search in source directories identified from `root.prizm` or the project tree scan):
-```bash
-grep -r "{{FEATURE_SLUG}}" . --include="*.js" --include="*.ts" --include="*.py" --include="*.go" --include="*.java" --include="*.rb" --include="*.rs" -l --exclude-dir=node_modules --exclude-dir=.git --exclude-dir=dist --exclude-dir=build --exclude-dir=vendor --exclude-dir=.prizmkit 2>/dev/null | head -20
-```
-Record result as `EXISTING_CODE` (list of files, or empty).
-If `EXISTING_CODE` is non-empty: your spec/plan/tasks must reflect this existing implementation — document what exists, identify gaps, do NOT re-implement what is already done.
 **Step A — Build Context Snapshot** (skip if `context-snapshot.md` already exists):
 1. Read `.prizm-docs/root.prizm` and relevant L1/L2 prizm docs

package/bundled/dev-pipeline/templates/sections/session-context.md CHANGED Viewed

@@ -1,6 +1,5 @@
 ## Session Context
 - **Feature ID**: {{FEATURE_ID}} | **Session**: {{SESSION_ID}} | **Run**: {{RUN_ID}}
-- **Complexity**: {{COMPLEXITY}} | **Retry**: {{RETRY_COUNT}} / {{MAX_RETRIES}}
-- **Previous Status**: {{PREV_SESSION_STATUS}} | **Resume From**: {{RESUME_PHASE}}
+- **Complexity**: {{COMPLEXITY}}
 - **Init**: {{INIT_DONE}} | Artifacts: spec={{HAS_SPEC}} plan={{HAS_PLAN}}

package/bundled/dev-pipeline/templates/sections/test-failure-recovery-agent.md ADDED Viewed

@@ -0,0 +1,75 @@
+## Test Failure Recovery Protocol
+When tests fail during implementation (Phase 3 / Phase 4), use **convergence-based recovery** — keep fixing as long as progress is being made.
+### Recovery Loop
+1. **Run tests and record results**:
+   - Count total failures and note which tests failed
+   - Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
+2. **Check termination conditions** (evaluate BEFORE each fix attempt):
+   - **All tests pass** → Done. Exit recovery loop.
+   - **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
+   - **Still making progress** — failure count decreased compared to previous round → Continue fixing.
+   - **First round** — no history yet → Proceed to fix.
+3. **Fix and iterate**:
+   - Analyze remaining failures: root cause (code bug vs. test brittleness vs. environment issue)
+   - Categorize:
+     - **Pre-existing baseline failure**: Expected, do NOT fix
+     - **New regression**: Fix the code
+     - **Brittle test**: Fix the test or environment setup
+   - Apply fix, re-run `$TEST_CMD`, go back to step 1
+### Convergence Tracking
+Maintain a mental (or logged) record each round:
+```
+Round 1: 5 failures [test_a, test_b, test_c, test_d, test_e]
+Round 2: 3 failures [test_b, test_d, test_e]          ← progress, continue
+Round 3: 3 failures [test_b, test_d, test_e]          ← same as round 2 (plateau 1/3)
+Round 4: 3 failures [test_b, test_d, test_e]          ← plateau 2/3
+Round 5: 3 failures [test_b, test_d, test_e]          ← plateau 3/3 → STOP
+```
+**Key rule**: If failures decrease (even by 1), the plateau counter resets to 0.
+### Escalation — Dev + Reviewer Workflow
+When the recovery loop exits with remaining failures:
+- Dev appends failure details to Implementation Log
+- Reviewer agent runs full test suite in Phase 5
+- If Reviewer confirms NEW regressions (not in baseline): mark verdict as `NEEDS_FIXES`
+- If Reviewer confirms only baseline failures remain: proceed with `PASS_WITH_WARNINGS`
+### Context-Aware Test Re-run (Performance Optimization)
+**Skip redundant re-runs**:
+- If Implementation Log section in context-snapshot.md already confirms "all tests passing"
+- → Skip Phase 5 test suite re-run (Reviewer will verify baseline log instead)
+- This avoids rebuilding/re-running tests when already verified
+**When to re-run**:
+- If Implementation Log is missing or incomplete
+- If any new code was added after the last test run
+- If Reviewer suspects brittleness or environment drift
+### Failure Capture Rules
+If tests remain broken after recovery:
+```
+## Test Failures Encountered
+- **Test**: [test name/path]
+  - Root Cause: [explanation]
+  - Category: [pre-existing baseline | new regression | brittle test | environment]
+  - Rounds Attempted: [N rounds, plateau at round M]
+  - Status: [still failing | requires next session | known limitation]
+- **Impact on Feature**: [can AC be verified despite failure | blocks AC verification]
+```
+**Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.

package/bundled/dev-pipeline/templates/sections/test-failure-recovery-lite.md ADDED Viewed

@@ -0,0 +1,66 @@
+## Test Failure Recovery Protocol
+When tests fail during implementation, use **convergence-based recovery** — keep fixing as long as progress is being made.
+### Recovery Loop
+1. **Run tests and record results**:
+   - Count total failures and note which tests failed
+   - Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
+2. **Check termination conditions** (evaluate BEFORE each fix attempt):
+   - **All tests pass** → Done. Exit recovery loop.
+   - **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
+   - **Still making progress** — failure count decreased compared to previous round → Continue fixing.
+   - **First round** — no history yet → Proceed to fix.
+3. **Fix and iterate**:
+   - Analyze remaining failures: root cause (code bug vs. test brittleness vs. environment issue)
+   - Categorize:
+     - **Pre-existing baseline failure**: Expected, do NOT fix
+     - **New regression**: Fix the code
+     - **Brittle test**: Fix the test or environment setup
+   - Apply fix, re-run `$TEST_CMD`, go back to step 1
+### Convergence Tracking
+Maintain a mental (or logged) record each round:
+```
+Round 1: 5 failures [test_a, test_b, test_c, test_d, test_e]
+Round 2: 3 failures [test_b, test_d, test_e]          ← progress, continue
+Round 3: 3 failures [test_b, test_d, test_e]          ← same as round 2 (plateau 1/3)
+Round 4: 3 failures [test_b, test_d, test_e]          ← plateau 2/3
+Round 5: 3 failures [test_b, test_d, test_e]          ← plateau 3/3 → STOP
+```
+**Key rule**: If failures decrease (even by 1), the plateau counter resets to 0.
+### Escalation — Single Agent
+When the recovery loop exits with remaining failures:
+- Document all remaining failures in Implementation Log with root cause analysis
+- Record PARTIAL status with known failure list
+- **Do NOT block commit** — unresolved test failures are deferred to next session
+### Context-Aware Optimization
+**Skip redundant re-runs**: If Implementation Log already confirms "all tests passing", skip full suite re-run.
+### Failure Capture Rules
+If tests remain broken after recovery:
+```
+## Test Failures Encountered
+- **Test**: [test name/path]
+  - Root Cause: [explanation]
+  - Category: [pre-existing baseline | new regression | brittle test | environment]
+  - Rounds Attempted: [N rounds, plateau at round M]
+  - Status: [still failing | requires next session | known limitation]
+- **Impact on Feature**: [can AC be verified despite failure | blocks AC verification]
+```
+**Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.

package/bundled/skills/_metadata.json CHANGED Viewed

@@ -1,5 +1,5 @@
 {
-  "version": "1.1.10",
+  "version": "1.1.12",
   "skills": {
     "prizm-kit": {
       "description": "Full-lifecycle dev toolkit. Covers spec-driven development, Prizm context docs, code quality, debugging, deployment, and knowledge management.",

package/bundled/skills/bugfix-pipeline-launcher/SKILL.md CHANGED Viewed

@@ -204,7 +204,7 @@ Detect user intent from their message, then follow the corresponding workflow:
    **If foreground**: Pipeline runs to completion in the terminal. After it finishes:
    - Summarize results: total bugs, fixed, failed, skipped
    - If all fixed: each bug session has already run `prizmkit-retrospective` internally (structural sync by default; full retrospective when the fix changed interfaces, dependencies, or observable behavior). Ask user what's next.
-   - If some failed: show failed bug IDs and suggest `retry-bugfix.sh <B-XXX>` or `dev-pipeline/reset-bug.sh <B-XXX> --clean --run`
+   - If some failed: show failed bug IDs and suggest `dev-pipeline/reset-bug.sh <B-XXX> --clean --run` for a fresh retry
    **If background daemon**:
    1. Verify launch:
@@ -295,15 +295,10 @@ Detect user intent from their message, then follow the corresponding workflow:
 When user says "retry B-001":
 ```bash
-dev-pipeline/retry-bugfix.sh B-001 .prizmkit/plans/bug-fix-list.json
+dev-pipeline/reset-bug.sh B-001 --clean --run .prizmkit/plans/bug-fix-list.json
 ```
-**Note:** `retry-bugfix.sh` runs exactly one bug session and exits. It **preserves prior session artifacts and checkpoint state** — reads `retry_count` and `resume_from_phase` from `status.json` so the AI session can resume from where it left off. For a full clean retry, use `dev-pipeline/reset-bug.sh <B-XXX> --clean --run`.
-Environment variables (optional):
-```bash
-SESSION_TIMEOUT=3600 dev-pipeline/retry-bugfix.sh B-001 .prizmkit/plans/bug-fix-list.json
-```
+**Note:** `reset-bug.sh --clean --run` performs a full clean (deletes session history and artifacts) before retrying — this gives a fresh start.
 ### Error Handling

package/bundled/skills/feature-pipeline-launcher/SKILL.md CHANGED Viewed

@@ -231,7 +231,7 @@ Detect user intent from their message, then follow the corresponding workflow:
    **If foreground**: Pipeline runs to completion in the terminal. After it finishes:
    - Summarize results: total features, succeeded, failed, skipped
    - If all succeeded: each feature session has already run `prizmkit-retrospective` internally. Ask user what's next.
-   - If some failed: show failed feature IDs and suggest `retry-feature.sh <F-XXX>` or `reset-feature.sh <F-XXX> --clean --run`
+   - If some failed: show failed feature IDs and suggest `reset-feature.sh <F-XXX> --clean --run` for a fresh retry
    - **Browser verification**: If any completed features have `browser_interaction` and `playwright-cli` is installed, offer to run browser verification (see §Post-Pipeline Browser Verification)
    **If background daemon**:
@@ -320,26 +320,14 @@ Detect user intent from their message, then follow the corresponding workflow:
 #### Intent E: Retry Single Feature Node
-When user says "retry F-003":
-```bash
-dev-pipeline/retry-feature.sh F-003 .prizmkit/plans/feature-list.json
-```
-When user says "clean retry F-003" or "retry F-003 from scratch":
+When user says "retry F-003" or "clean retry F-003":
 ```bash
 dev-pipeline/reset-feature.sh F-003 --clean --run .prizmkit/plans/feature-list.json
 ```
-Environment variables (optional):
-```bash
-SESSION_TIMEOUT=3600 dev-pipeline/retry-feature.sh F-003 .prizmkit/plans/feature-list.json
-```
 Notes:
-- `retry-feature.sh` runs exactly one feature session and exits. It **preserves prior session artifacts and checkpoint state** — reads `retry_count` and `resume_from_phase` from `status.json` so the AI session can resume from where it left off rather than starting from scratch.
-- `reset-feature.sh --clean --run` performs a full clean (deletes session history and artifacts) before retrying — use this for a fresh start when checkpoint recovery is not desired.
+- `reset-feature.sh --clean --run` performs a full clean (deletes session history and artifacts) before retrying — this gives a fresh start.
 - Keep pipeline daemon mode for main run management (`launch-feature-daemon.sh`).
 ---
@@ -387,7 +375,7 @@ After pipeline completion, if features have `browser_interaction` fields and `pl
 | Pipeline already running | Show status, ask if user wants to stop and restart |
 | PID file stale (process dead) | `launch-feature-daemon.sh` auto-cleans, retry start |
 | Launch failed (process died immediately) | Show last 20 lines of log: `tail -20 .prizmkit/state/features/pipeline-daemon.log` |
-| Feature stuck/blocked | Use `retry-feature.sh <F-XXX>` to retry; use `reset-feature.sh <F-XXX> --clean --run` for fresh start |
+| Feature stuck/blocked | Use `reset-feature.sh <F-XXX> --clean --run` for a fresh retry |
 | All features blocked/failed | Show status, suggest daemon-safe recovery: `dev-pipeline/reset-feature.sh <F-XXX> --clean --run .prizmkit/plans/feature-list.json` |
 | `playwright-cli` not installed | Browser verification skipped (non-blocking). Suggest: `npm install -g @playwright/cli@latest && playwright-cli install --skills` |
 | Permission denied on script | Run `chmod +x dev-pipeline/launch-feature-daemon.sh dev-pipeline/run-feature.sh` |

package/bundled/skills/feature-planner/SKILL.md CHANGED Viewed

@@ -80,13 +80,13 @@ Do NOT use this skill when:
    If the script is not available, perform these manual validation checks:
    1. **ID sequence**: All feature IDs are sequential (F-001, F-002, F-003, ...)
    2. **No circular dependencies**: No feature depends (directly or transitively) on itself
-   3. **Description length**: Minimum 15 words per description (error), 30/50/80 recommended
+   3. **Description length**: Minimum 15 words per description (error), recommended minimum 30/50/80/100+ for low/medium/high/critical (warning). No upper limit — more detail is always better
    4. **Dependency references**: All referenced features in dependencies exist in features array
    5. **Priority enums**: All priority values are exactly "critical", "high", "medium", or "low" (case-sensitive)
    6. **Status enum**: All status values are one of: pending, in_progress, completed, failed, skipped, split, auto_skipped
    7. **Acceptance criteria**: At least 1 criterion per feature, each is a concrete, measurable statement
-   8. **Browser interaction**: If present, has url, verify_steps array, and optional setup_command
-   9. **Complexity enum**: If present, is one of: low, medium, high
+   8. **Browser interaction**: If present, has verify_steps array (optional — AI auto-detects dev server, URL, port at runtime)
+   9. **Complexity enum**: If present, is one of: low, medium, high, critical
    10. **Model field**: If present, is a non-empty string
    11. **Critic field**: If present, is boolean; if true, critic_count should be 1 or 3
    12. **Root schema**: Has $schema='dev-pipeline-feature-list-v1', project_name, and non-empty features array
@@ -258,7 +258,11 @@ Key requirements:
 - English feature titles for stable slug generation
 - `critic` / `critic_count` defaults per Testing Defaults section
 - `browser_interaction` auto-generated for qualifying frontend features
-- descriptions: minimum 15 words (error), recommended 30/50/80 for low/medium/high complexity (warning)
+- descriptions: minimum 15 words (error), recommended minimum 30/50/80/100+ for low/medium/high/critical (warning). No upper limit — more detail prevents AI guessing
+- `estimated_complexity` determines pipeline execution tier:
+  - `low` / `medium` → **lite** (single agent, no subagents)
+  - `high` → **standard** (orchestrator + dev + reviewer, 3 agents)
+  - `critical` → **full** (full team + critic agents, 5 agents). Use for: architectural changes touching 10+ files, cross-module refactoring with API surface changes, features requiring multi-critic voting
 Run the validation script after generation:
 ```bash

package/bundled/skills/feature-planner/assets/planning-guide.md CHANGED Viewed

@@ -12,13 +12,16 @@ Feature descriptions are the **primary input** for autonomous pipeline sessions.
 ### Minimum Word Counts
-| Complexity | Minimum Words | Warning Threshold |
-|------------|---------------|-------------------|
-| low        | 15            | 30                |
-| medium     | 15            | 50                |
-| high       | 15            | 80                |
+| Complexity | Hard Minimum (error) | Recommended Minimum (warning below) |
+|------------|---------------------|-------------------------------------|
+| low        | 15                  | 30+                                 |
+| medium     | 15                  | 50+                                 |
+| high       | 15                  | 80+                                 |
+| critical   | 15                  | 100+                                |
-Below 15 words is a validation error. Below the threshold triggers a warning.
+Below 15 words is a validation error. Below the recommended minimum triggers a warning.
+**There is NO upper limit** — the more detail the better. Rich descriptions prevent the AI from guessing, producing higher quality code. Always aim to describe the feature as thoroughly as possible: what to build, how it should behave, what data it touches, and what edge cases to handle.
 ### What to Include
@@ -113,11 +116,12 @@ Then [expected outcome]
 ## Complexity Estimation Guide
-| Complexity | Characteristics | Typical Scope |
-|------------|----------------|---------------|
-| low | Single module, straightforward CRUD, minimal UI | 1-2 API endpoints, 1-2 pages |
-| medium | Multiple modules, business logic, moderate UI | 3-5 API endpoints, 2-4 pages |
-| high | Cross-cutting concerns, complex state, advanced UI | 5+ API endpoints, complex interactions |
+| Complexity | Characteristics | Typical Scope | Pipeline Tier |
+|------------|----------------|---------------|---------------|
+| low | Single module, straightforward CRUD, minimal UI | 1-2 API endpoints, 1-2 pages | lite (1 agent) |
+| medium | Multiple modules, business logic, moderate UI | 3-5 API endpoints, 2-4 pages | lite (1 agent) |
+| high | Cross-cutting concerns, complex state, advanced UI | 5+ API endpoints, complex interactions | standard (3 agents) |
+| critical | Architectural changes, 10+ files, multi-module API surface changes | System-wide refactoring, new infrastructure + app logic | full (5 agents + critic) |
 ### Complexity Red Flags
@@ -134,6 +138,7 @@ Consider splitting a feature if it exhibits any of the following:
 - If a feature is marked as "low" complexity, it should not have more than 5 acceptance criteria.
 - If a feature is marked as "high" complexity, it should have a clear justification (e.g., "involves payment processing with webhook handling and idempotency").
+- Use "critical" complexity only for features requiring architectural changes that touch 10+ files, involve cross-module API surface changes, or need multi-critic voting for safety.
 - When in doubt, estimate higher -- it is better to over-allocate than to under-allocate.
 ---

package/bundled/skills/feature-planner/references/browser-interaction.md CHANGED Viewed

@@ -11,23 +11,24 @@ For each qualifying feature, generate the `browser_interaction` object:
 ```json
 {
   "browser_interaction": {
-    "url": "http://localhost:3000/login",
-    "setup_command": "npm run dev",
     "verify_steps": [
       "Verify login form renders with email and password fields",
       "Verify valid credentials redirect to dashboard",
       "Verify invalid password shows error message"
-    ],
-    "screenshot": true
+    ]
   }
 }
 ```
 ## Field Rules
-- `url` is required — the page URL to verify
-- `setup_command` is optional — command to start dev server (omit if already running)
-- `verify_steps` are **verification goals**, not specific playwright-cli commands. Describe WHAT to verify, not HOW to verify it. The pipeline AI will read the actual code, use `playwright-cli snapshot` to discover real element refs, and decide the concrete click/fill/assert operations itself. This works better than prescribing steps at planning time because: (1) element refs don't exist yet, (2) UI structure may change during implementation, (3) the AI has full context of the actual code when it runs verification.
+- `verify_steps` are **verification goals**, not specific playwright-cli commands. Describe WHAT to verify, not HOW to verify it. The pipeline AI will:
+  1. Auto-detect the dev server start command from project config (`package.json`, `Makefile`, etc.)
+  2. Start the server and discover the URL/port at runtime
+  3. Use `playwright-cli snapshot` to discover real element refs
+  4. Decide the concrete click/fill/assert operations itself
+  This works better than prescribing URLs/commands at planning time because: (1) ports may differ across environments, (2) element refs don't exist yet, (3) UI structure may change during implementation, (4) the AI has full context of the actual code when it runs verification.
   - **Good**: `"Verify login form accepts valid credentials and redirects to dashboard"`
   - **Bad**: `"click <ref> — click login button"` (guesses at refs that don't exist yet)
-- `screenshot` defaults to `true` — capture final state for human review
+- Do NOT specify `url`, `setup_command`, or `port` — the AI detects these at runtime from the actual project configuration
+- An empty `browser_interaction: {}` object (no verify_steps) is valid — the AI will explore the app and verify the feature works as expected

package/bundled/skills/feature-planner/references/completeness-review.md CHANGED Viewed

@@ -7,7 +7,7 @@ Before generating `.prizmkit/plans/feature-list.json`, review the full feature s
 For each feature, evaluate against the word-count thresholds in `planning-guide.md`:
 - Does the description cover: what to build, key behaviors, integration points, data model (if applicable), error/edge cases?
 - Is the description specific enough for an AI coding session to implement without guessing?
-- Flag any feature below the recommended word count for its complexity level (30/50/80 words for low/medium/high).
+- Flag any feature below the recommended minimum word count for its complexity level (30+/50+/80+/100+ words for low/medium/high/critical). There is no upper limit — more detail is always better.
 **Implementation clarity check** — Every feature description will be consumed by an autonomous AI session. Verify each description specifies:
 1. Concrete deliverables (files to create, endpoints to build, components to implement, models to define)

package/bundled/skills/feature-planner/references/error-recovery.md CHANGED Viewed

@@ -26,7 +26,7 @@ Group errors by type and apply targeted fixes:
 | **Dependency errors** | Circular dependency, undefined target features | "Show cycle chain (e.g., `F-003 → F-005 → F-003`), suggest break point" | No |
 | **Missing fields** | Feature missing required keys (title, description, AC) | "List each feature + missing keys, guide patch" | Partial |
 | **Insufficient AC** | Feature has <2 acceptance criteria | "Show feature, suggest AC examples" | No |
-| **Invalid values** | complexity not in [low/medium/high], status not pending | "Show field, valid values" | Yes |
+| **Invalid values** | complexity not in [low/medium/high/critical], status not pending | "Show field, valid values" | Yes |
 ### Execution

package/bundled/skills/feature-planner/references/incremental-feature-planning.md CHANGED Viewed

@@ -73,7 +73,7 @@ For each new feature:
 - keep title in English
 - **write rich descriptions** (see `planning-guide.md` §4):
   - minimum 15 words (validation error below this)
-  - recommended: 30+ words (low), 50+ words (medium), 80+ words (high complexity)
+  - recommended minimum: 30+ (low), 50+ (medium), 80+ (high), 100+ (critical) — no upper limit, more detail is always better
   - include: what to build, key behaviors, integration points, data model, error/edge cases
 ### Step 4: Rebalance Priority

package/bundled/skills/feature-planner/scripts/validate-and-generate.py CHANGED Viewed

@@ -33,7 +33,7 @@ from datetime import datetime, timezone
 SCHEMA_VERSION = "dev-pipeline-feature-list-v1"
 VALID_STATUSES = {"pending", "in_progress", "completed", "failed", "skipped", "split", "auto_skipped"}
-VALID_COMPLEXITIES = {"low", "medium", "high"}
+VALID_COMPLEXITIES = {"low", "medium", "high", "critical"}
 VALID_PRIORITIES = {"critical", "high", "medium", "low"}
 VALID_GRANULARITIES = {"feature", "sub_feature", "auto"}
 VALID_PLANNING_MODES = {"new", "incremental"}
@@ -206,7 +206,7 @@ def validate_feature_list(data, planning_mode="new"):
     seen_ids = set()
     priorities = []
-    complexity_dist = {"low": 0, "medium": 0, "high": 0}
+    complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
     total_sub_features = 0
     for idx, feat in enumerate(features):
@@ -244,7 +244,9 @@ def validate_feature_list(data, planning_mode="new"):
         if isinstance(desc, str) and desc.strip():
             word_count = len(desc.split())
             complexity = feat.get("estimated_complexity", "medium")
-            min_words = {"low": 30, "medium": 50, "high": 80}.get(complexity, 50)
+            min_words = {
+                "low": 30, "medium": 50, "high": 80, "critical": 100,
+            }.get(complexity, 50)
             if word_count < 15:
                 errors.append(
                     "{}: description too short ({} words, minimum 15). "
@@ -580,7 +582,7 @@ def generate_summary_markdown(data):
     lines.append("")
     # Statistics
-    complexity_dist = {"low": 0, "medium": 0, "high": 0}
+    complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
     total_sub = 0
     for feat in features:
         c = feat.get("estimated_complexity")
@@ -596,8 +598,9 @@ def generate_summary_markdown(data):
     lines.append("- Total features: {}".format(len(features)))
     if total_sub > 0:
         lines.append("- Total sub-features: {}".format(total_sub))
-    lines.append("- Complexity: {} low, {} medium, {} high".format(
-        complexity_dist["low"], complexity_dist["medium"], complexity_dist["high"]
+    lines.append("- Complexity: {} low, {} medium, {} high, {} critical".format(
+        complexity_dist["low"], complexity_dist["medium"],
+        complexity_dist["high"], complexity_dist["critical"]
     ))
     lines.append("- Max dependency depth: {}".format(max_depth))
@@ -608,7 +611,7 @@ def generate_summary_json(data):
     """Generate a JSON summary of the feature list."""
     features = data.get("features", [])
-    complexity_dist = {"low": 0, "medium": 0, "high": 0}
+    complexity_dist = {"low": 0, "medium": 0, "high": 0, "critical": 0}
     total_sub = 0
     for feat in features:
         c = feat.get("estimated_complexity")

package/bundled/skills/recovery-workflow/SKILL.md CHANGED Viewed

@@ -16,7 +16,7 @@ User says:
 - "Don't want to restart from scratch"
 **Do NOT use when:**
-- Pipeline interrupted → use `retry-feature.sh` / `retry-bugfix.sh`
+- Pipeline interrupted → use `reset-feature.sh --clean --run` / `reset-bug.sh --clean --run` for a fresh retry
 - User wants a clean restart → use the original workflow skill directly (`/feature-workflow`, `/bug-fix-workflow`, `/refactor-workflow`)
 - Nothing was ever started → use the original workflow skill
@@ -257,8 +257,8 @@ Recovery complete.
 | `bug-fix-workflow` | **Recovery target** — this skill can resume interrupted bug-fix-workflow sessions |
 | `refactor-workflow` | **Recovery target** — this skill can resume interrupted refactor-workflow sessions |
 | `feature-pipeline-launcher` | **Called in Phase 2.2** — launches or checks pipeline status for feature recovery |
-| `retry-feature.sh` | **Alternative** — full clean retry for pipeline failures; this skill is the smart interactive alternative |
-| `retry-bugfix.sh` | **Alternative** — full clean retry for bugfix pipeline failures |
+| `reset-feature.sh --clean --run` | **Alternative** — full clean retry for pipeline failures; this skill is the smart interactive alternative |
+| `reset-bug.sh --clean --run` | **Alternative** — full clean retry for bugfix pipeline failures |
 | `/prizmkit-code-review` | **Called in Phase 2.1** — reviews recovered bug-fix code |
 | `/prizmkit-committer` | **Called in Phase 2.1** — commits the recovered result |