prizmkit 1.1.69 → 1.1.70
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bundled/VERSION.json +3 -3
- package/bundled/dev-pipeline/scripts/generate-bootstrap-prompt.py +11 -12
- package/bundled/dev-pipeline/templates/agent-prompts/dev-implement.md +36 -22
- package/bundled/dev-pipeline/templates/agent-prompts/reviewer-review.md +1 -1
- package/bundled/dev-pipeline/templates/bugfix-bootstrap-prompt.md +24 -21
- package/bundled/dev-pipeline/templates/refactor-bootstrap-prompt.md +13 -26
- package/bundled/dev-pipeline/templates/sections/ac-verification-checklist.md +4 -10
- package/bundled/dev-pipeline/templates/sections/context-budget-rules.md +1 -0
- package/bundled/dev-pipeline/templates/sections/feature-context.md +16 -11
- package/bundled/dev-pipeline/templates/sections/phase-browser-verification-auto.md +17 -26
- package/bundled/dev-pipeline/templates/sections/phase-browser-verification-opencli.md +1 -1
- package/bundled/dev-pipeline/templates/sections/phase-browser-verification.md +1 -1
- package/bundled/dev-pipeline/templates/sections/phase-context-snapshot-base.md +1 -1
- package/bundled/dev-pipeline/templates/sections/phase-implement-agent.md +2 -9
- package/bundled/dev-pipeline/templates/sections/phase-implement-full.md +2 -9
- package/bundled/dev-pipeline/templates/sections/phase-implement-lite.md +8 -17
- package/bundled/dev-pipeline/templates/sections/phase-plan-lite.md +1 -1
- package/bundled/dev-pipeline/templates/sections/phase-review-full.md +1 -1
- package/bundled/dev-pipeline/templates/sections/phase-specify-plan-full.md +1 -1
- package/bundled/dev-pipeline/templates/sections/task-contract.md +34 -0
- package/bundled/dev-pipeline/templates/sections/test-failure-recovery-agent.md +27 -46
- package/bundled/dev-pipeline/templates/sections/test-failure-recovery-lite.md +27 -37
- package/bundled/dev-pipeline/tests/test_generate_bootstrap_prompt.py +13 -0
- package/bundled/dev-pipeline-windows/scripts/generate-bootstrap-prompt.py +11 -12
- package/bundled/dev-pipeline-windows/templates/agent-prompts/dev-implement.md +36 -22
- package/bundled/dev-pipeline-windows/templates/agent-prompts/reviewer-review.md +1 -1
- package/bundled/dev-pipeline-windows/templates/bugfix-bootstrap-prompt.md +24 -21
- package/bundled/dev-pipeline-windows/templates/refactor-bootstrap-prompt.md +13 -26
- package/bundled/dev-pipeline-windows/templates/sections/ac-verification-checklist.md +4 -10
- package/bundled/dev-pipeline-windows/templates/sections/context-budget-rules.md +1 -0
- package/bundled/dev-pipeline-windows/templates/sections/feature-context.md +16 -11
- package/bundled/dev-pipeline-windows/templates/sections/phase-browser-verification-auto.md +22 -10
- package/bundled/dev-pipeline-windows/templates/sections/phase-context-snapshot-base.md +1 -1
- package/bundled/dev-pipeline-windows/templates/sections/phase-implement-agent.md +2 -9
- package/bundled/dev-pipeline-windows/templates/sections/phase-implement-full.md +2 -9
- package/bundled/dev-pipeline-windows/templates/sections/phase-implement-lite.md +8 -19
- package/bundled/dev-pipeline-windows/templates/sections/phase-plan-lite.md +1 -1
- package/bundled/dev-pipeline-windows/templates/sections/phase-review-full.md +1 -1
- package/bundled/dev-pipeline-windows/templates/sections/phase-specify-plan-full.md +1 -1
- package/bundled/dev-pipeline-windows/templates/sections/task-contract.md +34 -0
- package/bundled/dev-pipeline-windows/templates/sections/test-failure-recovery-agent.md +27 -46
- package/bundled/dev-pipeline-windows/templates/sections/test-failure-recovery-lite.md +27 -37
- package/bundled/skills/_metadata.json +1 -1
- package/package.json +1 -1
|
@@ -1,25 +1,30 @@
|
|
|
1
1
|
<feature-context>
|
|
2
|
-
|
|
2
|
+
## Source Semantics
|
|
3
3
|
|
|
4
|
-
|
|
4
|
+
Use this material according to the following priority:
|
|
5
5
|
|
|
6
|
-
|
|
6
|
+
1. **Task Contract** — defines current scope and Verification Gates.
|
|
7
|
+
2. **User Raw Context** — authoritative constraints, but not automatic scope expansion.
|
|
8
|
+
3. **Completed Dependencies** — existing behavior and interfaces; do not re-implement them.
|
|
9
|
+
4. **Project Brief** — product background and architecture alignment.
|
|
10
|
+
5. **App Global Context** — stack, runtime, and testing conventions.
|
|
11
|
+
6. **Project Conventions** — repository-specific rules.
|
|
7
12
|
|
|
8
|
-
###
|
|
13
|
+
### User Raw Context
|
|
9
14
|
|
|
10
|
-
{{
|
|
15
|
+
{{USER_CONTEXT}}
|
|
11
16
|
|
|
12
|
-
###
|
|
17
|
+
### Completed Dependencies
|
|
13
18
|
|
|
14
|
-
>
|
|
19
|
+
> Use this section to understand available interfaces and avoid duplicating completed work.
|
|
15
20
|
|
|
16
|
-
{{
|
|
21
|
+
{{COMPLETED_DEPENDENCIES}}
|
|
17
22
|
|
|
18
|
-
###
|
|
23
|
+
### Project Brief
|
|
19
24
|
|
|
20
|
-
>
|
|
25
|
+
> Use this section for alignment only. Do not treat unrelated backlog items as current feature scope.
|
|
21
26
|
|
|
22
|
-
{{
|
|
27
|
+
{{PROJECT_BRIEF}}
|
|
23
28
|
|
|
24
29
|
### App Global Context
|
|
25
30
|
|
|
@@ -1,8 +1,9 @@
|
|
|
1
1
|
### Browser Verification — MANDATORY
|
|
2
2
|
|
|
3
|
-
You MUST
|
|
3
|
+
You MUST attempt this phase. Mark it as skipped only when no usable browser tool can be found, installed, or started. Use native PowerShell commands for setup, server lifecycle, and cleanup.
|
|
4
|
+
|
|
5
|
+
**Step 0 — Tool Selection (BLOCKING — decide before any browser action)**:
|
|
4
6
|
|
|
5
|
-
**Tool readiness**:
|
|
6
7
|
```powershell
|
|
7
8
|
$playwright = Get-Command playwright-cli -ErrorAction SilentlyContinue
|
|
8
9
|
$opencli = Get-Command opencli -ErrorAction SilentlyContinue
|
|
@@ -10,16 +11,25 @@ if ($playwright) { playwright-cli --version } else { "PLAYWRIGHT_CLI:NOT_INSTALL
|
|
|
10
11
|
if ($opencli) { opencli --version } else { "OPENCLI:NOT_INSTALLED" }
|
|
11
12
|
```
|
|
12
13
|
|
|
13
|
-
|
|
14
|
+
Use this single decision tree:
|
|
15
|
+
1. If `playwright-cli` is available, use it by default for local dev-server verification.
|
|
16
|
+
2. Else if `opencli` is available, run `opencli doctor`; use `opencli` only if doctor passes.
|
|
17
|
+
3. Else install `playwright-cli` as the default:
|
|
18
|
+
```powershell
|
|
19
|
+
npm install -g @playwright/cli@latest
|
|
20
|
+
playwright-cli --version
|
|
21
|
+
```
|
|
22
|
+
4. If no browser tool is usable after these steps, append `## Browser Verification: SKIPPED — no usable browser tool` to `context-snapshot.md`, then continue to reporting/checkpoint.
|
|
23
|
+
|
|
24
|
+
Use `opencli` instead of `playwright-cli` only when the scenario requires an existing Chrome login/session or third-party integration cookies.
|
|
25
|
+
|
|
26
|
+
Record your choice:
|
|
14
27
|
```powershell
|
|
15
|
-
|
|
16
|
-
|
|
28
|
+
$BROWSER_TOOL = "playwright-cli" # or "opencli" or "skipped"
|
|
29
|
+
"Selected browser tool: $BROWSER_TOOL"
|
|
17
30
|
```
|
|
18
31
|
|
|
19
|
-
|
|
20
|
-
- Use `playwright-cli` for isolated local dev-server verification.
|
|
21
|
-
- Use `opencli` when the scenario needs an existing Chrome login/session.
|
|
22
|
-
- If opencli is selected, verify connectivity with `opencli doctor` before browser actions.
|
|
32
|
+
If `$BROWSER_TOOL -eq "skipped"`, do NOT start a dev server and do NOT run browser commands. Go directly to result reporting and the checkpoint update.
|
|
23
33
|
|
|
24
34
|
**Dev server setup**:
|
|
25
35
|
```powershell
|
|
@@ -56,6 +66,8 @@ Use the selected browser tool to inspect current page state, perform actions, an
|
|
|
56
66
|
- `playwright-cli open http://localhost:$DEV_PORT`, `playwright-cli snapshot`, `playwright-cli screenshot`, `playwright-cli close`
|
|
57
67
|
- `opencli browser open http://localhost:$DEV_PORT`, `opencli browser state`, `opencli browser screenshot`, `opencli browser close`
|
|
58
68
|
|
|
69
|
+
Base concrete actions on the Task Contract's Verification Gates.
|
|
70
|
+
|
|
59
71
|
**Cleanup**:
|
|
60
72
|
```powershell
|
|
61
73
|
if ($DEV_SERVER_PID) { Stop-Process -Id $DEV_SERVER_PID -Force -ErrorAction SilentlyContinue }
|
|
@@ -64,7 +76,7 @@ Get-NetTCPConnection -LocalPort ([int]$DEV_PORT) -ErrorAction SilentlyContinue |
|
|
|
64
76
|
ForEach-Object { Stop-Process -Id $_ -Force -ErrorAction SilentlyContinue }
|
|
65
77
|
```
|
|
66
78
|
|
|
67
|
-
Append results to `context-snapshot.md` with tool, URL, commands used, screenshot path, PASS/FAIL reason, and cleanup status.
|
|
79
|
+
Append results to `context-snapshot.md` with tool, URL, commands used, screenshot path, PASS/FAIL/SKIPPED reason, and cleanup status.
|
|
68
80
|
|
|
69
81
|
**Checkpoint update**:
|
|
70
82
|
```powershell
|
|
@@ -15,7 +15,7 @@ If MISSING — build it now:
|
|
|
15
15
|
Identify the top-level source directories from the results.
|
|
16
16
|
3. Scan the detected source directories for files related to this feature; read each one
|
|
17
17
|
4. Write `.prizmkit/specs/{{FEATURE_SLUG}}/context-snapshot.md`:
|
|
18
|
-
- **Section 1 —
|
|
18
|
+
- **Section 1 — Task Contract**: Objective, scope rule, non-scope rule, and Verification Gates from the Task Contract above
|
|
19
19
|
- **Section 2 — Project Structure**: run the following to get a visual directory tree, then paste output:
|
|
20
20
|
```powershell
|
|
21
21
|
Get-ChildItem -Path . -Directory -Recurse -Depth 2 -ErrorAction SilentlyContinue |
|
|
@@ -1,13 +1,6 @@
|
|
|
1
1
|
### Implement — Dev Subagent
|
|
2
2
|
|
|
3
|
-
**
|
|
4
|
-
|
|
5
|
-
**Dependency version gate (BLOCKING — pass to Dev agent)**: Before running ANY package install command (`npm install`, `pip install`, `cargo build`, `go mod tidy`, `bundle install`, etc.):
|
|
6
|
-
1. Every version number in the dependency manifest MUST be verified against the real registry (see Context Budget Rules §9)
|
|
7
|
-
2. If a scaffold tool generated a `package.json` / `requirements.txt` / etc., verify the versions it wrote too — scaffold tools can emit outdated versions
|
|
8
|
-
3. Do NOT proceed with install until all versions are confirmed real. Violation = wasted timeout cycles that can crash the session
|
|
9
|
-
|
|
10
|
-
**Scaffold file rule (pass to Dev agent)**: After running any init/scaffold command, record generated files in context-snapshot.md under `### Scaffold Files (do not re-read)`. Never re-read these files — their content is standard boilerplate (see Context Budget Rules §8). When spawning subagents, explicitly list scaffold files so they skip reading them.
|
|
3
|
+
**Protocol handoff to Dev**: The Dev prompt already carries the required subset of Context Budget Rules, the Test Failure Recovery Protocol, and Task Contract / Verification Gate constraints. Do not duplicate those rules here; verify Dev output against the gates below.
|
|
11
4
|
|
|
12
5
|
**Spawn Agent**:
|
|
13
6
|
| Parameter | Value |
|
|
@@ -24,7 +17,7 @@
|
|
|
24
17
|
**Prompt**:
|
|
25
18
|
> {{AGENT_PROMPT_DEV_IMPLEMENT}}
|
|
26
19
|
|
|
27
|
-
Wait for Dev to return.
|
|
20
|
+
Wait for Dev to return. Implementation may proceed only when tasks are complete and the Test Failure Recovery Protocol's Success Rule is satisfied.
|
|
28
21
|
|
|
29
22
|
**No silent artifact polling**:
|
|
30
23
|
- Do NOT run a long no-output loop that only waits for `## Implementation Log` or any other file marker.
|
|
@@ -1,13 +1,6 @@
|
|
|
1
1
|
### Implement — Dev Agent
|
|
2
2
|
|
|
3
|
-
**
|
|
4
|
-
|
|
5
|
-
**Dependency version gate (BLOCKING — pass to Dev agent)**: Before running ANY package install command (`npm install`, `pip install`, `cargo build`, `go mod tidy`, `bundle install`, etc.):
|
|
6
|
-
1. Every version number in the dependency manifest MUST be verified against the real registry (see Context Budget Rules §9)
|
|
7
|
-
2. If a scaffold tool generated a `package.json` / `requirements.txt` / etc., verify the versions it wrote too — scaffold tools can emit outdated versions
|
|
8
|
-
3. Do NOT proceed with install until all versions are confirmed real. Violation = wasted timeout cycles that can crash the session
|
|
9
|
-
|
|
10
|
-
**Scaffold file rule (pass to Dev agent)**: After running any init/scaffold command, record generated files in context-snapshot.md under `### Scaffold Files (do not re-read)`. Never re-read these files — their content is standard boilerplate (see Context Budget Rules §8). When spawning subagents, explicitly list scaffold files so they skip reading them.
|
|
3
|
+
**Protocol handoff to Dev**: The Dev prompt already carries the required subset of Context Budget Rules, the Test Failure Recovery Protocol, and Task Contract / Verification Gate constraints. Do not duplicate those rules here; verify Dev output against the gates below.
|
|
11
4
|
|
|
12
5
|
Before spawning Dev, check plan.md Tasks section:
|
|
13
6
|
```powershell
|
|
@@ -50,7 +43,7 @@ Wait for Dev to return. **If Dev times out before all tasks are `[x]`**:
|
|
|
50
43
|
> {{AGENT_PROMPT_DEV_RESUME}}
|
|
51
44
|
3. Max 2 recovery retries. After 2 failures, orchestrator implements remaining tasks directly.
|
|
52
45
|
|
|
53
|
-
|
|
46
|
+
Implementation phase is complete only when all tasks are `[x]` and the Test Failure Recovery Protocol's Success Rule is satisfied.
|
|
54
47
|
|
|
55
48
|
|
|
56
49
|
**Checkpoint update**: Run the update script to set step `prizmkit-implement` to `"completed"`:
|
|
@@ -1,20 +1,9 @@
|
|
|
1
1
|
### Implement + Test
|
|
2
2
|
|
|
3
|
-
**
|
|
4
|
-
|
|
5
|
-
|
|
6
|
-
|
|
7
|
-
Add-Content -Path .gitignore -Value '/binary-name'
|
|
8
|
-
}
|
|
9
|
-
```
|
|
10
|
-
Never commit compiled binaries, build output, or generated artifacts.
|
|
11
|
-
|
|
12
|
-
**Dependency version gate (BLOCKING)**: Before running ANY package install command (`npm install`, `pip install`, `cargo build`, `go mod tidy`, `bundle install`, etc.):
|
|
13
|
-
1. Every version number in your dependency manifest MUST be verified against the real registry (see Context Budget Rules §9)
|
|
14
|
-
2. If you used a scaffold tool that generated a `package.json` / `requirements.txt` / etc., verify the versions it wrote too — scaffold tools can emit outdated versions
|
|
15
|
-
3. Do NOT proceed with install until all versions are confirmed real. Violation = wasted timeout cycles
|
|
16
|
-
|
|
17
|
-
**Scaffold file rule**: After running any init/scaffold command, record generated files in context-snapshot.md under `### Scaffold Files (do not re-read)`. Never re-read these files — their content is standard boilerplate (see Context Budget Rules §8).
|
|
3
|
+
**Protocol references**:
|
|
4
|
+
- Follow Context Budget Rules §8 for scaffold/generated files.
|
|
5
|
+
- Follow Context Budget Rules §9 before package install/build commands that resolve dependencies.
|
|
6
|
+
- Follow Context Budget Rules §10 after build/compile commands.
|
|
18
7
|
|
|
19
8
|
**3a.** Detect test commands and record baseline:
|
|
20
9
|
|
|
@@ -36,11 +25,11 @@ You know this project's tech stack. Identify ALL test commands that apply (e.g.,
|
|
|
36
25
|
|
|
37
26
|
**3c.** After implement completes, verify:
|
|
38
27
|
1. All tasks in plan.md are `[x]`
|
|
39
|
-
2. Run the full test suite to ensure
|
|
40
|
-
3. Verify each
|
|
41
|
-
4. If any
|
|
28
|
+
2. Run the full test suite to ensure no new regressions remain
|
|
29
|
+
3. Verify each Verification Gate from the Task Contract — check mentally, do NOT re-read files you already wrote
|
|
30
|
+
4. If any gate is unmet or blocked, follow the Test Failure Recovery Protocol
|
|
42
31
|
|
|
43
|
-
**CP-2**:
|
|
32
|
+
**CP-2**: Implementation may proceed only when all tasks are `[x]` and the Test Failure Recovery Protocol's Success Rule is satisfied. Blocked gates must be documented in `failure-log.md` and are not success.
|
|
44
33
|
|
|
45
34
|
|
|
46
35
|
**Checkpoint update**: Run the update script to set step `prizmkit-implement` to `"completed"`:
|
|
@@ -5,7 +5,7 @@ Get-ChildItem .prizmkit/specs/{{FEATURE_SLUG}}/ -ErrorAction SilentlyContinue
|
|
|
5
5
|
```
|
|
6
6
|
|
|
7
7
|
If plan.md missing, run `/prizmkit-plan` with `artifact_dir=.prizmkit/specs/{{FEATURE_SLUG}}/`:
|
|
8
|
-
- Pass the
|
|
8
|
+
- Pass the Objective and Verification Gates from the Task Contract as input
|
|
9
9
|
- The plan.md should include: key components, data flow, files to create/modify, and a Tasks section with `[ ]` checkboxes (each task = one implementable unit). Keep under 80 lines.
|
|
10
10
|
- Resolve any `[NEEDS CLARIFICATION]` markers using the feature description — do NOT pause for interactive input.
|
|
11
11
|
|
|
@@ -21,7 +21,7 @@ Read `review-report.md` and check the Verdict:
|
|
|
21
21
|
|
|
22
22
|
Run the full test suite: `({{TEST_CMD}}) 2>&1 | Tee-Object (Join-Path $env:TEMP "review-test-out.txt") | Select-Object -Last 20`
|
|
23
23
|
|
|
24
|
-
**CP-3**: Review complete,
|
|
24
|
+
**CP-3**: Review complete, report written, and the Test Failure Recovery Protocol's Success Rule is satisfied.
|
|
25
25
|
|
|
26
26
|
|
|
27
27
|
**Checkpoint update**: Run the update script to set step `prizmkit-code-review` to `"completed"`:
|
|
@@ -21,7 +21,7 @@ Get-ChildItem .prizmkit/specs/{{FEATURE_SLUG}}/ -ErrorAction SilentlyContinue
|
|
|
21
21
|
Identify the top-level source directories from the results.
|
|
22
22
|
3. Scan the detected source directories for files related to this feature; read each one
|
|
23
23
|
4. Write `.prizmkit/specs/{{FEATURE_SLUG}}/context-snapshot.md`:
|
|
24
|
-
- **Section 1 — Task
|
|
24
|
+
- **Section 1 — Task Contract**: Objective, scope rule, non-scope rule, and Verification Gates from the Task Contract above
|
|
25
25
|
- **Section 2 — Project Structure**: run the following to get a visual directory tree, then paste output:
|
|
26
26
|
```powershell
|
|
27
27
|
Get-ChildItem -Path . -Directory -Recurse -Depth 2 -ErrorAction SilentlyContinue |
|
|
@@ -0,0 +1,34 @@
|
|
|
1
|
+
## Task Contract
|
|
2
|
+
|
|
3
|
+
This section defines the only work that belongs to this session.
|
|
4
|
+
|
|
5
|
+
### Objective
|
|
6
|
+
|
|
7
|
+
Implement {{FEATURE_ID}}: "{{FEATURE_TITLE}}".
|
|
8
|
+
|
|
9
|
+
{{FEATURE_DESCRIPTION}}
|
|
10
|
+
|
|
11
|
+
### Scope Rule
|
|
12
|
+
|
|
13
|
+
Current scope is limited to the intersection of:
|
|
14
|
+
|
|
15
|
+
1. The Objective above
|
|
16
|
+
2. The Verification Gates below
|
|
17
|
+
3. Dependencies required to complete those gates
|
|
18
|
+
|
|
19
|
+
Raw user context, project brief, and completed dependency notes are authoritative context, but they do not expand scope by themselves.
|
|
20
|
+
|
|
21
|
+
### Non-Scope Rule
|
|
22
|
+
|
|
23
|
+
Do NOT implement unrelated backlog items, already completed features, or adjacent modules unless they are required by the Objective or a Verification Gate.
|
|
24
|
+
|
|
25
|
+
### Verification Gates
|
|
26
|
+
|
|
27
|
+
These gates are generated from `feature.acceptance_criteria` and are the only acceptance requirements for this session.
|
|
28
|
+
|
|
29
|
+
{{AC_CHECKLIST}}
|
|
30
|
+
|
|
31
|
+
Gate rule:
|
|
32
|
+
- `[x]` means verified with implementation or test evidence.
|
|
33
|
+
- Any remaining `[ ]` means the feature is incomplete.
|
|
34
|
+
- If a gate is blocked, document the reason in `failure-log.md`; blocked is not success.
|
|
@@ -1,67 +1,48 @@
|
|
|
1
1
|
## Test Failure Recovery Protocol
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Use this protocol whenever implementation or review tests fail. Its purpose is to distinguish tolerated pre-existing failures from blockers introduced by this session.
|
|
4
4
|
|
|
5
|
-
###
|
|
6
|
-
|
|
7
|
-
1. **Run tests and record results**:
|
|
8
|
-
- Count total failures and note the failed tests
|
|
9
|
-
- Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
|
|
10
|
-
|
|
11
|
-
2. **Check termination conditions** (evaluate BEFORE each fix attempt):
|
|
12
|
-
- **All tests pass** → Done. Exit recovery loop.
|
|
13
|
-
- **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
|
|
14
|
-
- **Still making progress** — failure count decreased compared to previous round → Continue fixing.
|
|
15
|
-
- **First round** — no history yet → Proceed to fix.
|
|
16
|
-
|
|
17
|
-
3. **Fix and iterate**:
|
|
18
|
-
- Analyze remaining failures: root cause (code bug vs. test brittleness vs. environment issue)
|
|
19
|
-
- Categorize:
|
|
20
|
-
- **Pre-existing baseline failure**: Expected, do NOT fix
|
|
21
|
-
- **New regression**: Fix the code
|
|
22
|
-
- **Brittle test**: Fix the test or environment setup
|
|
23
|
-
- Apply fix, re-run `{{TEST_CMD}}`, go back to step 1
|
|
5
|
+
### Failure Classes
|
|
24
6
|
|
|
25
|
-
|
|
7
|
+
| Class | Meaning | Required Action | May Continue? |
|
|
8
|
+
|-------|---------|-----------------|---------------|
|
|
9
|
+
| Baseline failure | Existed before this session | Document in Implementation Log or review notes | Yes |
|
|
10
|
+
| New regression | Introduced by this session | Fix before reporting success | No |
|
|
11
|
+
| Brittle test | Test expectation/setup is wrong for the intended behavior | Fix the test or environment setup, then rerun | Only after fixed or documented as blocked |
|
|
12
|
+
| Environment/tooling failure | External tool, network, install, or local environment prevents verification | Document in `failure-log.md` with impact on gates | Only if no Verification Gate is blocked |
|
|
26
13
|
|
|
27
|
-
|
|
14
|
+
### Recovery Loop
|
|
28
15
|
|
|
29
|
-
|
|
16
|
+
1. Run tests and record: failing test names, failure count, and class for each failure.
|
|
17
|
+
2. Compare with `BASELINE_FAILURES`; never blame baseline failures on this feature.
|
|
18
|
+
3. Fix new regressions and brittle tests while progress is being made.
|
|
19
|
+
4. Stop after a plateau: same failure count and same failing tests for 3 consecutive rounds.
|
|
20
|
+
5. If failures decrease, reset the plateau counter.
|
|
30
21
|
|
|
31
|
-
###
|
|
22
|
+
### Success Rule
|
|
32
23
|
|
|
33
|
-
|
|
34
|
-
- Dev appends failure details to Implementation Log
|
|
35
|
-
- Reviewer agent runs full test suite in Phase 5
|
|
36
|
-
- If Reviewer confirms NEW regressions (not in baseline): mark verdict as `NEEDS_FIXES`
|
|
37
|
-
- If Reviewer confirms only baseline failures remain: proceed with `PASS_WITH_WARNINGS`
|
|
24
|
+
Proceed to review only when:
|
|
38
25
|
|
|
39
|
-
|
|
26
|
+
1. all new regressions are fixed;
|
|
27
|
+
2. baseline failures are documented;
|
|
28
|
+
3. every Verification Gate is verified.
|
|
40
29
|
|
|
41
|
-
|
|
42
|
-
- If Implementation Log section in context-snapshot.md already confirms "all tests passing"
|
|
43
|
-
- → Skip Phase 5 test suite re-run (Reviewer will verify baseline log instead)
|
|
44
|
-
- This avoids rebuilding/re-running tests when already verified
|
|
30
|
+
Blocked gates are not success. If any gate cannot be verified, follow the Blocked Rule.
|
|
45
31
|
|
|
46
|
-
|
|
47
|
-
- If Implementation Log is missing or incomplete
|
|
48
|
-
- If any new code was added after the last test run
|
|
49
|
-
- If Reviewer suspects brittleness or environment drift
|
|
32
|
+
### Blocked Rule
|
|
50
33
|
|
|
51
|
-
|
|
34
|
+
If a remaining failure prevents any Verification Gate from being verified, the feature is incomplete. Write `failure-log.md` and do not report success.
|
|
52
35
|
|
|
53
|
-
|
|
36
|
+
### Failure Capture Format
|
|
54
37
|
|
|
55
|
-
```
|
|
38
|
+
```markdown
|
|
56
39
|
## Test Failures Encountered
|
|
57
40
|
|
|
58
41
|
- **Test**: [test name/path]
|
|
59
42
|
- Root Cause: [explanation]
|
|
60
|
-
- Category: [
|
|
43
|
+
- Category: [baseline failure | new regression | brittle test | environment/tooling]
|
|
61
44
|
- Rounds Attempted: [N rounds, plateau at round M]
|
|
62
|
-
- Status: [
|
|
45
|
+
- Status: [fixed | still failing | blocked]
|
|
63
46
|
|
|
64
|
-
- **Impact on
|
|
47
|
+
- **Impact on Verification Gates**: [verified | not affected | blocked + reason]
|
|
65
48
|
```
|
|
66
|
-
|
|
67
|
-
**Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.
|
|
@@ -1,58 +1,48 @@
|
|
|
1
1
|
## Test Failure Recovery Protocol
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
Use this protocol whenever implementation tests fail. Its purpose is to distinguish tolerated pre-existing failures from blockers introduced by this session.
|
|
4
4
|
|
|
5
|
-
###
|
|
6
|
-
|
|
7
|
-
1. **Run tests and record results**:
|
|
8
|
-
- Count total failures and note the failed tests
|
|
9
|
-
- Compare against baseline (BASELINE_FAILURES) — exclude pre-existing failures
|
|
10
|
-
|
|
11
|
-
2. **Check termination conditions** (evaluate BEFORE each fix attempt):
|
|
12
|
-
- **All tests pass** → Done. Exit recovery loop.
|
|
13
|
-
- **Plateau detected** — same failure count AND same failing tests for 3 consecutive rounds → AI cannot resolve these failures. Document and exit.
|
|
14
|
-
- **Still making progress** — failure count decreased compared to previous round → Continue fixing.
|
|
15
|
-
- **First round** — no history yet → Proceed to fix.
|
|
5
|
+
### Failure Classes
|
|
16
6
|
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
- Apply fix, re-run `{{TEST_CMD}}`, go back to step 1
|
|
7
|
+
| Class | Meaning | Required Action | May Continue? |
|
|
8
|
+
|-------|---------|-----------------|---------------|
|
|
9
|
+
| Baseline failure | Existed before this session | Document in Implementation Log | Yes |
|
|
10
|
+
| New regression | Introduced by this session | Fix before reporting success | No |
|
|
11
|
+
| Brittle test | Test expectation/setup is wrong for the intended behavior | Fix the test or environment setup, then rerun | Only after fixed or documented as blocked |
|
|
12
|
+
| Environment/tooling failure | External tool, network, install, or local environment prevents verification | Document in `failure-log.md` with impact on gates | Only if no Verification Gate is blocked |
|
|
24
13
|
|
|
25
|
-
###
|
|
14
|
+
### Recovery Loop
|
|
26
15
|
|
|
27
|
-
|
|
16
|
+
1. Run tests and record: failing test names, failure count, and class for each failure.
|
|
17
|
+
2. Compare with `BASELINE_FAILURES`; never blame baseline failures on this feature.
|
|
18
|
+
3. Fix new regressions and brittle tests while progress is being made.
|
|
19
|
+
4. Stop after a plateau: same failure count and same failing tests for 3 consecutive rounds.
|
|
20
|
+
5. If failures decrease, reset the plateau counter.
|
|
28
21
|
|
|
29
|
-
|
|
22
|
+
### Success Rule
|
|
30
23
|
|
|
31
|
-
|
|
24
|
+
Proceed only when:
|
|
32
25
|
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
- **Do NOT block commit** — unresolved test failures are deferred to next session
|
|
26
|
+
1. all new regressions are fixed;
|
|
27
|
+
2. baseline failures are documented;
|
|
28
|
+
3. every Verification Gate is verified.
|
|
37
29
|
|
|
38
|
-
|
|
30
|
+
Blocked gates are not success. If any gate cannot be verified, follow the Blocked Rule.
|
|
39
31
|
|
|
40
|
-
|
|
32
|
+
### Blocked Rule
|
|
41
33
|
|
|
42
|
-
|
|
34
|
+
If a remaining failure prevents any Verification Gate from being verified, the feature is incomplete. Write `failure-log.md` and do not report success.
|
|
43
35
|
|
|
44
|
-
|
|
36
|
+
### Failure Capture Format
|
|
45
37
|
|
|
46
|
-
```
|
|
38
|
+
```markdown
|
|
47
39
|
## Test Failures Encountered
|
|
48
40
|
|
|
49
41
|
- **Test**: [test name/path]
|
|
50
42
|
- Root Cause: [explanation]
|
|
51
|
-
- Category: [
|
|
43
|
+
- Category: [baseline failure | new regression | brittle test | environment/tooling]
|
|
52
44
|
- Rounds Attempted: [N rounds, plateau at round M]
|
|
53
|
-
- Status: [
|
|
45
|
+
- Status: [fixed | still failing | blocked]
|
|
54
46
|
|
|
55
|
-
- **Impact on
|
|
47
|
+
- **Impact on Verification Gates**: [verified | not affected | blocked + reason]
|
|
56
48
|
```
|
|
57
|
-
|
|
58
|
-
**Rule**: If any AC cannot be verified due to test failure, the feature is incomplete. Document in failure-log.md for next session.
|