@codyswann/lisa 1.81.4 → 1.81.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +2 -2
- package/plugins/lisa/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa/agents/builder.md +2 -2
- package/plugins/lisa/agents/verification-specialist.md +13 -8
- package/plugins/lisa/rules/verification.md +16 -9
- package/plugins/lisa/skills/plan-execute/SKILL.md +5 -5
- package/plugins/lisa/skills/tdd-implementation/SKILL.md +3 -3
- package/plugins/lisa/skills/verification-lifecycle/SKILL.md +27 -20
- package/plugins/lisa-cdk/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-expo/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-nestjs/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-rails/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-typescript/.claude-plugin/plugin.json +1 -1
- package/plugins/src/base/agents/builder.md +2 -2
- package/plugins/src/base/agents/verification-specialist.md +13 -8
- package/plugins/src/base/rules/verification.md +16 -9
- package/plugins/src/base/skills/plan-execute/SKILL.md +5 -5
- package/plugins/src/base/skills/tdd-implementation/SKILL.md +3 -3
- package/plugins/src/base/skills/verification-lifecycle/SKILL.md +27 -20
package/package.json
CHANGED
|
@@ -76,7 +76,7 @@
|
|
|
76
76
|
"lodash": ">=4.18.1"
|
|
77
77
|
},
|
|
78
78
|
"name": "@codyswann/lisa",
|
|
79
|
-
"version": "1.81.
|
|
79
|
+
"version": "1.81.5",
|
|
80
80
|
"description": "Claude Code governance framework that applies guardrails, guidance, and automated enforcement to projects",
|
|
81
81
|
"main": "dist/index.js",
|
|
82
82
|
"exports": {
|
|
@@ -182,7 +182,7 @@
|
|
|
182
182
|
"vitest": "^4.1.0"
|
|
183
183
|
},
|
|
184
184
|
"devDependencies": {
|
|
185
|
-
"@codyswann/lisa": "^1.81.
|
|
185
|
+
"@codyswann/lisa": "^1.81.3",
|
|
186
186
|
"vite": "^8.0.5"
|
|
187
187
|
},
|
|
188
188
|
"type": "module"
|
|
@@ -26,7 +26,7 @@ If any of these are missing, ask the team for them before proceeding.
|
|
|
26
26
|
1. **Write failing tests** — translate each acceptance criterion into one or more tests. This is your RED phase. Tests define the contract before any implementation exists.
|
|
27
27
|
2. **Implement** — write the minimum code to make each test pass, one at a time. This is your GREEN phase.
|
|
28
28
|
3. **Refactor** — clean up while keeping all tests green. Follow existing patterns identified in the architecture plan.
|
|
29
|
-
4. **Run
|
|
29
|
+
4. **Run quality checks** — run tests, typecheck, and lint. These are quality gates (prerequisites), NOT verification. Empirical verification (running the actual system) is done separately by the `verification-specialist`.
|
|
30
30
|
5. **Update documentation** — add/update JSDoc preambles explaining the "why" behind each new piece of code.
|
|
31
31
|
6. **Commit atomically** — use the `/git-commit` skill. Group related changes into logical commits.
|
|
32
32
|
|
|
@@ -38,4 +38,4 @@ If any of these are missing, ask the team for them before proceeding.
|
|
|
38
38
|
- One task at a time — complete the current task before moving on
|
|
39
39
|
- If you discover a gap in the acceptance criteria, ask the team — don't guess
|
|
40
40
|
- If a dependency is missing (API not built, schema not migrated), report it as a blocker
|
|
41
|
-
- Never mark the task complete without running the
|
|
41
|
+
- Never mark the task complete without running quality checks (tests, typecheck, lint). Note: this is NOT verification — empirical verification is handled by the `verification-specialist`
|
|
@@ -15,17 +15,21 @@ Read `.claude/rules/verification.md` at the start of every investigation for the
|
|
|
15
15
|
|
|
16
16
|
## Core Philosophy
|
|
17
17
|
|
|
18
|
-
**"If you didn't run it, you didn't verify it."** Code review is not verification. Reading a test file is not verification. Only executing the system and observing output counts as proof.
|
|
18
|
+
**"If you didn't run it, you didn't verify it."** Code review is not verification. Reading a test file is not verification. **Running tests, typecheck, and lint is not verification either — those are quality gates (prerequisites).** Only executing the actual system and observing output counts as proof. Verification means making HTTP requests, clicking through the UI, running CLI commands, querying the database, or otherwise interacting with the running software as an end user would.
|
|
19
19
|
|
|
20
20
|
## Verification Process
|
|
21
21
|
|
|
22
|
-
Follow the verification lifecycle: **classify, check tooling, fail fast, plan, execute, loop.**
|
|
22
|
+
Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, loop.**
|
|
23
23
|
|
|
24
|
-
### 1.
|
|
24
|
+
### 1. Confirm Quality Gates
|
|
25
25
|
|
|
26
|
-
|
|
26
|
+
Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites — if they fail, fix them first. But passing quality gates does NOT mean the change is verified.
|
|
27
27
|
|
|
28
|
-
### 2.
|
|
28
|
+
### 2. Classify
|
|
29
|
+
|
|
30
|
+
Read `.claude/rules/verification.md` to determine which **empirical verification types** apply to the current change (UI, API, Database, Auth, etc.). Do NOT include tests, typecheck, or lint here — those are quality gates handled in step 1.
|
|
31
|
+
|
|
32
|
+
### 3. Discover Available Tools
|
|
29
33
|
|
|
30
34
|
Before creating anything new, find what the project already has.
|
|
31
35
|
|
|
@@ -46,7 +50,7 @@ Before creating anything new, find what the project already has.
|
|
|
46
50
|
**MCP tools:**
|
|
47
51
|
- Check available MCP server tools for browser automation, observability, issue tracking, and other capabilities
|
|
48
52
|
|
|
49
|
-
###
|
|
53
|
+
### 4. Plan the Verification
|
|
50
54
|
|
|
51
55
|
For each required verification type, determine:
|
|
52
56
|
|
|
@@ -61,7 +65,7 @@ For each required verification type, determine:
|
|
|
61
65
|
|
|
62
66
|
If any required verification type has no available tool and no reasonable alternative, escalate immediately.
|
|
63
67
|
|
|
64
|
-
###
|
|
68
|
+
### 5. Execute and Report
|
|
65
69
|
|
|
66
70
|
Run the verification and capture output. Always include:
|
|
67
71
|
|
|
@@ -113,7 +117,8 @@ If any verification fails, fix and re-verify. Do not declare done until all requ
|
|
|
113
117
|
## Rules
|
|
114
118
|
|
|
115
119
|
- Always read `.claude/rules/verification.md` first for the project's verification standards and type taxonomy
|
|
116
|
-
- Follow the verification lifecycle: classify, check tooling, fail fast, plan, execute, loop
|
|
120
|
+
- Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, loop
|
|
121
|
+
- Tests, typecheck, lint, and format are quality gates (prerequisites), NOT verification — never report them as verification evidence
|
|
117
122
|
- Discover existing project scripts and tools before creating new ones
|
|
118
123
|
- Every verification must produce observable output -- a status code, a response body, a UI state, a test result
|
|
119
124
|
- Verification scripts must be runnable locally without CI/CD dependencies
|
|
@@ -16,11 +16,13 @@ No agent may claim success without evidence from runtime verification.
|
|
|
16
16
|
|
|
17
17
|
Never assume something works because the code "looks correct." Run a command, observe the output, compare to expected result.
|
|
18
18
|
|
|
19
|
-
**Verification is not linting, typechecking, or testing.** Those are quality checks. Verification is using the resulting software the way a user would — interacting with the UI, calling the API, running the CLI command, observing the behavior. Tests pass in isolation; verification proves the system works as a whole.
|
|
19
|
+
**Verification is not linting, typechecking, or testing.** Those are **quality checks** — prerequisites that must pass before verification begins, but they are NOT verification themselves. Running `bun run test`, `bun run typecheck`, `bun run lint`, or `bun run format` is a quality check. Verification is using the resulting software the way a user would — interacting with the UI, calling the API, running the CLI command, observing the behavior. Tests pass in isolation; verification proves the system works as a whole.
|
|
20
|
+
|
|
21
|
+
**If all you did was run tests, typecheck, and lint — you have NOT verified.** You have only confirmed quality checks pass. Verification requires running the actual system and observing the results: making HTTP requests, clicking through the UI, executing CLI commands, querying the database, or otherwise interacting with the running software as an end user would.
|
|
20
22
|
|
|
21
23
|
Verification is mandatory. Never skip it, defer it, or claim it was unnecessary. Every task must be verified before claiming completion.
|
|
22
24
|
|
|
23
|
-
Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. Do not begin implementation until the plan is confirmed.
|
|
25
|
+
Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. A verification plan that only lists test/typecheck/lint commands is not a verification plan. Do not begin implementation until the plan is confirmed.
|
|
24
26
|
|
|
25
27
|
After verifying a change empirically, encode that verification as automated tests. The manual proof that something works should become a repeatable regression test that catches future regressions. Every verification should answer: "How do I turn this into a test?"
|
|
26
28
|
|
|
@@ -56,19 +58,23 @@ Agents must label every task outcome with exactly one of these:
|
|
|
56
58
|
|
|
57
59
|
---
|
|
58
60
|
|
|
59
|
-
##
|
|
60
|
-
|
|
61
|
-
Every change requires one or more verification types. Classify the change first, then verify each applicable type.
|
|
61
|
+
## Quality Gates (Prerequisites)
|
|
62
62
|
|
|
63
|
-
|
|
63
|
+
These are NOT verification. They are prerequisites that must pass before verification begins. Running these does not constitute verification — it only confirms code quality.
|
|
64
64
|
|
|
65
|
-
|
|
|
65
|
+
| Gate | What to prove | Acceptable proof |
|
|
66
66
|
|------|---------------|------------------|
|
|
67
67
|
| **Test** | Unit and integration tests pass for all changed code paths | Test runner output showing all relevant tests green with no skips |
|
|
68
68
|
| **Type Safety** | No type errors introduced by the change | Type checker exits clean on the full project |
|
|
69
69
|
| **Lint/Format** | Code meets project style and quality rules | Linter and formatter exit clean on changed files |
|
|
70
70
|
|
|
71
|
-
|
|
71
|
+
Quality gates are enforced automatically by the self-correction loop (hooks, pre-commit, pre-push). Passing all quality gates is necessary but NOT sufficient — you must still verify empirically.
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Verification Types
|
|
76
|
+
|
|
77
|
+
Every change requires one or more verification types. Classify the change first, then verify each applicable type by **running the actual system and observing results**.
|
|
72
78
|
|
|
73
79
|
| Type | When it applies | What to prove | Acceptable proof |
|
|
74
80
|
|------|----------------|---------------|------------------|
|
|
@@ -94,7 +100,8 @@ Every change requires one or more verification types. Classify the change first,
|
|
|
94
100
|
|
|
95
101
|
Verification happens at two stages in the workflow:
|
|
96
102
|
|
|
97
|
-
- **
|
|
103
|
+
- **Quality gates** (enforced automatically): Tests, typecheck, lint, and format run via hooks at write-time, commit-time, and push-time. These are prerequisites, not verification.
|
|
104
|
+
- **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, encode it as an e2e test.
|
|
98
105
|
- **Remote verification** (part of the Verify flow): After the PR is merged and deployed, repeat the same empirical verification against the target environment. This proves the change works in production, not just locally. If remote verification fails, fix and re-deploy.
|
|
99
106
|
|
|
100
107
|
Both levels use the same verification types table above. The difference is the environment, not the rigor.
|
|
@@ -77,9 +77,9 @@ Every task MUST include this JSON metadata block. Do NOT omit `skills` (use `[]`
|
|
|
77
77
|
"skills": ["..."],
|
|
78
78
|
"learnings": ["..."],
|
|
79
79
|
"verification": {
|
|
80
|
-
"type": "
|
|
81
|
-
"command": "the proof command",
|
|
82
|
-
"expected": "what success looks like"
|
|
80
|
+
"type": "ui-recording|api-test|cli-test|database-check|manual-check|documentation",
|
|
81
|
+
"command": "the proof command — must run the actual system (NOT test/typecheck/lint, those are quality gates)",
|
|
82
|
+
"expected": "what success looks like — observable system behavior"
|
|
83
83
|
}
|
|
84
84
|
}
|
|
85
85
|
```
|
|
@@ -91,8 +91,8 @@ Each task must have their learnings reviewed by the learner subagent.
|
|
|
91
91
|
|
|
92
92
|
Before shutting down the team, execute the Verify flow:
|
|
93
93
|
|
|
94
|
-
1. Run
|
|
95
|
-
2. `verification-specialist`: verify locally (empirical proof that the change works)
|
|
94
|
+
1. Run quality gates: lint, typecheck, tests — all must pass. These are prerequisites, NOT verification.
|
|
95
|
+
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step.
|
|
96
96
|
3. Write e2e test encoding the verification
|
|
97
97
|
4. Commit ALL outstanding changes in logical batches on the branch (minus sensitive data/information) — not just changes made by the agent team. This includes pre-existing uncommitted changes that were on the branch before the plan started. Do NOT filter commits to only "task-related" files. If it shows up in git status, it gets committed (unless it contains secrets).
|
|
98
98
|
5. Push the changes - if any pre-push hook blocks you, create a task for the agent team to fix the error/problem whether it was pre-existing or not
|
|
@@ -21,9 +21,9 @@ Each task you work on must have the following in its metadata:
|
|
|
21
21
|
"skills": ["..."],
|
|
22
22
|
"learnings": ["..."],
|
|
23
23
|
"verification": {
|
|
24
|
-
"type": "
|
|
25
|
-
"command": "the proof command",
|
|
26
|
-
"expected": "what success looks like"
|
|
24
|
+
"type": "ui-recording|api-test|cli-test|database-check|manual-check|documentation",
|
|
25
|
+
"command": "the proof command — must run the actual system (NOT test/typecheck/lint, those are quality gates)",
|
|
26
|
+
"expected": "what success looks like — observable system behavior"
|
|
27
27
|
}
|
|
28
28
|
}
|
|
29
29
|
```
|
|
@@ -1,42 +1,48 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: verification-lifecycle
|
|
3
|
-
description: "Verification lifecycle: classify types, discover tools, fail fast, plan, execute, loop.
|
|
3
|
+
description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Verification Lifecycle
|
|
7
7
|
|
|
8
|
-
This skill defines the complete verification lifecycle that agents must follow for every change: classify, check tooling, fail fast, plan, execute, and loop.
|
|
8
|
+
This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, and loop.
|
|
9
9
|
|
|
10
10
|
## Verification Lifecycle
|
|
11
11
|
|
|
12
12
|
Agents must follow this mandatory sequence for every change:
|
|
13
13
|
|
|
14
|
-
### 1.
|
|
14
|
+
### 1. Confirm Quality Gates
|
|
15
15
|
|
|
16
|
-
|
|
16
|
+
Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites, NOT verification. Do not count them as verification — they are enforced automatically by hooks and CI. If quality gates fail, fix them before proceeding.
|
|
17
17
|
|
|
18
|
-
### 2.
|
|
18
|
+
### 2. Classify
|
|
19
|
+
|
|
20
|
+
Determine which **empirical verification types** apply based on the change. Check each type in the Verification Types table in `.claude/rules/verification.md` against the change scope. Every applicable type requires running the actual system and observing results — not just running tests.
|
|
21
|
+
|
|
22
|
+
### 3. Check Tooling
|
|
19
23
|
|
|
20
24
|
For each required verification type, discover what tools are available in the project. Use the Tool Discovery Process below.
|
|
21
25
|
|
|
22
|
-
Report what is available for each required type. If a required type has no available tool, proceed to step
|
|
26
|
+
Report what is available for each required type. If a required type has no available tool, proceed to step 4.
|
|
23
27
|
|
|
24
|
-
###
|
|
28
|
+
### 4. Fail Fast
|
|
25
29
|
|
|
26
30
|
If a required verification type has no available tool and no reasonable alternative, escalate immediately using the Escalation Protocol. Do not begin implementation without a verification plan for every required type.
|
|
27
31
|
|
|
28
|
-
###
|
|
32
|
+
### 5. Plan
|
|
29
33
|
|
|
30
34
|
For each verification type, state:
|
|
31
|
-
- The specific tool or command that will be used
|
|
35
|
+
- The specific tool or command that will be used (NOT test/typecheck/lint — those are quality gates, not verification)
|
|
32
36
|
- The expected outcome that constitutes a pass
|
|
33
37
|
- Any prerequisites (running server, seeded database, auth token)
|
|
34
38
|
|
|
35
|
-
|
|
39
|
+
A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun run lint` is NOT a verification plan. Those are quality gates handled in step 1.
|
|
40
|
+
|
|
41
|
+
### 6. Execute
|
|
36
42
|
|
|
37
43
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
38
44
|
|
|
39
|
-
###
|
|
45
|
+
### 7. Loop
|
|
40
46
|
|
|
41
47
|
If any verification fails, fix the issue and re-verify. Do not declare done until all required types pass. If a verification is stuck after 3 attempts, escalate.
|
|
42
48
|
|
|
@@ -166,15 +172,16 @@ Agents must follow this sequence unless explicitly instructed otherwise:
|
|
|
166
172
|
|
|
167
173
|
1. Restate goal in one sentence.
|
|
168
174
|
2. Identify the end user of the change.
|
|
169
|
-
3.
|
|
170
|
-
4.
|
|
171
|
-
5.
|
|
172
|
-
6.
|
|
173
|
-
7.
|
|
174
|
-
8.
|
|
175
|
-
9.
|
|
176
|
-
10.
|
|
177
|
-
11.
|
|
175
|
+
3. Confirm quality gates pass (tests, typecheck, lint, format) — these are prerequisites, NOT verification.
|
|
176
|
+
4. Classify empirical verification types that apply to the change (UI, API, Database, etc.).
|
|
177
|
+
5. Discover available tools for each verification type.
|
|
178
|
+
6. Confirm required surfaces are available, escalate if not.
|
|
179
|
+
7. Plan verification: state tool, command, and expected outcome for each type. Do NOT list test/typecheck/lint here — those are quality gates from step 3.
|
|
180
|
+
8. Implement the change.
|
|
181
|
+
9. Execute verification plan — run the actual system and observe results.
|
|
182
|
+
10. Collect proof artifacts.
|
|
183
|
+
11. Summarize what changed, what was verified, and remaining risk.
|
|
184
|
+
12. Label the result with a verification level.
|
|
178
185
|
|
|
179
186
|
---
|
|
180
187
|
|
|
@@ -26,7 +26,7 @@ If any of these are missing, ask the team for them before proceeding.
|
|
|
26
26
|
1. **Write failing tests** — translate each acceptance criterion into one or more tests. This is your RED phase. Tests define the contract before any implementation exists.
|
|
27
27
|
2. **Implement** — write the minimum code to make each test pass, one at a time. This is your GREEN phase.
|
|
28
28
|
3. **Refactor** — clean up while keeping all tests green. Follow existing patterns identified in the architecture plan.
|
|
29
|
-
4. **Run
|
|
29
|
+
4. **Run quality checks** — run tests, typecheck, and lint. These are quality gates (prerequisites), NOT verification. Empirical verification (running the actual system) is done separately by the `verification-specialist`.
|
|
30
30
|
5. **Update documentation** — add/update JSDoc preambles explaining the "why" behind each new piece of code.
|
|
31
31
|
6. **Commit atomically** — use the `/git-commit` skill. Group related changes into logical commits.
|
|
32
32
|
|
|
@@ -38,4 +38,4 @@ If any of these are missing, ask the team for them before proceeding.
|
|
|
38
38
|
- One task at a time — complete the current task before moving on
|
|
39
39
|
- If you discover a gap in the acceptance criteria, ask the team — don't guess
|
|
40
40
|
- If a dependency is missing (API not built, schema not migrated), report it as a blocker
|
|
41
|
-
- Never mark the task complete without running the
|
|
41
|
+
- Never mark the task complete without running quality checks (tests, typecheck, lint). Note: this is NOT verification — empirical verification is handled by the `verification-specialist`
|
|
@@ -15,17 +15,21 @@ Read `.claude/rules/verification.md` at the start of every investigation for the
|
|
|
15
15
|
|
|
16
16
|
## Core Philosophy
|
|
17
17
|
|
|
18
|
-
**"If you didn't run it, you didn't verify it."** Code review is not verification. Reading a test file is not verification. Only executing the system and observing output counts as proof.
|
|
18
|
+
**"If you didn't run it, you didn't verify it."** Code review is not verification. Reading a test file is not verification. **Running tests, typecheck, and lint is not verification either — those are quality gates (prerequisites).** Only executing the actual system and observing output counts as proof. Verification means making HTTP requests, clicking through the UI, running CLI commands, querying the database, or otherwise interacting with the running software as an end user would.
|
|
19
19
|
|
|
20
20
|
## Verification Process
|
|
21
21
|
|
|
22
|
-
Follow the verification lifecycle: **classify, check tooling, fail fast, plan, execute, loop.**
|
|
22
|
+
Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, loop.**
|
|
23
23
|
|
|
24
|
-
### 1.
|
|
24
|
+
### 1. Confirm Quality Gates
|
|
25
25
|
|
|
26
|
-
|
|
26
|
+
Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites — if they fail, fix them first. But passing quality gates does NOT mean the change is verified.
|
|
27
27
|
|
|
28
|
-
### 2.
|
|
28
|
+
### 2. Classify
|
|
29
|
+
|
|
30
|
+
Read `.claude/rules/verification.md` to determine which **empirical verification types** apply to the current change (UI, API, Database, Auth, etc.). Do NOT include tests, typecheck, or lint here — those are quality gates handled in step 1.
|
|
31
|
+
|
|
32
|
+
### 3. Discover Available Tools
|
|
29
33
|
|
|
30
34
|
Before creating anything new, find what the project already has.
|
|
31
35
|
|
|
@@ -46,7 +50,7 @@ Before creating anything new, find what the project already has.
|
|
|
46
50
|
**MCP tools:**
|
|
47
51
|
- Check available MCP server tools for browser automation, observability, issue tracking, and other capabilities
|
|
48
52
|
|
|
49
|
-
###
|
|
53
|
+
### 4. Plan the Verification
|
|
50
54
|
|
|
51
55
|
For each required verification type, determine:
|
|
52
56
|
|
|
@@ -61,7 +65,7 @@ For each required verification type, determine:
|
|
|
61
65
|
|
|
62
66
|
If any required verification type has no available tool and no reasonable alternative, escalate immediately.
|
|
63
67
|
|
|
64
|
-
###
|
|
68
|
+
### 5. Execute and Report
|
|
65
69
|
|
|
66
70
|
Run the verification and capture output. Always include:
|
|
67
71
|
|
|
@@ -113,7 +117,8 @@ If any verification fails, fix and re-verify. Do not declare done until all requ
|
|
|
113
117
|
## Rules
|
|
114
118
|
|
|
115
119
|
- Always read `.claude/rules/verification.md` first for the project's verification standards and type taxonomy
|
|
116
|
-
- Follow the verification lifecycle: classify, check tooling, fail fast, plan, execute, loop
|
|
120
|
+
- Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, loop
|
|
121
|
+
- Tests, typecheck, lint, and format are quality gates (prerequisites), NOT verification — never report them as verification evidence
|
|
117
122
|
- Discover existing project scripts and tools before creating new ones
|
|
118
123
|
- Every verification must produce observable output -- a status code, a response body, a UI state, a test result
|
|
119
124
|
- Verification scripts must be runnable locally without CI/CD dependencies
|
|
@@ -16,11 +16,13 @@ No agent may claim success without evidence from runtime verification.
|
|
|
16
16
|
|
|
17
17
|
Never assume something works because the code "looks correct." Run a command, observe the output, compare to expected result.
|
|
18
18
|
|
|
19
|
-
**Verification is not linting, typechecking, or testing.** Those are quality checks. Verification is using the resulting software the way a user would — interacting with the UI, calling the API, running the CLI command, observing the behavior. Tests pass in isolation; verification proves the system works as a whole.
|
|
19
|
+
**Verification is not linting, typechecking, or testing.** Those are **quality checks** — prerequisites that must pass before verification begins, but they are NOT verification themselves. Running `bun run test`, `bun run typecheck`, `bun run lint`, or `bun run format` is a quality check. Verification is using the resulting software the way a user would — interacting with the UI, calling the API, running the CLI command, observing the behavior. Tests pass in isolation; verification proves the system works as a whole.
|
|
20
|
+
|
|
21
|
+
**If all you did was run tests, typecheck, and lint — you have NOT verified.** You have only confirmed quality checks pass. Verification requires running the actual system and observing the results: making HTTP requests, clicking through the UI, executing CLI commands, querying the database, or otherwise interacting with the running software as an end user would.
|
|
20
22
|
|
|
21
23
|
Verification is mandatory. Never skip it, defer it, or claim it was unnecessary. Every task must be verified before claiming completion.
|
|
22
24
|
|
|
23
|
-
Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. Do not begin implementation until the plan is confirmed.
|
|
25
|
+
Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. A verification plan that only lists test/typecheck/lint commands is not a verification plan. Do not begin implementation until the plan is confirmed.
|
|
24
26
|
|
|
25
27
|
After verifying a change empirically, encode that verification as automated tests. The manual proof that something works should become a repeatable regression test that catches future regressions. Every verification should answer: "How do I turn this into a test?"
|
|
26
28
|
|
|
@@ -56,19 +58,23 @@ Agents must label every task outcome with exactly one of these:
|
|
|
56
58
|
|
|
57
59
|
---
|
|
58
60
|
|
|
59
|
-
##
|
|
60
|
-
|
|
61
|
-
Every change requires one or more verification types. Classify the change first, then verify each applicable type.
|
|
61
|
+
## Quality Gates (Prerequisites)
|
|
62
62
|
|
|
63
|
-
|
|
63
|
+
These are NOT verification. They are prerequisites that must pass before verification begins. Running these does not constitute verification — it only confirms code quality.
|
|
64
64
|
|
|
65
|
-
|
|
|
65
|
+
| Gate | What to prove | Acceptable proof |
|
|
66
66
|
|------|---------------|------------------|
|
|
67
67
|
| **Test** | Unit and integration tests pass for all changed code paths | Test runner output showing all relevant tests green with no skips |
|
|
68
68
|
| **Type Safety** | No type errors introduced by the change | Type checker exits clean on the full project |
|
|
69
69
|
| **Lint/Format** | Code meets project style and quality rules | Linter and formatter exit clean on changed files |
|
|
70
70
|
|
|
71
|
-
|
|
71
|
+
Quality gates are enforced automatically by the self-correction loop (hooks, pre-commit, pre-push). Passing all quality gates is necessary but NOT sufficient — you must still verify empirically.
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Verification Types
|
|
76
|
+
|
|
77
|
+
Every change requires one or more verification types. Classify the change first, then verify each applicable type by **running the actual system and observing results**.
|
|
72
78
|
|
|
73
79
|
| Type | When it applies | What to prove | Acceptable proof |
|
|
74
80
|
|------|----------------|---------------|------------------|
|
|
@@ -94,7 +100,8 @@ Every change requires one or more verification types. Classify the change first,
|
|
|
94
100
|
|
|
95
101
|
Verification happens at two stages in the workflow:
|
|
96
102
|
|
|
97
|
-
- **
|
|
103
|
+
- **Quality gates** (enforced automatically): Tests, typecheck, lint, and format run via hooks at write-time, commit-time, and push-time. These are prerequisites, not verification.
|
|
104
|
+
- **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, encode it as an e2e test.
|
|
98
105
|
- **Remote verification** (part of the Verify flow): After the PR is merged and deployed, repeat the same empirical verification against the target environment. This proves the change works in production, not just locally. If remote verification fails, fix and re-deploy.
|
|
99
106
|
|
|
100
107
|
Both levels use the same verification types table above. The difference is the environment, not the rigor.
|
|
@@ -77,9 +77,9 @@ Every task MUST include this JSON metadata block. Do NOT omit `skills` (use `[]`
|
|
|
77
77
|
"skills": ["..."],
|
|
78
78
|
"learnings": ["..."],
|
|
79
79
|
"verification": {
|
|
80
|
-
"type": "
|
|
81
|
-
"command": "the proof command",
|
|
82
|
-
"expected": "what success looks like"
|
|
80
|
+
"type": "ui-recording|api-test|cli-test|database-check|manual-check|documentation",
|
|
81
|
+
"command": "the proof command — must run the actual system (NOT test/typecheck/lint, those are quality gates)",
|
|
82
|
+
"expected": "what success looks like — observable system behavior"
|
|
83
83
|
}
|
|
84
84
|
}
|
|
85
85
|
```
|
|
@@ -91,8 +91,8 @@ Each task must have their learnings reviewed by the learner subagent.
|
|
|
91
91
|
|
|
92
92
|
Before shutting down the team, execute the Verify flow:
|
|
93
93
|
|
|
94
|
-
1. Run
|
|
95
|
-
2. `verification-specialist`: verify locally (empirical proof that the change works)
|
|
94
|
+
1. Run quality gates: lint, typecheck, tests — all must pass. These are prerequisites, NOT verification.
|
|
95
|
+
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step.
|
|
96
96
|
3. Write e2e test encoding the verification
|
|
97
97
|
4. Commit ALL outstanding changes in logical batches on the branch (minus sensitive data/information) — not just changes made by the agent team. This includes pre-existing uncommitted changes that were on the branch before the plan started. Do NOT filter commits to only "task-related" files. If it shows up in git status, it gets committed (unless it contains secrets).
|
|
98
98
|
5. Push the changes - if any pre-push hook blocks you, create a task for the agent team to fix the error/problem whether it was pre-existing or not
|
|
@@ -21,9 +21,9 @@ Each task you work on must have the following in its metadata:
|
|
|
21
21
|
"skills": ["..."],
|
|
22
22
|
"learnings": ["..."],
|
|
23
23
|
"verification": {
|
|
24
|
-
"type": "
|
|
25
|
-
"command": "the proof command",
|
|
26
|
-
"expected": "what success looks like"
|
|
24
|
+
"type": "ui-recording|api-test|cli-test|database-check|manual-check|documentation",
|
|
25
|
+
"command": "the proof command — must run the actual system (NOT test/typecheck/lint, those are quality gates)",
|
|
26
|
+
"expected": "what success looks like — observable system behavior"
|
|
27
27
|
}
|
|
28
28
|
}
|
|
29
29
|
```
|
|
@@ -1,42 +1,48 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: verification-lifecycle
|
|
3
|
-
description: "Verification lifecycle: classify types, discover tools, fail fast, plan, execute, loop.
|
|
3
|
+
description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Verification Lifecycle
|
|
7
7
|
|
|
8
|
-
This skill defines the complete verification lifecycle that agents must follow for every change: classify, check tooling, fail fast, plan, execute, and loop.
|
|
8
|
+
This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, and loop.
|
|
9
9
|
|
|
10
10
|
## Verification Lifecycle
|
|
11
11
|
|
|
12
12
|
Agents must follow this mandatory sequence for every change:
|
|
13
13
|
|
|
14
|
-
### 1.
|
|
14
|
+
### 1. Confirm Quality Gates
|
|
15
15
|
|
|
16
|
-
|
|
16
|
+
Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites, NOT verification. Do not count them as verification — they are enforced automatically by hooks and CI. If quality gates fail, fix them before proceeding.
|
|
17
17
|
|
|
18
|
-
### 2.
|
|
18
|
+
### 2. Classify
|
|
19
|
+
|
|
20
|
+
Determine which **empirical verification types** apply based on the change. Check each type in the Verification Types table in `.claude/rules/verification.md` against the change scope. Every applicable type requires running the actual system and observing results — not just running tests.
|
|
21
|
+
|
|
22
|
+
### 3. Check Tooling
|
|
19
23
|
|
|
20
24
|
For each required verification type, discover what tools are available in the project. Use the Tool Discovery Process below.
|
|
21
25
|
|
|
22
|
-
Report what is available for each required type. If a required type has no available tool, proceed to step
|
|
26
|
+
Report what is available for each required type. If a required type has no available tool, proceed to step 4.
|
|
23
27
|
|
|
24
|
-
###
|
|
28
|
+
### 4. Fail Fast
|
|
25
29
|
|
|
26
30
|
If a required verification type has no available tool and no reasonable alternative, escalate immediately using the Escalation Protocol. Do not begin implementation without a verification plan for every required type.
|
|
27
31
|
|
|
28
|
-
###
|
|
32
|
+
### 5. Plan
|
|
29
33
|
|
|
30
34
|
For each verification type, state:
|
|
31
|
-
- The specific tool or command that will be used
|
|
35
|
+
- The specific tool or command that will be used (NOT test/typecheck/lint — those are quality gates, not verification)
|
|
32
36
|
- The expected outcome that constitutes a pass
|
|
33
37
|
- Any prerequisites (running server, seeded database, auth token)
|
|
34
38
|
|
|
35
|
-
|
|
39
|
+
A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun run lint` is NOT a verification plan. Those are quality gates handled in step 1.
|
|
40
|
+
|
|
41
|
+
### 6. Execute
|
|
36
42
|
|
|
37
43
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
38
44
|
|
|
39
|
-
###
|
|
45
|
+
### 7. Loop
|
|
40
46
|
|
|
41
47
|
If any verification fails, fix the issue and re-verify. Do not declare done until all required types pass. If a verification is stuck after 3 attempts, escalate.
|
|
42
48
|
|
|
@@ -166,15 +172,16 @@ Agents must follow this sequence unless explicitly instructed otherwise:
|
|
|
166
172
|
|
|
167
173
|
1. Restate goal in one sentence.
|
|
168
174
|
2. Identify the end user of the change.
|
|
169
|
-
3.
|
|
170
|
-
4.
|
|
171
|
-
5.
|
|
172
|
-
6.
|
|
173
|
-
7.
|
|
174
|
-
8.
|
|
175
|
-
9.
|
|
176
|
-
10.
|
|
177
|
-
11.
|
|
175
|
+
3. Confirm quality gates pass (tests, typecheck, lint, format) — these are prerequisites, NOT verification.
|
|
176
|
+
4. Classify empirical verification types that apply to the change (UI, API, Database, etc.).
|
|
177
|
+
5. Discover available tools for each verification type.
|
|
178
|
+
6. Confirm required surfaces are available, escalate if not.
|
|
179
|
+
7. Plan verification: state tool, command, and expected outcome for each type. Do NOT list test/typecheck/lint here — those are quality gates from step 3.
|
|
180
|
+
8. Implement the change.
|
|
181
|
+
9. Execute verification plan — run the actual system and observe results.
|
|
182
|
+
10. Collect proof artifacts.
|
|
183
|
+
11. Summarize what changed, what was verified, and remaining risk.
|
|
184
|
+
12. Label the result with a verification level.
|
|
178
185
|
|
|
179
186
|
---
|
|
180
187
|
|