@codyswann/lisa 2.6.3 → 2.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/oxlint/expo.json +0 -3
- package/package.json +1 -1
- package/plugins/lisa/.claude-plugin/plugin.json +10 -1
- package/plugins/lisa/agents/verification-specialist.md +8 -2
- package/plugins/lisa/commands/codify-verification.md +6 -0
- package/plugins/lisa/hooks/block-no-verify.sh +37 -0
- package/plugins/lisa/rules/intent-routing.md +5 -4
- package/plugins/lisa/rules/verification.md +2 -2
- package/plugins/lisa/skills/codify-verification/SKILL.md +152 -0
- package/plugins/lisa/skills/verification-lifecycle/SKILL.md +31 -9
- package/plugins/lisa/skills/verify/SKILL.md +7 -6
- package/plugins/lisa-cdk/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-expo/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-nestjs/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-rails/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-typescript/.claude-plugin/plugin.json +1 -1
- package/plugins/src/base/.claude-plugin/plugin.json +6 -0
- package/plugins/src/base/agents/verification-specialist.md +8 -2
- package/plugins/src/base/commands/codify-verification.md +6 -0
- package/plugins/src/base/hooks/block-no-verify.sh +37 -0
- package/plugins/src/base/rules/intent-routing.md +5 -4
- package/plugins/src/base/rules/verification.md +2 -2
- package/plugins/src/base/skills/codify-verification/SKILL.md +152 -0
- package/plugins/src/base/skills/verification-lifecycle/SKILL.md +31 -9
- package/plugins/src/base/skills/verify/SKILL.md +7 -6
package/oxlint/expo.json
CHANGED
package/package.json
CHANGED
|
@@ -79,7 +79,7 @@
|
|
|
79
79
|
"lodash": ">=4.18.1"
|
|
80
80
|
},
|
|
81
81
|
"name": "@codyswann/lisa",
|
|
82
|
-
"version": "2.
|
|
82
|
+
"version": "2.7.0",
|
|
83
83
|
"description": "Claude Code governance framework that applies guardrails, guidance, and automated enforcement to projects",
|
|
84
84
|
"main": "dist/index.js",
|
|
85
85
|
"exports": {
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lisa",
|
|
3
|
-
"version": "2.
|
|
3
|
+
"version": "2.7.0",
|
|
4
4
|
"description": "Universal governance — agents, skills, commands, hooks, and rules for all projects",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Cody Swann"
|
|
@@ -46,6 +46,15 @@
|
|
|
46
46
|
"command": "command -v entire >/dev/null 2>&1 && entire hooks claude-code pre-task || true"
|
|
47
47
|
}
|
|
48
48
|
]
|
|
49
|
+
},
|
|
50
|
+
{
|
|
51
|
+
"matcher": "Bash",
|
|
52
|
+
"hooks": [
|
|
53
|
+
{
|
|
54
|
+
"type": "command",
|
|
55
|
+
"command": "${CLAUDE_PLUGIN_ROOT}/hooks/block-no-verify.sh"
|
|
56
|
+
}
|
|
57
|
+
]
|
|
49
58
|
}
|
|
50
59
|
],
|
|
51
60
|
"Stop": [
|
|
@@ -3,6 +3,7 @@ name: verification-specialist
|
|
|
3
3
|
description: Verification specialist agent. Discovers project tooling and executes verification for all required types. Plans and executes empirical proof that work is done by running the actual system and observing results.
|
|
4
4
|
skills:
|
|
5
5
|
- verification-lifecycle
|
|
6
|
+
- codify-verification
|
|
6
7
|
- jira-journey
|
|
7
8
|
- spec-conformance
|
|
8
9
|
---
|
|
@@ -19,7 +20,7 @@ Read `.claude/rules/verification.md` at the start of every investigation for the
|
|
|
19
20
|
|
|
20
21
|
## Verification Process
|
|
21
22
|
|
|
22
|
-
Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, loop.**
|
|
23
|
+
Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop.**
|
|
23
24
|
|
|
24
25
|
### 1. Confirm Quality Gates
|
|
25
26
|
|
|
@@ -76,6 +77,10 @@ Run the verification and capture output. Always include:
|
|
|
76
77
|
|
|
77
78
|
If any verification fails, fix and re-verify. Do not declare done until all required types pass.
|
|
78
79
|
|
|
80
|
+
### 6. Codify
|
|
81
|
+
|
|
82
|
+
For every empirical verification that produced PASS evidence, invoke the `codify-verification` skill to encode it as a regression test in the appropriate framework (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.). The new test must run, pass, and be committed in the same PR. Skipping codification is allowed only for non-behavioral types (PR, Documentation, Deploy) and Investigate-Only spikes — for everything else, codify or escalate.
|
|
83
|
+
|
|
79
84
|
## Output Format
|
|
80
85
|
|
|
81
86
|
```
|
|
@@ -117,7 +122,8 @@ If any verification fails, fix and re-verify. Do not declare done until all requ
|
|
|
117
122
|
## Rules
|
|
118
123
|
|
|
119
124
|
- Always read `.claude/rules/verification.md` first for the project's verification standards and type taxonomy
|
|
120
|
-
- Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, loop
|
|
125
|
+
- Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop
|
|
126
|
+
- Every passing empirical verification must be codified as a regression test via `codify-verification` before declaring done (skip allowed only for PR / Documentation / Deploy / Investigate-Only)
|
|
121
127
|
- Tests, typecheck, lint, and format are quality gates (prerequisites), NOT verification — never report them as verification evidence
|
|
122
128
|
- Discover existing project scripts and tools before creating new ones
|
|
123
129
|
- Every verification must produce observable output -- a status code, a response body, a UI state, a test result
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Convert empirical verification into a regression test (Playwright for UI, integration test for API/DB, benchmark for performance, etc.) so it doesn't regress. Mandatory step after verification passes — invoked from verification-lifecycle and from each Build/Fix/Improve flow."
|
|
3
|
+
argument-hint: "<verification-type> <what-was-verified>"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
Use the /lisa:codify-verification skill to encode the empirical verification that just passed as a regression test, in the appropriate framework for the verification type. $ARGUMENTS
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# PreToolUse hook for Bash: blocks any command containing --no-verify.
|
|
3
|
+
# --no-verify on git commit/push (and equivalents) bypasses pre-commit/pre-push
|
|
4
|
+
# hooks that exist for a reason. The fix is to address the underlying issue,
|
|
5
|
+
# not silence the check. See feedback_never_no_verify in user memory.
|
|
6
|
+
#
|
|
7
|
+
# Word-boundary match avoids false positives on flags like --no-verify-ssl,
|
|
8
|
+
# --no-verify-host, etc.
|
|
9
|
+
set -euo pipefail
|
|
10
|
+
|
|
11
|
+
input="$(cat)"
|
|
12
|
+
|
|
13
|
+
tool_name="$(printf '%s' "$input" | jq -r '.tool_name // empty')"
|
|
14
|
+
if [ "$tool_name" != "Bash" ]; then
|
|
15
|
+
exit 0
|
|
16
|
+
fi
|
|
17
|
+
|
|
18
|
+
command_str="$(printf '%s' "$input" | jq -r '.tool_input.command // empty')"
|
|
19
|
+
if [ -z "$command_str" ]; then
|
|
20
|
+
exit 0
|
|
21
|
+
fi
|
|
22
|
+
|
|
23
|
+
# Match --no-verify bounded by non-token characters (not alphanumeric, _, or -).
|
|
24
|
+
# This catches all syntactic positions including subshells (e.g. `(git commit --no-verify)`)
|
|
25
|
+
# while excluding longer flags like --no-verify-ssl, --no-verify-host, etc.
|
|
26
|
+
if printf '%s' "$command_str" | grep -Eq '(^|[^[:alnum:]_-])--no-verify($|[^[:alnum:]_-])'; then
|
|
27
|
+
cat >&2 <<'EOF'
|
|
28
|
+
Blocked: --no-verify bypasses pre-commit/pre-push hooks. Fix the underlying
|
|
29
|
+
issue (lint error, failing test, formatting) or ask the user before bypassing.
|
|
30
|
+
|
|
31
|
+
If the user has explicitly authorized the bypass for this specific command,
|
|
32
|
+
re-run after they confirm.
|
|
33
|
+
EOF
|
|
34
|
+
exit 2
|
|
35
|
+
fi
|
|
36
|
+
|
|
37
|
+
exit 0
|
|
@@ -119,7 +119,7 @@ Determine the work type and execute the matching variant:
|
|
|
119
119
|
5. `builder` -- implement via TDD (acceptance criteria become tests)
|
|
120
120
|
6. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
|
|
121
121
|
7. `verification-specialist` -- verify locally (run the software, observe behavior)
|
|
122
|
-
8.
|
|
122
|
+
8. `verification-specialist` -- invoke `codify-verification` skill per passing verification (Playwright for UI, integration test for API/DB/auth, etc.); commit each test in the same PR
|
|
123
123
|
9. **Review sub-flow**
|
|
124
124
|
10. `learner` -- capture discoveries
|
|
125
125
|
|
|
@@ -133,7 +133,7 @@ Determine the work type and execute the matching variant:
|
|
|
133
133
|
6. `bug-fixer` -- implement fix via TDD (reproduction becomes failing test)
|
|
134
134
|
7. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
|
|
135
135
|
8. `verification-specialist` -- verify locally (prove the bug is fixed)
|
|
136
|
-
9.
|
|
136
|
+
9. `verification-specialist` -- invoke `codify-verification` skill to encode the fix as a regression test (mandatory for bug fixes — the test must fail against the pre-fix commit and pass against the fix); commit in the same PR
|
|
137
137
|
10. **Review sub-flow**
|
|
138
138
|
11. `learner` -- capture discoveries
|
|
139
139
|
|
|
@@ -145,7 +145,7 @@ Determine the work type and execute the matching variant:
|
|
|
145
145
|
4. `builder` -- implement improvements via TDD
|
|
146
146
|
5. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
|
|
147
147
|
6. `verification-specialist` -- measure again, prove improvement over baseline
|
|
148
|
-
7.
|
|
148
|
+
7. `verification-specialist` -- invoke `codify-verification` skill (typically a benchmark asserting against baseline for performance work, or a regression test for behavioral refactors); commit in the same PR
|
|
149
149
|
8. **Review sub-flow**
|
|
150
150
|
9. `learner` -- capture discoveries
|
|
151
151
|
|
|
@@ -156,7 +156,7 @@ Determine the work type and execute the matching variant:
|
|
|
156
156
|
3. Recommend next action (Research, Plan, Implement, or escalate)
|
|
157
157
|
4. `learner` -- capture discoveries
|
|
158
158
|
|
|
159
|
-
Output: Code passing all quality gates + local empirical verification +
|
|
159
|
+
Output: Code passing all quality gates + local empirical verification + codified regression test for each verification (except for spikes, which produce findings only, and non-behavioral verification types — PR / Documentation / Deploy — which carry their own proof).
|
|
160
160
|
|
|
161
161
|
### Verify
|
|
162
162
|
|
|
@@ -165,6 +165,7 @@ When: Code is ready to ship. All quality gates pass and local empirical verifica
|
|
|
165
165
|
Gate:
|
|
166
166
|
- Code must pass quality gates (lint, typecheck, tests)
|
|
167
167
|
- Local empirical verification must be complete
|
|
168
|
+
- Each passing local verification must be codified as a regression test (or carry a documented skip from the allowed set: PR / Documentation / Deploy / Investigate-Only). If verifications are not codified, return to the Implement flow's codify step before shipping
|
|
168
169
|
- If quality gates fail, go back to **Implement**
|
|
169
170
|
- If no code changes exist, there is nothing to verify
|
|
170
171
|
|
|
@@ -24,7 +24,7 @@ Verification is mandatory. Never skip it, defer it, or claim it was unnecessary.
|
|
|
24
24
|
|
|
25
25
|
Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. A verification plan that only lists test/typecheck/lint commands is not a verification plan. Do not begin implementation until the plan is confirmed.
|
|
26
26
|
|
|
27
|
-
After verifying a change empirically, encode that verification as automated
|
|
27
|
+
After verifying a change empirically, encode that verification as an automated regression test via the `codify-verification` skill. The manual proof that something works must become a repeatable test that catches future regressions — Playwright for UI/browser flows, integration test for API/DB/auth, benchmark for performance, etc. Codification is mandatory for every empirical verification type except the inherently non-behavioral ones (PR, Documentation, Deploy) and Investigate-Only spikes. If codification is genuinely impossible, escalate via the Escalation Protocol — never silently skip.
|
|
28
28
|
|
|
29
29
|
Every pull request must include step-by-step instructions for reviewers to independently replicate the verification. These are not test commands — they are the exact steps a human would follow to use the software and confirm the change works. If a reviewer cannot reproduce your verification from the PR description alone, the PR is incomplete.
|
|
30
30
|
|
|
@@ -101,7 +101,7 @@ Every change requires one or more verification types. Classify the change first,
|
|
|
101
101
|
Verification happens at two stages in the workflow:
|
|
102
102
|
|
|
103
103
|
- **Quality gates** (enforced automatically): Tests, typecheck, lint, and format run via hooks at write-time, commit-time, and push-time. These are prerequisites, not verification.
|
|
104
|
-
- **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, encode it as
|
|
104
|
+
- **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
|
|
105
105
|
- **Remote verification** (part of the Verify flow): After the PR is merged and deployed, repeat the same empirical verification against the target environment. This proves the change works in production, not just locally. If remote verification fails, fix and re-deploy.
|
|
106
106
|
|
|
107
107
|
Both levels use the same verification types table above. The difference is the environment, not the rigor.
|
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: codify-verification
|
|
3
|
+
description: "Convert empirical verification into a regression test so it never has to be re-proven manually. Runs after a verification passes — picks the appropriate test framework for the verification type (Playwright for UI/browser, integration test for API/DB/auth, benchmark for performance, etc.), generates the test, wires it into the project's test runner, and confirms it executes. Mandatory step in the verification lifecycle and in the Build/Fix/Improve flows."
|
|
4
|
+
allowed-tools: ["Bash", "Read", "Edit", "Write", "Glob", "Grep", "Skill"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Codify Verification: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Take the empirical verification that just passed and encode it as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
10
|
+
|
|
11
|
+
This skill is invoked from the verification lifecycle (between Execute and Spec Conformance) and from each work-type sub-flow (Build / Fix / Improve) after the local verification step.
|
|
12
|
+
|
|
13
|
+
## When to invoke
|
|
14
|
+
|
|
15
|
+
Invoke once per empirical verification that produced PASS evidence. If a single change had three verifications (UI flow, API endpoint, DB query), this skill runs three times — or once with the three verifications batched, but each must produce its own committed test.
|
|
16
|
+
|
|
17
|
+
## When to skip
|
|
18
|
+
|
|
19
|
+
Skip codification only for verification types whose proof is inherently non-behavioral:
|
|
20
|
+
|
|
21
|
+
- **PR** — proof is the PR description itself
|
|
22
|
+
- **Documentation** — proof is content review
|
|
23
|
+
- **Deploy** — proof is deployment output and health endpoints (already covered by ops-verify-health)
|
|
24
|
+
- **Investigate-Only spikes** — produce findings, not shipped code
|
|
25
|
+
|
|
26
|
+
For every other verification type, codification is mandatory. If the codification is not possible (e.g., the test framework doesn't exist and can't be installed in scope), escalate via the lifecycle's Escalation Protocol — do not silently skip.
|
|
27
|
+
|
|
28
|
+
## Inputs
|
|
29
|
+
|
|
30
|
+
The caller must provide:
|
|
31
|
+
|
|
32
|
+
- The verification type (UI, API, Database, Auth, Security, Performance, Background Jobs, Cache, Configuration, Email/Notification, Observability, Infrastructure)
|
|
33
|
+
- The exact steps that were performed (URL visited, request made, query run, etc.)
|
|
34
|
+
- The expected outcome (status code, UI state, row count, log entry, etc.)
|
|
35
|
+
- The proof artifact captured (screenshot path, response body, query output, log excerpt)
|
|
36
|
+
|
|
37
|
+
If any of these are missing, ask the caller before generating a test — a test built on guesswork will not match the verification it claims to encode.
|
|
38
|
+
|
|
39
|
+
## Process
|
|
40
|
+
|
|
41
|
+
### 1. Discover existing test infrastructure
|
|
42
|
+
|
|
43
|
+
Before creating anything new, find what the project already has. Use the Tool Discovery Process from `verification-lifecycle`. Specifically check for:
|
|
44
|
+
|
|
45
|
+
- **Browser/E2E**: `playwright.config.*`, `cypress.config.*`, `e2e/` directory, `tests/e2e/`, Playwright/Cypress in `package.json` devDependencies
|
|
46
|
+
- **API/integration**: `tests/integration/`, `spec/`, `test/integration/`, supertest/fetch helpers, Vitest/Jest integration configs
|
|
47
|
+
- **Database**: integration test setup with migrations, factory files, seed scripts
|
|
48
|
+
- **Performance**: existing benchmark suite (`benchmarks/`, `bench/`), `vitest bench`, k6 scripts
|
|
49
|
+
- **Mobile (RN/Expo)**: Detox config, Maestro flows
|
|
50
|
+
- **Backend jobs**: existing job-test harness, queue integration tests
|
|
51
|
+
|
|
52
|
+
Do NOT install a new framework if one already exists for the verification type. Use what's there.
|
|
53
|
+
|
|
54
|
+
### 2. Map verification type → framework
|
|
55
|
+
|
|
56
|
+
| Verification type | Preferred framework (use whichever the project already has) |
|
|
57
|
+
|---|---|
|
|
58
|
+
| UI (web) | Playwright > Cypress > Selenium |
|
|
59
|
+
| UI (mobile) | Maestro > Detox > Playwright (mobile emulation) |
|
|
60
|
+
| API | project's integration test runner (Vitest / Jest / RSpec / pytest) with HTTP client (supertest / fetch / faraday) |
|
|
61
|
+
| Database | integration test with real DB + migrations applied |
|
|
62
|
+
| Auth | API or UI test asserting role-gated access (multi-role coverage) |
|
|
63
|
+
| Security | regression test that reproduces the attack and asserts safe handling |
|
|
64
|
+
| Performance | benchmark in the project's bench harness, asserting against the baseline captured in the verification |
|
|
65
|
+
| Background Jobs | integration test that enqueues, drains the queue, and asserts terminal state |
|
|
66
|
+
| Cache | integration test asserting hit/miss/invalidation behavior |
|
|
67
|
+
| Configuration | integration test that loads config and asserts effect |
|
|
68
|
+
| Email/Notification | test capturing outbound message via project's mailer test mode |
|
|
69
|
+
| Observability | test asserting structured log/metric/trace emission |
|
|
70
|
+
| Infrastructure | test or script asserting infra state (terraform plan diff, CDK snapshot test) |
|
|
71
|
+
|
|
72
|
+
If the project lacks the preferred framework AND no acceptable substitute exists, escalate.
|
|
73
|
+
|
|
74
|
+
### 3. Generate the test
|
|
75
|
+
|
|
76
|
+
The generated test must:
|
|
77
|
+
|
|
78
|
+
- **Encode the exact verification that passed**, not a paraphrase. Same URL, same input, same assertion target.
|
|
79
|
+
- **Assert the observable outcome**, not implementation details. If the verification confirmed "user sees order confirmation", the test asserts that text/element is visible — not that a particular function was called.
|
|
80
|
+
- **Be deterministic.** No reliance on timing, network flakiness, real third-party services, or mutable shared state. Use the project's existing fixtures, factories, mocks, and seed data conventions.
|
|
81
|
+
- **Be self-contained.** Set up its own preconditions and clean up after itself, following the project's existing test isolation patterns.
|
|
82
|
+
- **Be named after the behavior, not the bug/ticket.** `displays order confirmation after checkout` not `fixes PROJ-1234`.
|
|
83
|
+
- **Live in the project's existing test directory** for that type. Do not create a parallel test tree.
|
|
84
|
+
|
|
85
|
+
For Playwright UI tests specifically:
|
|
86
|
+
- Use the project's existing `test` fixture / `page` fixture / auth helper if one exists
|
|
87
|
+
- Prefer role/text selectors (`getByRole`, `getByText`) over CSS/XPath — they survive markup churn
|
|
88
|
+
- Capture a trace or screenshot only if the project's existing tests do; do not invent a new artifact convention
|
|
89
|
+
- Mirror the project's existing config for base URL, retries, and test isolation
|
|
90
|
+
|
|
91
|
+
### 4. Run the test in isolation
|
|
92
|
+
|
|
93
|
+
Run only the new test, using whatever per-test invocation the project supports:
|
|
94
|
+
|
|
95
|
+
- Playwright: `npx playwright test path/to/new.spec.ts`
|
|
96
|
+
- Vitest: `npx vitest run path/to/new.spec.ts`
|
|
97
|
+
- Jest: `npx jest path/to/new.test.ts`
|
|
98
|
+
- RSpec: `bundle exec rspec path/to/new_spec.rb`
|
|
99
|
+
|
|
100
|
+
Confirm:
|
|
101
|
+
1. The test PASSES against the current code (the change being shipped)
|
|
102
|
+
2. The test would have FAILED before the change (sanity check by mentally reverting, or for bug fixes, by running against the pre-fix commit if cheap)
|
|
103
|
+
|
|
104
|
+
For a bug fix, step 2 is mandatory and easy: check out the failing commit, run the new test, see it fail, return to the fix branch. This proves the test actually guards the regression.
|
|
105
|
+
|
|
106
|
+
### 5. Wire it into the suite
|
|
107
|
+
|
|
108
|
+
Confirm the test is picked up by the project's standard test command (the one CI runs). Run that command and confirm the count went up by exactly the number of tests added.
|
|
109
|
+
|
|
110
|
+
If the test is in a directory the standard test command excludes (e.g., E2E suite that runs separately in CI), confirm the appropriate CI workflow includes it.
|
|
111
|
+
|
|
112
|
+
### 6. Commit
|
|
113
|
+
|
|
114
|
+
Commit the test in the same PR as the change it codifies, in its own atomic commit:
|
|
115
|
+
|
|
116
|
+
- Build/feature: `test: add e2e for <behavior>`
|
|
117
|
+
- Bug fix: `test: add regression test for <bug behavior>`
|
|
118
|
+
- Performance: `test: add benchmark asserting <metric> <baseline>`
|
|
119
|
+
|
|
120
|
+
The commit message body should reference the verification it encodes (one line linking to the proof artifact or the verification report section).
|
|
121
|
+
|
|
122
|
+
### 7. Record evidence
|
|
123
|
+
|
|
124
|
+
Append to the verification report (or PR description):
|
|
125
|
+
|
|
126
|
+
```markdown
|
|
127
|
+
### Codified Verifications
|
|
128
|
+
|
|
129
|
+
| # | Verification | Framework | Test file | Status |
|
|
130
|
+
|---|--------------|-----------|-----------|--------|
|
|
131
|
+
| 1 | <description> | Playwright | `e2e/checkout.spec.ts::displays order confirmation after checkout` | PASS |
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
This evidence shows the verification is now guarded.
|
|
135
|
+
|
|
136
|
+
## Output
|
|
137
|
+
|
|
138
|
+
For each empirical verification passed in:
|
|
139
|
+
|
|
140
|
+
- A new test file (or extension to an existing test file) committed to the PR
|
|
141
|
+
- Confirmation that the test passes against the current branch
|
|
142
|
+
- The test file path + test name recorded in the verification report
|
|
143
|
+
|
|
144
|
+
If codification was skipped, an explicit reason recorded in the report (one of the skip conditions above) — never silent.
|
|
145
|
+
|
|
146
|
+
## Rules
|
|
147
|
+
|
|
148
|
+
- Never claim a verification is codified without running the new test and observing it pass
|
|
149
|
+
- Never disable, skip, or `.skip()` the new test "temporarily" to make CI green — fix the test or fix the underlying change
|
|
150
|
+
- Never use `expect(true).toBe(true)` placeholders or smoke-only assertions that don't actually exercise the verified behavior
|
|
151
|
+
- Never reuse the verification's manual artifact (screenshot, curl output) as a "test" — those are evidence, not regression coverage
|
|
152
|
+
- If the project lacks the appropriate framework, escalate via Human Action Packet rather than installing one mid-task without approval
|
|
@@ -1,16 +1,26 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: verification-lifecycle
|
|
3
|
-
description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
|
|
3
|
+
description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Verification Lifecycle
|
|
7
7
|
|
|
8
|
-
This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, and loop.
|
|
8
|
+
This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), and loop.
|
|
9
9
|
|
|
10
10
|
## Verification Lifecycle
|
|
11
11
|
|
|
12
12
|
Agents must follow this mandatory sequence for every change:
|
|
13
13
|
|
|
14
|
+
1. Confirm Quality Gates
|
|
15
|
+
2. Classify
|
|
16
|
+
3. Check Tooling
|
|
17
|
+
4. Fail Fast (if blocked)
|
|
18
|
+
5. Plan
|
|
19
|
+
6. Execute
|
|
20
|
+
7. Codify (turn each passing verification into a regression test)
|
|
21
|
+
8. Spec Conformance
|
|
22
|
+
9. Loop
|
|
23
|
+
|
|
14
24
|
### 1. Confirm Quality Gates
|
|
15
25
|
|
|
16
26
|
Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites, NOT verification. Do not count them as verification — they are enforced automatically by hooks and CI. If quality gates fail, fix them before proceeding.
|
|
@@ -42,7 +52,17 @@ A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun
|
|
|
42
52
|
|
|
43
53
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
44
54
|
|
|
45
|
-
### 7.
|
|
55
|
+
### 7. Codify
|
|
56
|
+
|
|
57
|
+
After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
58
|
+
|
|
59
|
+
The `codify-verification` skill maps the verification type to the appropriate framework (Playwright for browser/UI, integration test for API/DB/auth, benchmark for performance, etc.), generates a deterministic test that asserts the same observable outcome the verification just confirmed, runs it in isolation to confirm PASS, and commits it in the same PR as the change.
|
|
60
|
+
|
|
61
|
+
Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
|
|
62
|
+
|
|
63
|
+
A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
|
|
64
|
+
|
|
65
|
+
### 8. Spec Conformance
|
|
46
66
|
|
|
47
67
|
After empirical verification produces evidence, run spec conformance as a separate, mandatory step. Invoke the `spec-conformance` skill (or delegate to the `spec-conformance-specialist` agent) with the spec source — plan file, JIRA/Linear/GitHub key, or PRD.
|
|
48
68
|
|
|
@@ -56,9 +76,9 @@ Required outputs:
|
|
|
56
76
|
|
|
57
77
|
`PARTIAL` or `DIVERGES` blocks completion. Fix the gaps (implement the miss, remove the creep, capture the missing evidence) and re-run both empirical verification AND spec conformance. Never skip this step — it catches failures that empirical verification by itself does not, such as a feature that works but wasn't asked for, or a spec item that was quietly dropped.
|
|
58
78
|
|
|
59
|
-
###
|
|
79
|
+
### 9. Loop
|
|
60
80
|
|
|
61
|
-
If any verification or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass AND the spec-conformance verdict is `CONFORMS`. If a verification or conformance check is stuck after 3 attempts, escalate.
|
|
81
|
+
If any verification, codification, or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass, all empirical verifications are codified, AND the spec-conformance verdict is `CONFORMS`. If a verification, codification, or conformance check is stuck after 3 attempts, escalate.
|
|
62
82
|
|
|
63
83
|
---
|
|
64
84
|
|
|
@@ -194,9 +214,10 @@ Agents must follow this sequence unless explicitly instructed otherwise:
|
|
|
194
214
|
8. Implement the change.
|
|
195
215
|
9. Execute verification plan — run the actual system and observe results.
|
|
196
216
|
10. Collect proof artifacts.
|
|
197
|
-
11.
|
|
198
|
-
12.
|
|
199
|
-
13.
|
|
217
|
+
11. Codify — for each passing empirical verification, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
|
|
218
|
+
12. Run spec conformance — build coverage matrix against the spec source (plan/ticket/issue), flag scope creep and untraceable changes, produce verdict.
|
|
219
|
+
13. Summarize what changed, what was verified, what was codified, conformance verdict, and remaining risk.
|
|
220
|
+
14. Label the result with a verification level.
|
|
200
221
|
|
|
201
222
|
---
|
|
202
223
|
|
|
@@ -305,9 +326,10 @@ A task is done only when:
|
|
|
305
326
|
|
|
306
327
|
- End user is identified
|
|
307
328
|
- All applicable verification types are classified and executed
|
|
308
|
-
- Verification lifecycle is completed (classify, check tooling, plan, execute, spec conformance, loop)
|
|
329
|
+
- Verification lifecycle is completed (classify, check tooling, plan, execute, codify, spec conformance, loop)
|
|
309
330
|
- Required verification surfaces and tooling surfaces are used or explicitly unavailable
|
|
310
331
|
- Proof artifacts are captured
|
|
332
|
+
- Every passing empirical verification is codified as a regression test (or has an explicit, documented skip reason from the allowed set)
|
|
311
333
|
- Spec conformance verdict is `CONFORMS` (not `PARTIAL`, not `DIVERGES`)
|
|
312
334
|
- Verification level is declared
|
|
313
335
|
- Risks and gaps are documented
|
|
@@ -18,12 +18,13 @@ If you ARE already inside an agent team (e.g., a teammate invoked this skill via
|
|
|
18
18
|
|
|
19
19
|
Execute the **Verify** flow as defined in the `intent-routing` rule (loaded via the lisa plugin). The flow includes:
|
|
20
20
|
|
|
21
|
-
1. **
|
|
22
|
-
2. **
|
|
23
|
-
3. **
|
|
24
|
-
4. **
|
|
25
|
-
5. **
|
|
26
|
-
6. **
|
|
21
|
+
1. **Pre-flight: codification gate** — confirm that every passing local empirical verification on this branch was codified as a regression test (the Implement flow's codify step). If any verification has no committed test and no allowed skip reason (PR / Documentation / Deploy / Investigate-Only), invoke `codify-verification` now and amend the PR before shipping. A change cannot ship until its verifications are guarded.
|
|
22
|
+
2. **Commit** any pending changes via `lisa:git-commit`
|
|
23
|
+
3. **Push and PR** via `lisa:git-submit-pr`
|
|
24
|
+
4. **Review loop** — handle CodeRabbit / human review comments via `lisa:pull-request-review`
|
|
25
|
+
5. **Merge** when CI is green
|
|
26
|
+
6. **Remote verification** — invoke the `lisa:monitor` skill against the target environment to confirm the deploy actually works (health endpoints, recent logs/errors, Validation Journey replay if defined). If remote verification surfaces a behavioral gap that the existing codified tests do not guard, invoke `codify-verification` to add coverage and open a follow-up PR.
|
|
27
|
+
7. **Evidence** — post results to the originating ticket via `lisa:jira-evidence` (or equivalent tracker adapter), including the list of codified tests added on this branch.
|
|
27
28
|
|
|
28
29
|
The rule contains the canonical step sequence. Change it there, propagate everywhere.
|
|
29
30
|
|
|
@@ -35,6 +35,12 @@
|
|
|
35
35
|
"hooks": [
|
|
36
36
|
{ "type": "command", "command": "command -v entire >/dev/null 2>&1 && entire hooks claude-code pre-task || true" }
|
|
37
37
|
]
|
|
38
|
+
},
|
|
39
|
+
{
|
|
40
|
+
"matcher": "Bash",
|
|
41
|
+
"hooks": [
|
|
42
|
+
{ "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/block-no-verify.sh" }
|
|
43
|
+
]
|
|
38
44
|
}
|
|
39
45
|
],
|
|
40
46
|
"Stop": [
|
|
@@ -3,6 +3,7 @@ name: verification-specialist
|
|
|
3
3
|
description: Verification specialist agent. Discovers project tooling and executes verification for all required types. Plans and executes empirical proof that work is done by running the actual system and observing results.
|
|
4
4
|
skills:
|
|
5
5
|
- verification-lifecycle
|
|
6
|
+
- codify-verification
|
|
6
7
|
- jira-journey
|
|
7
8
|
- spec-conformance
|
|
8
9
|
---
|
|
@@ -19,7 +20,7 @@ Read `.claude/rules/verification.md` at the start of every investigation for the
|
|
|
19
20
|
|
|
20
21
|
## Verification Process
|
|
21
22
|
|
|
22
|
-
Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, loop.**
|
|
23
|
+
Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop.**
|
|
23
24
|
|
|
24
25
|
### 1. Confirm Quality Gates
|
|
25
26
|
|
|
@@ -76,6 +77,10 @@ Run the verification and capture output. Always include:
|
|
|
76
77
|
|
|
77
78
|
If any verification fails, fix and re-verify. Do not declare done until all required types pass.
|
|
78
79
|
|
|
80
|
+
### 6. Codify
|
|
81
|
+
|
|
82
|
+
For every empirical verification that produced PASS evidence, invoke the `codify-verification` skill to encode it as a regression test in the appropriate framework (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.). The new test must run, pass, and be committed in the same PR. Skipping codification is allowed only for non-behavioral types (PR, Documentation, Deploy) and Investigate-Only spikes — for everything else, codify or escalate.
|
|
83
|
+
|
|
79
84
|
## Output Format
|
|
80
85
|
|
|
81
86
|
```
|
|
@@ -117,7 +122,8 @@ If any verification fails, fix and re-verify. Do not declare done until all requ
|
|
|
117
122
|
## Rules
|
|
118
123
|
|
|
119
124
|
- Always read `.claude/rules/verification.md` first for the project's verification standards and type taxonomy
|
|
120
|
-
- Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, loop
|
|
125
|
+
- Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop
|
|
126
|
+
- Every passing empirical verification must be codified as a regression test via `codify-verification` before declaring done (skip allowed only for PR / Documentation / Deploy / Investigate-Only)
|
|
121
127
|
- Tests, typecheck, lint, and format are quality gates (prerequisites), NOT verification — never report them as verification evidence
|
|
122
128
|
- Discover existing project scripts and tools before creating new ones
|
|
123
129
|
- Every verification must produce observable output -- a status code, a response body, a UI state, a test result
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
---
|
|
2
|
+
description: "Convert empirical verification into a regression test (Playwright for UI, integration test for API/DB, benchmark for performance, etc.) so it doesn't regress. Mandatory step after verification passes — invoked from verification-lifecycle and from each Build/Fix/Improve flow."
|
|
3
|
+
argument-hint: "<verification-type> <what-was-verified>"
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
Use the /lisa:codify-verification skill to encode the empirical verification that just passed as a regression test, in the appropriate framework for the verification type. $ARGUMENTS
|
|
@@ -0,0 +1,37 @@
|
|
|
1
|
+
#!/usr/bin/env bash
|
|
2
|
+
# PreToolUse hook for Bash: blocks any command containing --no-verify.
|
|
3
|
+
# --no-verify on git commit/push (and equivalents) bypasses pre-commit/pre-push
|
|
4
|
+
# hooks that exist for a reason. The fix is to address the underlying issue,
|
|
5
|
+
# not silence the check. See feedback_never_no_verify in user memory.
|
|
6
|
+
#
|
|
7
|
+
# Word-boundary match avoids false positives on flags like --no-verify-ssl,
|
|
8
|
+
# --no-verify-host, etc.
|
|
9
|
+
set -euo pipefail
|
|
10
|
+
|
|
11
|
+
input="$(cat)"
|
|
12
|
+
|
|
13
|
+
tool_name="$(printf '%s' "$input" | jq -r '.tool_name // empty')"
|
|
14
|
+
if [ "$tool_name" != "Bash" ]; then
|
|
15
|
+
exit 0
|
|
16
|
+
fi
|
|
17
|
+
|
|
18
|
+
command_str="$(printf '%s' "$input" | jq -r '.tool_input.command // empty')"
|
|
19
|
+
if [ -z "$command_str" ]; then
|
|
20
|
+
exit 0
|
|
21
|
+
fi
|
|
22
|
+
|
|
23
|
+
# Match --no-verify bounded by non-token characters (not alphanumeric, _, or -).
|
|
24
|
+
# This catches all syntactic positions including subshells (e.g. `(git commit --no-verify)`)
|
|
25
|
+
# while excluding longer flags like --no-verify-ssl, --no-verify-host, etc.
|
|
26
|
+
if printf '%s' "$command_str" | grep -Eq '(^|[^[:alnum:]_-])--no-verify($|[^[:alnum:]_-])'; then
|
|
27
|
+
cat >&2 <<'EOF'
|
|
28
|
+
Blocked: --no-verify bypasses pre-commit/pre-push hooks. Fix the underlying
|
|
29
|
+
issue (lint error, failing test, formatting) or ask the user before bypassing.
|
|
30
|
+
|
|
31
|
+
If the user has explicitly authorized the bypass for this specific command,
|
|
32
|
+
re-run after they confirm.
|
|
33
|
+
EOF
|
|
34
|
+
exit 2
|
|
35
|
+
fi
|
|
36
|
+
|
|
37
|
+
exit 0
|
|
@@ -119,7 +119,7 @@ Determine the work type and execute the matching variant:
|
|
|
119
119
|
5. `builder` -- implement via TDD (acceptance criteria become tests)
|
|
120
120
|
6. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
|
|
121
121
|
7. `verification-specialist` -- verify locally (run the software, observe behavior)
|
|
122
|
-
8.
|
|
122
|
+
8. `verification-specialist` -- invoke `codify-verification` skill per passing verification (Playwright for UI, integration test for API/DB/auth, etc.); commit each test in the same PR
|
|
123
123
|
9. **Review sub-flow**
|
|
124
124
|
10. `learner` -- capture discoveries
|
|
125
125
|
|
|
@@ -133,7 +133,7 @@ Determine the work type and execute the matching variant:
|
|
|
133
133
|
6. `bug-fixer` -- implement fix via TDD (reproduction becomes failing test)
|
|
134
134
|
7. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
|
|
135
135
|
8. `verification-specialist` -- verify locally (prove the bug is fixed)
|
|
136
|
-
9.
|
|
136
|
+
9. `verification-specialist` -- invoke `codify-verification` skill to encode the fix as a regression test (mandatory for bug fixes — the test must fail against the pre-fix commit and pass against the fix); commit in the same PR
|
|
137
137
|
10. **Review sub-flow**
|
|
138
138
|
11. `learner` -- capture discoveries
|
|
139
139
|
|
|
@@ -145,7 +145,7 @@ Determine the work type and execute the matching variant:
|
|
|
145
145
|
4. `builder` -- implement improvements via TDD
|
|
146
146
|
5. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
|
|
147
147
|
6. `verification-specialist` -- measure again, prove improvement over baseline
|
|
148
|
-
7.
|
|
148
|
+
7. `verification-specialist` -- invoke `codify-verification` skill (typically a benchmark asserting against baseline for performance work, or a regression test for behavioral refactors); commit in the same PR
|
|
149
149
|
8. **Review sub-flow**
|
|
150
150
|
9. `learner` -- capture discoveries
|
|
151
151
|
|
|
@@ -156,7 +156,7 @@ Determine the work type and execute the matching variant:
|
|
|
156
156
|
3. Recommend next action (Research, Plan, Implement, or escalate)
|
|
157
157
|
4. `learner` -- capture discoveries
|
|
158
158
|
|
|
159
|
-
Output: Code passing all quality gates + local empirical verification +
|
|
159
|
+
Output: Code passing all quality gates + local empirical verification + codified regression test for each verification (except for spikes, which produce findings only, and non-behavioral verification types — PR / Documentation / Deploy — which carry their own proof).
|
|
160
160
|
|
|
161
161
|
### Verify
|
|
162
162
|
|
|
@@ -165,6 +165,7 @@ When: Code is ready to ship. All quality gates pass and local empirical verifica
|
|
|
165
165
|
Gate:
|
|
166
166
|
- Code must pass quality gates (lint, typecheck, tests)
|
|
167
167
|
- Local empirical verification must be complete
|
|
168
|
+
- Each passing local verification must be codified as a regression test (or carry a documented skip from the allowed set: PR / Documentation / Deploy / Investigate-Only). If verifications are not codified, return to the Implement flow's codify step before shipping
|
|
168
169
|
- If quality gates fail, go back to **Implement**
|
|
169
170
|
- If no code changes exist, there is nothing to verify
|
|
170
171
|
|
|
@@ -24,7 +24,7 @@ Verification is mandatory. Never skip it, defer it, or claim it was unnecessary.
|
|
|
24
24
|
|
|
25
25
|
Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. A verification plan that only lists test/typecheck/lint commands is not a verification plan. Do not begin implementation until the plan is confirmed.
|
|
26
26
|
|
|
27
|
-
After verifying a change empirically, encode that verification as automated
|
|
27
|
+
After verifying a change empirically, encode that verification as an automated regression test via the `codify-verification` skill. The manual proof that something works must become a repeatable test that catches future regressions — Playwright for UI/browser flows, integration test for API/DB/auth, benchmark for performance, etc. Codification is mandatory for every empirical verification type except the inherently non-behavioral ones (PR, Documentation, Deploy) and Investigate-Only spikes. If codification is genuinely impossible, escalate via the Escalation Protocol — never silently skip.
|
|
28
28
|
|
|
29
29
|
Every pull request must include step-by-step instructions for reviewers to independently replicate the verification. These are not test commands — they are the exact steps a human would follow to use the software and confirm the change works. If a reviewer cannot reproduce your verification from the PR description alone, the PR is incomplete.
|
|
30
30
|
|
|
@@ -101,7 +101,7 @@ Every change requires one or more verification types. Classify the change first,
|
|
|
101
101
|
Verification happens at two stages in the workflow:
|
|
102
102
|
|
|
103
103
|
- **Quality gates** (enforced automatically): Tests, typecheck, lint, and format run via hooks at write-time, commit-time, and push-time. These are prerequisites, not verification.
|
|
104
|
-
- **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, encode it as
|
|
104
|
+
- **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
|
|
105
105
|
- **Remote verification** (part of the Verify flow): After the PR is merged and deployed, repeat the same empirical verification against the target environment. This proves the change works in production, not just locally. If remote verification fails, fix and re-deploy.
|
|
106
106
|
|
|
107
107
|
Both levels use the same verification types table above. The difference is the environment, not the rigor.
|
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: codify-verification
|
|
3
|
+
description: "Convert empirical verification into a regression test so it never has to be re-proven manually. Runs after a verification passes — picks the appropriate test framework for the verification type (Playwright for UI/browser, integration test for API/DB/auth, benchmark for performance, etc.), generates the test, wires it into the project's test runner, and confirms it executes. Mandatory step in the verification lifecycle and in the Build/Fix/Improve flows."
|
|
4
|
+
allowed-tools: ["Bash", "Read", "Edit", "Write", "Glob", "Grep", "Skill"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Codify Verification: $ARGUMENTS
|
|
8
|
+
|
|
9
|
+
Take the empirical verification that just passed and encode it as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
10
|
+
|
|
11
|
+
This skill is invoked from the verification lifecycle (between Execute and Spec Conformance) and from each work-type sub-flow (Build / Fix / Improve) after the local verification step.
|
|
12
|
+
|
|
13
|
+
## When to invoke
|
|
14
|
+
|
|
15
|
+
Invoke once per empirical verification that produced PASS evidence. If a single change had three verifications (UI flow, API endpoint, DB query), this skill runs three times — or once with the three verifications batched, but each must produce its own committed test.
|
|
16
|
+
|
|
17
|
+
## When to skip
|
|
18
|
+
|
|
19
|
+
Skip codification only for verification types whose proof is inherently non-behavioral:
|
|
20
|
+
|
|
21
|
+
- **PR** — proof is the PR description itself
|
|
22
|
+
- **Documentation** — proof is content review
|
|
23
|
+
- **Deploy** — proof is deployment output and health endpoints (already covered by ops-verify-health)
|
|
24
|
+
- **Investigate-Only spikes** — produce findings, not shipped code
|
|
25
|
+
|
|
26
|
+
For every other verification type, codification is mandatory. If the codification is not possible (e.g., the test framework doesn't exist and can't be installed in scope), escalate via the lifecycle's Escalation Protocol — do not silently skip.
|
|
27
|
+
|
|
28
|
+
## Inputs
|
|
29
|
+
|
|
30
|
+
The caller must provide:
|
|
31
|
+
|
|
32
|
+
- The verification type (UI, API, Database, Auth, Security, Performance, Background Jobs, Cache, Configuration, Email/Notification, Observability, Infrastructure)
|
|
33
|
+
- The exact steps that were performed (URL visited, request made, query run, etc.)
|
|
34
|
+
- The expected outcome (status code, UI state, row count, log entry, etc.)
|
|
35
|
+
- The proof artifact captured (screenshot path, response body, query output, log excerpt)
|
|
36
|
+
|
|
37
|
+
If any of these are missing, ask the caller before generating a test — a test built on guesswork will not match the verification it claims to encode.
|
|
38
|
+
|
|
39
|
+
## Process
|
|
40
|
+
|
|
41
|
+
### 1. Discover existing test infrastructure
|
|
42
|
+
|
|
43
|
+
Before creating anything new, find what the project already has. Use the Tool Discovery Process from `verification-lifecycle`. Specifically check for:
|
|
44
|
+
|
|
45
|
+
- **Browser/E2E**: `playwright.config.*`, `cypress.config.*`, `e2e/` directory, `tests/e2e/`, Playwright/Cypress in `package.json` devDependencies
|
|
46
|
+
- **API/integration**: `tests/integration/`, `spec/`, `test/integration/`, supertest/fetch helpers, Vitest/Jest integration configs
|
|
47
|
+
- **Database**: integration test setup with migrations, factory files, seed scripts
|
|
48
|
+
- **Performance**: existing benchmark suite (`benchmarks/`, `bench/`), `vitest bench`, k6 scripts
|
|
49
|
+
- **Mobile (RN/Expo)**: Detox config, Maestro flows
|
|
50
|
+
- **Backend jobs**: existing job-test harness, queue integration tests
|
|
51
|
+
|
|
52
|
+
Do NOT install a new framework if one already exists for the verification type. Use what's there.
|
|
53
|
+
|
|
54
|
+
### 2. Map verification type → framework
|
|
55
|
+
|
|
56
|
+
| Verification type | Preferred framework (use whichever the project already has) |
|
|
57
|
+
|---|---|
|
|
58
|
+
| UI (web) | Playwright > Cypress > Selenium |
|
|
59
|
+
| UI (mobile) | Maestro > Detox > Playwright (mobile emulation) |
|
|
60
|
+
| API | project's integration test runner (Vitest / Jest / RSpec / pytest) with HTTP client (supertest / fetch / faraday) |
|
|
61
|
+
| Database | integration test with real DB + migrations applied |
|
|
62
|
+
| Auth | API or UI test asserting role-gated access (multi-role coverage) |
|
|
63
|
+
| Security | regression test that reproduces the attack and asserts safe handling |
|
|
64
|
+
| Performance | benchmark in the project's bench harness, asserting against the baseline captured in the verification |
|
|
65
|
+
| Background Jobs | integration test that enqueues, drains the queue, and asserts terminal state |
|
|
66
|
+
| Cache | integration test asserting hit/miss/invalidation behavior |
|
|
67
|
+
| Configuration | integration test that loads config and asserts effect |
|
|
68
|
+
| Email/Notification | test capturing outbound message via project's mailer test mode |
|
|
69
|
+
| Observability | test asserting structured log/metric/trace emission |
|
|
70
|
+
| Infrastructure | test or script asserting infra state (terraform plan diff, CDK snapshot test) |
|
|
71
|
+
|
|
72
|
+
If the project lacks the preferred framework AND no acceptable substitute exists, escalate.
|
|
73
|
+
|
|
74
|
+
### 3. Generate the test
|
|
75
|
+
|
|
76
|
+
The generated test must:
|
|
77
|
+
|
|
78
|
+
- **Encode the exact verification that passed**, not a paraphrase. Same URL, same input, same assertion target.
|
|
79
|
+
- **Assert the observable outcome**, not implementation details. If the verification confirmed "user sees order confirmation", the test asserts that text/element is visible — not that a particular function was called.
|
|
80
|
+
- **Be deterministic.** No reliance on timing, network flakiness, real third-party services, or mutable shared state. Use the project's existing fixtures, factories, mocks, and seed data conventions.
|
|
81
|
+
- **Be self-contained.** Set up its own preconditions and clean up after itself, following the project's existing test isolation patterns.
|
|
82
|
+
- **Be named after the behavior, not the bug/ticket.** `displays order confirmation after checkout` not `fixes PROJ-1234`.
|
|
83
|
+
- **Live in the project's existing test directory** for that type. Do not create a parallel test tree.
|
|
84
|
+
|
|
85
|
+
For Playwright UI tests specifically:
|
|
86
|
+
- Use the project's existing `test` fixture / `page` fixture / auth helper if one exists
|
|
87
|
+
- Prefer role/text selectors (`getByRole`, `getByText`) over CSS/XPath — they survive markup churn
|
|
88
|
+
- Capture a trace or screenshot only if the project's existing tests do; do not invent a new artifact convention
|
|
89
|
+
- Mirror the project's existing config for base URL, retries, and test isolation
|
|
90
|
+
|
|
91
|
+
### 4. Run the test in isolation
|
|
92
|
+
|
|
93
|
+
Run only the new test, using whatever per-test invocation the project supports:
|
|
94
|
+
|
|
95
|
+
- Playwright: `npx playwright test path/to/new.spec.ts`
|
|
96
|
+
- Vitest: `npx vitest run path/to/new.spec.ts`
|
|
97
|
+
- Jest: `npx jest path/to/new.test.ts`
|
|
98
|
+
- RSpec: `bundle exec rspec path/to/new_spec.rb`
|
|
99
|
+
|
|
100
|
+
Confirm:
|
|
101
|
+
1. The test PASSES against the current code (the change being shipped)
|
|
102
|
+
2. The test would have FAILED before the change (sanity check by mentally reverting, or for bug fixes, by running against the pre-fix commit if cheap)
|
|
103
|
+
|
|
104
|
+
For a bug fix, step 2 is mandatory and easy: check out the failing commit, run the new test, see it fail, return to the fix branch. This proves the test actually guards the regression.
|
|
105
|
+
|
|
106
|
+
### 5. Wire it into the suite
|
|
107
|
+
|
|
108
|
+
Confirm the test is picked up by the project's standard test command (the one CI runs). Run that command and confirm the count went up by exactly the number of tests added.
|
|
109
|
+
|
|
110
|
+
If the test is in a directory the standard test command excludes (e.g., E2E suite that runs separately in CI), confirm the appropriate CI workflow includes it.
|
|
111
|
+
|
|
112
|
+
### 6. Commit
|
|
113
|
+
|
|
114
|
+
Commit the test in the same PR as the change it codifies, in its own atomic commit:
|
|
115
|
+
|
|
116
|
+
- Build/feature: `test: add e2e for <behavior>`
|
|
117
|
+
- Bug fix: `test: add regression test for <bug behavior>`
|
|
118
|
+
- Performance: `test: add benchmark asserting <metric> <baseline>`
|
|
119
|
+
|
|
120
|
+
The commit message body should reference the verification it encodes (one line linking to the proof artifact or the verification report section).
|
|
121
|
+
|
|
122
|
+
### 7. Record evidence
|
|
123
|
+
|
|
124
|
+
Append to the verification report (or PR description):
|
|
125
|
+
|
|
126
|
+
```markdown
|
|
127
|
+
### Codified Verifications
|
|
128
|
+
|
|
129
|
+
| # | Verification | Framework | Test file | Status |
|
|
130
|
+
|---|--------------|-----------|-----------|--------|
|
|
131
|
+
| 1 | <description> | Playwright | `e2e/checkout.spec.ts::displays order confirmation after checkout` | PASS |
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
This evidence shows the verification is now guarded.
|
|
135
|
+
|
|
136
|
+
## Output
|
|
137
|
+
|
|
138
|
+
For each empirical verification passed in:
|
|
139
|
+
|
|
140
|
+
- A new test file (or extension to an existing test file) committed to the PR
|
|
141
|
+
- Confirmation that the test passes against the current branch
|
|
142
|
+
- The test file path + test name recorded in the verification report
|
|
143
|
+
|
|
144
|
+
If codification was skipped, an explicit reason recorded in the report (one of the skip conditions above) — never silent.
|
|
145
|
+
|
|
146
|
+
## Rules
|
|
147
|
+
|
|
148
|
+
- Never claim a verification is codified without running the new test and observing it pass
|
|
149
|
+
- Never disable, skip, or `.skip()` the new test "temporarily" to make CI green — fix the test or fix the underlying change
|
|
150
|
+
- Never use `expect(true).toBe(true)` placeholders or smoke-only assertions that don't actually exercise the verified behavior
|
|
151
|
+
- Never reuse the verification's manual artifact (screenshot, curl output) as a "test" — those are evidence, not regression coverage
|
|
152
|
+
- If the project lacks the appropriate framework, escalate via Human Action Packet rather than installing one mid-task without approval
|
|
@@ -1,16 +1,26 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: verification-lifecycle
|
|
3
|
-
description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
|
|
3
|
+
description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
|
|
4
4
|
---
|
|
5
5
|
|
|
6
6
|
# Verification Lifecycle
|
|
7
7
|
|
|
8
|
-
This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, and loop.
|
|
8
|
+
This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), and loop.
|
|
9
9
|
|
|
10
10
|
## Verification Lifecycle
|
|
11
11
|
|
|
12
12
|
Agents must follow this mandatory sequence for every change:
|
|
13
13
|
|
|
14
|
+
1. Confirm Quality Gates
|
|
15
|
+
2. Classify
|
|
16
|
+
3. Check Tooling
|
|
17
|
+
4. Fail Fast (if blocked)
|
|
18
|
+
5. Plan
|
|
19
|
+
6. Execute
|
|
20
|
+
7. Codify (turn each passing verification into a regression test)
|
|
21
|
+
8. Spec Conformance
|
|
22
|
+
9. Loop
|
|
23
|
+
|
|
14
24
|
### 1. Confirm Quality Gates
|
|
15
25
|
|
|
16
26
|
Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites, NOT verification. Do not count them as verification — they are enforced automatically by hooks and CI. If quality gates fail, fix them before proceeding.
|
|
@@ -42,7 +52,17 @@ A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun
|
|
|
42
52
|
|
|
43
53
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
44
54
|
|
|
45
|
-
### 7.
|
|
55
|
+
### 7. Codify
|
|
56
|
+
|
|
57
|
+
After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
58
|
+
|
|
59
|
+
The `codify-verification` skill maps the verification type to the appropriate framework (Playwright for browser/UI, integration test for API/DB/auth, benchmark for performance, etc.), generates a deterministic test that asserts the same observable outcome the verification just confirmed, runs it in isolation to confirm PASS, and commits it in the same PR as the change.
|
|
60
|
+
|
|
61
|
+
Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
|
|
62
|
+
|
|
63
|
+
A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
|
|
64
|
+
|
|
65
|
+
### 8. Spec Conformance
|
|
46
66
|
|
|
47
67
|
After empirical verification produces evidence, run spec conformance as a separate, mandatory step. Invoke the `spec-conformance` skill (or delegate to the `spec-conformance-specialist` agent) with the spec source — plan file, JIRA/Linear/GitHub key, or PRD.
|
|
48
68
|
|
|
@@ -56,9 +76,9 @@ Required outputs:
|
|
|
56
76
|
|
|
57
77
|
`PARTIAL` or `DIVERGES` blocks completion. Fix the gaps (implement the miss, remove the creep, capture the missing evidence) and re-run both empirical verification AND spec conformance. Never skip this step — it catches failures that empirical verification by itself does not, such as a feature that works but wasn't asked for, or a spec item that was quietly dropped.
|
|
58
78
|
|
|
59
|
-
###
|
|
79
|
+
### 9. Loop
|
|
60
80
|
|
|
61
|
-
If any verification or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass AND the spec-conformance verdict is `CONFORMS`. If a verification or conformance check is stuck after 3 attempts, escalate.
|
|
81
|
+
If any verification, codification, or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass, all empirical verifications are codified, AND the spec-conformance verdict is `CONFORMS`. If a verification, codification, or conformance check is stuck after 3 attempts, escalate.
|
|
62
82
|
|
|
63
83
|
---
|
|
64
84
|
|
|
@@ -194,9 +214,10 @@ Agents must follow this sequence unless explicitly instructed otherwise:
|
|
|
194
214
|
8. Implement the change.
|
|
195
215
|
9. Execute verification plan — run the actual system and observe results.
|
|
196
216
|
10. Collect proof artifacts.
|
|
197
|
-
11.
|
|
198
|
-
12.
|
|
199
|
-
13.
|
|
217
|
+
11. Codify — for each passing empirical verification, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
|
|
218
|
+
12. Run spec conformance — build coverage matrix against the spec source (plan/ticket/issue), flag scope creep and untraceable changes, produce verdict.
|
|
219
|
+
13. Summarize what changed, what was verified, what was codified, conformance verdict, and remaining risk.
|
|
220
|
+
14. Label the result with a verification level.
|
|
200
221
|
|
|
201
222
|
---
|
|
202
223
|
|
|
@@ -305,9 +326,10 @@ A task is done only when:
|
|
|
305
326
|
|
|
306
327
|
- End user is identified
|
|
307
328
|
- All applicable verification types are classified and executed
|
|
308
|
-
- Verification lifecycle is completed (classify, check tooling, plan, execute, spec conformance, loop)
|
|
329
|
+
- Verification lifecycle is completed (classify, check tooling, plan, execute, codify, spec conformance, loop)
|
|
309
330
|
- Required verification surfaces and tooling surfaces are used or explicitly unavailable
|
|
310
331
|
- Proof artifacts are captured
|
|
332
|
+
- Every passing empirical verification is codified as a regression test (or has an explicit, documented skip reason from the allowed set)
|
|
311
333
|
- Spec conformance verdict is `CONFORMS` (not `PARTIAL`, not `DIVERGES`)
|
|
312
334
|
- Verification level is declared
|
|
313
335
|
- Risks and gaps are documented
|
|
@@ -18,12 +18,13 @@ If you ARE already inside an agent team (e.g., a teammate invoked this skill via
|
|
|
18
18
|
|
|
19
19
|
Execute the **Verify** flow as defined in the `intent-routing` rule (loaded via the lisa plugin). The flow includes:
|
|
20
20
|
|
|
21
|
-
1. **
|
|
22
|
-
2. **
|
|
23
|
-
3. **
|
|
24
|
-
4. **
|
|
25
|
-
5. **
|
|
26
|
-
6. **
|
|
21
|
+
1. **Pre-flight: codification gate** — confirm that every passing local empirical verification on this branch was codified as a regression test (the Implement flow's codify step). If any verification has no committed test and no allowed skip reason (PR / Documentation / Deploy / Investigate-Only), invoke `codify-verification` now and amend the PR before shipping. A change cannot ship until its verifications are guarded.
|
|
22
|
+
2. **Commit** any pending changes via `lisa:git-commit`
|
|
23
|
+
3. **Push and PR** via `lisa:git-submit-pr`
|
|
24
|
+
4. **Review loop** — handle CodeRabbit / human review comments via `lisa:pull-request-review`
|
|
25
|
+
5. **Merge** when CI is green
|
|
26
|
+
6. **Remote verification** — invoke the `lisa:monitor` skill against the target environment to confirm the deploy actually works (health endpoints, recent logs/errors, Validation Journey replay if defined). If remote verification surfaces a behavioral gap that the existing codified tests do not guard, invoke `codify-verification` to add coverage and open a follow-up PR.
|
|
27
|
+
7. **Evidence** — post results to the originating ticket via `lisa:jira-evidence` (or equivalent tracker adapter), including the list of codified tests added on this branch.
|
|
27
28
|
|
|
28
29
|
The rule contains the canonical step sequence. Change it there, propagate everywhere.
|
|
29
30
|
|