@codyswann/lisa 2.6.4 → 2.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -79,7 +79,7 @@
79
79
  "lodash": ">=4.18.1"
80
80
  },
81
81
  "name": "@codyswann/lisa",
82
- "version": "2.6.4",
82
+ "version": "2.7.0",
83
83
  "description": "Claude Code governance framework that applies guardrails, guidance, and automated enforcement to projects",
84
84
  "main": "dist/index.js",
85
85
  "exports": {
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "lisa",
3
- "version": "2.6.4",
3
+ "version": "2.7.0",
4
4
  "description": "Universal governance — agents, skills, commands, hooks, and rules for all projects",
5
5
  "author": {
6
6
  "name": "Cody Swann"
@@ -46,6 +46,15 @@
46
46
  "command": "command -v entire >/dev/null 2>&1 && entire hooks claude-code pre-task || true"
47
47
  }
48
48
  ]
49
+ },
50
+ {
51
+ "matcher": "Bash",
52
+ "hooks": [
53
+ {
54
+ "type": "command",
55
+ "command": "${CLAUDE_PLUGIN_ROOT}/hooks/block-no-verify.sh"
56
+ }
57
+ ]
49
58
  }
50
59
  ],
51
60
  "Stop": [
@@ -3,6 +3,7 @@ name: verification-specialist
3
3
  description: Verification specialist agent. Discovers project tooling and executes verification for all required types. Plans and executes empirical proof that work is done by running the actual system and observing results.
4
4
  skills:
5
5
  - verification-lifecycle
6
+ - codify-verification
6
7
  - jira-journey
7
8
  - spec-conformance
8
9
  ---
@@ -19,7 +20,7 @@ Read `.claude/rules/verification.md` at the start of every investigation for the
19
20
 
20
21
  ## Verification Process
21
22
 
22
- Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, loop.**
23
+ Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop.**
23
24
 
24
25
  ### 1. Confirm Quality Gates
25
26
 
@@ -76,6 +77,10 @@ Run the verification and capture output. Always include:
76
77
 
77
78
  If any verification fails, fix and re-verify. Do not declare done until all required types pass.
78
79
 
80
+ ### 6. Codify
81
+
82
+ For every empirical verification that produced PASS evidence, invoke the `codify-verification` skill to encode it as a regression test in the appropriate framework (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.). The new test must run, pass, and be committed in the same PR. Skipping codification is allowed only for non-behavioral types (PR, Documentation, Deploy) and Investigate-Only spikes — for everything else, codify or escalate.
83
+
79
84
  ## Output Format
80
85
 
81
86
  ```
@@ -117,7 +122,8 @@ If any verification fails, fix and re-verify. Do not declare done until all requ
117
122
  ## Rules
118
123
 
119
124
  - Always read `.claude/rules/verification.md` first for the project's verification standards and type taxonomy
120
- - Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, loop
125
+ - Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop
126
+ - Every passing empirical verification must be codified as a regression test via `codify-verification` before declaring done (skip allowed only for PR / Documentation / Deploy / Investigate-Only)
121
127
  - Tests, typecheck, lint, and format are quality gates (prerequisites), NOT verification — never report them as verification evidence
122
128
  - Discover existing project scripts and tools before creating new ones
123
129
  - Every verification must produce observable output -- a status code, a response body, a UI state, a test result
@@ -0,0 +1,6 @@
1
+ ---
2
+ description: "Convert empirical verification into a regression test (Playwright for UI, integration test for API/DB, benchmark for performance, etc.) so it doesn't regress. Mandatory step after verification passes — invoked from verification-lifecycle and from each Build/Fix/Improve flow."
3
+ argument-hint: "<verification-type> <what-was-verified>"
4
+ ---
5
+
6
+ Use the /lisa:codify-verification skill to encode the empirical verification that just passed as a regression test, in the appropriate framework for the verification type. $ARGUMENTS
@@ -0,0 +1,37 @@
1
+ #!/usr/bin/env bash
2
+ # PreToolUse hook for Bash: blocks any command containing --no-verify.
3
+ # --no-verify on git commit/push (and equivalents) bypasses pre-commit/pre-push
4
+ # hooks that exist for a reason. The fix is to address the underlying issue,
5
+ # not silence the check. See feedback_never_no_verify in user memory.
6
+ #
7
+ # Word-boundary match avoids false positives on flags like --no-verify-ssl,
8
+ # --no-verify-host, etc.
9
+ set -euo pipefail
10
+
11
+ input="$(cat)"
12
+
13
+ tool_name="$(printf '%s' "$input" | jq -r '.tool_name // empty')"
14
+ if [ "$tool_name" != "Bash" ]; then
15
+ exit 0
16
+ fi
17
+
18
+ command_str="$(printf '%s' "$input" | jq -r '.tool_input.command // empty')"
19
+ if [ -z "$command_str" ]; then
20
+ exit 0
21
+ fi
22
+
23
+ # Match --no-verify bounded by non-token characters (not alphanumeric, _, or -).
24
+ # This catches all syntactic positions including subshells (e.g. `(git commit --no-verify)`)
25
+ # while excluding longer flags like --no-verify-ssl, --no-verify-host, etc.
26
+ if printf '%s' "$command_str" | grep -Eq '(^|[^[:alnum:]_-])--no-verify($|[^[:alnum:]_-])'; then
27
+ cat >&2 <<'EOF'
28
+ Blocked: --no-verify bypasses pre-commit/pre-push hooks. Fix the underlying
29
+ issue (lint error, failing test, formatting) or ask the user before bypassing.
30
+
31
+ If the user has explicitly authorized the bypass for this specific command,
32
+ re-run after they confirm.
33
+ EOF
34
+ exit 2
35
+ fi
36
+
37
+ exit 0
@@ -119,7 +119,7 @@ Determine the work type and execute the matching variant:
119
119
  5. `builder` -- implement via TDD (acceptance criteria become tests)
120
120
  6. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
121
121
  7. `verification-specialist` -- verify locally (run the software, observe behavior)
122
- 8. Write e2e test encoding the verification
122
+ 8. `verification-specialist` -- invoke `codify-verification` skill per passing verification (Playwright for UI, integration test for API/DB/auth, etc.); commit each test in the same PR
123
123
  9. **Review sub-flow**
124
124
  10. `learner` -- capture discoveries
125
125
 
@@ -133,7 +133,7 @@ Determine the work type and execute the matching variant:
133
133
  6. `bug-fixer` -- implement fix via TDD (reproduction becomes failing test)
134
134
  7. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
135
135
  8. `verification-specialist` -- verify locally (prove the bug is fixed)
136
- 9. Write e2e test encoding the verification
136
+ 9. `verification-specialist` -- invoke `codify-verification` skill to encode the fix as a regression test (mandatory for bug fixes — the test must fail against the pre-fix commit and pass against the fix); commit in the same PR
137
137
  10. **Review sub-flow**
138
138
  11. `learner` -- capture discoveries
139
139
 
@@ -145,7 +145,7 @@ Determine the work type and execute the matching variant:
145
145
  4. `builder` -- implement improvements via TDD
146
146
  5. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
147
147
  6. `verification-specialist` -- measure again, prove improvement over baseline
148
- 7. Write e2e test encoding the verification (if applicable)
148
+ 7. `verification-specialist` -- invoke `codify-verification` skill (typically a benchmark asserting against baseline for performance work, or a regression test for behavioral refactors); commit in the same PR
149
149
  8. **Review sub-flow**
150
150
  9. `learner` -- capture discoveries
151
151
 
@@ -156,7 +156,7 @@ Determine the work type and execute the matching variant:
156
156
  3. Recommend next action (Research, Plan, Implement, or escalate)
157
157
  4. `learner` -- capture discoveries
158
158
 
159
- Output: Code passing all quality gates + local empirical verification + e2e test (except for spikes, which produce findings only).
159
+ Output: Code passing all quality gates + local empirical verification + codified regression test for each verification (except for spikes, which produce findings only, and non-behavioral verification types — PR / Documentation / Deploy — which carry their own proof).
160
160
 
161
161
  ### Verify
162
162
 
@@ -165,6 +165,7 @@ When: Code is ready to ship. All quality gates pass and local empirical verifica
165
165
  Gate:
166
166
  - Code must pass quality gates (lint, typecheck, tests)
167
167
  - Local empirical verification must be complete
168
+ - Each passing local verification must be codified as a regression test (or carry a documented skip from the allowed set: PR / Documentation / Deploy / Investigate-Only). If verifications are not codified, return to the Implement flow's codify step before shipping
168
169
  - If quality gates fail, go back to **Implement**
169
170
  - If no code changes exist, there is nothing to verify
170
171
 
@@ -24,7 +24,7 @@ Verification is mandatory. Never skip it, defer it, or claim it was unnecessary.
24
24
 
25
25
  Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. A verification plan that only lists test/typecheck/lint commands is not a verification plan. Do not begin implementation until the plan is confirmed.
26
26
 
27
- After verifying a change empirically, encode that verification as automated tests. The manual proof that something works should become a repeatable regression test that catches future regressions. Every verification should answer: "How do I turn this into a test?"
27
+ After verifying a change empirically, encode that verification as an automated regression test via the `codify-verification` skill. The manual proof that something works must become a repeatable test that catches future regressions — Playwright for UI/browser flows, integration test for API/DB/auth, benchmark for performance, etc. Codification is mandatory for every empirical verification type except the inherently non-behavioral ones (PR, Documentation, Deploy) and Investigate-Only spikes. If codification is genuinely impossible, escalate via the Escalation Protocol — never silently skip.
28
28
 
29
29
  Every pull request must include step-by-step instructions for reviewers to independently replicate the verification. These are not test commands — they are the exact steps a human would follow to use the software and confirm the change works. If a reviewer cannot reproduce your verification from the PR description alone, the PR is incomplete.
30
30
 
@@ -101,7 +101,7 @@ Every change requires one or more verification types. Classify the change first,
101
101
  Verification happens at two stages in the workflow:
102
102
 
103
103
  - **Quality gates** (enforced automatically): Tests, typecheck, lint, and format run via hooks at write-time, commit-time, and push-time. These are prerequisites, not verification.
104
- - **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, encode it as an e2e test.
104
+ - **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
105
105
  - **Remote verification** (part of the Verify flow): After the PR is merged and deployed, repeat the same empirical verification against the target environment. This proves the change works in production, not just locally. If remote verification fails, fix and re-deploy.
106
106
 
107
107
  Both levels use the same verification types table above. The difference is the environment, not the rigor.
@@ -0,0 +1,152 @@
1
+ ---
2
+ name: codify-verification
3
+ description: "Convert empirical verification into a regression test so it never has to be re-proven manually. Runs after a verification passes — picks the appropriate test framework for the verification type (Playwright for UI/browser, integration test for API/DB/auth, benchmark for performance, etc.), generates the test, wires it into the project's test runner, and confirms it executes. Mandatory step in the verification lifecycle and in the Build/Fix/Improve flows."
4
+ allowed-tools: ["Bash", "Read", "Edit", "Write", "Glob", "Grep", "Skill"]
5
+ ---
6
+
7
+ # Codify Verification: $ARGUMENTS
8
+
9
+ Take the empirical verification that just passed and encode it as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
10
+
11
+ This skill is invoked from the verification lifecycle (between Execute and Spec Conformance) and from each work-type sub-flow (Build / Fix / Improve) after the local verification step.
12
+
13
+ ## When to invoke
14
+
15
+ Invoke once per empirical verification that produced PASS evidence. If a single change had three verifications (UI flow, API endpoint, DB query), this skill runs three times — or once with the three verifications batched, but each must produce its own committed test.
16
+
17
+ ## When to skip
18
+
19
+ Skip codification only for verification types whose proof is inherently non-behavioral:
20
+
21
+ - **PR** — proof is the PR description itself
22
+ - **Documentation** — proof is content review
23
+ - **Deploy** — proof is deployment output and health endpoints (already covered by ops-verify-health)
24
+ - **Investigate-Only spikes** — produce findings, not shipped code
25
+
26
+ For every other verification type, codification is mandatory. If the codification is not possible (e.g., the test framework doesn't exist and can't be installed in scope), escalate via the lifecycle's Escalation Protocol — do not silently skip.
27
+
28
+ ## Inputs
29
+
30
+ The caller must provide:
31
+
32
+ - The verification type (UI, API, Database, Auth, Security, Performance, Background Jobs, Cache, Configuration, Email/Notification, Observability, Infrastructure)
33
+ - The exact steps that were performed (URL visited, request made, query run, etc.)
34
+ - The expected outcome (status code, UI state, row count, log entry, etc.)
35
+ - The proof artifact captured (screenshot path, response body, query output, log excerpt)
36
+
37
+ If any of these are missing, ask the caller before generating a test — a test built on guesswork will not match the verification it claims to encode.
38
+
39
+ ## Process
40
+
41
+ ### 1. Discover existing test infrastructure
42
+
43
+ Before creating anything new, find what the project already has. Use the Tool Discovery Process from `verification-lifecycle`. Specifically check for:
44
+
45
+ - **Browser/E2E**: `playwright.config.*`, `cypress.config.*`, `e2e/` directory, `tests/e2e/`, Playwright/Cypress in `package.json` devDependencies
46
+ - **API/integration**: `tests/integration/`, `spec/`, `test/integration/`, supertest/fetch helpers, Vitest/Jest integration configs
47
+ - **Database**: integration test setup with migrations, factory files, seed scripts
48
+ - **Performance**: existing benchmark suite (`benchmarks/`, `bench/`), `vitest bench`, k6 scripts
49
+ - **Mobile (RN/Expo)**: Detox config, Maestro flows
50
+ - **Backend jobs**: existing job-test harness, queue integration tests
51
+
52
+ Do NOT install a new framework if one already exists for the verification type. Use what's there.
53
+
54
+ ### 2. Map verification type → framework
55
+
56
+ | Verification type | Preferred framework (use whichever the project already has) |
57
+ |---|---|
58
+ | UI (web) | Playwright > Cypress > Selenium |
59
+ | UI (mobile) | Maestro > Detox > Playwright (mobile emulation) |
60
+ | API | project's integration test runner (Vitest / Jest / RSpec / pytest) with HTTP client (supertest / fetch / faraday) |
61
+ | Database | integration test with real DB + migrations applied |
62
+ | Auth | API or UI test asserting role-gated access (multi-role coverage) |
63
+ | Security | regression test that reproduces the attack and asserts safe handling |
64
+ | Performance | benchmark in the project's bench harness, asserting against the baseline captured in the verification |
65
+ | Background Jobs | integration test that enqueues, drains the queue, and asserts terminal state |
66
+ | Cache | integration test asserting hit/miss/invalidation behavior |
67
+ | Configuration | integration test that loads config and asserts effect |
68
+ | Email/Notification | test capturing outbound message via project's mailer test mode |
69
+ | Observability | test asserting structured log/metric/trace emission |
70
+ | Infrastructure | test or script asserting infra state (terraform plan diff, CDK snapshot test) |
71
+
72
+ If the project lacks the preferred framework AND no acceptable substitute exists, escalate.
73
+
74
+ ### 3. Generate the test
75
+
76
+ The generated test must:
77
+
78
+ - **Encode the exact verification that passed**, not a paraphrase. Same URL, same input, same assertion target.
79
+ - **Assert the observable outcome**, not implementation details. If the verification confirmed "user sees order confirmation", the test asserts that text/element is visible — not that a particular function was called.
80
+ - **Be deterministic.** No reliance on timing, network flakiness, real third-party services, or mutable shared state. Use the project's existing fixtures, factories, mocks, and seed data conventions.
81
+ - **Be self-contained.** Set up its own preconditions and clean up after itself, following the project's existing test isolation patterns.
82
+ - **Be named after the behavior, not the bug/ticket.** `displays order confirmation after checkout` not `fixes PROJ-1234`.
83
+ - **Live in the project's existing test directory** for that type. Do not create a parallel test tree.
84
+
85
+ For Playwright UI tests specifically:
86
+ - Use the project's existing `test` fixture / `page` fixture / auth helper if one exists
87
+ - Prefer role/text selectors (`getByRole`, `getByText`) over CSS/XPath — they survive markup churn
88
+ - Capture a trace or screenshot only if the project's existing tests do; do not invent a new artifact convention
89
+ - Mirror the project's existing config for base URL, retries, and test isolation
90
+
91
+ ### 4. Run the test in isolation
92
+
93
+ Run only the new test, using whatever per-test invocation the project supports:
94
+
95
+ - Playwright: `npx playwright test path/to/new.spec.ts`
96
+ - Vitest: `npx vitest run path/to/new.spec.ts`
97
+ - Jest: `npx jest path/to/new.test.ts`
98
+ - RSpec: `bundle exec rspec path/to/new_spec.rb`
99
+
100
+ Confirm:
101
+ 1. The test PASSES against the current code (the change being shipped)
102
+ 2. The test would have FAILED before the change (sanity check by mentally reverting, or for bug fixes, by running against the pre-fix commit if cheap)
103
+
104
+ For a bug fix, step 2 is mandatory and easy: check out the failing commit, run the new test, see it fail, return to the fix branch. This proves the test actually guards the regression.
105
+
106
+ ### 5. Wire it into the suite
107
+
108
+ Confirm the test is picked up by the project's standard test command (the one CI runs). Run that command and confirm the count went up by exactly the number of tests added.
109
+
110
+ If the test is in a directory the standard test command excludes (e.g., E2E suite that runs separately in CI), confirm the appropriate CI workflow includes it.
111
+
112
+ ### 6. Commit
113
+
114
+ Commit the test in the same PR as the change it codifies, in its own atomic commit:
115
+
116
+ - Build/feature: `test: add e2e for <behavior>`
117
+ - Bug fix: `test: add regression test for <bug behavior>`
118
+ - Performance: `test: add benchmark asserting <metric> <baseline>`
119
+
120
+ The commit message body should reference the verification it encodes (one line linking to the proof artifact or the verification report section).
121
+
122
+ ### 7. Record evidence
123
+
124
+ Append to the verification report (or PR description):
125
+
126
+ ```markdown
127
+ ### Codified Verifications
128
+
129
+ | # | Verification | Framework | Test file | Status |
130
+ |---|--------------|-----------|-----------|--------|
131
+ | 1 | <description> | Playwright | `e2e/checkout.spec.ts::displays order confirmation after checkout` | PASS |
132
+ ```
133
+
134
+ This evidence shows the verification is now guarded.
135
+
136
+ ## Output
137
+
138
+ For each empirical verification passed in:
139
+
140
+ - A new test file (or extension to an existing test file) committed to the PR
141
+ - Confirmation that the test passes against the current branch
142
+ - The test file path + test name recorded in the verification report
143
+
144
+ If codification was skipped, an explicit reason recorded in the report (one of the skip conditions above) — never silent.
145
+
146
+ ## Rules
147
+
148
+ - Never claim a verification is codified without running the new test and observing it pass
149
+ - Never disable, skip, or `.skip()` the new test "temporarily" to make CI green — fix the test or fix the underlying change
150
+ - Never use `expect(true).toBe(true)` placeholders or smoke-only assertions that don't actually exercise the verified behavior
151
+ - Never reuse the verification's manual artifact (screenshot, curl output) as a "test" — those are evidence, not regression coverage
152
+ - If the project lacks the appropriate framework, escalate via Human Action Packet rather than installing one mid-task without approval
@@ -1,16 +1,26 @@
1
1
  ---
2
2
  name: verification-lifecycle
3
- description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
3
+ description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
4
4
  ---
5
5
 
6
6
  # Verification Lifecycle
7
7
 
8
- This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, and loop.
8
+ This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), and loop.
9
9
 
10
10
  ## Verification Lifecycle
11
11
 
12
12
  Agents must follow this mandatory sequence for every change:
13
13
 
14
+ 1. Confirm Quality Gates
15
+ 2. Classify
16
+ 3. Check Tooling
17
+ 4. Fail Fast (if blocked)
18
+ 5. Plan
19
+ 6. Execute
20
+ 7. Codify (turn each passing verification into a regression test)
21
+ 8. Spec Conformance
22
+ 9. Loop
23
+
14
24
  ### 1. Confirm Quality Gates
15
25
 
16
26
  Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites, NOT verification. Do not count them as verification — they are enforced automatically by hooks and CI. If quality gates fail, fix them before proceeding.
@@ -42,7 +52,17 @@ A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun
42
52
 
43
53
  After implementation, run the verification plan. Execute each verification type in order.
44
54
 
45
- ### 7. Spec Conformance
55
+ ### 7. Codify
56
+
57
+ After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
58
+
59
+ The `codify-verification` skill maps the verification type to the appropriate framework (Playwright for browser/UI, integration test for API/DB/auth, benchmark for performance, etc.), generates a deterministic test that asserts the same observable outcome the verification just confirmed, runs it in isolation to confirm PASS, and commits it in the same PR as the change.
60
+
61
+ Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
62
+
63
+ A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
64
+
65
+ ### 8. Spec Conformance
46
66
 
47
67
  After empirical verification produces evidence, run spec conformance as a separate, mandatory step. Invoke the `spec-conformance` skill (or delegate to the `spec-conformance-specialist` agent) with the spec source — plan file, JIRA/Linear/GitHub key, or PRD.
48
68
 
@@ -56,9 +76,9 @@ Required outputs:
56
76
 
57
77
  `PARTIAL` or `DIVERGES` blocks completion. Fix the gaps (implement the miss, remove the creep, capture the missing evidence) and re-run both empirical verification AND spec conformance. Never skip this step — it catches failures that empirical verification by itself does not, such as a feature that works but wasn't asked for, or a spec item that was quietly dropped.
58
78
 
59
- ### 8. Loop
79
+ ### 9. Loop
60
80
 
61
- If any verification or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass AND the spec-conformance verdict is `CONFORMS`. If a verification or conformance check is stuck after 3 attempts, escalate.
81
+ If any verification, codification, or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass, all empirical verifications are codified, AND the spec-conformance verdict is `CONFORMS`. If a verification, codification, or conformance check is stuck after 3 attempts, escalate.
62
82
 
63
83
  ---
64
84
 
@@ -194,9 +214,10 @@ Agents must follow this sequence unless explicitly instructed otherwise:
194
214
  8. Implement the change.
195
215
  9. Execute verification plan — run the actual system and observe results.
196
216
  10. Collect proof artifacts.
197
- 11. Run spec conformance build coverage matrix against the spec source (plan/ticket/issue), flag scope creep and untraceable changes, produce verdict.
198
- 12. Summarize what changed, what was verified, conformance verdict, and remaining risk.
199
- 13. Label the result with a verification level.
217
+ 11. Codify for each passing empirical verification, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
218
+ 12. Run spec conformance build coverage matrix against the spec source (plan/ticket/issue), flag scope creep and untraceable changes, produce verdict.
219
+ 13. Summarize what changed, what was verified, what was codified, conformance verdict, and remaining risk.
220
+ 14. Label the result with a verification level.
200
221
 
201
222
  ---
202
223
 
@@ -305,9 +326,10 @@ A task is done only when:
305
326
 
306
327
  - End user is identified
307
328
  - All applicable verification types are classified and executed
308
- - Verification lifecycle is completed (classify, check tooling, plan, execute, spec conformance, loop)
329
+ - Verification lifecycle is completed (classify, check tooling, plan, execute, codify, spec conformance, loop)
309
330
  - Required verification surfaces and tooling surfaces are used or explicitly unavailable
310
331
  - Proof artifacts are captured
332
+ - Every passing empirical verification is codified as a regression test (or has an explicit, documented skip reason from the allowed set)
311
333
  - Spec conformance verdict is `CONFORMS` (not `PARTIAL`, not `DIVERGES`)
312
334
  - Verification level is declared
313
335
  - Risks and gaps are documented
@@ -18,12 +18,13 @@ If you ARE already inside an agent team (e.g., a teammate invoked this skill via
18
18
 
19
19
  Execute the **Verify** flow as defined in the `intent-routing` rule (loaded via the lisa plugin). The flow includes:
20
20
 
21
- 1. **Commit** any pending changes via `lisa:git-commit`
22
- 2. **Push and PR** via `lisa:git-submit-pr`
23
- 3. **Review loop** — handle CodeRabbit / human review comments via `lisa:pull-request-review`
24
- 4. **Merge** when CI is green
25
- 5. **Remote verification** invoke the `lisa:monitor` skill against the target environment to confirm the deploy actually works (health endpoints, recent logs/errors, Validation Journey replay if defined)
26
- 6. **Evidence** — post results to the originating ticket via `lisa:jira-evidence` (or equivalent tracker adapter)
21
+ 1. **Pre-flight: codification gate** — confirm that every passing local empirical verification on this branch was codified as a regression test (the Implement flow's codify step). If any verification has no committed test and no allowed skip reason (PR / Documentation / Deploy / Investigate-Only), invoke `codify-verification` now and amend the PR before shipping. A change cannot ship until its verifications are guarded.
22
+ 2. **Commit** any pending changes via `lisa:git-commit`
23
+ 3. **Push and PR** via `lisa:git-submit-pr`
24
+ 4. **Review loop** handle CodeRabbit / human review comments via `lisa:pull-request-review`
25
+ 5. **Merge** when CI is green
26
+ 6. **Remote verification** — invoke the `lisa:monitor` skill against the target environment to confirm the deploy actually works (health endpoints, recent logs/errors, Validation Journey replay if defined). If remote verification surfaces a behavioral gap that the existing codified tests do not guard, invoke `codify-verification` to add coverage and open a follow-up PR.
27
+ 7. **Evidence** — post results to the originating ticket via `lisa:jira-evidence` (or equivalent tracker adapter), including the list of codified tests added on this branch.
27
28
 
28
29
  The rule contains the canonical step sequence. Change it there, propagate everywhere.
29
30
 
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "lisa-cdk",
3
- "version": "2.6.4",
3
+ "version": "2.7.0",
4
4
  "description": "AWS CDK-specific plugin",
5
5
  "author": {
6
6
  "name": "Cody Swann"
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "lisa-expo",
3
- "version": "2.6.4",
3
+ "version": "2.7.0",
4
4
  "description": "Expo/React Native-specific skills, agents, rules, and MCP servers",
5
5
  "author": {
6
6
  "name": "Cody Swann"
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "lisa-nestjs",
3
- "version": "2.6.4",
3
+ "version": "2.7.0",
4
4
  "description": "NestJS-specific skills (GraphQL, TypeORM) and hooks (migration write-protection)",
5
5
  "author": {
6
6
  "name": "Cody Swann"
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "lisa-rails",
3
- "version": "2.6.4",
3
+ "version": "2.7.0",
4
4
  "description": "Ruby on Rails-specific hooks — RuboCop linting/formatting and ast-grep scanning on edit",
5
5
  "author": {
6
6
  "name": "Cody Swann"
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "lisa-typescript",
3
- "version": "2.6.4",
3
+ "version": "2.7.0",
4
4
  "description": "TypeScript-specific hooks — Prettier formatting, ESLint linting, and ast-grep scanning on edit",
5
5
  "author": {
6
6
  "name": "Cody Swann"
@@ -35,6 +35,12 @@
35
35
  "hooks": [
36
36
  { "type": "command", "command": "command -v entire >/dev/null 2>&1 && entire hooks claude-code pre-task || true" }
37
37
  ]
38
+ },
39
+ {
40
+ "matcher": "Bash",
41
+ "hooks": [
42
+ { "type": "command", "command": "${CLAUDE_PLUGIN_ROOT}/hooks/block-no-verify.sh" }
43
+ ]
38
44
  }
39
45
  ],
40
46
  "Stop": [
@@ -3,6 +3,7 @@ name: verification-specialist
3
3
  description: Verification specialist agent. Discovers project tooling and executes verification for all required types. Plans and executes empirical proof that work is done by running the actual system and observing results.
4
4
  skills:
5
5
  - verification-lifecycle
6
+ - codify-verification
6
7
  - jira-journey
7
8
  - spec-conformance
8
9
  ---
@@ -19,7 +20,7 @@ Read `.claude/rules/verification.md` at the start of every investigation for the
19
20
 
20
21
  ## Verification Process
21
22
 
22
- Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, loop.**
23
+ Follow the verification lifecycle: **confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop.**
23
24
 
24
25
  ### 1. Confirm Quality Gates
25
26
 
@@ -76,6 +77,10 @@ Run the verification and capture output. Always include:
76
77
 
77
78
  If any verification fails, fix and re-verify. Do not declare done until all required types pass.
78
79
 
80
+ ### 6. Codify
81
+
82
+ For every empirical verification that produced PASS evidence, invoke the `codify-verification` skill to encode it as a regression test in the appropriate framework (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.). The new test must run, pass, and be committed in the same PR. Skipping codification is allowed only for non-behavioral types (PR, Documentation, Deploy) and Investigate-Only spikes — for everything else, codify or escalate.
83
+
79
84
  ## Output Format
80
85
 
81
86
  ```
@@ -117,7 +122,8 @@ If any verification fails, fix and re-verify. Do not declare done until all requ
117
122
  ## Rules
118
123
 
119
124
  - Always read `.claude/rules/verification.md` first for the project's verification standards and type taxonomy
120
- - Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, loop
125
+ - Follow the verification lifecycle: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify, spec conformance, loop
126
+ - Every passing empirical verification must be codified as a regression test via `codify-verification` before declaring done (skip allowed only for PR / Documentation / Deploy / Investigate-Only)
121
127
  - Tests, typecheck, lint, and format are quality gates (prerequisites), NOT verification — never report them as verification evidence
122
128
  - Discover existing project scripts and tools before creating new ones
123
129
  - Every verification must produce observable output -- a status code, a response body, a UI state, a test result
@@ -0,0 +1,6 @@
1
+ ---
2
+ description: "Convert empirical verification into a regression test (Playwright for UI, integration test for API/DB, benchmark for performance, etc.) so it doesn't regress. Mandatory step after verification passes — invoked from verification-lifecycle and from each Build/Fix/Improve flow."
3
+ argument-hint: "<verification-type> <what-was-verified>"
4
+ ---
5
+
6
+ Use the /lisa:codify-verification skill to encode the empirical verification that just passed as a regression test, in the appropriate framework for the verification type. $ARGUMENTS
@@ -0,0 +1,37 @@
1
+ #!/usr/bin/env bash
2
+ # PreToolUse hook for Bash: blocks any command containing --no-verify.
3
+ # --no-verify on git commit/push (and equivalents) bypasses pre-commit/pre-push
4
+ # hooks that exist for a reason. The fix is to address the underlying issue,
5
+ # not silence the check. See feedback_never_no_verify in user memory.
6
+ #
7
+ # Word-boundary match avoids false positives on flags like --no-verify-ssl,
8
+ # --no-verify-host, etc.
9
+ set -euo pipefail
10
+
11
+ input="$(cat)"
12
+
13
+ tool_name="$(printf '%s' "$input" | jq -r '.tool_name // empty')"
14
+ if [ "$tool_name" != "Bash" ]; then
15
+ exit 0
16
+ fi
17
+
18
+ command_str="$(printf '%s' "$input" | jq -r '.tool_input.command // empty')"
19
+ if [ -z "$command_str" ]; then
20
+ exit 0
21
+ fi
22
+
23
+ # Match --no-verify bounded by non-token characters (not alphanumeric, _, or -).
24
+ # This catches all syntactic positions including subshells (e.g. `(git commit --no-verify)`)
25
+ # while excluding longer flags like --no-verify-ssl, --no-verify-host, etc.
26
+ if printf '%s' "$command_str" | grep -Eq '(^|[^[:alnum:]_-])--no-verify($|[^[:alnum:]_-])'; then
27
+ cat >&2 <<'EOF'
28
+ Blocked: --no-verify bypasses pre-commit/pre-push hooks. Fix the underlying
29
+ issue (lint error, failing test, formatting) or ask the user before bypassing.
30
+
31
+ If the user has explicitly authorized the bypass for this specific command,
32
+ re-run after they confirm.
33
+ EOF
34
+ exit 2
35
+ fi
36
+
37
+ exit 0
@@ -119,7 +119,7 @@ Determine the work type and execute the matching variant:
119
119
  5. `builder` -- implement via TDD (acceptance criteria become tests)
120
120
  6. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
121
121
  7. `verification-specialist` -- verify locally (run the software, observe behavior)
122
- 8. Write e2e test encoding the verification
122
+ 8. `verification-specialist` -- invoke `codify-verification` skill per passing verification (Playwright for UI, integration test for API/DB/auth, etc.); commit each test in the same PR
123
123
  9. **Review sub-flow**
124
124
  10. `learner` -- capture discoveries
125
125
 
@@ -133,7 +133,7 @@ Determine the work type and execute the matching variant:
133
133
  6. `bug-fixer` -- implement fix via TDD (reproduction becomes failing test)
134
134
  7. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
135
135
  8. `verification-specialist` -- verify locally (prove the bug is fixed)
136
- 9. Write e2e test encoding the verification
136
+ 9. `verification-specialist` -- invoke `codify-verification` skill to encode the fix as a regression test (mandatory for bug fixes — the test must fail against the pre-fix commit and pass against the fix); commit in the same PR
137
137
  10. **Review sub-flow**
138
138
  11. `learner` -- capture discoveries
139
139
 
@@ -145,7 +145,7 @@ Determine the work type and execute the matching variant:
145
145
  4. `builder` -- implement improvements via TDD
146
146
  5. Run quality gates: lint, typecheck, tests (these are prerequisites, NOT verification)
147
147
  6. `verification-specialist` -- measure again, prove improvement over baseline
148
- 7. Write e2e test encoding the verification (if applicable)
148
+ 7. `verification-specialist` -- invoke `codify-verification` skill (typically a benchmark asserting against baseline for performance work, or a regression test for behavioral refactors); commit in the same PR
149
149
  8. **Review sub-flow**
150
150
  9. `learner` -- capture discoveries
151
151
 
@@ -156,7 +156,7 @@ Determine the work type and execute the matching variant:
156
156
  3. Recommend next action (Research, Plan, Implement, or escalate)
157
157
  4. `learner` -- capture discoveries
158
158
 
159
- Output: Code passing all quality gates + local empirical verification + e2e test (except for spikes, which produce findings only).
159
+ Output: Code passing all quality gates + local empirical verification + codified regression test for each verification (except for spikes, which produce findings only, and non-behavioral verification types — PR / Documentation / Deploy — which carry their own proof).
160
160
 
161
161
  ### Verify
162
162
 
@@ -165,6 +165,7 @@ When: Code is ready to ship. All quality gates pass and local empirical verifica
165
165
  Gate:
166
166
  - Code must pass quality gates (lint, typecheck, tests)
167
167
  - Local empirical verification must be complete
168
+ - Each passing local verification must be codified as a regression test (or carry a documented skip from the allowed set: PR / Documentation / Deploy / Investigate-Only). If verifications are not codified, return to the Implement flow's codify step before shipping
168
169
  - If quality gates fail, go back to **Implement**
169
170
  - If no code changes exist, there is nothing to verify
170
171
 
@@ -24,7 +24,7 @@ Verification is mandatory. Never skip it, defer it, or claim it was unnecessary.
24
24
 
25
25
  Before starting implementation, state your verification plan — how you will use the resulting software to prove it works. A verification plan that only lists test/typecheck/lint commands is not a verification plan. Do not begin implementation until the plan is confirmed.
26
26
 
27
- After verifying a change empirically, encode that verification as automated tests. The manual proof that something works should become a repeatable regression test that catches future regressions. Every verification should answer: "How do I turn this into a test?"
27
+ After verifying a change empirically, encode that verification as an automated regression test via the `codify-verification` skill. The manual proof that something works must become a repeatable test that catches future regressions — Playwright for UI/browser flows, integration test for API/DB/auth, benchmark for performance, etc. Codification is mandatory for every empirical verification type except the inherently non-behavioral ones (PR, Documentation, Deploy) and Investigate-Only spikes. If codification is genuinely impossible, escalate via the Escalation Protocol — never silently skip.
28
28
 
29
29
  Every pull request must include step-by-step instructions for reviewers to independently replicate the verification. These are not test commands — they are the exact steps a human would follow to use the software and confirm the change works. If a reviewer cannot reproduce your verification from the PR description alone, the PR is incomplete.
30
30
 
@@ -101,7 +101,7 @@ Every change requires one or more verification types. Classify the change first,
101
101
  Verification happens at two stages in the workflow:
102
102
 
103
103
  - **Quality gates** (enforced automatically): Tests, typecheck, lint, and format run via hooks at write-time, commit-time, and push-time. These are prerequisites, not verification.
104
- - **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, encode it as an e2e test.
104
+ - **Local verification** (part of the Implement flow): After quality gates pass, empirically verify the change by running the actual system in a local or preview environment — make HTTP requests, interact with the UI, execute CLI commands, query the database. This proves the change works before shipping. After local verification succeeds, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
105
105
  - **Remote verification** (part of the Verify flow): After the PR is merged and deployed, repeat the same empirical verification against the target environment. This proves the change works in production, not just locally. If remote verification fails, fix and re-deploy.
106
106
 
107
107
  Both levels use the same verification types table above. The difference is the environment, not the rigor.
@@ -0,0 +1,152 @@
1
+ ---
2
+ name: codify-verification
3
+ description: "Convert empirical verification into a regression test so it never has to be re-proven manually. Runs after a verification passes — picks the appropriate test framework for the verification type (Playwright for UI/browser, integration test for API/DB/auth, benchmark for performance, etc.), generates the test, wires it into the project's test runner, and confirms it executes. Mandatory step in the verification lifecycle and in the Build/Fix/Improve flows."
4
+ allowed-tools: ["Bash", "Read", "Edit", "Write", "Glob", "Grep", "Skill"]
5
+ ---
6
+
7
+ # Codify Verification: $ARGUMENTS
8
+
9
+ Take the empirical verification that just passed and encode it as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
10
+
11
+ This skill is invoked from the verification lifecycle (between Execute and Spec Conformance) and from each work-type sub-flow (Build / Fix / Improve) after the local verification step.
12
+
13
+ ## When to invoke
14
+
15
+ Invoke once per empirical verification that produced PASS evidence. If a single change had three verifications (UI flow, API endpoint, DB query), this skill runs three times — or once with the three verifications batched, but each must produce its own committed test.
16
+
17
+ ## When to skip
18
+
19
+ Skip codification only for verification types whose proof is inherently non-behavioral:
20
+
21
+ - **PR** — proof is the PR description itself
22
+ - **Documentation** — proof is content review
23
+ - **Deploy** — proof is deployment output and health endpoints (already covered by ops-verify-health)
24
+ - **Investigate-Only spikes** — produce findings, not shipped code
25
+
26
+ For every other verification type, codification is mandatory. If the codification is not possible (e.g., the test framework doesn't exist and can't be installed in scope), escalate via the lifecycle's Escalation Protocol — do not silently skip.
27
+
28
+ ## Inputs
29
+
30
+ The caller must provide:
31
+
32
+ - The verification type (UI, API, Database, Auth, Security, Performance, Background Jobs, Cache, Configuration, Email/Notification, Observability, Infrastructure)
33
+ - The exact steps that were performed (URL visited, request made, query run, etc.)
34
+ - The expected outcome (status code, UI state, row count, log entry, etc.)
35
+ - The proof artifact captured (screenshot path, response body, query output, log excerpt)
36
+
37
+ If any of these are missing, ask the caller before generating a test — a test built on guesswork will not match the verification it claims to encode.
38
+
39
+ ## Process
40
+
41
+ ### 1. Discover existing test infrastructure
42
+
43
+ Before creating anything new, find what the project already has. Use the Tool Discovery Process from `verification-lifecycle`. Specifically check for:
44
+
45
+ - **Browser/E2E**: `playwright.config.*`, `cypress.config.*`, `e2e/` directory, `tests/e2e/`, Playwright/Cypress in `package.json` devDependencies
46
+ - **API/integration**: `tests/integration/`, `spec/`, `test/integration/`, supertest/fetch helpers, Vitest/Jest integration configs
47
+ - **Database**: integration test setup with migrations, factory files, seed scripts
48
+ - **Performance**: existing benchmark suite (`benchmarks/`, `bench/`), `vitest bench`, k6 scripts
49
+ - **Mobile (RN/Expo)**: Detox config, Maestro flows
50
+ - **Backend jobs**: existing job-test harness, queue integration tests
51
+
52
+ Do NOT install a new framework if one already exists for the verification type. Use what's there.
53
+
54
+ ### 2. Map verification type → framework
55
+
56
+ | Verification type | Preferred framework (use whichever the project already has) |
57
+ |---|---|
58
+ | UI (web) | Playwright > Cypress > Selenium |
59
+ | UI (mobile) | Maestro > Detox > Playwright (mobile emulation) |
60
+ | API | project's integration test runner (Vitest / Jest / RSpec / pytest) with HTTP client (supertest / fetch / faraday) |
61
+ | Database | integration test with real DB + migrations applied |
62
+ | Auth | API or UI test asserting role-gated access (multi-role coverage) |
63
+ | Security | regression test that reproduces the attack and asserts safe handling |
64
+ | Performance | benchmark in the project's bench harness, asserting against the baseline captured in the verification |
65
+ | Background Jobs | integration test that enqueues, drains the queue, and asserts terminal state |
66
+ | Cache | integration test asserting hit/miss/invalidation behavior |
67
+ | Configuration | integration test that loads config and asserts effect |
68
+ | Email/Notification | test capturing outbound message via project's mailer test mode |
69
+ | Observability | test asserting structured log/metric/trace emission |
70
+ | Infrastructure | test or script asserting infra state (terraform plan diff, CDK snapshot test) |
71
+
72
+ If the project lacks the preferred framework AND no acceptable substitute exists, escalate.
73
+
74
+ ### 3. Generate the test
75
+
76
+ The generated test must:
77
+
78
+ - **Encode the exact verification that passed**, not a paraphrase. Same URL, same input, same assertion target.
79
+ - **Assert the observable outcome**, not implementation details. If the verification confirmed "user sees order confirmation", the test asserts that text/element is visible — not that a particular function was called.
80
+ - **Be deterministic.** No reliance on timing, network flakiness, real third-party services, or mutable shared state. Use the project's existing fixtures, factories, mocks, and seed data conventions.
81
+ - **Be self-contained.** Set up its own preconditions and clean up after itself, following the project's existing test isolation patterns.
82
+ - **Be named after the behavior, not the bug/ticket.** `displays order confirmation after checkout` not `fixes PROJ-1234`.
83
+ - **Live in the project's existing test directory** for that type. Do not create a parallel test tree.
84
+
85
+ For Playwright UI tests specifically:
86
+ - Use the project's existing `test` fixture / `page` fixture / auth helper if one exists
87
+ - Prefer role/text selectors (`getByRole`, `getByText`) over CSS/XPath — they survive markup churn
88
+ - Capture a trace or screenshot only if the project's existing tests do; do not invent a new artifact convention
89
+ - Mirror the project's existing config for base URL, retries, and test isolation
90
+
91
+ ### 4. Run the test in isolation
92
+
93
+ Run only the new test, using whatever per-test invocation the project supports:
94
+
95
+ - Playwright: `npx playwright test path/to/new.spec.ts`
96
+ - Vitest: `npx vitest run path/to/new.spec.ts`
97
+ - Jest: `npx jest path/to/new.test.ts`
98
+ - RSpec: `bundle exec rspec path/to/new_spec.rb`
99
+
100
+ Confirm:
101
+ 1. The test PASSES against the current code (the change being shipped)
102
+ 2. The test would have FAILED before the change (sanity check by mentally reverting, or for bug fixes, by running against the pre-fix commit if cheap)
103
+
104
+ For a bug fix, step 2 is mandatory and easy: check out the failing commit, run the new test, see it fail, return to the fix branch. This proves the test actually guards the regression.
105
+
106
+ ### 5. Wire it into the suite
107
+
108
+ Confirm the test is picked up by the project's standard test command (the one CI runs). Run that command and confirm the count went up by exactly the number of tests added.
109
+
110
+ If the test is in a directory the standard test command excludes (e.g., E2E suite that runs separately in CI), confirm the appropriate CI workflow includes it.
111
+
112
+ ### 6. Commit
113
+
114
+ Commit the test in the same PR as the change it codifies, in its own atomic commit:
115
+
116
+ - Build/feature: `test: add e2e for <behavior>`
117
+ - Bug fix: `test: add regression test for <bug behavior>`
118
+ - Performance: `test: add benchmark asserting <metric> <baseline>`
119
+
120
+ The commit message body should reference the verification it encodes (one line linking to the proof artifact or the verification report section).
121
+
122
+ ### 7. Record evidence
123
+
124
+ Append to the verification report (or PR description):
125
+
126
+ ```markdown
127
+ ### Codified Verifications
128
+
129
+ | # | Verification | Framework | Test file | Status |
130
+ |---|--------------|-----------|-----------|--------|
131
+ | 1 | <description> | Playwright | `e2e/checkout.spec.ts::displays order confirmation after checkout` | PASS |
132
+ ```
133
+
134
+ This evidence shows the verification is now guarded.
135
+
136
+ ## Output
137
+
138
+ For each empirical verification passed in:
139
+
140
+ - A new test file (or extension to an existing test file) committed to the PR
141
+ - Confirmation that the test passes against the current branch
142
+ - The test file path + test name recorded in the verification report
143
+
144
+ If codification was skipped, an explicit reason recorded in the report (one of the skip conditions above) — never silent.
145
+
146
+ ## Rules
147
+
148
+ - Never claim a verification is codified without running the new test and observing it pass
149
+ - Never disable, skip, or `.skip()` the new test "temporarily" to make CI green — fix the test or fix the underlying change
150
+ - Never use `expect(true).toBe(true)` placeholders or smoke-only assertions that don't actually exercise the verified behavior
151
+ - Never reuse the verification's manual artifact (screenshot, curl output) as a "test" — those are evidence, not regression coverage
152
+ - If the project lacks the appropriate framework, escalate via Human Action Packet rather than installing one mid-task without approval
@@ -1,16 +1,26 @@
1
1
  ---
2
2
  name: verification-lifecycle
3
- description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
3
+ description: "Verification lifecycle: confirm quality gates, classify types, discover tools, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), loop. Quality gates (tests/typecheck/lint) are prerequisites, NOT verification. Verification means running the actual system and observing results."
4
4
  ---
5
5
 
6
6
  # Verification Lifecycle
7
7
 
8
- This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, and loop.
8
+ This skill defines the complete verification lifecycle that agents must follow for every change: confirm quality gates, classify, check tooling, fail fast, plan, execute, codify (turn each passing verification into a regression test), spec conformance (verify shipped work matches the spec), and loop.
9
9
 
10
10
  ## Verification Lifecycle
11
11
 
12
12
  Agents must follow this mandatory sequence for every change:
13
13
 
14
+ 1. Confirm Quality Gates
15
+ 2. Classify
16
+ 3. Check Tooling
17
+ 4. Fail Fast (if blocked)
18
+ 5. Plan
19
+ 6. Execute
20
+ 7. Codify (turn each passing verification into a regression test)
21
+ 8. Spec Conformance
22
+ 9. Loop
23
+
14
24
  ### 1. Confirm Quality Gates
15
25
 
16
26
  Confirm that quality gates (tests, typecheck, lint, format) pass. These are prerequisites, NOT verification. Do not count them as verification — they are enforced automatically by hooks and CI. If quality gates fail, fix them before proceeding.
@@ -42,7 +52,17 @@ A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun
42
52
 
43
53
  After implementation, run the verification plan. Execute each verification type in order.
44
54
 
45
- ### 7. Spec Conformance
55
+ ### 7. Codify
56
+
57
+ After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
58
+
59
+ The `codify-verification` skill maps the verification type to the appropriate framework (Playwright for browser/UI, integration test for API/DB/auth, benchmark for performance, etc.), generates a deterministic test that asserts the same observable outcome the verification just confirmed, runs it in isolation to confirm PASS, and commits it in the same PR as the change.
60
+
61
+ Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
62
+
63
+ A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
64
+
65
+ ### 8. Spec Conformance
46
66
 
47
67
  After empirical verification produces evidence, run spec conformance as a separate, mandatory step. Invoke the `spec-conformance` skill (or delegate to the `spec-conformance-specialist` agent) with the spec source — plan file, JIRA/Linear/GitHub key, or PRD.
48
68
 
@@ -56,9 +76,9 @@ Required outputs:
56
76
 
57
77
  `PARTIAL` or `DIVERGES` blocks completion. Fix the gaps (implement the miss, remove the creep, capture the missing evidence) and re-run both empirical verification AND spec conformance. Never skip this step — it catches failures that empirical verification by itself does not, such as a feature that works but wasn't asked for, or a spec item that was quietly dropped.
58
78
 
59
- ### 8. Loop
79
+ ### 9. Loop
60
80
 
61
- If any verification or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass AND the spec-conformance verdict is `CONFORMS`. If a verification or conformance check is stuck after 3 attempts, escalate.
81
+ If any verification, codification, or spec-conformance check fails, fix the issue and re-verify. Do not declare done until all required types pass, all empirical verifications are codified, AND the spec-conformance verdict is `CONFORMS`. If a verification, codification, or conformance check is stuck after 3 attempts, escalate.
62
82
 
63
83
  ---
64
84
 
@@ -194,9 +214,10 @@ Agents must follow this sequence unless explicitly instructed otherwise:
194
214
  8. Implement the change.
195
215
  9. Execute verification plan — run the actual system and observe results.
196
216
  10. Collect proof artifacts.
197
- 11. Run spec conformance build coverage matrix against the spec source (plan/ticket/issue), flag scope creep and untraceable changes, produce verdict.
198
- 12. Summarize what changed, what was verified, conformance verdict, and remaining risk.
199
- 13. Label the result with a verification level.
217
+ 11. Codify for each passing empirical verification, invoke `codify-verification` to encode it as a regression test (Playwright for UI, integration test for API/DB/auth, benchmark for performance, etc.) and commit the test in the same PR.
218
+ 12. Run spec conformance build coverage matrix against the spec source (plan/ticket/issue), flag scope creep and untraceable changes, produce verdict.
219
+ 13. Summarize what changed, what was verified, what was codified, conformance verdict, and remaining risk.
220
+ 14. Label the result with a verification level.
200
221
 
201
222
  ---
202
223
 
@@ -305,9 +326,10 @@ A task is done only when:
305
326
 
306
327
  - End user is identified
307
328
  - All applicable verification types are classified and executed
308
- - Verification lifecycle is completed (classify, check tooling, plan, execute, spec conformance, loop)
329
+ - Verification lifecycle is completed (classify, check tooling, plan, execute, codify, spec conformance, loop)
309
330
  - Required verification surfaces and tooling surfaces are used or explicitly unavailable
310
331
  - Proof artifacts are captured
332
+ - Every passing empirical verification is codified as a regression test (or has an explicit, documented skip reason from the allowed set)
311
333
  - Spec conformance verdict is `CONFORMS` (not `PARTIAL`, not `DIVERGES`)
312
334
  - Verification level is declared
313
335
  - Risks and gaps are documented
@@ -18,12 +18,13 @@ If you ARE already inside an agent team (e.g., a teammate invoked this skill via
18
18
 
19
19
  Execute the **Verify** flow as defined in the `intent-routing` rule (loaded via the lisa plugin). The flow includes:
20
20
 
21
- 1. **Commit** any pending changes via `lisa:git-commit`
22
- 2. **Push and PR** via `lisa:git-submit-pr`
23
- 3. **Review loop** — handle CodeRabbit / human review comments via `lisa:pull-request-review`
24
- 4. **Merge** when CI is green
25
- 5. **Remote verification** invoke the `lisa:monitor` skill against the target environment to confirm the deploy actually works (health endpoints, recent logs/errors, Validation Journey replay if defined)
26
- 6. **Evidence** — post results to the originating ticket via `lisa:jira-evidence` (or equivalent tracker adapter)
21
+ 1. **Pre-flight: codification gate** — confirm that every passing local empirical verification on this branch was codified as a regression test (the Implement flow's codify step). If any verification has no committed test and no allowed skip reason (PR / Documentation / Deploy / Investigate-Only), invoke `codify-verification` now and amend the PR before shipping. A change cannot ship until its verifications are guarded.
22
+ 2. **Commit** any pending changes via `lisa:git-commit`
23
+ 3. **Push and PR** via `lisa:git-submit-pr`
24
+ 4. **Review loop** handle CodeRabbit / human review comments via `lisa:pull-request-review`
25
+ 5. **Merge** when CI is green
26
+ 6. **Remote verification** — invoke the `lisa:monitor` skill against the target environment to confirm the deploy actually works (health endpoints, recent logs/errors, Validation Journey replay if defined). If remote verification surfaces a behavioral gap that the existing codified tests do not guard, invoke `codify-verification` to add coverage and open a follow-up PR.
27
+ 7. **Evidence** — post results to the originating ticket via `lisa:jira-evidence` (or equivalent tracker adapter), including the list of codified tests added on this branch.
27
28
 
28
29
  The rule contains the canonical step sequence. Change it there, propagate everywhere.
29
30