@codyswann/lisa 2.159.5 → 2.159.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/plugins/lisa/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa/skills/implement/SKILL.md +15 -2
- package/plugins/lisa/skills/tdd-implementation/SKILL.md +3 -0
- package/plugins/lisa/skills/verification-lifecycle/SKILL.md +10 -0
- package/plugins/lisa-agy/plugin.json +1 -1
- package/plugins/lisa-agy/skills/implement/SKILL.md +15 -2
- package/plugins/lisa-agy/skills/tdd-implementation/SKILL.md +3 -0
- package/plugins/lisa-agy/skills/verification-lifecycle/SKILL.md +10 -0
- package/plugins/lisa-cdk/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-cdk/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-cdk-agy/plugin.json +1 -1
- package/plugins/lisa-cdk-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-cdk-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-copilot/skills/implement/SKILL.md +15 -2
- package/plugins/lisa-copilot/skills/tdd-implementation/SKILL.md +3 -0
- package/plugins/lisa-copilot/skills/verification-lifecycle/SKILL.md +10 -0
- package/plugins/lisa-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-cursor/skills/implement/SKILL.md +15 -2
- package/plugins/lisa-cursor/skills/tdd-implementation/SKILL.md +3 -0
- package/plugins/lisa-cursor/skills/verification-lifecycle/SKILL.md +10 -0
- package/plugins/lisa-expo/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-expo/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-expo-agy/plugin.json +1 -1
- package/plugins/lisa-expo-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-expo-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-harper-fabric/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-harper-fabric/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-harper-fabric-agy/plugin.json +1 -1
- package/plugins/lisa-harper-fabric-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-harper-fabric-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-nestjs/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-nestjs/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-nestjs-agy/plugin.json +1 -1
- package/plugins/lisa-nestjs-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-nestjs-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-openclaw/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-openclaw/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-openclaw-agy/plugin.json +1 -1
- package/plugins/lisa-openclaw-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-openclaw-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-rails/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-rails/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-rails-agy/plugin.json +1 -1
- package/plugins/lisa-rails-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-rails-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-typescript/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-typescript/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-typescript-agy/plugin.json +1 -1
- package/plugins/lisa-typescript-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-typescript-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-wiki/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-wiki/.codex-plugin/plugin.json +1 -1
- package/plugins/lisa-wiki-agy/plugin.json +1 -1
- package/plugins/lisa-wiki-copilot/.claude-plugin/plugin.json +1 -1
- package/plugins/lisa-wiki-cursor/.claude-plugin/plugin.json +1 -1
- package/plugins/src/base/skills/implement/SKILL.md +15 -2
- package/plugins/src/base/skills/tdd-implementation/SKILL.md +3 -0
- package/plugins/src/base/skills/verification-lifecycle/SKILL.md +10 -0
package/package.json
CHANGED
|
@@ -84,7 +84,7 @@
|
|
|
84
84
|
"lodash": ">=4.18.1"
|
|
85
85
|
},
|
|
86
86
|
"name": "@codyswann/lisa",
|
|
87
|
-
"version": "2.159.
|
|
87
|
+
"version": "2.159.6",
|
|
88
88
|
"description": "Claude Code governance framework that applies guardrails, guidance, and automated enforcement to projects",
|
|
89
89
|
"main": "dist/index.js",
|
|
90
90
|
"exports": {
|
|
@@ -111,6 +111,17 @@ IF it is a Fix (bug), execute the Reproduce sub-flow FIRST:
|
|
|
111
111
|
1. Write a simple API client and call the offending API
|
|
112
112
|
2. Start the server on localhost and use the Playwright CLI or Chrome DevTools
|
|
113
113
|
|
|
114
|
+
For any Fix flow, and for any Build flow that changes user-visible behavior, regression coverage is a required deliverable at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the task plan and definition of done MUST include a deterministic regression spec against the reported surface, using mocked or seeded data where needed. This is alongside unit or integration coverage, not a substitute for it.
|
|
115
|
+
|
|
116
|
+
The team lead may not waive, defer, demote, or phrase this regression spec as "optional", "if cheap", "nice to have", or equivalent. The only permitted exits are:
|
|
117
|
+
|
|
118
|
+
1. The project genuinely has no end-to-end harness for the affected platform; record the checked locations and that absence in the task metadata, PR, and work-item evidence.
|
|
119
|
+
2. A genuine technical blocker prevents adding or executing the spec in this PR; before merge, create a linked build-ready follow-up ticket, reference it from the PR and source work item, and keep the current item blocked or explicitly non-terminal until that follow-up is accepted.
|
|
120
|
+
|
|
121
|
+
Completion evidence for the regression spec must prove execution, not mere existence. A green CI run is insufficient unless the PR evidence includes a CI log line, reporter output, or equivalent record naming the new spec and showing that it ran and passed. Guard explicitly against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
122
|
+
|
|
123
|
+
If the required regression spec is still in flight on an auto-merge-enabled PR, pause auto-merge or use an equivalent merge gate until the spec commit is pushed and its execution proof is available. The flow must not allow the PR to merge before this non-demotable deliverable is satisfied or formally blocked through the linked follow-up path above.
|
|
124
|
+
|
|
114
125
|
Using the general-purpose agent in Team Lead session, determine how you will know that the task is fully complete. Write this as an **effective completion condition** — one an independent verifier could confirm from observed output alone, not from your assertion that it works. A strong condition has:
|
|
115
126
|
|
|
116
127
|
- **One measurable end state** — a status code, an exit code, a row count, an observable UI state, an empty queue. Not "it looks right" or "the code is correct".
|
|
@@ -146,13 +157,15 @@ Every task MUST include this JSON metadata block. Do NOT omit `skills` (use `[]`
|
|
|
146
157
|
|
|
147
158
|
Before any task is implemented, the agent team must explore the codebase for relevant research (documentation, code, git history, etc) and update each task's `metadata.relevant_documentation` with the findings.
|
|
148
159
|
|
|
160
|
+
For Fix tasks and user-visible Build tasks, `testing_requirements` must include the highest-practical-observation regression requirement above, including the selected harness or the recorded absence/blocker path. The completion condition must include the proof command and the required CI execution evidence for the new spec.
|
|
161
|
+
|
|
149
162
|
Each task must be reviewed by the team to make sure their verification passes.
|
|
150
163
|
Each task must have their learnings reviewed by the learner subagent.
|
|
151
164
|
|
|
152
165
|
Before shutting down the team, execute the Verify flow:
|
|
153
166
|
|
|
154
167
|
1. Run quality gates: lint, typecheck, tests — all must pass. These are prerequisites, NOT verification.
|
|
155
|
-
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step.
|
|
168
|
+
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step. For UI-surface bugs, the proof must observe the UI surface with browser/device automation against the target environment whenever such a harness exists; unit-level or API-only proof cannot satisfy the empirical verification contract for a UI-surface defect.
|
|
156
169
|
2a. **Record the verification verdict** — the independent, machine-readable proof that gates completion. The `verification-specialist` writes `${CLAUDE_PROJECT_DIR:-.}/.lisa/verification-status.json` with one entry per acceptance criterion, each carrying the proof command's observed evidence:
|
|
157
170
|
|
|
158
171
|
```json
|
|
@@ -169,7 +182,7 @@ Before shutting down the team, execute the Verify flow:
|
|
|
169
182
|
Set `status: "pass"` only when every criterion is `pass` with real evidence (output from running the system, not a claim). The verdict must be judged by an agent that did NOT implement the change (the `verification-specialist`), never self-certified by the implementer. This is runtime scratch — it is gitignored and MUST NOT be committed (treat it like the secrets exclusion in the commit step).
|
|
170
183
|
|
|
171
184
|
On Claude, the `enforce-verification-gate.sh` Stop hook reads this file and **will not let the flow stop** until it shows a terminal, all-`pass` verdict — carrying over the non-bypassable completion gate of the `/goal` primitive, but checked deterministically against real evidence rather than by a transcript-only evaluator model. If you must stop before completion (a readiness gate failed, a blocker was found, a dependency is unresolved), write the verdict with `status: "blocked"` and the reason: that records the outcome and releases the gate instead of leaving it to spin. Other harnesses fall back to this prose obligation.
|
|
172
|
-
3. Write
|
|
185
|
+
3. Write the highest-practical-observation regression test encoding the verification. For user-visible bugs or user-visible Build changes with an available browser/device/e2e harness, this means a deterministic spec on the reported surface. Prove the new spec actually executed and passed in PR CI by recording a named spec log/reporter line or equivalent execution record; green CI without that named evidence does not satisfy this step.
|
|
173
186
|
4. Record Implement usage on the originating work artifact via `lisa:usage-accounting` so the work item (or other implementation-owned artifact) gains a direct `implement` usage entry in the canonical `## Lisa Usage` section. If the parent / child graph is already known, prefer `record_and_rollup` so ancestor totals refresh in the same write; otherwise still write the direct entry, and if runtime usage is unavailable, use `source: unavailable` with nullable token/cost fields instead of skipping the row.
|
|
174
187
|
5. Commit ALL outstanding changes in logical batches on the branch (minus sensitive data/information) — not just changes made by the agent team. This includes pre-existing uncommitted changes that were on the branch before the plan started. Do NOT filter commits to only "task-related" files. If it shows up in git status, it gets committed (unless it contains secrets).
|
|
175
188
|
6. Push the changes - if any pre-push hook blocks you, create a task for the agent team to fix the error/problem whether it was pre-existing or not
|
|
@@ -62,6 +62,9 @@ TDD Cycle:
|
|
|
62
62
|
- Focus on testing behavior, not implementation details
|
|
63
63
|
- The test must fail before you write any production code
|
|
64
64
|
- If the imported module doesn't exist, Jest reports 0 tests found (not N failed) — this is expected RED behavior
|
|
65
|
+
- For a Fix task, or a Build task that changes user-visible behavior, include a regression test at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the RED test plan must include a deterministic spec against the reported surface, using mocked or seeded data where needed.
|
|
66
|
+
- The team lead may not waive, defer, or mark that user-visible regression spec as optional, "if cheap", or equivalent. The only exits are a recorded absence of an end-to-end harness for the affected platform, or a genuine technical blocker with a linked build-ready follow-up ticket created before merge and referenced from the PR and source work item.
|
|
67
|
+
- A regression spec is not complete merely because it exists. Completion evidence must prove the spec actually ran and passed in PR CI with a named log line, reporter output, or equivalent execution record. Guard against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
65
68
|
|
|
66
69
|
### GREEN Phase
|
|
67
70
|
|
|
@@ -58,12 +58,20 @@ For each verification type, state:
|
|
|
58
58
|
|
|
59
59
|
A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun run lint` is NOT a verification plan. Those are quality gates handled in step 1.
|
|
60
60
|
|
|
61
|
+
For a user-visible Fix, or a Build change that affects user-visible behavior, the verification plan must include the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for the affected platform, plan a deterministic regression spec against that surface and the empirical command that observes the same surface. Unit-level or API-only verification does not satisfy a UI-surface defect when browser/device automation is available.
|
|
62
|
+
|
|
63
|
+
The lead cannot waive, defer, or demote this regression spec as optional, "if cheap", or equivalent. The only acceptable exits are a recorded absence of an end-to-end harness for the platform, or a genuine technical blocker that is captured before merge as a linked build-ready follow-up ticket referenced from the PR and source work item.
|
|
64
|
+
|
|
61
65
|
### 6. Execute
|
|
62
66
|
|
|
63
67
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
64
68
|
|
|
65
69
|
Evidence output must explicitly label each verification result as either `verified empirically` or `artifact-only / verification deferred`. Artifact-only evidence can support a blocked escalation packet, but it cannot mark a required runtime verification complete.
|
|
66
70
|
|
|
71
|
+
For a required user-visible regression spec, evidence must prove execution, not only existence. Record a CI log line, reporter output, or equivalent artifact that names the new spec and shows it ran and passed in the PR. A green CI run without named execution proof is not enough; explicitly check for `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
72
|
+
|
|
73
|
+
If auto-merge is enabled while the regression spec is still in flight, disable auto-merge or apply an equivalent merge gate until the spec commit is pushed and its CI execution proof is available. Do not let the PR merge before the required regression deliverable is satisfied or formally blocked through the linked follow-up path.
|
|
74
|
+
|
|
67
75
|
### 7. Codify
|
|
68
76
|
|
|
69
77
|
After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
@@ -72,6 +80,8 @@ The `codify-verification` skill maps the verification type to the appropriate fr
|
|
|
72
80
|
|
|
73
81
|
Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
|
|
74
82
|
|
|
83
|
+
For UI-surface defects with an available browser/device/e2e harness, codification must happen in that harness or the nearest surface-equivalent automated harness. Lower-level tests may be added for diagnosis or edge cases, but they do not replace the reported-surface regression spec.
|
|
84
|
+
|
|
75
85
|
A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
|
|
76
86
|
|
|
77
87
|
### 8. Spec Conformance
|
|
@@ -111,6 +111,17 @@ IF it is a Fix (bug), execute the Reproduce sub-flow FIRST:
|
|
|
111
111
|
1. Write a simple API client and call the offending API
|
|
112
112
|
2. Start the server on localhost and use the Playwright CLI or Chrome DevTools
|
|
113
113
|
|
|
114
|
+
For any Fix flow, and for any Build flow that changes user-visible behavior, regression coverage is a required deliverable at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the task plan and definition of done MUST include a deterministic regression spec against the reported surface, using mocked or seeded data where needed. This is alongside unit or integration coverage, not a substitute for it.
|
|
115
|
+
|
|
116
|
+
The team lead may not waive, defer, demote, or phrase this regression spec as "optional", "if cheap", "nice to have", or equivalent. The only permitted exits are:
|
|
117
|
+
|
|
118
|
+
1. The project genuinely has no end-to-end harness for the affected platform; record the checked locations and that absence in the task metadata, PR, and work-item evidence.
|
|
119
|
+
2. A genuine technical blocker prevents adding or executing the spec in this PR; before merge, create a linked build-ready follow-up ticket, reference it from the PR and source work item, and keep the current item blocked or explicitly non-terminal until that follow-up is accepted.
|
|
120
|
+
|
|
121
|
+
Completion evidence for the regression spec must prove execution, not mere existence. A green CI run is insufficient unless the PR evidence includes a CI log line, reporter output, or equivalent record naming the new spec and showing that it ran and passed. Guard explicitly against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
122
|
+
|
|
123
|
+
If the required regression spec is still in flight on an auto-merge-enabled PR, pause auto-merge or use an equivalent merge gate until the spec commit is pushed and its execution proof is available. The flow must not allow the PR to merge before this non-demotable deliverable is satisfied or formally blocked through the linked follow-up path above.
|
|
124
|
+
|
|
114
125
|
Using the general-purpose agent in Team Lead session, determine how you will know that the task is fully complete. Write this as an **effective completion condition** — one an independent verifier could confirm from observed output alone, not from your assertion that it works. A strong condition has:
|
|
115
126
|
|
|
116
127
|
- **One measurable end state** — a status code, an exit code, a row count, an observable UI state, an empty queue. Not "it looks right" or "the code is correct".
|
|
@@ -146,13 +157,15 @@ Every task MUST include this JSON metadata block. Do NOT omit `skills` (use `[]`
|
|
|
146
157
|
|
|
147
158
|
Before any task is implemented, the agent team must explore the codebase for relevant research (documentation, code, git history, etc) and update each task's `metadata.relevant_documentation` with the findings.
|
|
148
159
|
|
|
160
|
+
For Fix tasks and user-visible Build tasks, `testing_requirements` must include the highest-practical-observation regression requirement above, including the selected harness or the recorded absence/blocker path. The completion condition must include the proof command and the required CI execution evidence for the new spec.
|
|
161
|
+
|
|
149
162
|
Each task must be reviewed by the team to make sure their verification passes.
|
|
150
163
|
Each task must have their learnings reviewed by the learner subagent.
|
|
151
164
|
|
|
152
165
|
Before shutting down the team, execute the Verify flow:
|
|
153
166
|
|
|
154
167
|
1. Run quality gates: lint, typecheck, tests — all must pass. These are prerequisites, NOT verification.
|
|
155
|
-
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step.
|
|
168
|
+
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step. For UI-surface bugs, the proof must observe the UI surface with browser/device automation against the target environment whenever such a harness exists; unit-level or API-only proof cannot satisfy the empirical verification contract for a UI-surface defect.
|
|
156
169
|
2a. **Record the verification verdict** — the independent, machine-readable proof that gates completion. The `verification-specialist` writes `${CLAUDE_PROJECT_DIR:-.}/.lisa/verification-status.json` with one entry per acceptance criterion, each carrying the proof command's observed evidence:
|
|
157
170
|
|
|
158
171
|
```json
|
|
@@ -169,7 +182,7 @@ Before shutting down the team, execute the Verify flow:
|
|
|
169
182
|
Set `status: "pass"` only when every criterion is `pass` with real evidence (output from running the system, not a claim). The verdict must be judged by an agent that did NOT implement the change (the `verification-specialist`), never self-certified by the implementer. This is runtime scratch — it is gitignored and MUST NOT be committed (treat it like the secrets exclusion in the commit step).
|
|
170
183
|
|
|
171
184
|
On Claude, the `enforce-verification-gate.sh` Stop hook reads this file and **will not let the flow stop** until it shows a terminal, all-`pass` verdict — carrying over the non-bypassable completion gate of the `/goal` primitive, but checked deterministically against real evidence rather than by a transcript-only evaluator model. If you must stop before completion (a readiness gate failed, a blocker was found, a dependency is unresolved), write the verdict with `status: "blocked"` and the reason: that records the outcome and releases the gate instead of leaving it to spin. Other harnesses fall back to this prose obligation.
|
|
172
|
-
3. Write
|
|
185
|
+
3. Write the highest-practical-observation regression test encoding the verification. For user-visible bugs or user-visible Build changes with an available browser/device/e2e harness, this means a deterministic spec on the reported surface. Prove the new spec actually executed and passed in PR CI by recording a named spec log/reporter line or equivalent execution record; green CI without that named evidence does not satisfy this step.
|
|
173
186
|
4. Record Implement usage on the originating work artifact via `lisa:usage-accounting` so the work item (or other implementation-owned artifact) gains a direct `implement` usage entry in the canonical `## Lisa Usage` section. If the parent / child graph is already known, prefer `record_and_rollup` so ancestor totals refresh in the same write; otherwise still write the direct entry, and if runtime usage is unavailable, use `source: unavailable` with nullable token/cost fields instead of skipping the row.
|
|
174
187
|
5. Commit ALL outstanding changes in logical batches on the branch (minus sensitive data/information) — not just changes made by the agent team. This includes pre-existing uncommitted changes that were on the branch before the plan started. Do NOT filter commits to only "task-related" files. If it shows up in git status, it gets committed (unless it contains secrets).
|
|
175
188
|
6. Push the changes - if any pre-push hook blocks you, create a task for the agent team to fix the error/problem whether it was pre-existing or not
|
|
@@ -62,6 +62,9 @@ TDD Cycle:
|
|
|
62
62
|
- Focus on testing behavior, not implementation details
|
|
63
63
|
- The test must fail before you write any production code
|
|
64
64
|
- If the imported module doesn't exist, Jest reports 0 tests found (not N failed) — this is expected RED behavior
|
|
65
|
+
- For a Fix task, or a Build task that changes user-visible behavior, include a regression test at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the RED test plan must include a deterministic spec against the reported surface, using mocked or seeded data where needed.
|
|
66
|
+
- The team lead may not waive, defer, or mark that user-visible regression spec as optional, "if cheap", or equivalent. The only exits are a recorded absence of an end-to-end harness for the affected platform, or a genuine technical blocker with a linked build-ready follow-up ticket created before merge and referenced from the PR and source work item.
|
|
67
|
+
- A regression spec is not complete merely because it exists. Completion evidence must prove the spec actually ran and passed in PR CI with a named log line, reporter output, or equivalent execution record. Guard against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
65
68
|
|
|
66
69
|
### GREEN Phase
|
|
67
70
|
|
|
@@ -58,12 +58,20 @@ For each verification type, state:
|
|
|
58
58
|
|
|
59
59
|
A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun run lint` is NOT a verification plan. Those are quality gates handled in step 1.
|
|
60
60
|
|
|
61
|
+
For a user-visible Fix, or a Build change that affects user-visible behavior, the verification plan must include the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for the affected platform, plan a deterministic regression spec against that surface and the empirical command that observes the same surface. Unit-level or API-only verification does not satisfy a UI-surface defect when browser/device automation is available.
|
|
62
|
+
|
|
63
|
+
The lead cannot waive, defer, or demote this regression spec as optional, "if cheap", or equivalent. The only acceptable exits are a recorded absence of an end-to-end harness for the platform, or a genuine technical blocker that is captured before merge as a linked build-ready follow-up ticket referenced from the PR and source work item.
|
|
64
|
+
|
|
61
65
|
### 6. Execute
|
|
62
66
|
|
|
63
67
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
64
68
|
|
|
65
69
|
Evidence output must explicitly label each verification result as either `verified empirically` or `artifact-only / verification deferred`. Artifact-only evidence can support a blocked escalation packet, but it cannot mark a required runtime verification complete.
|
|
66
70
|
|
|
71
|
+
For a required user-visible regression spec, evidence must prove execution, not only existence. Record a CI log line, reporter output, or equivalent artifact that names the new spec and shows it ran and passed in the PR. A green CI run without named execution proof is not enough; explicitly check for `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
72
|
+
|
|
73
|
+
If auto-merge is enabled while the regression spec is still in flight, disable auto-merge or apply an equivalent merge gate until the spec commit is pushed and its CI execution proof is available. Do not let the PR merge before the required regression deliverable is satisfied or formally blocked through the linked follow-up path.
|
|
74
|
+
|
|
67
75
|
### 7. Codify
|
|
68
76
|
|
|
69
77
|
After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
@@ -72,6 +80,8 @@ The `codify-verification` skill maps the verification type to the appropriate fr
|
|
|
72
80
|
|
|
73
81
|
Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
|
|
74
82
|
|
|
83
|
+
For UI-surface defects with an available browser/device/e2e harness, codification must happen in that harness or the nearest surface-equivalent automated harness. Lower-level tests may be added for diagnosis or edge cases, but they do not replace the reported-surface regression spec.
|
|
84
|
+
|
|
75
85
|
A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
|
|
76
86
|
|
|
77
87
|
### 8. Spec Conformance
|
|
@@ -111,6 +111,17 @@ IF it is a Fix (bug), execute the Reproduce sub-flow FIRST:
|
|
|
111
111
|
1. Write a simple API client and call the offending API
|
|
112
112
|
2. Start the server on localhost and use the Playwright CLI or Chrome DevTools
|
|
113
113
|
|
|
114
|
+
For any Fix flow, and for any Build flow that changes user-visible behavior, regression coverage is a required deliverable at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the task plan and definition of done MUST include a deterministic regression spec against the reported surface, using mocked or seeded data where needed. This is alongside unit or integration coverage, not a substitute for it.
|
|
115
|
+
|
|
116
|
+
The team lead may not waive, defer, demote, or phrase this regression spec as "optional", "if cheap", "nice to have", or equivalent. The only permitted exits are:
|
|
117
|
+
|
|
118
|
+
1. The project genuinely has no end-to-end harness for the affected platform; record the checked locations and that absence in the task metadata, PR, and work-item evidence.
|
|
119
|
+
2. A genuine technical blocker prevents adding or executing the spec in this PR; before merge, create a linked build-ready follow-up ticket, reference it from the PR and source work item, and keep the current item blocked or explicitly non-terminal until that follow-up is accepted.
|
|
120
|
+
|
|
121
|
+
Completion evidence for the regression spec must prove execution, not mere existence. A green CI run is insufficient unless the PR evidence includes a CI log line, reporter output, or equivalent record naming the new spec and showing that it ran and passed. Guard explicitly against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
122
|
+
|
|
123
|
+
If the required regression spec is still in flight on an auto-merge-enabled PR, pause auto-merge or use an equivalent merge gate until the spec commit is pushed and its execution proof is available. The flow must not allow the PR to merge before this non-demotable deliverable is satisfied or formally blocked through the linked follow-up path above.
|
|
124
|
+
|
|
114
125
|
Using the general-purpose agent in Team Lead session, determine how you will know that the task is fully complete. Write this as an **effective completion condition** — one an independent verifier could confirm from observed output alone, not from your assertion that it works. A strong condition has:
|
|
115
126
|
|
|
116
127
|
- **One measurable end state** — a status code, an exit code, a row count, an observable UI state, an empty queue. Not "it looks right" or "the code is correct".
|
|
@@ -146,13 +157,15 @@ Every task MUST include this JSON metadata block. Do NOT omit `skills` (use `[]`
|
|
|
146
157
|
|
|
147
158
|
Before any task is implemented, the agent team must explore the codebase for relevant research (documentation, code, git history, etc) and update each task's `metadata.relevant_documentation` with the findings.
|
|
148
159
|
|
|
160
|
+
For Fix tasks and user-visible Build tasks, `testing_requirements` must include the highest-practical-observation regression requirement above, including the selected harness or the recorded absence/blocker path. The completion condition must include the proof command and the required CI execution evidence for the new spec.
|
|
161
|
+
|
|
149
162
|
Each task must be reviewed by the team to make sure their verification passes.
|
|
150
163
|
Each task must have their learnings reviewed by the learner subagent.
|
|
151
164
|
|
|
152
165
|
Before shutting down the team, execute the Verify flow:
|
|
153
166
|
|
|
154
167
|
1. Run quality gates: lint, typecheck, tests — all must pass. These are prerequisites, NOT verification.
|
|
155
|
-
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step.
|
|
168
|
+
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step. For UI-surface bugs, the proof must observe the UI surface with browser/device automation against the target environment whenever such a harness exists; unit-level or API-only proof cannot satisfy the empirical verification contract for a UI-surface defect.
|
|
156
169
|
2a. **Record the verification verdict** — the independent, machine-readable proof that gates completion. The `verification-specialist` writes `${CLAUDE_PROJECT_DIR:-.}/.lisa/verification-status.json` with one entry per acceptance criterion, each carrying the proof command's observed evidence:
|
|
157
170
|
|
|
158
171
|
```json
|
|
@@ -169,7 +182,7 @@ Before shutting down the team, execute the Verify flow:
|
|
|
169
182
|
Set `status: "pass"` only when every criterion is `pass` with real evidence (output from running the system, not a claim). The verdict must be judged by an agent that did NOT implement the change (the `verification-specialist`), never self-certified by the implementer. This is runtime scratch — it is gitignored and MUST NOT be committed (treat it like the secrets exclusion in the commit step).
|
|
170
183
|
|
|
171
184
|
On Claude, the `enforce-verification-gate.sh` Stop hook reads this file and **will not let the flow stop** until it shows a terminal, all-`pass` verdict — carrying over the non-bypassable completion gate of the `/goal` primitive, but checked deterministically against real evidence rather than by a transcript-only evaluator model. If you must stop before completion (a readiness gate failed, a blocker was found, a dependency is unresolved), write the verdict with `status: "blocked"` and the reason: that records the outcome and releases the gate instead of leaving it to spin. Other harnesses fall back to this prose obligation.
|
|
172
|
-
3. Write
|
|
185
|
+
3. Write the highest-practical-observation regression test encoding the verification. For user-visible bugs or user-visible Build changes with an available browser/device/e2e harness, this means a deterministic spec on the reported surface. Prove the new spec actually executed and passed in PR CI by recording a named spec log/reporter line or equivalent execution record; green CI without that named evidence does not satisfy this step.
|
|
173
186
|
4. Record Implement usage on the originating work artifact via `lisa:usage-accounting` so the work item (or other implementation-owned artifact) gains a direct `implement` usage entry in the canonical `## Lisa Usage` section. If the parent / child graph is already known, prefer `record_and_rollup` so ancestor totals refresh in the same write; otherwise still write the direct entry, and if runtime usage is unavailable, use `source: unavailable` with nullable token/cost fields instead of skipping the row.
|
|
174
187
|
5. Commit ALL outstanding changes in logical batches on the branch (minus sensitive data/information) — not just changes made by the agent team. This includes pre-existing uncommitted changes that were on the branch before the plan started. Do NOT filter commits to only "task-related" files. If it shows up in git status, it gets committed (unless it contains secrets).
|
|
175
188
|
6. Push the changes - if any pre-push hook blocks you, create a task for the agent team to fix the error/problem whether it was pre-existing or not
|
|
@@ -62,6 +62,9 @@ TDD Cycle:
|
|
|
62
62
|
- Focus on testing behavior, not implementation details
|
|
63
63
|
- The test must fail before you write any production code
|
|
64
64
|
- If the imported module doesn't exist, Jest reports 0 tests found (not N failed) — this is expected RED behavior
|
|
65
|
+
- For a Fix task, or a Build task that changes user-visible behavior, include a regression test at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the RED test plan must include a deterministic spec against the reported surface, using mocked or seeded data where needed.
|
|
66
|
+
- The team lead may not waive, defer, or mark that user-visible regression spec as optional, "if cheap", or equivalent. The only exits are a recorded absence of an end-to-end harness for the affected platform, or a genuine technical blocker with a linked build-ready follow-up ticket created before merge and referenced from the PR and source work item.
|
|
67
|
+
- A regression spec is not complete merely because it exists. Completion evidence must prove the spec actually ran and passed in PR CI with a named log line, reporter output, or equivalent execution record. Guard against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
65
68
|
|
|
66
69
|
### GREEN Phase
|
|
67
70
|
|
|
@@ -58,12 +58,20 @@ For each verification type, state:
|
|
|
58
58
|
|
|
59
59
|
A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun run lint` is NOT a verification plan. Those are quality gates handled in step 1.
|
|
60
60
|
|
|
61
|
+
For a user-visible Fix, or a Build change that affects user-visible behavior, the verification plan must include the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for the affected platform, plan a deterministic regression spec against that surface and the empirical command that observes the same surface. Unit-level or API-only verification does not satisfy a UI-surface defect when browser/device automation is available.
|
|
62
|
+
|
|
63
|
+
The lead cannot waive, defer, or demote this regression spec as optional, "if cheap", or equivalent. The only acceptable exits are a recorded absence of an end-to-end harness for the platform, or a genuine technical blocker that is captured before merge as a linked build-ready follow-up ticket referenced from the PR and source work item.
|
|
64
|
+
|
|
61
65
|
### 6. Execute
|
|
62
66
|
|
|
63
67
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
64
68
|
|
|
65
69
|
Evidence output must explicitly label each verification result as either `verified empirically` or `artifact-only / verification deferred`. Artifact-only evidence can support a blocked escalation packet, but it cannot mark a required runtime verification complete.
|
|
66
70
|
|
|
71
|
+
For a required user-visible regression spec, evidence must prove execution, not only existence. Record a CI log line, reporter output, or equivalent artifact that names the new spec and shows it ran and passed in the PR. A green CI run without named execution proof is not enough; explicitly check for `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
72
|
+
|
|
73
|
+
If auto-merge is enabled while the regression spec is still in flight, disable auto-merge or apply an equivalent merge gate until the spec commit is pushed and its CI execution proof is available. Do not let the PR merge before the required regression deliverable is satisfied or formally blocked through the linked follow-up path.
|
|
74
|
+
|
|
67
75
|
### 7. Codify
|
|
68
76
|
|
|
69
77
|
After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
@@ -72,6 +80,8 @@ The `codify-verification` skill maps the verification type to the appropriate fr
|
|
|
72
80
|
|
|
73
81
|
Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
|
|
74
82
|
|
|
83
|
+
For UI-surface defects with an available browser/device/e2e harness, codification must happen in that harness or the nearest surface-equivalent automated harness. Lower-level tests may be added for diagnosis or edge cases, but they do not replace the reported-surface regression spec.
|
|
84
|
+
|
|
75
85
|
A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
|
|
76
86
|
|
|
77
87
|
### 8. Spec Conformance
|
|
@@ -111,6 +111,17 @@ IF it is a Fix (bug), execute the Reproduce sub-flow FIRST:
|
|
|
111
111
|
1. Write a simple API client and call the offending API
|
|
112
112
|
2. Start the server on localhost and use the Playwright CLI or Chrome DevTools
|
|
113
113
|
|
|
114
|
+
For any Fix flow, and for any Build flow that changes user-visible behavior, regression coverage is a required deliverable at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the task plan and definition of done MUST include a deterministic regression spec against the reported surface, using mocked or seeded data where needed. This is alongside unit or integration coverage, not a substitute for it.
|
|
115
|
+
|
|
116
|
+
The team lead may not waive, defer, demote, or phrase this regression spec as "optional", "if cheap", "nice to have", or equivalent. The only permitted exits are:
|
|
117
|
+
|
|
118
|
+
1. The project genuinely has no end-to-end harness for the affected platform; record the checked locations and that absence in the task metadata, PR, and work-item evidence.
|
|
119
|
+
2. A genuine technical blocker prevents adding or executing the spec in this PR; before merge, create a linked build-ready follow-up ticket, reference it from the PR and source work item, and keep the current item blocked or explicitly non-terminal until that follow-up is accepted.
|
|
120
|
+
|
|
121
|
+
Completion evidence for the regression spec must prove execution, not mere existence. A green CI run is insufficient unless the PR evidence includes a CI log line, reporter output, or equivalent record naming the new spec and showing that it ran and passed. Guard explicitly against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
122
|
+
|
|
123
|
+
If the required regression spec is still in flight on an auto-merge-enabled PR, pause auto-merge or use an equivalent merge gate until the spec commit is pushed and its execution proof is available. The flow must not allow the PR to merge before this non-demotable deliverable is satisfied or formally blocked through the linked follow-up path above.
|
|
124
|
+
|
|
114
125
|
Using the general-purpose agent in Team Lead session, determine how you will know that the task is fully complete. Write this as an **effective completion condition** — one an independent verifier could confirm from observed output alone, not from your assertion that it works. A strong condition has:
|
|
115
126
|
|
|
116
127
|
- **One measurable end state** — a status code, an exit code, a row count, an observable UI state, an empty queue. Not "it looks right" or "the code is correct".
|
|
@@ -146,13 +157,15 @@ Every task MUST include this JSON metadata block. Do NOT omit `skills` (use `[]`
|
|
|
146
157
|
|
|
147
158
|
Before any task is implemented, the agent team must explore the codebase for relevant research (documentation, code, git history, etc) and update each task's `metadata.relevant_documentation` with the findings.
|
|
148
159
|
|
|
160
|
+
For Fix tasks and user-visible Build tasks, `testing_requirements` must include the highest-practical-observation regression requirement above, including the selected harness or the recorded absence/blocker path. The completion condition must include the proof command and the required CI execution evidence for the new spec.
|
|
161
|
+
|
|
149
162
|
Each task must be reviewed by the team to make sure their verification passes.
|
|
150
163
|
Each task must have their learnings reviewed by the learner subagent.
|
|
151
164
|
|
|
152
165
|
Before shutting down the team, execute the Verify flow:
|
|
153
166
|
|
|
154
167
|
1. Run quality gates: lint, typecheck, tests — all must pass. These are prerequisites, NOT verification.
|
|
155
|
-
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step.
|
|
168
|
+
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step. For UI-surface bugs, the proof must observe the UI surface with browser/device automation against the target environment whenever such a harness exists; unit-level or API-only proof cannot satisfy the empirical verification contract for a UI-surface defect.
|
|
156
169
|
2a. **Record the verification verdict** — the independent, machine-readable proof that gates completion. The `verification-specialist` writes `${CLAUDE_PROJECT_DIR:-.}/.lisa/verification-status.json` with one entry per acceptance criterion, each carrying the proof command's observed evidence:
|
|
157
170
|
|
|
158
171
|
```json
|
|
@@ -169,7 +182,7 @@ Before shutting down the team, execute the Verify flow:
|
|
|
169
182
|
Set `status: "pass"` only when every criterion is `pass` with real evidence (output from running the system, not a claim). The verdict must be judged by an agent that did NOT implement the change (the `verification-specialist`), never self-certified by the implementer. This is runtime scratch — it is gitignored and MUST NOT be committed (treat it like the secrets exclusion in the commit step).
|
|
170
183
|
|
|
171
184
|
On Claude, the `enforce-verification-gate.sh` Stop hook reads this file and **will not let the flow stop** until it shows a terminal, all-`pass` verdict — carrying over the non-bypassable completion gate of the `/goal` primitive, but checked deterministically against real evidence rather than by a transcript-only evaluator model. If you must stop before completion (a readiness gate failed, a blocker was found, a dependency is unresolved), write the verdict with `status: "blocked"` and the reason: that records the outcome and releases the gate instead of leaving it to spin. Other harnesses fall back to this prose obligation.
|
|
172
|
-
3. Write
|
|
185
|
+
3. Write the highest-practical-observation regression test encoding the verification. For user-visible bugs or user-visible Build changes with an available browser/device/e2e harness, this means a deterministic spec on the reported surface. Prove the new spec actually executed and passed in PR CI by recording a named spec log/reporter line or equivalent execution record; green CI without that named evidence does not satisfy this step.
|
|
173
186
|
4. Record Implement usage on the originating work artifact via `lisa:usage-accounting` so the work item (or other implementation-owned artifact) gains a direct `implement` usage entry in the canonical `## Lisa Usage` section. If the parent / child graph is already known, prefer `record_and_rollup` so ancestor totals refresh in the same write; otherwise still write the direct entry, and if runtime usage is unavailable, use `source: unavailable` with nullable token/cost fields instead of skipping the row.
|
|
174
187
|
5. Commit ALL outstanding changes in logical batches on the branch (minus sensitive data/information) — not just changes made by the agent team. This includes pre-existing uncommitted changes that were on the branch before the plan started. Do NOT filter commits to only "task-related" files. If it shows up in git status, it gets committed (unless it contains secrets).
|
|
175
188
|
6. Push the changes - if any pre-push hook blocks you, create a task for the agent team to fix the error/problem whether it was pre-existing or not
|
|
@@ -62,6 +62,9 @@ TDD Cycle:
|
|
|
62
62
|
- Focus on testing behavior, not implementation details
|
|
63
63
|
- The test must fail before you write any production code
|
|
64
64
|
- If the imported module doesn't exist, Jest reports 0 tests found (not N failed) — this is expected RED behavior
|
|
65
|
+
- For a Fix task, or a Build task that changes user-visible behavior, include a regression test at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the RED test plan must include a deterministic spec against the reported surface, using mocked or seeded data where needed.
|
|
66
|
+
- The team lead may not waive, defer, or mark that user-visible regression spec as optional, "if cheap", or equivalent. The only exits are a recorded absence of an end-to-end harness for the affected platform, or a genuine technical blocker with a linked build-ready follow-up ticket created before merge and referenced from the PR and source work item.
|
|
67
|
+
- A regression spec is not complete merely because it exists. Completion evidence must prove the spec actually ran and passed in PR CI with a named log line, reporter output, or equivalent execution record. Guard against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
65
68
|
|
|
66
69
|
### GREEN Phase
|
|
67
70
|
|
|
@@ -58,12 +58,20 @@ For each verification type, state:
|
|
|
58
58
|
|
|
59
59
|
A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun run lint` is NOT a verification plan. Those are quality gates handled in step 1.
|
|
60
60
|
|
|
61
|
+
For a user-visible Fix, or a Build change that affects user-visible behavior, the verification plan must include the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for the affected platform, plan a deterministic regression spec against that surface and the empirical command that observes the same surface. Unit-level or API-only verification does not satisfy a UI-surface defect when browser/device automation is available.
|
|
62
|
+
|
|
63
|
+
The lead cannot waive, defer, or demote this regression spec as optional, "if cheap", or equivalent. The only acceptable exits are a recorded absence of an end-to-end harness for the platform, or a genuine technical blocker that is captured before merge as a linked build-ready follow-up ticket referenced from the PR and source work item.
|
|
64
|
+
|
|
61
65
|
### 6. Execute
|
|
62
66
|
|
|
63
67
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
64
68
|
|
|
65
69
|
Evidence output must explicitly label each verification result as either `verified empirically` or `artifact-only / verification deferred`. Artifact-only evidence can support a blocked escalation packet, but it cannot mark a required runtime verification complete.
|
|
66
70
|
|
|
71
|
+
For a required user-visible regression spec, evidence must prove execution, not only existence. Record a CI log line, reporter output, or equivalent artifact that names the new spec and shows it ran and passed in the PR. A green CI run without named execution proof is not enough; explicitly check for `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
72
|
+
|
|
73
|
+
If auto-merge is enabled while the regression spec is still in flight, disable auto-merge or apply an equivalent merge gate until the spec commit is pushed and its CI execution proof is available. Do not let the PR merge before the required regression deliverable is satisfied or formally blocked through the linked follow-up path.
|
|
74
|
+
|
|
67
75
|
### 7. Codify
|
|
68
76
|
|
|
69
77
|
After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
@@ -72,6 +80,8 @@ The `codify-verification` skill maps the verification type to the appropriate fr
|
|
|
72
80
|
|
|
73
81
|
Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
|
|
74
82
|
|
|
83
|
+
For UI-surface defects with an available browser/device/e2e harness, codification must happen in that harness or the nearest surface-equivalent automated harness. Lower-level tests may be added for diagnosis or edge cases, but they do not replace the reported-surface regression spec.
|
|
84
|
+
|
|
75
85
|
A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
|
|
76
86
|
|
|
77
87
|
### 8. Spec Conformance
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lisa-openclaw",
|
|
3
|
-
"version": "2.159.
|
|
3
|
+
"version": "2.159.6",
|
|
4
4
|
"description": "Connect staff roles to Telegram or Slack via OpenClaw — facilitator/specialist hub-and-spoke routing and repo-coding topics, for Claude Code and Codex",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Cody Swann"
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lisa-openclaw",
|
|
3
|
-
"version": "2.159.
|
|
3
|
+
"version": "2.159.6",
|
|
4
4
|
"description": "Connect staff roles to Telegram or Slack via OpenClaw — facilitator/specialist hub-and-spoke routing and repo-coding topics, across Claude and Codex.",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Cody Swann"
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lisa-openclaw",
|
|
3
|
-
"version": "2.159.
|
|
3
|
+
"version": "2.159.6",
|
|
4
4
|
"description": "Connect staff roles to Telegram or Slack via OpenClaw — facilitator/specialist hub-and-spoke routing and repo-coding topics, for Claude Code and Codex",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Cody Swann"
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lisa-openclaw",
|
|
3
|
-
"version": "2.159.
|
|
3
|
+
"version": "2.159.6",
|
|
4
4
|
"description": "Connect staff roles to Telegram or Slack via OpenClaw — facilitator/specialist hub-and-spoke routing and repo-coding topics, for Claude Code and Codex",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Cody Swann"
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "lisa-openclaw",
|
|
3
|
-
"version": "2.159.
|
|
3
|
+
"version": "2.159.6",
|
|
4
4
|
"description": "Connect staff roles to Telegram or Slack via OpenClaw — facilitator/specialist hub-and-spoke routing and repo-coding topics, for Claude Code and Codex",
|
|
5
5
|
"author": {
|
|
6
6
|
"name": "Cody Swann"
|
|
@@ -111,6 +111,17 @@ IF it is a Fix (bug), execute the Reproduce sub-flow FIRST:
|
|
|
111
111
|
1. Write a simple API client and call the offending API
|
|
112
112
|
2. Start the server on localhost and use the Playwright CLI or Chrome DevTools
|
|
113
113
|
|
|
114
|
+
For any Fix flow, and for any Build flow that changes user-visible behavior, regression coverage is a required deliverable at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the task plan and definition of done MUST include a deterministic regression spec against the reported surface, using mocked or seeded data where needed. This is alongside unit or integration coverage, not a substitute for it.
|
|
115
|
+
|
|
116
|
+
The team lead may not waive, defer, demote, or phrase this regression spec as "optional", "if cheap", "nice to have", or equivalent. The only permitted exits are:
|
|
117
|
+
|
|
118
|
+
1. The project genuinely has no end-to-end harness for the affected platform; record the checked locations and that absence in the task metadata, PR, and work-item evidence.
|
|
119
|
+
2. A genuine technical blocker prevents adding or executing the spec in this PR; before merge, create a linked build-ready follow-up ticket, reference it from the PR and source work item, and keep the current item blocked or explicitly non-terminal until that follow-up is accepted.
|
|
120
|
+
|
|
121
|
+
Completion evidence for the regression spec must prove execution, not mere existence. A green CI run is insufficient unless the PR evidence includes a CI log line, reporter output, or equivalent record naming the new spec and showing that it ran and passed. Guard explicitly against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
122
|
+
|
|
123
|
+
If the required regression spec is still in flight on an auto-merge-enabled PR, pause auto-merge or use an equivalent merge gate until the spec commit is pushed and its execution proof is available. The flow must not allow the PR to merge before this non-demotable deliverable is satisfied or formally blocked through the linked follow-up path above.
|
|
124
|
+
|
|
114
125
|
Using the general-purpose agent in Team Lead session, determine how you will know that the task is fully complete. Write this as an **effective completion condition** — one an independent verifier could confirm from observed output alone, not from your assertion that it works. A strong condition has:
|
|
115
126
|
|
|
116
127
|
- **One measurable end state** — a status code, an exit code, a row count, an observable UI state, an empty queue. Not "it looks right" or "the code is correct".
|
|
@@ -146,13 +157,15 @@ Every task MUST include this JSON metadata block. Do NOT omit `skills` (use `[]`
|
|
|
146
157
|
|
|
147
158
|
Before any task is implemented, the agent team must explore the codebase for relevant research (documentation, code, git history, etc) and update each task's `metadata.relevant_documentation` with the findings.
|
|
148
159
|
|
|
160
|
+
For Fix tasks and user-visible Build tasks, `testing_requirements` must include the highest-practical-observation regression requirement above, including the selected harness or the recorded absence/blocker path. The completion condition must include the proof command and the required CI execution evidence for the new spec.
|
|
161
|
+
|
|
149
162
|
Each task must be reviewed by the team to make sure their verification passes.
|
|
150
163
|
Each task must have their learnings reviewed by the learner subagent.
|
|
151
164
|
|
|
152
165
|
Before shutting down the team, execute the Verify flow:
|
|
153
166
|
|
|
154
167
|
1. Run quality gates: lint, typecheck, tests — all must pass. These are prerequisites, NOT verification.
|
|
155
|
-
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step.
|
|
168
|
+
2. `verification-specialist`: verify locally by running the actual system and observing results (empirical proof that the change works). This is the real verification step. For UI-surface bugs, the proof must observe the UI surface with browser/device automation against the target environment whenever such a harness exists; unit-level or API-only proof cannot satisfy the empirical verification contract for a UI-surface defect.
|
|
156
169
|
2a. **Record the verification verdict** — the independent, machine-readable proof that gates completion. The `verification-specialist` writes `${CLAUDE_PROJECT_DIR:-.}/.lisa/verification-status.json` with one entry per acceptance criterion, each carrying the proof command's observed evidence:
|
|
157
170
|
|
|
158
171
|
```json
|
|
@@ -169,7 +182,7 @@ Before shutting down the team, execute the Verify flow:
|
|
|
169
182
|
Set `status: "pass"` only when every criterion is `pass` with real evidence (output from running the system, not a claim). The verdict must be judged by an agent that did NOT implement the change (the `verification-specialist`), never self-certified by the implementer. This is runtime scratch — it is gitignored and MUST NOT be committed (treat it like the secrets exclusion in the commit step).
|
|
170
183
|
|
|
171
184
|
On Claude, the `enforce-verification-gate.sh` Stop hook reads this file and **will not let the flow stop** until it shows a terminal, all-`pass` verdict — carrying over the non-bypassable completion gate of the `/goal` primitive, but checked deterministically against real evidence rather than by a transcript-only evaluator model. If you must stop before completion (a readiness gate failed, a blocker was found, a dependency is unresolved), write the verdict with `status: "blocked"` and the reason: that records the outcome and releases the gate instead of leaving it to spin. Other harnesses fall back to this prose obligation.
|
|
172
|
-
3. Write
|
|
185
|
+
3. Write the highest-practical-observation regression test encoding the verification. For user-visible bugs or user-visible Build changes with an available browser/device/e2e harness, this means a deterministic spec on the reported surface. Prove the new spec actually executed and passed in PR CI by recording a named spec log/reporter line or equivalent execution record; green CI without that named evidence does not satisfy this step.
|
|
173
186
|
4. Record Implement usage on the originating work artifact via `lisa:usage-accounting` so the work item (or other implementation-owned artifact) gains a direct `implement` usage entry in the canonical `## Lisa Usage` section. If the parent / child graph is already known, prefer `record_and_rollup` so ancestor totals refresh in the same write; otherwise still write the direct entry, and if runtime usage is unavailable, use `source: unavailable` with nullable token/cost fields instead of skipping the row.
|
|
174
187
|
5. Commit ALL outstanding changes in logical batches on the branch (minus sensitive data/information) — not just changes made by the agent team. This includes pre-existing uncommitted changes that were on the branch before the plan started. Do NOT filter commits to only "task-related" files. If it shows up in git status, it gets committed (unless it contains secrets).
|
|
175
188
|
6. Push the changes - if any pre-push hook blocks you, create a task for the agent team to fix the error/problem whether it was pre-existing or not
|
|
@@ -62,6 +62,9 @@ TDD Cycle:
|
|
|
62
62
|
- Focus on testing behavior, not implementation details
|
|
63
63
|
- The test must fail before you write any production code
|
|
64
64
|
- If the imported module doesn't exist, Jest reports 0 tests found (not N failed) — this is expected RED behavior
|
|
65
|
+
- For a Fix task, or a Build task that changes user-visible behavior, include a regression test at the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for that platform (for example Playwright, Maestro, Detox, Cypress, or an equivalent runtime), the RED test plan must include a deterministic spec against the reported surface, using mocked or seeded data where needed.
|
|
66
|
+
- The team lead may not waive, defer, or mark that user-visible regression spec as optional, "if cheap", or equivalent. The only exits are a recorded absence of an end-to-end harness for the affected platform, or a genuine technical blocker with a linked build-ready follow-up ticket created before merge and referenced from the PR and source work item.
|
|
67
|
+
- A regression spec is not complete merely because it exists. Completion evidence must prove the spec actually ran and passed in PR CI with a named log line, reporter output, or equivalent execution record. Guard against `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
65
68
|
|
|
66
69
|
### GREEN Phase
|
|
67
70
|
|
|
@@ -58,12 +58,20 @@ For each verification type, state:
|
|
|
58
58
|
|
|
59
59
|
A verification plan that only lists `bun run test`, `bun run typecheck`, or `bun run lint` is NOT a verification plan. Those are quality gates handled in step 1.
|
|
60
60
|
|
|
61
|
+
For a user-visible Fix, or a Build change that affects user-visible behavior, the verification plan must include the highest practical observation level for the reported surface. If the project has a browser, device, or end-to-end harness for the affected platform, plan a deterministic regression spec against that surface and the empirical command that observes the same surface. Unit-level or API-only verification does not satisfy a UI-surface defect when browser/device automation is available.
|
|
62
|
+
|
|
63
|
+
The lead cannot waive, defer, or demote this regression spec as optional, "if cheap", or equivalent. The only acceptable exits are a recorded absence of an end-to-end harness for the platform, or a genuine technical blocker that is captured before merge as a linked build-ready follow-up ticket referenced from the PR and source work item.
|
|
64
|
+
|
|
61
65
|
### 6. Execute
|
|
62
66
|
|
|
63
67
|
After implementation, run the verification plan. Execute each verification type in order.
|
|
64
68
|
|
|
65
69
|
Evidence output must explicitly label each verification result as either `verified empirically` or `artifact-only / verification deferred`. Artifact-only evidence can support a blocked escalation packet, but it cannot mark a required runtime verification complete.
|
|
66
70
|
|
|
71
|
+
For a required user-visible regression spec, evidence must prove execution, not only existence. Record a CI log line, reporter output, or equivalent artifact that names the new spec and shows it ran and passed in the PR. A green CI run without named execution proof is not enough; explicitly check for `test.skip`, suite-level environment gates, shard filters, and "0 tests" passes.
|
|
72
|
+
|
|
73
|
+
If auto-merge is enabled while the regression spec is still in flight, disable auto-merge or apply an equivalent merge gate until the spec commit is pushed and its CI execution proof is available. Do not let the PR merge before the required regression deliverable is satisfied or formally blocked through the linked follow-up path.
|
|
74
|
+
|
|
67
75
|
### 7. Codify
|
|
68
76
|
|
|
69
77
|
After each empirical verification produces PASS evidence, invoke the `codify-verification` skill to encode the verification as an automated regression test. The manual proof becomes a repeatable check that catches future regressions.
|
|
@@ -72,6 +80,8 @@ The `codify-verification` skill maps the verification type to the appropriate fr
|
|
|
72
80
|
|
|
73
81
|
Codification is mandatory for every empirical verification type with one exception set: PR, Documentation, Deploy, and Investigate-Only spikes — those have inherently non-behavioral proof. For every other type, skipping codification is not allowed; if codification is genuinely impossible (e.g., the test framework does not exist and cannot be installed in scope), escalate via the Escalation Protocol rather than silently skipping.
|
|
74
82
|
|
|
83
|
+
For UI-surface defects with an available browser/device/e2e harness, codification must happen in that harness or the nearest surface-equivalent automated harness. Lower-level tests may be added for diagnosis or edge cases, but they do not replace the reported-surface regression spec.
|
|
84
|
+
|
|
75
85
|
A change is not "verified" in the lifecycle sense until each empirical verification has both passed AND been codified.
|
|
76
86
|
|
|
77
87
|
### 8. Spec Conformance
|