@codyswann/lisa 1.41.0 → 1.42.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,22 +1,145 @@
1
1
  # Empirical Verification
2
2
 
3
- Every task and plan requires a **proof command** - a single command that empirically demonstrates the work is done.
3
+ This repository supports AI agents as first-class contributors.
4
+
5
+ This file is the operational contract that defines how agents plan work, execute changes, verify outcomes, and escalate when blocked.
6
+
7
+ If anything here conflicts with other repo docs, treat this file as authoritative for agent behavior.
8
+
9
+ ---
4
10
 
5
11
  ## Core Principle
6
12
 
13
+ Agents must close the loop between code changes and observable system behavior.
14
+
15
+ No agent may claim success without evidence from runtime verification.
16
+
7
17
  Never assume something works because the code "looks correct." Run a command, observe the output, compare to expected result.
8
18
 
9
- ## Verification Types
19
+ ---
20
+
21
+ ## Roles
22
+
23
+ ### Builder Agent
24
+
25
+ Implements the change in code and infrastructure.
26
+
27
+ ### Verifier Agent
28
+
29
+ Acts as the end user (human user, API client, operator, attacker, or system) and produces proof artifacts.
30
+
31
+ Verifier Agent must be independent from Builder Agent when possible.
32
+
33
+ ### Human Overseer
34
+
35
+ Approves risky operations, security boundary crossings, and any work the agents cannot fully verify.
36
+
37
+ ---
38
+
39
+ ## Verification Levels
40
+
41
+ Agents must label every task outcome with exactly one of these:
42
+
43
+ - **FULLY VERIFIED**: Verified in the target environment with end-user simulation and captured artifacts.
44
+ - **PARTIALLY VERIFIED**: Verified in a lower-fidelity environment or with incomplete surfaces, with explicit gaps documented.
45
+ - **UNVERIFIED**: Verification blocked, human action required, no claim of correctness permitted.
46
+
47
+ ---
48
+
49
+ ## Verification Types Quick Reference
10
50
 
11
51
  | Type | Use Case | Example |
12
52
  |------|----------|---------|
13
53
  | `test` | Unit/integration tests | `npm test -- path/to/file.spec.ts` |
14
54
  | `api-test` | API endpoints | `curl -s localhost:3000/api/endpoint` |
15
55
  | `test-coverage` | Coverage thresholds | `npm run test:cov -- --collectCoverageFrom=...` |
16
- | `ui-recording` | UI changes | Start local server; recorded session with Playwright/Maestro/Chrome Browser |
56
+ | `ui-recording` | UI changes | Start local server; recorded session with Playwright/Maestro/Chrome Browser |
17
57
  | `documentation` | Doc changes | `grep "expected" path/to/doc.md` |
18
58
  | `manual-check` | Config/setup | `cat config.json \| jq '.setting'` |
19
59
 
60
+ ---
61
+
62
+ ## Verification Surfaces
63
+
64
+ Agents may only self-verify when the required verification surfaces are available.
65
+
66
+ Verification surfaces include:
67
+
68
+ ### Action Surfaces
69
+
70
+ - Build and test execution
71
+ - Deployment and rollback
72
+ - Infrastructure apply and drift detection
73
+ - Feature flag toggling
74
+ - Data seeding and state reset
75
+ - Load generation and fault injection
76
+
77
+ ### Observation Surfaces
78
+
79
+ - Application logs (local and remote)
80
+ - Metrics (latency, errors, saturation, scaling)
81
+ - Traces and correlation IDs
82
+ - Database queries and schema inspection
83
+ - Browser and device automation
84
+ - Queue depth and consumer execution visibility
85
+ - CDN headers and edge behavior
86
+ - Artifact capture (video, screenshots, traces, diffs)
87
+
88
+ If a required surface is unavailable, agents must follow the Escalation Protocol.
89
+
90
+ ---
91
+
92
+ ## Tooling Surfaces
93
+
94
+ Many verification steps require tools that may not be available by default.
95
+
96
+ Tooling surfaces include:
97
+
98
+ - Required CLIs (cloud, DB, deployment, observability)
99
+ - Required MCP servers and their capabilities
100
+ - Required internal APIs (feature flags, auth, metrics, logs, CI)
101
+ - Required credentials and scopes for those tools
102
+
103
+ If required tooling is missing, misconfigured, blocked, undocumented, or inaccessible, agents must treat this as a verification blocker and escalate before proceeding.
104
+
105
+ ---
106
+
107
+ ## Proof Artifacts Requirements
108
+
109
+ Every completed task must include proof artifacts stored in the PR description or linked output location.
110
+
111
+ Proof artifacts must be specific and re-checkable.
112
+
113
+ Examples of acceptable proof:
114
+
115
+ - Playwright video and screenshots for UI work
116
+ - HTTP trace and response payload for API work
117
+ - Before/after DB query outputs for data work
118
+ - Metrics snapshots for autoscaling
119
+ - Log excerpts with correlation IDs for behavior validation
120
+ - Load test results showing threshold behavior
121
+
122
+ Statements like "works" or "should work" are not acceptable.
123
+
124
+ ---
125
+
126
+ ## Standard Workflow
127
+
128
+ Agents must follow this sequence unless explicitly instructed otherwise:
129
+
130
+ 1. Restate goal in one sentence.
131
+ 2. Identify the end user of the change.
132
+ 3. Choose the verification method that matches the end user.
133
+ 4. List required verification surfaces and required tooling surfaces.
134
+ 5. Confirm required surfaces are available.
135
+ 6. Implement the change.
136
+ 7. Run verification from the end-user perspective.
137
+ 8. Collect proof artifacts.
138
+ 9. Summarize what changed, what was verified, and remaining risk.
139
+ 10. Label the result with a verification level.
140
+
141
+ ---
142
+
20
143
  ## Task Completion Rules
21
144
 
22
145
  1. **Run the proof command** before marking any task complete
@@ -25,55 +148,36 @@ Never assume something works because the code "looks correct." Run a command, ob
25
148
  4. **If verification blocked** (missing Docker, services, etc.): Mark as blocked, not complete
26
149
  5. **Must not be dependent on CI/CD** if necessary, you may use local deploy methods found in `package.json`, but the verification methods must be listed in the pull request and therefore cannot be dependent on CI/CD completing
27
150
 
28
- ## Examples
151
+ ---
29
152
 
30
- ### API Endpoint (E2E with curl)
153
+ ## End-User Verification Patterns
31
154
 
32
- **Task**: Add health check endpoint
155
+ Agents must choose the pattern that fits the task.
33
156
 
34
- **Wrong verification**: "I added the route handler"
157
+ ### UI and UX Feature or Bug
35
158
 
36
- **Correct verification**:
37
- ```bash
38
- curl -s http://localhost:3000/health | jq '.status'
39
- ```
40
- **Expected**: `"ok"`
159
+ End user: human in browser or native device.
41
160
 
42
- ### API Workflow (Multi-step E2E)
161
+ Required proof:
43
162
 
44
- **Task**: Add user registration endpoint
163
+ - Automated session recording (Playwright preferred)
164
+ - Screenshots of key states
165
+ - Network calls and console errors captured when relevant
45
166
 
46
- **Wrong verification**: "The route handler creates a user record"
47
-
48
- **Correct verification** -- write a small client script that exercises the full flow:
49
- ```bash
50
- # Create user
51
- RESPONSE=$(curl -s -w "\n%{http_code}" -X POST http://localhost:3000/api/users \
52
- -H "Content-Type: application/json" \
53
- -d '{"email":"test@example.com","name":"Test User"}')
54
- HTTP_CODE=$(echo "$RESPONSE" | tail -1)
55
- BODY=$(echo "$RESPONSE" | sed '$d')
56
- echo "Create status: $HTTP_CODE"
57
- echo "Create body: $BODY"
58
-
59
- # Verify the user exists by fetching it back
60
- USER_ID=$(echo "$BODY" | jq -r '.id')
61
- curl -s "http://localhost:3000/api/users/$USER_ID" | jq '.email'
62
- ```
63
- **Expected**: Create returns `201`, fetch returns `"test@example.com"`
64
-
65
- ### UI Feature (Playwright browser verification)
167
+ #### Example: UI Feature (Playwright browser verification)
66
168
 
67
169
  **Task**: Add logout button to the dashboard header
68
170
 
69
171
  **Wrong verification**: "I added the button component to the header"
70
172
 
71
173
  **Correct verification** -- use Playwright to interact with the app as a real user:
174
+
72
175
  ```bash
73
176
  npx playwright test --headed -g "logout button" 2>&1 | tail -20
74
177
  ```
75
178
 
76
179
  Or for ad-hoc verification without a test file, use the Playwright CLI browser tools or `browser_run_code`:
180
+
77
181
  ```javascript
78
182
  async (page) => {
79
183
  await page.goto('http://localhost:3000/dashboard');
@@ -84,15 +188,17 @@ async (page) => {
84
188
  return { url: page.url(), title: await page.title() };
85
189
  }
86
190
  ```
191
+
87
192
  **Expected**: Browser navigates to `/login` after clicking the logout button
88
193
 
89
- ### UI Visual/Behavioral (Screenshot comparison)
194
+ #### Example: UI Visual/Behavioral (Screenshot comparison)
90
195
 
91
196
  **Task**: Fix mobile nav menu not closing after link click
92
197
 
93
198
  **Wrong verification**: "I added an onClick handler that closes the menu"
94
199
 
95
200
  **Correct verification** -- open a browser and perform the exact user action:
201
+
96
202
  ```javascript
97
203
  async (page) => {
98
204
  await page.setViewportSize({ width: 375, height: 812 });
@@ -104,15 +210,76 @@ async (page) => {
104
210
  return { menuVisibleAfterClick: isVisible, url: page.url() };
105
211
  }
106
212
  ```
213
+
107
214
  **Expected**: `menuVisibleAfterClick: false`, url contains `/about`
108
215
 
109
- ### API with Authentication (E2E flow)
216
+ ### API, GraphQL, or RPC Change
217
+
218
+ End user: API client.
219
+
220
+ Required proof:
221
+
222
+ - Curl or a minimal script checked into the repo or attached to PR
223
+ - Response payload showing schema and expected data
224
+ - Negative case if applicable (auth failure, validation error)
225
+
226
+ #### Example: API Endpoint (E2E with curl)
227
+
228
+ **Task**: Add health check endpoint
229
+
230
+ **Wrong verification**: "I added the route handler"
231
+
232
+ **Correct verification**:
233
+
234
+ ```bash
235
+ curl -s http://localhost:3000/health | jq '.status'
236
+ ```
237
+
238
+ **Expected**: `"ok"`
239
+
240
+ #### Example: API Workflow (Multi-step E2E)
241
+
242
+ **Task**: Add user registration endpoint
243
+
244
+ **Wrong verification**: "The route handler creates a user record"
245
+
246
+ **Correct verification** -- write a small client script that exercises the full flow:
247
+
248
+ ```bash
249
+ # Create user
250
+ RESPONSE=$(curl -s -w "\n%{http_code}" -X POST http://localhost:3000/api/users \
251
+ -H "Content-Type: application/json" \
252
+ -d '{"email":"test@example.com","name":"Test User"}')
253
+ HTTP_CODE=$(echo "$RESPONSE" | tail -1)
254
+ BODY=$(echo "$RESPONSE" | sed '$d')
255
+ echo "Create status: $HTTP_CODE"
256
+ echo "Create body: $BODY"
257
+
258
+ # Verify the user exists by fetching it back
259
+ USER_ID=$(echo "$BODY" | jq -r '.id')
260
+ curl -s "http://localhost:3000/api/users/$USER_ID" | jq '.email'
261
+ ```
262
+
263
+ **Expected**: Create returns `201`, fetch returns `"test@example.com"`
264
+
265
+ ### Authentication and Authorization
266
+
267
+ End user: user with specific identity and role.
268
+
269
+ Required proof:
270
+
271
+ - Verification across at least two roles (allowed and denied)
272
+ - Explicit status codes or UI outcomes
273
+ - Artifact showing enforcement (screenshots or HTTP traces)
274
+
275
+ #### Example: API with Authentication (E2E flow)
110
276
 
111
277
  **Task**: Add rate limiting to the search endpoint
112
278
 
113
279
  **Wrong verification**: "I added the rate limiter middleware"
114
280
 
115
281
  **Correct verification** -- actually hit the rate limit:
282
+
116
283
  ```bash
117
284
  # Fire requests until rate limited
118
285
  for i in $(seq 1 25); do
@@ -122,15 +289,27 @@ for i in $(seq 1 25); do
122
289
  echo "Request $i: $CODE"
123
290
  done | tail -5
124
291
  ```
292
+
125
293
  **Expected**: First requests return `200`, later requests return `429`
126
294
 
127
- ### Database Migration
295
+ ### Database Migration or Backfill
296
+
297
+ End user: application and operators.
298
+
299
+ Required proof:
300
+
301
+ - Schema verification
302
+ - Backfill verification with before/after counts
303
+ - Rollback plan validated when possible
304
+
305
+ #### Example: Database Migration
128
306
 
129
307
  **Task**: Add `last_login_at` column to users table
130
308
 
131
309
  **Wrong verification**: "The migration file creates the column"
132
310
 
133
311
  **Correct verification**:
312
+
134
313
  ```bash
135
314
  # Run migration
136
315
  npm run migration:run
@@ -138,4 +317,209 @@ npm run migration:run
138
317
  # Verify column exists and has correct type
139
318
  psql "$DATABASE_URL" -c "\d users" | grep last_login_at
140
319
  ```
320
+
141
321
  **Expected**: `last_login_at | timestamp with time zone |`
322
+
323
+ ### Background Jobs, Queues, Events
324
+
325
+ End user: system operator and downstream consumers.
326
+
327
+ Required proof:
328
+
329
+ - Evidence of enqueue, processing, and final state change
330
+ - Queue depth and worker logs
331
+ - Idempotency check when relevant
332
+
333
+ ### Caching and Performance
334
+
335
+ End user: API consumer or UI user.
336
+
337
+ Required proof:
338
+
339
+ - Measured latency or throughput before and after
340
+ - Cache hit evidence (logs, metrics, key inspection)
341
+ - TTL behavior where relevant
342
+
343
+ ### Infrastructure and Autoscaling
344
+
345
+ End user: operator and workload.
346
+
347
+ Required proof:
348
+
349
+ - Load simulation that triggers scaling or behavior change
350
+ - Metrics showing scale-out and scale-in
351
+ - Evidence of stability (error rates, latency) during the event
352
+
353
+ ### Security Fixes
354
+
355
+ End user: attacker and defender.
356
+
357
+ Required proof:
358
+
359
+ - Reproduction of exploit pre-fix
360
+ - Demonstration of exploit failure post-fix
361
+ - Evidence of safe handling (sanitization, rejection, rate limit)
362
+
363
+ ---
364
+
365
+ ## Escalation Protocol
366
+
367
+ Agents must escalate when verification is blocked, ambiguous, or requires tools that are missing or inaccessible.
368
+
369
+ Common blockers:
370
+
371
+ - VPN required
372
+ - MFA, OTP, SMS codes
373
+ - Hardware token requirement
374
+ - Missing CLI, MCP server, or internal API required for verification
375
+ - Missing documentation on how to access required tooling
376
+ - Production-only access gates
377
+ - Compliance restrictions
378
+
379
+ When blocked, agents must do the following:
380
+
381
+ 1. Identify the exact boundary preventing verification.
382
+ 2. Identify which verification surfaces and tooling surfaces are missing.
383
+ 3. Attempt safe fallback verification (local, staging, mocks) and label it clearly.
384
+ 4. Declare verification level as PARTIALLY VERIFIED or UNVERIFIED.
385
+ 5. Produce a Human Action Packet.
386
+ 6. Pause until explicit human confirmation or tooling is provided.
387
+
388
+ Agents must never proceed past an unverified boundary without surfacing it to the human overseer.
389
+
390
+ ### Human Action Packet Format
391
+
392
+ Agents must provide:
393
+
394
+ - What is blocked and why
395
+ - What tool or access is missing
396
+ - Exactly what the human must do
397
+ - How to confirm completion
398
+ - What the agent will do immediately after
399
+ - What artifacts the agent will produce after access is restored
400
+
401
+ Example:
402
+
403
+ - Blocked: Cannot reach DB, VPN required.
404
+ - Missing: `psql` access to `db.host` and internal logs viewer MCP.
405
+ - Human steps: Connect VPN "CorpVPN", confirm access by running `nc -vz db.host 5432`, provide MCP endpoint or credentials.
406
+ - Confirmation: Reply "VPN ACTIVE" and "MCP READY".
407
+ - Next: Agent runs migration verification script and captures schema diff and query outputs.
408
+
409
+ Agents must pause until explicit human confirmation.
410
+
411
+ Agents must never bypass security controls to proceed.
412
+
413
+ ---
414
+
415
+ ## Environments and Safety Rules
416
+
417
+ ### Allowed Environments
418
+
419
+ - Local development
420
+ - Preview environments
421
+ - Staging
422
+ - Production read-only, only if explicitly approved and configured for safe access
423
+
424
+ ### Prohibited Actions Without Human Approval
425
+
426
+ - Writing to production data stores
427
+ - Disabling MFA or security policies
428
+ - Modifying IAM roles or firewall rules beyond scoped change requests
429
+ - Running destructive migrations
430
+ - Triggering external billing or payment flows
431
+
432
+ If an operation is irreversible or risky, escalate first.
433
+
434
+ ---
435
+
436
+ ## Repository Conventions
437
+
438
+ ### Code Style and Structure
439
+
440
+ - Follow existing patterns in the codebase.
441
+ - Do not introduce new frameworks or architectural patterns without justification in the PR.
442
+ - Prefer small, reviewable changes with clear commit messages.
443
+
444
+ ### Tests
445
+
446
+ - Run the fastest relevant test suite locally.
447
+ - Expand to integration or end-to-end tests based on impact.
448
+ - If tests are flaky or slow, document it and propose a follow-up.
449
+
450
+ ### Logging and Observability
451
+
452
+ - Include correlation IDs where supported.
453
+ - Prefer structured logs over ad hoc strings.
454
+ - For behavior changes, include log evidence in proof artifacts.
455
+
456
+ ---
457
+
458
+ ## Artifact Storage and PR Requirements
459
+
460
+ Every PR must include:
461
+
462
+ - Goal summary
463
+ - Verification level
464
+ - Proof artifacts links or embedded outputs
465
+ - How to reproduce verification locally
466
+ - Known limitations and follow-up items
467
+
468
+ Preferred artifact locations:
469
+
470
+ - PR description
471
+ - Repo-local scripts under `scripts/verification/`
472
+ - CI artifacts linked from the build
473
+
474
+ ---
475
+
476
+ ## Quick Commands
477
+
478
+ Document the canonical commands agents should use here.
479
+
480
+ Replace placeholders with real commands.
481
+
482
+ ### Local
483
+
484
+ - Install: `REPLACE_ME`
485
+ - Run app: `REPLACE_ME`
486
+ - Run unit tests: `REPLACE_ME`
487
+ - Run integration tests: `REPLACE_ME`
488
+ - Lint and format: `REPLACE_ME`
489
+
490
+ ### UI Verification
491
+
492
+ - Playwright tests: `REPLACE_ME`
493
+ - Record a flow: `REPLACE_ME`
494
+
495
+ ### API Verification
496
+
497
+ - Example curl: `REPLACE_ME`
498
+ - GraphQL query runner: `REPLACE_ME`
499
+
500
+ ### Deployment
501
+
502
+ - Deploy to preview: `REPLACE_ME`
503
+ - Deploy to staging: `REPLACE_ME`
504
+ - Rollback: `REPLACE_ME`
505
+
506
+ ### Observability
507
+
508
+ - Tail logs: `REPLACE_ME`
509
+ - Query metrics: `REPLACE_ME`
510
+ - Trace lookup: `REPLACE_ME`
511
+
512
+ ---
513
+
514
+ ## Definition of Done
515
+
516
+ A task is done only when:
517
+
518
+ - End user is identified
519
+ - Verification pattern is applied
520
+ - Required verification surfaces and tooling surfaces are used or explicitly unavailable
521
+ - Proof artifacts are captured
522
+ - Verification level is declared
523
+ - Risks and gaps are documented
524
+
525
+ If any of these are missing, the work is not complete.
package/package.json CHANGED
@@ -86,10 +86,11 @@
86
86
  "@jest/globals": "^30.0.0"
87
87
  },
88
88
  "resolutions": {
89
- "@isaacs/brace-expansion": "^5.0.1"
89
+ "@isaacs/brace-expansion": "^5.0.1",
90
+ "axios": ">=1.13.5"
90
91
  },
91
92
  "name": "@codyswann/lisa",
92
- "version": "1.41.0",
93
+ "version": "1.42.0",
93
94
  "description": "Claude Code governance framework that applies guardrails, guidance, and automated enforcement to projects",
94
95
  "main": "dist/index.js",
95
96
  "bin": {
@@ -18,3 +18,9 @@ pre-push:
18
18
  run: bundle exec rspec
19
19
  brakeman:
20
20
  run: bundle exec brakeman --no-pager --quiet
21
+ reek:
22
+ run: bundle exec reek app/ lib/
23
+ flog:
24
+ run: bundle exec flog --all --group app/ lib/
25
+ flay:
26
+ run: bundle exec flay app/ lib/