@codyswann/lisa 1.40.0 → 1.42.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/all/copy-overwrite/.claude/rules/verfication.md +421 -37
- package/expo/copy-overwrite/.claude/agents/ops-specialist.md +124 -0
- package/expo/copy-overwrite/.claude/skills/ops-browser-uat/SKILL.md +124 -0
- package/expo/copy-overwrite/.claude/skills/ops-check-logs/SKILL.md +211 -0
- package/expo/copy-overwrite/.claude/skills/ops-db-ops/SKILL.md +119 -0
- package/expo/copy-overwrite/.claude/skills/ops-deploy/SKILL.md +119 -0
- package/expo/copy-overwrite/.claude/skills/ops-monitor-errors/SKILL.md +99 -0
- package/expo/copy-overwrite/.claude/skills/ops-performance/SKILL.md +165 -0
- package/expo/copy-overwrite/.claude/skills/ops-run-local/SKILL.md +166 -0
- package/expo/copy-overwrite/.claude/skills/ops-verify-health/SKILL.md +101 -0
- package/package.json +3 -2
- package/rails/copy-overwrite/lefthook.yml +6 -0
|
@@ -1,22 +1,145 @@
|
|
|
1
1
|
# Empirical Verification
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
This repository supports AI agents as first-class contributors.
|
|
4
|
+
|
|
5
|
+
This file is the operational contract that defines how agents plan work, execute changes, verify outcomes, and escalate when blocked.
|
|
6
|
+
|
|
7
|
+
If anything here conflicts with other repo docs, treat this file as authoritative for agent behavior.
|
|
8
|
+
|
|
9
|
+
---
|
|
4
10
|
|
|
5
11
|
## Core Principle
|
|
6
12
|
|
|
13
|
+
Agents must close the loop between code changes and observable system behavior.
|
|
14
|
+
|
|
15
|
+
No agent may claim success without evidence from runtime verification.
|
|
16
|
+
|
|
7
17
|
Never assume something works because the code "looks correct." Run a command, observe the output, compare to expected result.
|
|
8
18
|
|
|
9
|
-
|
|
19
|
+
---
|
|
20
|
+
|
|
21
|
+
## Roles
|
|
22
|
+
|
|
23
|
+
### Builder Agent
|
|
24
|
+
|
|
25
|
+
Implements the change in code and infrastructure.
|
|
26
|
+
|
|
27
|
+
### Verifier Agent
|
|
28
|
+
|
|
29
|
+
Acts as the end user (human user, API client, operator, attacker, or system) and produces proof artifacts.
|
|
30
|
+
|
|
31
|
+
Verifier Agent must be independent from Builder Agent when possible.
|
|
32
|
+
|
|
33
|
+
### Human Overseer
|
|
34
|
+
|
|
35
|
+
Approves risky operations, security boundary crossings, and any work the agents cannot fully verify.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Verification Levels
|
|
40
|
+
|
|
41
|
+
Agents must label every task outcome with exactly one of these:
|
|
42
|
+
|
|
43
|
+
- **FULLY VERIFIED**: Verified in the target environment with end-user simulation and captured artifacts.
|
|
44
|
+
- **PARTIALLY VERIFIED**: Verified in a lower-fidelity environment or with incomplete surfaces, with explicit gaps documented.
|
|
45
|
+
- **UNVERIFIED**: Verification blocked, human action required, no claim of correctness permitted.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Verification Types Quick Reference
|
|
10
50
|
|
|
11
51
|
| Type | Use Case | Example |
|
|
12
52
|
|------|----------|---------|
|
|
13
53
|
| `test` | Unit/integration tests | `npm test -- path/to/file.spec.ts` |
|
|
14
54
|
| `api-test` | API endpoints | `curl -s localhost:3000/api/endpoint` |
|
|
15
55
|
| `test-coverage` | Coverage thresholds | `npm run test:cov -- --collectCoverageFrom=...` |
|
|
16
|
-
| `ui-recording` | UI changes | Start local server; recorded session with Playwright/Maestro/Chrome Browser
|
|
56
|
+
| `ui-recording` | UI changes | Start local server; recorded session with Playwright/Maestro/Chrome Browser |
|
|
17
57
|
| `documentation` | Doc changes | `grep "expected" path/to/doc.md` |
|
|
18
58
|
| `manual-check` | Config/setup | `cat config.json \| jq '.setting'` |
|
|
19
59
|
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## Verification Surfaces
|
|
63
|
+
|
|
64
|
+
Agents may only self-verify when the required verification surfaces are available.
|
|
65
|
+
|
|
66
|
+
Verification surfaces include:
|
|
67
|
+
|
|
68
|
+
### Action Surfaces
|
|
69
|
+
|
|
70
|
+
- Build and test execution
|
|
71
|
+
- Deployment and rollback
|
|
72
|
+
- Infrastructure apply and drift detection
|
|
73
|
+
- Feature flag toggling
|
|
74
|
+
- Data seeding and state reset
|
|
75
|
+
- Load generation and fault injection
|
|
76
|
+
|
|
77
|
+
### Observation Surfaces
|
|
78
|
+
|
|
79
|
+
- Application logs (local and remote)
|
|
80
|
+
- Metrics (latency, errors, saturation, scaling)
|
|
81
|
+
- Traces and correlation IDs
|
|
82
|
+
- Database queries and schema inspection
|
|
83
|
+
- Browser and device automation
|
|
84
|
+
- Queue depth and consumer execution visibility
|
|
85
|
+
- CDN headers and edge behavior
|
|
86
|
+
- Artifact capture (video, screenshots, traces, diffs)
|
|
87
|
+
|
|
88
|
+
If a required surface is unavailable, agents must follow the Escalation Protocol.
|
|
89
|
+
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Tooling Surfaces
|
|
93
|
+
|
|
94
|
+
Many verification steps require tools that may not be available by default.
|
|
95
|
+
|
|
96
|
+
Tooling surfaces include:
|
|
97
|
+
|
|
98
|
+
- Required CLIs (cloud, DB, deployment, observability)
|
|
99
|
+
- Required MCP servers and their capabilities
|
|
100
|
+
- Required internal APIs (feature flags, auth, metrics, logs, CI)
|
|
101
|
+
- Required credentials and scopes for those tools
|
|
102
|
+
|
|
103
|
+
If required tooling is missing, misconfigured, blocked, undocumented, or inaccessible, agents must treat this as a verification blocker and escalate before proceeding.
|
|
104
|
+
|
|
105
|
+
---
|
|
106
|
+
|
|
107
|
+
## Proof Artifacts Requirements
|
|
108
|
+
|
|
109
|
+
Every completed task must include proof artifacts stored in the PR description or linked output location.
|
|
110
|
+
|
|
111
|
+
Proof artifacts must be specific and re-checkable.
|
|
112
|
+
|
|
113
|
+
Examples of acceptable proof:
|
|
114
|
+
|
|
115
|
+
- Playwright video and screenshots for UI work
|
|
116
|
+
- HTTP trace and response payload for API work
|
|
117
|
+
- Before/after DB query outputs for data work
|
|
118
|
+
- Metrics snapshots for autoscaling
|
|
119
|
+
- Log excerpts with correlation IDs for behavior validation
|
|
120
|
+
- Load test results showing threshold behavior
|
|
121
|
+
|
|
122
|
+
Statements like "works" or "should work" are not acceptable.
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## Standard Workflow
|
|
127
|
+
|
|
128
|
+
Agents must follow this sequence unless explicitly instructed otherwise:
|
|
129
|
+
|
|
130
|
+
1. Restate goal in one sentence.
|
|
131
|
+
2. Identify the end user of the change.
|
|
132
|
+
3. Choose the verification method that matches the end user.
|
|
133
|
+
4. List required verification surfaces and required tooling surfaces.
|
|
134
|
+
5. Confirm required surfaces are available.
|
|
135
|
+
6. Implement the change.
|
|
136
|
+
7. Run verification from the end-user perspective.
|
|
137
|
+
8. Collect proof artifacts.
|
|
138
|
+
9. Summarize what changed, what was verified, and remaining risk.
|
|
139
|
+
10. Label the result with a verification level.
|
|
140
|
+
|
|
141
|
+
---
|
|
142
|
+
|
|
20
143
|
## Task Completion Rules
|
|
21
144
|
|
|
22
145
|
1. **Run the proof command** before marking any task complete
|
|
@@ -25,55 +148,36 @@ Never assume something works because the code "looks correct." Run a command, ob
|
|
|
25
148
|
4. **If verification blocked** (missing Docker, services, etc.): Mark as blocked, not complete
|
|
26
149
|
5. **Must not be dependent on CI/CD** if necessary, you may use local deploy methods found in `package.json`, but the verification methods must be listed in the pull request and therefore cannot be dependent on CI/CD completing
|
|
27
150
|
|
|
28
|
-
|
|
151
|
+
---
|
|
29
152
|
|
|
30
|
-
|
|
153
|
+
## End-User Verification Patterns
|
|
31
154
|
|
|
32
|
-
|
|
155
|
+
Agents must choose the pattern that fits the task.
|
|
33
156
|
|
|
34
|
-
|
|
157
|
+
### UI and UX Feature or Bug
|
|
35
158
|
|
|
36
|
-
|
|
37
|
-
```bash
|
|
38
|
-
curl -s http://localhost:3000/health | jq '.status'
|
|
39
|
-
```
|
|
40
|
-
**Expected**: `"ok"`
|
|
159
|
+
End user: human in browser or native device.
|
|
41
160
|
|
|
42
|
-
|
|
161
|
+
Required proof:
|
|
43
162
|
|
|
44
|
-
|
|
163
|
+
- Automated session recording (Playwright preferred)
|
|
164
|
+
- Screenshots of key states
|
|
165
|
+
- Network calls and console errors captured when relevant
|
|
45
166
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
**Correct verification** -- write a small client script that exercises the full flow:
|
|
49
|
-
```bash
|
|
50
|
-
# Create user
|
|
51
|
-
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST http://localhost:3000/api/users \
|
|
52
|
-
-H "Content-Type: application/json" \
|
|
53
|
-
-d '{"email":"test@example.com","name":"Test User"}')
|
|
54
|
-
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
|
|
55
|
-
BODY=$(echo "$RESPONSE" | sed '$d')
|
|
56
|
-
echo "Create status: $HTTP_CODE"
|
|
57
|
-
echo "Create body: $BODY"
|
|
58
|
-
|
|
59
|
-
# Verify the user exists by fetching it back
|
|
60
|
-
USER_ID=$(echo "$BODY" | jq -r '.id')
|
|
61
|
-
curl -s "http://localhost:3000/api/users/$USER_ID" | jq '.email'
|
|
62
|
-
```
|
|
63
|
-
**Expected**: Create returns `201`, fetch returns `"test@example.com"`
|
|
64
|
-
|
|
65
|
-
### UI Feature (Playwright browser verification)
|
|
167
|
+
#### Example: UI Feature (Playwright browser verification)
|
|
66
168
|
|
|
67
169
|
**Task**: Add logout button to the dashboard header
|
|
68
170
|
|
|
69
171
|
**Wrong verification**: "I added the button component to the header"
|
|
70
172
|
|
|
71
173
|
**Correct verification** -- use Playwright to interact with the app as a real user:
|
|
174
|
+
|
|
72
175
|
```bash
|
|
73
176
|
npx playwright test --headed -g "logout button" 2>&1 | tail -20
|
|
74
177
|
```
|
|
75
178
|
|
|
76
179
|
Or for ad-hoc verification without a test file, use the Playwright CLI browser tools or `browser_run_code`:
|
|
180
|
+
|
|
77
181
|
```javascript
|
|
78
182
|
async (page) => {
|
|
79
183
|
await page.goto('http://localhost:3000/dashboard');
|
|
@@ -84,15 +188,17 @@ async (page) => {
|
|
|
84
188
|
return { url: page.url(), title: await page.title() };
|
|
85
189
|
}
|
|
86
190
|
```
|
|
191
|
+
|
|
87
192
|
**Expected**: Browser navigates to `/login` after clicking the logout button
|
|
88
193
|
|
|
89
|
-
|
|
194
|
+
#### Example: UI Visual/Behavioral (Screenshot comparison)
|
|
90
195
|
|
|
91
196
|
**Task**: Fix mobile nav menu not closing after link click
|
|
92
197
|
|
|
93
198
|
**Wrong verification**: "I added an onClick handler that closes the menu"
|
|
94
199
|
|
|
95
200
|
**Correct verification** -- open a browser and perform the exact user action:
|
|
201
|
+
|
|
96
202
|
```javascript
|
|
97
203
|
async (page) => {
|
|
98
204
|
await page.setViewportSize({ width: 375, height: 812 });
|
|
@@ -104,15 +210,76 @@ async (page) => {
|
|
|
104
210
|
return { menuVisibleAfterClick: isVisible, url: page.url() };
|
|
105
211
|
}
|
|
106
212
|
```
|
|
213
|
+
|
|
107
214
|
**Expected**: `menuVisibleAfterClick: false`, url contains `/about`
|
|
108
215
|
|
|
109
|
-
### API
|
|
216
|
+
### API, GraphQL, or RPC Change
|
|
217
|
+
|
|
218
|
+
End user: API client.
|
|
219
|
+
|
|
220
|
+
Required proof:
|
|
221
|
+
|
|
222
|
+
- Curl or a minimal script checked into the repo or attached to PR
|
|
223
|
+
- Response payload showing schema and expected data
|
|
224
|
+
- Negative case if applicable (auth failure, validation error)
|
|
225
|
+
|
|
226
|
+
#### Example: API Endpoint (E2E with curl)
|
|
227
|
+
|
|
228
|
+
**Task**: Add health check endpoint
|
|
229
|
+
|
|
230
|
+
**Wrong verification**: "I added the route handler"
|
|
231
|
+
|
|
232
|
+
**Correct verification**:
|
|
233
|
+
|
|
234
|
+
```bash
|
|
235
|
+
curl -s http://localhost:3000/health | jq '.status'
|
|
236
|
+
```
|
|
237
|
+
|
|
238
|
+
**Expected**: `"ok"`
|
|
239
|
+
|
|
240
|
+
#### Example: API Workflow (Multi-step E2E)
|
|
241
|
+
|
|
242
|
+
**Task**: Add user registration endpoint
|
|
243
|
+
|
|
244
|
+
**Wrong verification**: "The route handler creates a user record"
|
|
245
|
+
|
|
246
|
+
**Correct verification** -- write a small client script that exercises the full flow:
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
# Create user
|
|
250
|
+
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST http://localhost:3000/api/users \
|
|
251
|
+
-H "Content-Type: application/json" \
|
|
252
|
+
-d '{"email":"test@example.com","name":"Test User"}')
|
|
253
|
+
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
|
|
254
|
+
BODY=$(echo "$RESPONSE" | sed '$d')
|
|
255
|
+
echo "Create status: $HTTP_CODE"
|
|
256
|
+
echo "Create body: $BODY"
|
|
257
|
+
|
|
258
|
+
# Verify the user exists by fetching it back
|
|
259
|
+
USER_ID=$(echo "$BODY" | jq -r '.id')
|
|
260
|
+
curl -s "http://localhost:3000/api/users/$USER_ID" | jq '.email'
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
**Expected**: Create returns `201`, fetch returns `"test@example.com"`
|
|
264
|
+
|
|
265
|
+
### Authentication and Authorization
|
|
266
|
+
|
|
267
|
+
End user: user with specific identity and role.
|
|
268
|
+
|
|
269
|
+
Required proof:
|
|
270
|
+
|
|
271
|
+
- Verification across at least two roles (allowed and denied)
|
|
272
|
+
- Explicit status codes or UI outcomes
|
|
273
|
+
- Artifact showing enforcement (screenshots or HTTP traces)
|
|
274
|
+
|
|
275
|
+
#### Example: API with Authentication (E2E flow)
|
|
110
276
|
|
|
111
277
|
**Task**: Add rate limiting to the search endpoint
|
|
112
278
|
|
|
113
279
|
**Wrong verification**: "I added the rate limiter middleware"
|
|
114
280
|
|
|
115
281
|
**Correct verification** -- actually hit the rate limit:
|
|
282
|
+
|
|
116
283
|
```bash
|
|
117
284
|
# Fire requests until rate limited
|
|
118
285
|
for i in $(seq 1 25); do
|
|
@@ -122,15 +289,27 @@ for i in $(seq 1 25); do
|
|
|
122
289
|
echo "Request $i: $CODE"
|
|
123
290
|
done | tail -5
|
|
124
291
|
```
|
|
292
|
+
|
|
125
293
|
**Expected**: First requests return `200`, later requests return `429`
|
|
126
294
|
|
|
127
|
-
### Database Migration
|
|
295
|
+
### Database Migration or Backfill
|
|
296
|
+
|
|
297
|
+
End user: application and operators.
|
|
298
|
+
|
|
299
|
+
Required proof:
|
|
300
|
+
|
|
301
|
+
- Schema verification
|
|
302
|
+
- Backfill verification with before/after counts
|
|
303
|
+
- Rollback plan validated when possible
|
|
304
|
+
|
|
305
|
+
#### Example: Database Migration
|
|
128
306
|
|
|
129
307
|
**Task**: Add `last_login_at` column to users table
|
|
130
308
|
|
|
131
309
|
**Wrong verification**: "The migration file creates the column"
|
|
132
310
|
|
|
133
311
|
**Correct verification**:
|
|
312
|
+
|
|
134
313
|
```bash
|
|
135
314
|
# Run migration
|
|
136
315
|
npm run migration:run
|
|
@@ -138,4 +317,209 @@ npm run migration:run
|
|
|
138
317
|
# Verify column exists and has correct type
|
|
139
318
|
psql "$DATABASE_URL" -c "\d users" | grep last_login_at
|
|
140
319
|
```
|
|
320
|
+
|
|
141
321
|
**Expected**: `last_login_at | timestamp with time zone |`
|
|
322
|
+
|
|
323
|
+
### Background Jobs, Queues, Events
|
|
324
|
+
|
|
325
|
+
End user: system operator and downstream consumers.
|
|
326
|
+
|
|
327
|
+
Required proof:
|
|
328
|
+
|
|
329
|
+
- Evidence of enqueue, processing, and final state change
|
|
330
|
+
- Queue depth and worker logs
|
|
331
|
+
- Idempotency check when relevant
|
|
332
|
+
|
|
333
|
+
### Caching and Performance
|
|
334
|
+
|
|
335
|
+
End user: API consumer or UI user.
|
|
336
|
+
|
|
337
|
+
Required proof:
|
|
338
|
+
|
|
339
|
+
- Measured latency or throughput before and after
|
|
340
|
+
- Cache hit evidence (logs, metrics, key inspection)
|
|
341
|
+
- TTL behavior where relevant
|
|
342
|
+
|
|
343
|
+
### Infrastructure and Autoscaling
|
|
344
|
+
|
|
345
|
+
End user: operator and workload.
|
|
346
|
+
|
|
347
|
+
Required proof:
|
|
348
|
+
|
|
349
|
+
- Load simulation that triggers scaling or behavior change
|
|
350
|
+
- Metrics showing scale-out and scale-in
|
|
351
|
+
- Evidence of stability (error rates, latency) during the event
|
|
352
|
+
|
|
353
|
+
### Security Fixes
|
|
354
|
+
|
|
355
|
+
End user: attacker and defender.
|
|
356
|
+
|
|
357
|
+
Required proof:
|
|
358
|
+
|
|
359
|
+
- Reproduction of exploit pre-fix
|
|
360
|
+
- Demonstration of exploit failure post-fix
|
|
361
|
+
- Evidence of safe handling (sanitization, rejection, rate limit)
|
|
362
|
+
|
|
363
|
+
---
|
|
364
|
+
|
|
365
|
+
## Escalation Protocol
|
|
366
|
+
|
|
367
|
+
Agents must escalate when verification is blocked, ambiguous, or requires tools that are missing or inaccessible.
|
|
368
|
+
|
|
369
|
+
Common blockers:
|
|
370
|
+
|
|
371
|
+
- VPN required
|
|
372
|
+
- MFA, OTP, SMS codes
|
|
373
|
+
- Hardware token requirement
|
|
374
|
+
- Missing CLI, MCP server, or internal API required for verification
|
|
375
|
+
- Missing documentation on how to access required tooling
|
|
376
|
+
- Production-only access gates
|
|
377
|
+
- Compliance restrictions
|
|
378
|
+
|
|
379
|
+
When blocked, agents must do the following:
|
|
380
|
+
|
|
381
|
+
1. Identify the exact boundary preventing verification.
|
|
382
|
+
2. Identify which verification surfaces and tooling surfaces are missing.
|
|
383
|
+
3. Attempt safe fallback verification (local, staging, mocks) and label it clearly.
|
|
384
|
+
4. Declare verification level as PARTIALLY VERIFIED or UNVERIFIED.
|
|
385
|
+
5. Produce a Human Action Packet.
|
|
386
|
+
6. Pause until explicit human confirmation or tooling is provided.
|
|
387
|
+
|
|
388
|
+
Agents must never proceed past an unverified boundary without surfacing it to the human overseer.
|
|
389
|
+
|
|
390
|
+
### Human Action Packet Format
|
|
391
|
+
|
|
392
|
+
Agents must provide:
|
|
393
|
+
|
|
394
|
+
- What is blocked and why
|
|
395
|
+
- What tool or access is missing
|
|
396
|
+
- Exactly what the human must do
|
|
397
|
+
- How to confirm completion
|
|
398
|
+
- What the agent will do immediately after
|
|
399
|
+
- What artifacts the agent will produce after access is restored
|
|
400
|
+
|
|
401
|
+
Example:
|
|
402
|
+
|
|
403
|
+
- Blocked: Cannot reach DB, VPN required.
|
|
404
|
+
- Missing: `psql` access to `db.host` and internal logs viewer MCP.
|
|
405
|
+
- Human steps: Connect VPN "CorpVPN", confirm access by running `nc -vz db.host 5432`, provide MCP endpoint or credentials.
|
|
406
|
+
- Confirmation: Reply "VPN ACTIVE" and "MCP READY".
|
|
407
|
+
- Next: Agent runs migration verification script and captures schema diff and query outputs.
|
|
408
|
+
|
|
409
|
+
Agents must pause until explicit human confirmation.
|
|
410
|
+
|
|
411
|
+
Agents must never bypass security controls to proceed.
|
|
412
|
+
|
|
413
|
+
---
|
|
414
|
+
|
|
415
|
+
## Environments and Safety Rules
|
|
416
|
+
|
|
417
|
+
### Allowed Environments
|
|
418
|
+
|
|
419
|
+
- Local development
|
|
420
|
+
- Preview environments
|
|
421
|
+
- Staging
|
|
422
|
+
- Production read-only, only if explicitly approved and configured for safe access
|
|
423
|
+
|
|
424
|
+
### Prohibited Actions Without Human Approval
|
|
425
|
+
|
|
426
|
+
- Writing to production data stores
|
|
427
|
+
- Disabling MFA or security policies
|
|
428
|
+
- Modifying IAM roles or firewall rules beyond scoped change requests
|
|
429
|
+
- Running destructive migrations
|
|
430
|
+
- Triggering external billing or payment flows
|
|
431
|
+
|
|
432
|
+
If an operation is irreversible or risky, escalate first.
|
|
433
|
+
|
|
434
|
+
---
|
|
435
|
+
|
|
436
|
+
## Repository Conventions
|
|
437
|
+
|
|
438
|
+
### Code Style and Structure
|
|
439
|
+
|
|
440
|
+
- Follow existing patterns in the codebase.
|
|
441
|
+
- Do not introduce new frameworks or architectural patterns without justification in the PR.
|
|
442
|
+
- Prefer small, reviewable changes with clear commit messages.
|
|
443
|
+
|
|
444
|
+
### Tests
|
|
445
|
+
|
|
446
|
+
- Run the fastest relevant test suite locally.
|
|
447
|
+
- Expand to integration or end-to-end tests based on impact.
|
|
448
|
+
- If tests are flaky or slow, document it and propose a follow-up.
|
|
449
|
+
|
|
450
|
+
### Logging and Observability
|
|
451
|
+
|
|
452
|
+
- Include correlation IDs where supported.
|
|
453
|
+
- Prefer structured logs over ad hoc strings.
|
|
454
|
+
- For behavior changes, include log evidence in proof artifacts.
|
|
455
|
+
|
|
456
|
+
---
|
|
457
|
+
|
|
458
|
+
## Artifact Storage and PR Requirements
|
|
459
|
+
|
|
460
|
+
Every PR must include:
|
|
461
|
+
|
|
462
|
+
- Goal summary
|
|
463
|
+
- Verification level
|
|
464
|
+
- Proof artifacts links or embedded outputs
|
|
465
|
+
- How to reproduce verification locally
|
|
466
|
+
- Known limitations and follow-up items
|
|
467
|
+
|
|
468
|
+
Preferred artifact locations:
|
|
469
|
+
|
|
470
|
+
- PR description
|
|
471
|
+
- Repo-local scripts under `scripts/verification/`
|
|
472
|
+
- CI artifacts linked from the build
|
|
473
|
+
|
|
474
|
+
---
|
|
475
|
+
|
|
476
|
+
## Quick Commands
|
|
477
|
+
|
|
478
|
+
Document the canonical commands agents should use here.
|
|
479
|
+
|
|
480
|
+
Replace placeholders with real commands.
|
|
481
|
+
|
|
482
|
+
### Local
|
|
483
|
+
|
|
484
|
+
- Install: `REPLACE_ME`
|
|
485
|
+
- Run app: `REPLACE_ME`
|
|
486
|
+
- Run unit tests: `REPLACE_ME`
|
|
487
|
+
- Run integration tests: `REPLACE_ME`
|
|
488
|
+
- Lint and format: `REPLACE_ME`
|
|
489
|
+
|
|
490
|
+
### UI Verification
|
|
491
|
+
|
|
492
|
+
- Playwright tests: `REPLACE_ME`
|
|
493
|
+
- Record a flow: `REPLACE_ME`
|
|
494
|
+
|
|
495
|
+
### API Verification
|
|
496
|
+
|
|
497
|
+
- Example curl: `REPLACE_ME`
|
|
498
|
+
- GraphQL query runner: `REPLACE_ME`
|
|
499
|
+
|
|
500
|
+
### Deployment
|
|
501
|
+
|
|
502
|
+
- Deploy to preview: `REPLACE_ME`
|
|
503
|
+
- Deploy to staging: `REPLACE_ME`
|
|
504
|
+
- Rollback: `REPLACE_ME`
|
|
505
|
+
|
|
506
|
+
### Observability
|
|
507
|
+
|
|
508
|
+
- Tail logs: `REPLACE_ME`
|
|
509
|
+
- Query metrics: `REPLACE_ME`
|
|
510
|
+
- Trace lookup: `REPLACE_ME`
|
|
511
|
+
|
|
512
|
+
---
|
|
513
|
+
|
|
514
|
+
## Definition of Done
|
|
515
|
+
|
|
516
|
+
A task is done only when:
|
|
517
|
+
|
|
518
|
+
- End user is identified
|
|
519
|
+
- Verification pattern is applied
|
|
520
|
+
- Required verification surfaces and tooling surfaces are used or explicitly unavailable
|
|
521
|
+
- Proof artifacts are captured
|
|
522
|
+
- Verification level is declared
|
|
523
|
+
- Risks and gaps are documented
|
|
524
|
+
|
|
525
|
+
If any of these are missing, the work is not complete.
|
|
@@ -0,0 +1,124 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: ops-specialist
|
|
3
|
+
description: Operations specialist agent for Expo + serverless backend projects. Runs the full stack locally, deploys frontend (EAS) and backend (Serverless), checks logs (local, browser, device, CloudWatch), monitors errors (Sentry), runs browser UAT via Playwright MCP tools, manages database migrations, and performs performance analysis. Self-contained with all operational knowledge.
|
|
4
|
+
tools: Read, Grep, Glob, Bash
|
|
5
|
+
skills:
|
|
6
|
+
- ops-run-local
|
|
7
|
+
- ops-deploy
|
|
8
|
+
- ops-check-logs
|
|
9
|
+
- ops-verify-health
|
|
10
|
+
- ops-browser-uat
|
|
11
|
+
- ops-db-ops
|
|
12
|
+
- ops-monitor-errors
|
|
13
|
+
- ops-performance
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Ops Specialist Agent
|
|
17
|
+
|
|
18
|
+
You are an operations specialist for an Expo + serverless backend application. Your mission is to **run, monitor, deploy, debug, and UAT the application** across local and remote environments. You operate with full operational knowledge embedded in this prompt — you do not need to search for setup instructions.
|
|
19
|
+
|
|
20
|
+
## Architecture Summary
|
|
21
|
+
|
|
22
|
+
| Layer | Stack | Key Tech |
|
|
23
|
+
|-------|-------|----------|
|
|
24
|
+
| Frontend | Expo (React Native for Web, iOS, Android) | bun, Apollo Client, Expo Router |
|
|
25
|
+
| Backend | NestJS (Serverless Framework on AWS Lambda) | TypeORM, PostgreSQL, Cognito, Redis, GraphQL |
|
|
26
|
+
| Auth | AWS Cognito | Phone + OTP flow |
|
|
27
|
+
| CI/CD | GitHub Actions | EAS Update (OTA), Serverless deploy |
|
|
28
|
+
| Monitoring | Sentry | Frontend + Backend projects |
|
|
29
|
+
|
|
30
|
+
## Repository Paths
|
|
31
|
+
|
|
32
|
+
- **Frontend**: The current project directory (this repo). Use `.` or `$CLAUDE_PROJECT_DIR` in commands.
|
|
33
|
+
- **Backend**: Resolved via the `BACKEND_DIR` environment variable. Defaults to `../backend-v2` (sibling directory convention).
|
|
34
|
+
|
|
35
|
+
### Path Resolution
|
|
36
|
+
|
|
37
|
+
All skills use `${BACKEND_DIR:-../backend-v2}` in bash commands. This means:
|
|
38
|
+
- If `BACKEND_DIR` is set, use that path.
|
|
39
|
+
- Otherwise, assume the backend repo is at `../backend-v2` relative to the frontend root.
|
|
40
|
+
|
|
41
|
+
### Developer Setup
|
|
42
|
+
|
|
43
|
+
Each developer sets their backend path in `.claude/settings.local.json` (gitignored):
|
|
44
|
+
|
|
45
|
+
```json
|
|
46
|
+
{
|
|
47
|
+
"env": {
|
|
48
|
+
"BACKEND_DIR": "/path/to/your/backend"
|
|
49
|
+
}
|
|
50
|
+
}
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
If the backend is a sibling directory named `backend-v2`, no configuration is needed — the default works.
|
|
54
|
+
|
|
55
|
+
## Project Discovery
|
|
56
|
+
|
|
57
|
+
On first invocation, discover project-specific values by reading these files:
|
|
58
|
+
|
|
59
|
+
| Value | Source File | How to Extract |
|
|
60
|
+
|-------|------------|----------------|
|
|
61
|
+
| Environment URLs | `.env.localhost`, `.env.development`, `.env.staging`, `.env.production` | `EXPO_PUBLIC_GRAPHQL_BASE_URL` for backend; frontend URLs from `e2e/constants.ts` |
|
|
62
|
+
| Test credentials | `e2e/constants.ts` | `PHONE_NUMBER` and `OTP` exports |
|
|
63
|
+
| UI selectors | `e2e/selectors.ts` | `selectors` object with all `data-testid` values |
|
|
64
|
+
| Login flow | `e2e/fixtures/auth.fixture.ts` | `createAuthFixture` function with step-by-step login |
|
|
65
|
+
| AWS profiles | Backend `package.json` | Scripts matching `aws:signin:*` pattern |
|
|
66
|
+
| Lambda functions | Backend `package.json` | Scripts matching `logs:*` and `deploy:function:*` patterns |
|
|
67
|
+
| Deploy commands | Backend `package.json` | Scripts matching `deploy:*` pattern |
|
|
68
|
+
| Migration commands | Backend `package.json` | Scripts matching `migration:*` pattern |
|
|
69
|
+
| Sentry config | Frontend `package.json` | `@sentry/react-native` dependency; org/project from `.sentryclirc` or Sentry DSN in `.env.*` |
|
|
70
|
+
| Frontend scripts | Frontend `package.json` | All available `scripts` |
|
|
71
|
+
|
|
72
|
+
## App Routes
|
|
73
|
+
|
|
74
|
+
Discover routes from the `app/` directory (Expo Router file-based routing):
|
|
75
|
+
|
|
76
|
+
| Route | Purpose |
|
|
77
|
+
|-------|---------|
|
|
78
|
+
| `/signin` | Login page |
|
|
79
|
+
| `/confirm-code` | OTP entry |
|
|
80
|
+
| `/` | Home screen |
|
|
81
|
+
|
|
82
|
+
Read `app/` directory structure to discover all available routes.
|
|
83
|
+
|
|
84
|
+
## Skills Reference
|
|
85
|
+
|
|
86
|
+
| Skill | Purpose |
|
|
87
|
+
|-------|---------|
|
|
88
|
+
| `ops-run-local` | Start, stop, restart, or check status of local dev environment |
|
|
89
|
+
| `ops-deploy` | Deploy frontend (EAS) or backend (Serverless) to any environment |
|
|
90
|
+
| `ops-check-logs` | View local, browser, device, or remote CloudWatch logs |
|
|
91
|
+
| `ops-verify-health` | Health check all services across environments |
|
|
92
|
+
| `ops-browser-uat` | Browser-based UAT via Playwright MCP tools |
|
|
93
|
+
| `ops-db-ops` | Database migrations, reverts, schema generation, GraphQL codegen |
|
|
94
|
+
| `ops-monitor-errors` | Monitor Sentry for unresolved errors |
|
|
95
|
+
| `ops-performance` | Lighthouse audits, bundle analysis, k6 load tests |
|
|
96
|
+
|
|
97
|
+
## Troubleshooting Quick Reference
|
|
98
|
+
|
|
99
|
+
| Problem | Likely Cause | Fix |
|
|
100
|
+
|---------|-------------|-----|
|
|
101
|
+
| `port 8081 already in use` | Previous Metro bundler running | `lsof -ti :8081 \| xargs kill -9` |
|
|
102
|
+
| `port 3000 already in use` | Previous backend running | `lsof -ti :3000 \| xargs kill -9` |
|
|
103
|
+
| `ExpiredTokenException` | AWS SSO session expired | Run `aws:signin:{env}` script from backend dir |
|
|
104
|
+
| Metro bundler crash | Cache corruption | `bun start:local --clear` |
|
|
105
|
+
| `ECONNREFUSED localhost:3000` | Backend not running | Start backend first, then frontend |
|
|
106
|
+
| Migration fails | Missing AWS credentials | Run `aws:signin:{env}` script before migration |
|
|
107
|
+
| EAS CLI not found | Not installed globally | `npm install -g eas-cli` |
|
|
108
|
+
| `sls` not found | Serverless not installed | `cd $BACKEND_DIR && bun install` |
|
|
109
|
+
| GraphQL schema mismatch | Stale generated types | Run `generate:types:{env}` script |
|
|
110
|
+
| `BACKEND_DIR` not set | Missing env config | Set in `.claude/settings.local.json` or use default `../backend-v2` |
|
|
111
|
+
|
|
112
|
+
## Rules
|
|
113
|
+
|
|
114
|
+
- Always verify empirically — never assume something works because the code looks correct
|
|
115
|
+
- Always discover project-specific values from source files before operations
|
|
116
|
+
- Always check prerequisites (ports, AWS credentials, running services) before operations
|
|
117
|
+
- Always resolve `$BACKEND_DIR` before running backend commands — verify the directory exists
|
|
118
|
+
- Never deploy to production without explicit human confirmation
|
|
119
|
+
- Never run destructive database operations without explicit human confirmation
|
|
120
|
+
- Always use test credentials from `e2e/constants.ts` for browser automation
|
|
121
|
+
- Always use the correct `--profile` and `--region` for AWS CLI commands (discover from backend scripts)
|
|
122
|
+
- Always start backend before frontend for full-stack local development
|
|
123
|
+
- Always take screenshots at verification points during browser UAT
|
|
124
|
+
- Always report results in structured tables
|