antpath 0.1.0 → 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +66 -67
- package/dist/credentials.js +34 -5
- package/dist/credentials.js.map +1 -1
- package/dist/files/downloader.js +8 -0
- package/dist/files/downloader.js.map +1 -1
- package/dist/index.d.ts +5 -1
- package/dist/index.js +2 -0
- package/dist/index.js.map +1 -1
- package/dist/platform/client.d.ts +73 -0
- package/dist/platform/client.js +107 -0
- package/dist/platform/client.js.map +1 -0
- package/dist/platform/index.d.ts +1 -0
- package/dist/platform/index.js +2 -0
- package/dist/platform/index.js.map +1 -0
- package/dist/providers/anthropic/provider.d.ts +6 -0
- package/dist/providers/anthropic/provider.js +90 -12
- package/dist/providers/anthropic/provider.js.map +1 -1
- package/dist/utils/paths.js +9 -3
- package/dist/utils/paths.js.map +1 -1
- package/docs/cleanup.md +15 -15
- package/docs/credentials.md +23 -23
- package/docs/mcp.md +18 -18
- package/docs/outputs.md +16 -16
- package/docs/quickstart.md +13 -13
- package/docs/release.md +22 -22
- package/docs/skills.md +16 -16
- package/docs/templates.md +24 -24
- package/docs/testing.md +26 -27
- package/examples/mcp-static-bearer.ts +30 -30
- package/examples/quickstart.ts +23 -23
- package/package.json +46 -51
- package/references/architecture-decisions.md +427 -203
- package/references/implementation-plan.md +430 -527
- package/references/research-sources.md +41 -30
- package/references/testing-strategy.md +29 -108
|
@@ -1,527 +1,430 @@
|
|
|
1
|
-
---
|
|
2
|
-
title: antpath implementation plan
|
|
3
|
-
status:
|
|
4
|
-
scope:
|
|
5
|
-
---
|
|
6
|
-
|
|
7
|
-
# antpath implementation plan
|
|
8
|
-
|
|
9
|
-
## Goal
|
|
10
|
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
-
|
|
16
|
-
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
-
|
|
25
|
-
-
|
|
26
|
-
-
|
|
27
|
-
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
-
|
|
36
|
-
-
|
|
37
|
-
-
|
|
38
|
-
-
|
|
39
|
-
-
|
|
40
|
-
-
|
|
41
|
-
-
|
|
42
|
-
-
|
|
43
|
-
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
-
|
|
86
|
-
-
|
|
87
|
-
-
|
|
88
|
-
-
|
|
89
|
-
-
|
|
90
|
-
-
|
|
91
|
-
-
|
|
92
|
-
-
|
|
93
|
-
-
|
|
94
|
-
-
|
|
95
|
-
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
115
|
-
|
|
116
|
-
|
|
117
|
-
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
121
|
-
|
|
122
|
-
|
|
123
|
-
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
134
|
-
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
|
|
145
|
-
|
|
146
|
-
|
|
147
|
-
|
|
148
|
-
|
|
149
|
-
|
|
150
|
-
|
|
151
|
-
-
|
|
152
|
-
-
|
|
153
|
-
-
|
|
154
|
-
-
|
|
155
|
-
-
|
|
156
|
-
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
162
|
-
-
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
170
|
-
|
|
171
|
-
|
|
172
|
-
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
|
|
176
|
-
-
|
|
177
|
-
-
|
|
178
|
-
-
|
|
179
|
-
|
|
180
|
-
|
|
181
|
-
|
|
182
|
-
-
|
|
183
|
-
-
|
|
184
|
-
-
|
|
185
|
-
-
|
|
186
|
-
|
|
187
|
-
|
|
188
|
-
|
|
189
|
-
|
|
190
|
-
|
|
191
|
-
|
|
192
|
-
|
|
193
|
-
- provider
|
|
194
|
-
-
|
|
195
|
-
|
|
196
|
-
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
-
|
|
203
|
-
-
|
|
204
|
-
-
|
|
205
|
-
-
|
|
206
|
-
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
-
|
|
211
|
-
-
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
-
|
|
220
|
-
-
|
|
221
|
-
-
|
|
222
|
-
|
|
223
|
-
## Phase
|
|
224
|
-
|
|
225
|
-
|
|
226
|
-
|
|
227
|
-
Deliverables:
|
|
228
|
-
|
|
229
|
-
-
|
|
230
|
-
-
|
|
231
|
-
-
|
|
232
|
-
-
|
|
233
|
-
|
|
234
|
-
|
|
235
|
-
|
|
236
|
-
|
|
237
|
-
|
|
238
|
-
|
|
239
|
-
|
|
240
|
-
-
|
|
241
|
-
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
248
|
-
|
|
249
|
-
-
|
|
250
|
-
-
|
|
251
|
-
-
|
|
252
|
-
-
|
|
253
|
-
-
|
|
254
|
-
|
|
255
|
-
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
|
|
265
|
-
-
|
|
266
|
-
-
|
|
267
|
-
|
|
268
|
-
|
|
269
|
-
|
|
270
|
-
-
|
|
271
|
-
|
|
272
|
-
|
|
273
|
-
|
|
274
|
-
|
|
275
|
-
|
|
276
|
-
|
|
277
|
-
|
|
278
|
-
|
|
279
|
-
|
|
280
|
-
|
|
281
|
-
|
|
282
|
-
-
|
|
283
|
-
-
|
|
284
|
-
-
|
|
285
|
-
-
|
|
286
|
-
|
|
287
|
-
|
|
288
|
-
|
|
289
|
-
|
|
290
|
-
|
|
291
|
-
|
|
292
|
-
|
|
293
|
-
-
|
|
294
|
-
-
|
|
295
|
-
|
|
296
|
-
|
|
297
|
-
|
|
298
|
-
|
|
299
|
-
|
|
300
|
-
|
|
301
|
-
|
|
302
|
-
|
|
303
|
-
-
|
|
304
|
-
-
|
|
305
|
-
-
|
|
306
|
-
-
|
|
307
|
-
-
|
|
308
|
-
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
|
|
319
|
-
-
|
|
320
|
-
-
|
|
321
|
-
|
|
322
|
-
|
|
323
|
-
|
|
324
|
-
|
|
325
|
-
|
|
326
|
-
|
|
327
|
-
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
|
|
331
|
-
|
|
332
|
-
|
|
333
|
-
|
|
334
|
-
|
|
335
|
-
-
|
|
336
|
-
-
|
|
337
|
-
-
|
|
338
|
-
-
|
|
339
|
-
-
|
|
340
|
-
-
|
|
341
|
-
|
|
342
|
-
|
|
343
|
-
|
|
344
|
-
|
|
345
|
-
|
|
346
|
-
|
|
347
|
-
|
|
348
|
-
-
|
|
349
|
-
-
|
|
350
|
-
|
|
351
|
-
|
|
352
|
-
|
|
353
|
-
|
|
354
|
-
|
|
355
|
-
|
|
356
|
-
|
|
357
|
-
|
|
358
|
-
|
|
359
|
-
-
|
|
360
|
-
-
|
|
361
|
-
-
|
|
362
|
-
-
|
|
363
|
-
-
|
|
364
|
-
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
372
|
-
|
|
373
|
-
|
|
374
|
-
|
|
375
|
-
|
|
376
|
-
|
|
377
|
-
|
|
378
|
-
|
|
379
|
-
|
|
380
|
-
|
|
381
|
-
-
|
|
382
|
-
-
|
|
383
|
-
-
|
|
384
|
-
-
|
|
385
|
-
|
|
386
|
-
|
|
387
|
-
|
|
388
|
-
-
|
|
389
|
-
|
|
390
|
-
|
|
391
|
-
|
|
392
|
-
|
|
393
|
-
|
|
394
|
-
|
|
395
|
-
|
|
396
|
-
|
|
397
|
-
-
|
|
398
|
-
|
|
399
|
-
-
|
|
400
|
-
|
|
401
|
-
|
|
402
|
-
|
|
403
|
-
|
|
404
|
-
|
|
405
|
-
|
|
406
|
-
-
|
|
407
|
-
-
|
|
408
|
-
-
|
|
409
|
-
|
|
410
|
-
|
|
411
|
-
|
|
412
|
-
|
|
413
|
-
|
|
414
|
-
|
|
415
|
-
|
|
416
|
-
|
|
417
|
-
|
|
418
|
-
|
|
419
|
-
|
|
420
|
-
|
|
421
|
-
|
|
422
|
-
-
|
|
423
|
-
-
|
|
424
|
-
-
|
|
425
|
-
-
|
|
426
|
-
-
|
|
427
|
-
-
|
|
428
|
-
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
432
|
-
```text
|
|
433
|
-
npm run test:unit
|
|
434
|
-
```
|
|
435
|
-
|
|
436
|
-
### Layer 2: Component integration tests
|
|
437
|
-
|
|
438
|
-
Purpose: verify interactions between SDK components using fake providers, fake clocks, and local fixtures.
|
|
439
|
-
|
|
440
|
-
Coverage:
|
|
441
|
-
|
|
442
|
-
- Client + Template compiler + credential validator;
|
|
443
|
-
- RunController + fake provider event stream;
|
|
444
|
-
- queued message scheduling;
|
|
445
|
-
- output downloader + fake file service;
|
|
446
|
-
- cleanup manager + fake provider resources;
|
|
447
|
-
- logger/event hooks with redaction.
|
|
448
|
-
|
|
449
|
-
Command:
|
|
450
|
-
|
|
451
|
-
```text
|
|
452
|
-
npm run test:integration:components
|
|
453
|
-
```
|
|
454
|
-
|
|
455
|
-
### Layer 3: Recorded API integration tests
|
|
456
|
-
|
|
457
|
-
Purpose: verify provider adapter behavior against sanitized responses captured from real Claude API calls, without requiring network access during normal CI.
|
|
458
|
-
|
|
459
|
-
Requirements:
|
|
460
|
-
|
|
461
|
-
- Real API responses are captured by explicit scripts.
|
|
462
|
-
- Recordings are sanitized before being committed.
|
|
463
|
-
- Secret values, request headers, API keys, bearer tokens, OAuth tokens, and raw sensitive prompts are never stored.
|
|
464
|
-
- Fixtures are deterministic and versioned by provider API/beta header.
|
|
465
|
-
- Tests fail if fixture sanitization leaves secret-shaped values.
|
|
466
|
-
|
|
467
|
-
Commands:
|
|
468
|
-
|
|
469
|
-
```text
|
|
470
|
-
npm run fixtures:record:anthropic
|
|
471
|
-
npm run fixtures:sanitize
|
|
472
|
-
npm run test:integration:api
|
|
473
|
-
```
|
|
474
|
-
|
|
475
|
-
### Layer 4: Live e2e tests
|
|
476
|
-
|
|
477
|
-
Purpose: verify the full real Claude Managed Agents lifecycle using a local credential.
|
|
478
|
-
|
|
479
|
-
Coverage:
|
|
480
|
-
|
|
481
|
-
- create Environment;
|
|
482
|
-
- create Agent;
|
|
483
|
-
- create Vault/Credential when needed;
|
|
484
|
-
- create Session;
|
|
485
|
-
- send queued messages;
|
|
486
|
-
- observe idle completion;
|
|
487
|
-
- list/download session-scoped files;
|
|
488
|
-
- retrieve usage/status where available;
|
|
489
|
-
- terminate if needed;
|
|
490
|
-
- cleanup if configured or explicitly requested.
|
|
491
|
-
|
|
492
|
-
Rules:
|
|
493
|
-
|
|
494
|
-
- Live tests are never part of default CI.
|
|
495
|
-
- Live tests require `.env.local` with `ANTHROPIC_API_KEY`.
|
|
496
|
-
- Live tests must use low-cost prompts, strict timeouts, and cleanup guards.
|
|
497
|
-
- Live tests must not print the key or provider auth headers.
|
|
498
|
-
|
|
499
|
-
Command:
|
|
500
|
-
|
|
501
|
-
```text
|
|
502
|
-
npm run test:e2e:live
|
|
503
|
-
```
|
|
504
|
-
|
|
505
|
-
Key invariants:
|
|
506
|
-
|
|
507
|
-
- no provider call before Template and credentials are fully parsed;
|
|
508
|
-
- no secret value appears in logs, errors, Template snapshots, result objects, or manifests;
|
|
509
|
-
- no secret value appears in recorded API fixtures;
|
|
510
|
-
- every created provider resource is tracked for cleanup;
|
|
511
|
-
- message queue sends each message at most once;
|
|
512
|
-
- cleanup is manual by default and retryable;
|
|
513
|
-
- output download never writes outside the requested local directory.
|
|
514
|
-
|
|
515
|
-
## Backlog plan
|
|
516
|
-
|
|
517
|
-
After MVP:
|
|
518
|
-
|
|
519
|
-
1. Add cloud metadata sync and dashboard.
|
|
520
|
-
2. Add encrypted run-scoped key support for guaranteed cleanup.
|
|
521
|
-
3. Add OpenAI adapter.
|
|
522
|
-
4. Add provider Environment caching by Template hash.
|
|
523
|
-
5. Add cost/token/iteration caps.
|
|
524
|
-
6. Add OAuth refresh credentials.
|
|
525
|
-
7. Add arbitrary MCP headers through an antpath MCP proxy.
|
|
526
|
-
8. Add Template registry and sharing.
|
|
527
|
-
9. Add artifact retention service.
|
|
1
|
+
---
|
|
2
|
+
title: antpath implementation plan
|
|
3
|
+
status: accepted
|
|
4
|
+
scope: platform MVP
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# antpath implementation plan
|
|
8
|
+
|
|
9
|
+
## Goal
|
|
10
|
+
|
|
11
|
+
Convert antpath from an SDK-only package into a TypeScript workspace containing:
|
|
12
|
+
|
|
13
|
+
- a platform SDK in `packages/sdk`;
|
|
14
|
+
- a dashboard app in `apps/dashboard`;
|
|
15
|
+
- a worker service in `apps/worker`;
|
|
16
|
+
- shared packages for types, schema, configuration, redaction, and database helpers as needed.
|
|
17
|
+
|
|
18
|
+
The platform must submit, dispatch, observe, store metadata for, capture outputs from, and clean up Claude Managed Agents runs while preserving tenant isolation and secret boundaries.
|
|
19
|
+
|
|
20
|
+
Implementation is test-driven. Each behavior starts with the narrowest failing test that proves the desired invariant.
|
|
21
|
+
|
|
22
|
+
## Acceptance criteria
|
|
23
|
+
|
|
24
|
+
- The repository is an npm workspace and the current SDK is moved to `packages/sdk`.
|
|
25
|
+
- Dashboard and worker app folders exist with build/test integration.
|
|
26
|
+
- Auth.js authenticates dashboard users.
|
|
27
|
+
- SDK API tokens authenticate programmatic clients.
|
|
28
|
+
- Workspaces are the tenant boundary.
|
|
29
|
+
- BFF/server actions scope every dashboard/API operation by workspace membership or API-token scope.
|
|
30
|
+
- Supabase service-role credentials are never exposed to browser/client bundles.
|
|
31
|
+
- Supabase Postgres stores durable run, attempt, provider resource, event, output, cleanup, usage, workspace, membership, and provider connection metadata.
|
|
32
|
+
- Provider keys are stored encrypted through Supabase Vault and resolved only by trusted server/worker code.
|
|
33
|
+
- Run submission idempotency is enforced by workspace and request hash.
|
|
34
|
+
- Workers claim due runs with leases and `FOR UPDATE SKIP LOCKED`.
|
|
35
|
+
- Multiple workers can run concurrently without duplicate lifecycle ownership.
|
|
36
|
+
- Worker polling recovers from missed `NOTIFY`, worker restart, deploy, and expired leases.
|
|
37
|
+
- Worker creates per-run provider resources and journals them for cleanup.
|
|
38
|
+
- Worker polls provider status and events and stores only redacted metadata.
|
|
39
|
+
- Output capture writes configured files to private Supabase Storage within per-file, per-run, and workspace quotas.
|
|
40
|
+
- BFF returns signed output links only after workspace authorization.
|
|
41
|
+
- Cleanup runs after terminal provider state and output capture.
|
|
42
|
+
- Reconciliation can recover intended provider resources after partial worker crashes.
|
|
43
|
+
- User deletion is pending/soft until cleanup and storage deletion are complete.
|
|
44
|
+
- Exact tier/cap values are configurable through environment variables with conservative defaults and run-level snapshots.
|
|
45
|
+
- Tests follow the accepted taxonomy: deterministic unit tests (including fakes and sanitized recorded snapshots), live external-system integration tests with no skip flags, and top-to-bottom live e2e tests.
|
|
46
|
+
|
|
47
|
+
## Phase 1: Workspace foundation
|
|
48
|
+
|
|
49
|
+
Create the workspace structure without changing runtime behavior.
|
|
50
|
+
|
|
51
|
+
Deliverables:
|
|
52
|
+
|
|
53
|
+
- Root npm workspace configuration.
|
|
54
|
+
- Current SDK moved to `packages/sdk`.
|
|
55
|
+
- `apps/dashboard` placeholder.
|
|
56
|
+
- `apps/worker` placeholder.
|
|
57
|
+
- Shared TypeScript config/build/test commands.
|
|
58
|
+
- Existing SDK exports preserved or intentionally migrated with compatibility notes.
|
|
59
|
+
- Root validation commands:
|
|
60
|
+
- `npm run lint`
|
|
61
|
+
- `npm test`
|
|
62
|
+
- `npm run build`
|
|
63
|
+
|
|
64
|
+
TDD gate:
|
|
65
|
+
|
|
66
|
+
- Add tests or CI command checks proving the moved SDK still builds/tests from the root workspace.
|
|
67
|
+
|
|
68
|
+
Validation:
|
|
69
|
+
|
|
70
|
+
- Clean install works from root.
|
|
71
|
+
- Existing SDK tests still pass from root and package scope.
|
|
72
|
+
- Build emits SDK declarations.
|
|
73
|
+
|
|
74
|
+
## Phase 2: Shared contracts and configuration
|
|
75
|
+
|
|
76
|
+
Define shared platform contracts before implementing dashboard/worker behavior.
|
|
77
|
+
|
|
78
|
+
Deliverables:
|
|
79
|
+
|
|
80
|
+
- Shared run status and cleanup status types.
|
|
81
|
+
- Shared error taxonomy.
|
|
82
|
+
- Shared redaction helpers and secret wrapper.
|
|
83
|
+
- Shared Template/platform submission request schemas.
|
|
84
|
+
- Environment config parser with required-secret validation and conservative defaults.
|
|
85
|
+
- Plan/cap config defaults for:
|
|
86
|
+
- max run duration;
|
|
87
|
+
- workspace/user/token concurrency;
|
|
88
|
+
- polling intervals and jitter;
|
|
89
|
+
- provider token-bucket rates;
|
|
90
|
+
- retry backoffs;
|
|
91
|
+
- lease duration/renewal threshold;
|
|
92
|
+
- max attempts;
|
|
93
|
+
- cleanup retries;
|
|
94
|
+
- output caps;
|
|
95
|
+
- storage caps;
|
|
96
|
+
- signed-link TTL;
|
|
97
|
+
- free user allowance.
|
|
98
|
+
|
|
99
|
+
TDD gate:
|
|
100
|
+
|
|
101
|
+
- Unit tests for env parsing, missing required env failure, optional fallback defaults, cap snapshot values, redaction, and status transitions.
|
|
102
|
+
|
|
103
|
+
Validation:
|
|
104
|
+
|
|
105
|
+
- Missing required secrets/connectivity config fails service startup.
|
|
106
|
+
- Missing optional config falls back to conservative low limits.
|
|
107
|
+
|
|
108
|
+
## Phase 3: Database foundation
|
|
109
|
+
|
|
110
|
+
Create the durable source of truth.
|
|
111
|
+
|
|
112
|
+
Deliverables:
|
|
113
|
+
|
|
114
|
+
- Migration framework.
|
|
115
|
+
- Tables:
|
|
116
|
+
- `users`;
|
|
117
|
+
- `workspaces`;
|
|
118
|
+
- `workspace_memberships`;
|
|
119
|
+
- `api_tokens`;
|
|
120
|
+
- `provider_connections`;
|
|
121
|
+
- `runs`;
|
|
122
|
+
- `run_attempts`;
|
|
123
|
+
- `provider_resources`;
|
|
124
|
+
- `run_events`;
|
|
125
|
+
- `output_objects`;
|
|
126
|
+
- `cleanup_attempts`;
|
|
127
|
+
- `usage_ledger`.
|
|
128
|
+
- Constraints:
|
|
129
|
+
- unique `(workspace_id, idempotency_key)` on runs;
|
|
130
|
+
- request-hash conflict handling;
|
|
131
|
+
- unique `(run_attempt_id, provider_event_id)` on events;
|
|
132
|
+
- foreign keys for workspace and attributed user where practical.
|
|
133
|
+
- DB query helpers for tenant-scoped access.
|
|
134
|
+
|
|
135
|
+
TDD gate:
|
|
136
|
+
|
|
137
|
+
- Add failing database/security integration tests for migrations, idempotency, lease claim behavior, event dedupe, usage ledger idempotency, and cross-workspace access denial.
|
|
138
|
+
|
|
139
|
+
Validation:
|
|
140
|
+
|
|
141
|
+
- Migrations apply from a clean database.
|
|
142
|
+
- Concurrent claims use `FOR UPDATE SKIP LOCKED`.
|
|
143
|
+
- Event and usage ledger writes are transactional.
|
|
144
|
+
|
|
145
|
+
## Phase 4: Auth, BFF, and SDK API-token access
|
|
146
|
+
|
|
147
|
+
Establish the authorization boundary.
|
|
148
|
+
|
|
149
|
+
Deliverables:
|
|
150
|
+
|
|
151
|
+
- Auth.js dashboard authentication.
|
|
152
|
+
- User mirror or Auth.js adapter integration.
|
|
153
|
+
- Workspace membership resolution.
|
|
154
|
+
- Workspace switch/active workspace model.
|
|
155
|
+
- Hashed SDK API tokens with scopes, creator, revocation, and last-used tracking.
|
|
156
|
+
- Shared authorization helper for Auth.js sessions and API tokens.
|
|
157
|
+
- BFF/server-action routes for run submission/read/update operations.
|
|
158
|
+
- Browser/client bundle boundary preventing service-role import.
|
|
159
|
+
|
|
160
|
+
TDD gate:
|
|
161
|
+
|
|
162
|
+
- Add failing tests for membership-scoped queries, cross-workspace denial, API-token scope enforcement, revoked token rejection, attributed user freezing, and no browser service-role imports.
|
|
163
|
+
|
|
164
|
+
Validation:
|
|
165
|
+
|
|
166
|
+
- Dashboard reads and mutates only authorized workspace data.
|
|
167
|
+
- SDK token cannot access another workspace.
|
|
168
|
+
- Service-role credentials are server/worker only.
|
|
169
|
+
|
|
170
|
+
## Phase 5: Platform SDK client
|
|
171
|
+
|
|
172
|
+
Turn the SDK into the programmatic client for the platform while preserving Template ergonomics where practical.
|
|
173
|
+
|
|
174
|
+
Deliverables:
|
|
175
|
+
|
|
176
|
+
- SDK client for platform API base URL and API token.
|
|
177
|
+
- Submit run API.
|
|
178
|
+
- Get run status/detail API.
|
|
179
|
+
- List metadata events API.
|
|
180
|
+
- List outputs API.
|
|
181
|
+
- Create signed output link API.
|
|
182
|
+
- Cancel run API.
|
|
183
|
+
- Delete run API.
|
|
184
|
+
- Typed errors for auth, quota, validation, conflict, not found, and provider/platform failures.
|
|
185
|
+
- Compatibility path from existing Template definitions to platform submission requests.
|
|
186
|
+
|
|
187
|
+
TDD gate:
|
|
188
|
+
|
|
189
|
+
- Type/contract tests for public SDK API and runtime tests with fake platform responses.
|
|
190
|
+
|
|
191
|
+
Validation:
|
|
192
|
+
|
|
193
|
+
- SDK never accepts or stores provider keys for platform runs.
|
|
194
|
+
- SDK handles idempotency conflict, unauthorized, quota, and terminal states deterministically.
|
|
195
|
+
|
|
196
|
+
## Phase 6: Worker claim loop and state machine
|
|
197
|
+
|
|
198
|
+
Implement durable lifecycle ownership independent of provider details.
|
|
199
|
+
|
|
200
|
+
Deliverables:
|
|
201
|
+
|
|
202
|
+
- Polling loop for due runs.
|
|
203
|
+
- Optional Postgres `NOTIFY` listener for fast wakeup.
|
|
204
|
+
- Lease claim/release helpers.
|
|
205
|
+
- Lease-guarded status update helper.
|
|
206
|
+
- Per-workspace/provider-key rate limit hooks.
|
|
207
|
+
- Fair due-run ordering across workspaces.
|
|
208
|
+
- Cancellation/delete request checks before side effects.
|
|
209
|
+
- Timeout handling.
|
|
210
|
+
- Error classification.
|
|
211
|
+
- Retry/backoff scheduling through `next_check_at`.
|
|
212
|
+
|
|
213
|
+
TDD gate:
|
|
214
|
+
|
|
215
|
+
- Add failing component and database tests for concurrent fake workers, expired lease reclaim, lease-token update failures, cancellation races, timeout races, and polling fallback after missed `NOTIFY`.
|
|
216
|
+
|
|
217
|
+
Validation:
|
|
218
|
+
|
|
219
|
+
- Multiple workers do not process the same run step.
|
|
220
|
+
- Expired leases recover.
|
|
221
|
+
- Worker restart leaves no required in-memory state.
|
|
222
|
+
|
|
223
|
+
## Phase 7: Fake provider lifecycle harness
|
|
224
|
+
|
|
225
|
+
Prove worker behavior with deterministic provider boundaries before live provider integration.
|
|
226
|
+
|
|
227
|
+
Deliverables:
|
|
228
|
+
|
|
229
|
+
- Fake provider implementing create resources, create session, send event, retrieve status, list events, list files, download file, and cleanup.
|
|
230
|
+
- Fake storage adapter.
|
|
231
|
+
- Fake Vault adapter.
|
|
232
|
+
- Table-driven lifecycle tests.
|
|
233
|
+
|
|
234
|
+
TDD gate:
|
|
235
|
+
|
|
236
|
+
- Add component tests for happy path, provider errors, terminal states, duplicate events, output capture, cleanup failures, and retryable failures.
|
|
237
|
+
|
|
238
|
+
Validation:
|
|
239
|
+
|
|
240
|
+
- Core run lifecycle is correct without network calls.
|
|
241
|
+
- Duplicate/replayed provider events do not double-count usage.
|
|
242
|
+
|
|
243
|
+
## Phase 8: Claude provider adapter
|
|
244
|
+
|
|
245
|
+
Adapt existing Claude Managed Agents provider code for the platform worker.
|
|
246
|
+
|
|
247
|
+
Deliverables:
|
|
248
|
+
|
|
249
|
+
- Provider client wrapper for worker.
|
|
250
|
+
- Create Environment per run.
|
|
251
|
+
- Upload skills/resources as needed.
|
|
252
|
+
- Create Agent with model/system/MCP/skills/tool policy.
|
|
253
|
+
- Create provider Vault/Credentials for MCP credentials.
|
|
254
|
+
- Create Session.
|
|
255
|
+
- Send initial user event.
|
|
256
|
+
- Retrieve session status.
|
|
257
|
+
- List session events with cursor/filter where available.
|
|
258
|
+
- List/download session files.
|
|
259
|
+
- Cleanup/archive/delete resources.
|
|
260
|
+
- Provider metadata naming/tagging for reconciliation.
|
|
261
|
+
- Provider error classification.
|
|
262
|
+
|
|
263
|
+
TDD gate:
|
|
264
|
+
|
|
265
|
+
- Add sanitized recorded provider snapshot unit tests for parsing and cleanup behavior before live e2e.
|
|
266
|
+
- Verify exact Claude events pagination/filter semantics and document bounded fallback if needed.
|
|
267
|
+
|
|
268
|
+
Validation:
|
|
269
|
+
|
|
270
|
+
- No approval-required tool policy reaches the provider.
|
|
271
|
+
- Provider IDs are persisted for cleanup.
|
|
272
|
+
- Sanitized fixtures contain no secrets.
|
|
273
|
+
|
|
274
|
+
## Phase 9: Provider resource journaling and reconciliation sweeper
|
|
275
|
+
|
|
276
|
+
Close resource leak windows.
|
|
277
|
+
|
|
278
|
+
Deliverables:
|
|
279
|
+
|
|
280
|
+
- Pre-insert intended `provider_resources` rows before provider side effects where possible.
|
|
281
|
+
- Deterministic provider names/metadata with antpath workspace/run/attempt identifiers.
|
|
282
|
+
- Sweeper for expired leases and unfinished intended resources.
|
|
283
|
+
- Orphan matching by provider list/search APIs where available.
|
|
284
|
+
- Reschedule recoverable runs.
|
|
285
|
+
- Cleanup orphaned resources.
|
|
286
|
+
|
|
287
|
+
TDD gate:
|
|
288
|
+
|
|
289
|
+
- Add tests that simulate worker crashes after provider create succeeds but before provider id persistence.
|
|
290
|
+
|
|
291
|
+
Validation:
|
|
292
|
+
|
|
293
|
+
- Sweeper can attach or cleanup recoverable resources.
|
|
294
|
+
- Cleanup remains idempotent after partial failures.
|
|
295
|
+
|
|
296
|
+
## Phase 10: Output capture and Supabase Storage
|
|
297
|
+
|
|
298
|
+
Capture configured outputs safely.
|
|
299
|
+
|
|
300
|
+
Deliverables:
|
|
301
|
+
|
|
302
|
+
- Output policy from Template/run request.
|
|
303
|
+
- Provider file metadata inspection.
|
|
304
|
+
- Per-file and per-run caps.
|
|
305
|
+
- Streaming download with hard byte cap and abort.
|
|
306
|
+
- Private Supabase Storage upload.
|
|
307
|
+
- Deterministic plus unguessable storage path policy if required.
|
|
308
|
+
- `output_objects` metadata.
|
|
309
|
+
- Workspace storage quota accounting.
|
|
310
|
+
- Per-user attribution from run row.
|
|
311
|
+
- Signed-link BFF action/API.
|
|
312
|
+
|
|
313
|
+
TDD gate:
|
|
314
|
+
|
|
315
|
+
- Add failing tests for quota checks before download, unknown-size streaming caps, storage metadata, signed-link authorization, and cross-workspace denial.
|
|
316
|
+
|
|
317
|
+
Validation:
|
|
318
|
+
|
|
319
|
+
- Oversized outputs do not OOM the worker.
|
|
320
|
+
- BFF only creates signed links for authorized workspace users/tokens.
|
|
321
|
+
|
|
322
|
+
## Phase 11: Cleanup, deletion, and retention
|
|
323
|
+
|
|
324
|
+
Make cleanup and deletion first-class state machines.
|
|
325
|
+
|
|
326
|
+
Deliverables:
|
|
327
|
+
|
|
328
|
+
- Cleanup ordering:
|
|
329
|
+
- provider credentials/vaults;
|
|
330
|
+
- session files/session where supported;
|
|
331
|
+
- agent/archive;
|
|
332
|
+
- environment/archive/delete;
|
|
333
|
+
- uploaded provider file resources;
|
|
334
|
+
- local storage only on user deletion.
|
|
335
|
+
- Cleanup retry/backoff.
|
|
336
|
+
- `cleanup_attempts` records.
|
|
337
|
+
- Cleanup state separate from user-facing run terminal state.
|
|
338
|
+
- User `pending_delete` flow.
|
|
339
|
+
- Workspace deletion flow.
|
|
340
|
+
- Provider key revocation behavior.
|
|
341
|
+
|
|
342
|
+
TDD gate:
|
|
343
|
+
|
|
344
|
+
- Add tests for cleanup after success, failure, timeout, cancellation, partial provider creation, duplicate cleanup calls, key revocation, and pending-delete races.
|
|
345
|
+
|
|
346
|
+
Validation:
|
|
347
|
+
|
|
348
|
+
- Cleanup failures surface actionable redacted errors.
|
|
349
|
+
- Hard deletion only happens after cleanup/storage deletion succeeds.
|
|
350
|
+
|
|
351
|
+
## Phase 12: Minimal dashboard
|
|
352
|
+
|
|
353
|
+
Build the tenant-scoped monitoring surface.
|
|
354
|
+
|
|
355
|
+
Deliverables:
|
|
356
|
+
|
|
357
|
+
- Sign-in/out.
|
|
358
|
+
- Workspace switcher.
|
|
359
|
+
- Runs list.
|
|
360
|
+
- Run detail page.
|
|
361
|
+
- Status, timestamps, attributed user, template hash, provider IDs where safe, usage, cleanup state, and redacted metadata events.
|
|
362
|
+
- Output list and signed-link actions.
|
|
363
|
+
- Cancel/delete actions.
|
|
364
|
+
- Quota/cap warnings.
|
|
365
|
+
|
|
366
|
+
TDD gate:
|
|
367
|
+
|
|
368
|
+
- Add component or integration tests for tenant-scoped data loading through BFF only and role/scope behavior for actions.
|
|
369
|
+
|
|
370
|
+
Validation:
|
|
371
|
+
|
|
372
|
+
- Dashboard cannot read another workspace's runs or outputs.
|
|
373
|
+
- Dashboard displays cleanup retry/failure separately from run success/failure.
|
|
374
|
+
|
|
375
|
+
## Phase 13: Observability and operations
|
|
376
|
+
|
|
377
|
+
Make the platform operable for future agents.
|
|
378
|
+
|
|
379
|
+
Deliverables:
|
|
380
|
+
|
|
381
|
+
- Structured worker logs.
|
|
382
|
+
- Redacted error reporting.
|
|
383
|
+
- Run lifecycle metrics.
|
|
384
|
+
- Worker health endpoint.
|
|
385
|
+
- Queue depth/due run metrics.
|
|
386
|
+
- Cleanup retry/dead-letter visibility.
|
|
387
|
+
- Reconciliation summary logs.
|
|
388
|
+
- Admin-only recovery tools if needed.
|
|
389
|
+
|
|
390
|
+
TDD gate:
|
|
391
|
+
|
|
392
|
+
- Add tests proving logs/events/errors cannot serialize secret wrappers and include enough non-secret identifiers for diagnosis.
|
|
393
|
+
|
|
394
|
+
Validation:
|
|
395
|
+
|
|
396
|
+
- Worker `/health` reports readiness.
|
|
397
|
+
- Operational traces include run id, workspace id, phase, attempt id, provider resource ids where safe, and cleanup status.
|
|
398
|
+
|
|
399
|
+
## Phase 14: Live-gated e2e and release readiness
|
|
400
|
+
|
|
401
|
+
Verify the complete lifecycle only when credentials are intentionally present.
|
|
402
|
+
|
|
403
|
+
Deliverables:
|
|
404
|
+
|
|
405
|
+
- Live e2e command guarded by explicit env flag.
|
|
406
|
+
- Low-cost Claude Managed Agents fixture Template.
|
|
407
|
+
- Cleanup in `finally`.
|
|
408
|
+
- Release/readiness docs.
|
|
409
|
+
- Updated README and examples for platform SDK usage.
|
|
410
|
+
|
|
411
|
+
TDD gate:
|
|
412
|
+
|
|
413
|
+
- Live e2e is not a TDD driver for core logic, but must prove final integration before release.
|
|
414
|
+
|
|
415
|
+
Validation:
|
|
416
|
+
|
|
417
|
+
- Full submit -> provider session -> metadata poll -> output capture -> signed link -> cleanup works.
|
|
418
|
+
- Default `npm run lint`, `npm test`, and `npm run build` pass.
|
|
419
|
+
|
|
420
|
+
## Backlog
|
|
421
|
+
|
|
422
|
+
- Provider webhooks as wakeup/reconciliation accelerator.
|
|
423
|
+
- SSE live event stream for richer dashboard UI.
|
|
424
|
+
- Supabase Realtime with explicit Auth.js-to-Supabase authorization design.
|
|
425
|
+
- Agent/Environment caching by Template/config hash.
|
|
426
|
+
- Additional provider adapters.
|
|
427
|
+
- Runtime human approval flow if product scope changes.
|
|
428
|
+
- Advanced billing and plan management.
|
|
429
|
+
- Cloud Template registry.
|
|
430
|
+
- Curated MCP adapter catalog.
|