antpath 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (49) hide show
  1. package/README.md +102 -67
  2. package/dist/client.js +4 -1
  3. package/dist/client.js.map +1 -1
  4. package/dist/credentials.js +34 -5
  5. package/dist/credentials.js.map +1 -1
  6. package/dist/files/downloader.js +8 -0
  7. package/dist/files/downloader.js.map +1 -1
  8. package/dist/index.d.ts +7 -2
  9. package/dist/index.js +3 -0
  10. package/dist/index.js.map +1 -1
  11. package/dist/platform/client.d.ts +204 -0
  12. package/dist/platform/client.js +203 -0
  13. package/dist/platform/client.js.map +1 -0
  14. package/dist/platform/index.d.ts +1 -0
  15. package/dist/platform/index.js +2 -0
  16. package/dist/platform/index.js.map +1 -0
  17. package/dist/providers/anthropic/provider.d.ts +6 -0
  18. package/dist/providers/anthropic/provider.js +90 -12
  19. package/dist/providers/anthropic/provider.js.map +1 -1
  20. package/dist/providers/known-events.d.ts +60 -0
  21. package/dist/providers/known-events.js +64 -0
  22. package/dist/providers/known-events.js.map +1 -0
  23. package/dist/run/controller.d.ts +4 -1
  24. package/dist/run/controller.js +103 -13
  25. package/dist/run/controller.js.map +1 -1
  26. package/dist/template/index.d.ts +1 -1
  27. package/dist/types.d.ts +20 -0
  28. package/dist/utils/events.d.ts +21 -0
  29. package/dist/utils/events.js +97 -18
  30. package/dist/utils/events.js.map +1 -1
  31. package/dist/utils/paths.js +9 -3
  32. package/dist/utils/paths.js.map +1 -1
  33. package/docs/cleanup.md +15 -15
  34. package/docs/credentials.md +105 -23
  35. package/docs/events.md +129 -0
  36. package/docs/mcp.md +18 -18
  37. package/docs/outputs.md +16 -16
  38. package/docs/quickstart.md +13 -13
  39. package/docs/release.md +22 -22
  40. package/docs/skills.md +18 -16
  41. package/docs/templates.md +24 -24
  42. package/docs/testing.md +26 -27
  43. package/examples/mcp-static-bearer.ts +30 -30
  44. package/examples/quickstart.ts +23 -23
  45. package/package.json +46 -51
  46. package/references/architecture-decisions.md +473 -203
  47. package/references/implementation-plan.md +452 -527
  48. package/references/research-sources.md +41 -30
  49. package/references/testing-strategy.md +29 -108
@@ -1,527 +1,452 @@
1
- ---
2
- title: antpath implementation plan
3
- status: proposed
4
- scope: sdk-only MVP
5
- ---
6
-
7
- # antpath implementation plan
8
-
9
- ## Goal
10
-
11
- Build the SDK-only MVP for antpath with test-first development: a TypeScript SDK that runs code-defined Templates on Claude Managed Agents, using caller-held provider credentials, typed credential inputs, provider-side vaults, queued user messages, event streaming, output downloads, and manual cleanup.
12
-
13
- ## Acceptance criteria
14
-
15
- - A user can define a secret-free Template in TypeScript.
16
- - A user can create a Client with their provider key.
17
- - A user can run a Template with typed variables and credentials.
18
- - The SDK creates all required provider resources per run.
19
- - The SDK sends queued user messages, one at a time, when the session is idle.
20
- - The SDK aborts on any provider error or terminated state.
21
- - The SDK resolves completion as the first idle state after all queued messages are sent.
22
- - The SDK can stream events, wait for completion, terminate, list files, download a file, download all outputs, read usage, return a final result, and cleanup.
23
- - The SDK never persists or logs provider keys or MCP credential values.
24
- - Unit tests cover Template parsing, variable resolution, credential validation, provider request mapping, run state transitions, output handling, and cleanup.
25
- - Component integration tests cover interactions between SDK modules using fake providers and local fixtures.
26
- - Recorded API integration tests replay sanitized responses captured from real Claude API calls and are reproducible with scripts.
27
- - Live e2e tests exercise the real Claude Managed Agents lifecycle and are gated behind local credentials.
28
-
29
- ## Phase 1: Repository and package foundation
30
-
31
- Create the TypeScript package foundation.
32
-
33
- Deliverables:
34
-
35
- - package manager configuration;
36
- - TypeScript configuration;
37
- - lint/format/test commands;
38
- - source/test directory layout;
39
- - public package entrypoint;
40
- - typed error base classes;
41
- - test fixtures and fake provider harness;
42
- - four test commands: unit, component integration, recorded API integration, and live e2e;
43
- - fixture recording/sanitization scripts for real API responses.
44
-
45
- Recommended structure:
46
-
47
- ```text
48
- src/
49
- index.ts
50
- client.ts
51
- template/
52
- credentials/
53
- providers/
54
- anthropic/
55
- run/
56
- files/
57
- skills/
58
- utils/
59
- test/
60
- unit/
61
- integration/
62
- components/
63
- api-recorded/
64
- live/
65
- fixtures/
66
- api-recordings/
67
- references/
68
- scripts/
69
- record-api-fixtures.ts
70
- sanitize-api-fixtures.ts
71
- ```
72
-
73
- Validation:
74
-
75
- - `npm test` or equivalent test command passes.
76
- - test commands exist for all four layers.
77
- - TypeScript build emits declarations.
78
-
79
- ## Phase 2: Public SDK types
80
-
81
- Define the public API before provider implementation.
82
-
83
- Deliverables:
84
-
85
- - `AntpathClient`;
86
- - `Template`;
87
- - `RunOptions`;
88
- - `RunHandle`;
89
- - `RunResult`;
90
- - `RunStatus`;
91
- - `RunEvent`;
92
- - `UsageSummary`;
93
- - `OutputManifest`;
94
- - `CleanupPolicy`;
95
- - typed credential unions.
96
-
97
- Initial API shape:
98
-
99
- ```ts
100
- const client = new AntpathClient({
101
- anthropicApiKey: process.env.ANTHROPIC_API_KEY
102
- });
103
-
104
- const template = defineTemplate({
105
- name: "example",
106
- model: "claude-sonnet-4-6",
107
- system: "You are a focused automation agent.",
108
- messages: ["Do the task: {{task}}"],
109
- variables: {
110
- task: string()
111
- },
112
- mcpServers: {
113
- linear: {
114
- url: "https://mcp.linear.app/mcp",
115
- auth: requiredStaticBearer(),
116
- tools: { allow: ["list_issues"] }
117
- }
118
- },
119
- outputs: {
120
- recommendedPath: "/antpath/outputs"
121
- }
122
- });
123
-
124
- const handle = await client.run(template, {
125
- variables: { task: "Summarize open issues" },
126
- credentials: {
127
- linear: { type: "static_bearer", token: process.env.LINEAR_API_KEY! }
128
- }
129
- });
130
-
131
- const result = await handle.wait();
132
- await handle.downloadOutputs("./outputs");
133
- await handle.cleanup();
134
- ```
135
-
136
- Validation:
137
-
138
- - Compile-time tests assert supported and unsupported credential shapes.
139
- - Runtime schema tests reject invalid Template and run inputs.
140
- - Public types are defined test-first with type tests before implementation.
141
-
142
- ## Phase 3: Template compiler
143
-
144
- Implement a strict compiler from user Template to resolved, provider-neutral internal configuration.
145
-
146
- Deliverables:
147
-
148
- - immutable Template snapshot/hash;
149
- - variable declaration and resolution;
150
- - escaping for literal placeholders;
151
- - strict unresolved variable failures;
152
- - secret boundary enforcement;
153
- - MCP declaration normalization;
154
- - tool allow/deny normalization;
155
- - environment package/setup normalization;
156
- - output configuration normalization.
157
-
158
- Validation:
159
-
160
- - unresolved variables fail before provider calls;
161
- - escaped placeholders remain literal;
162
- - secrets cannot be supplied through variables;
163
- - Template hash is stable for semantically identical inputs;
164
- - Template edits produce a different hash.
165
- - Recorded fixtures do not include resolved secret values.
166
-
167
- ## Phase 4: Credential validation
168
-
169
- Implement typed credential validation independent of provider calls.
170
-
171
- Deliverables:
172
-
173
- - `static_bearer` credential type;
174
- - `oauth_access_token` credential type;
175
- - Template credential requirements;
176
- - run-time credential matching by MCP server key;
177
- - redacted error messages;
178
- - no credential values in logs/errors/result objects.
179
-
180
- Validation:
181
-
182
- - missing required credentials fail before provider calls;
183
- - unsupported arbitrary headers fail with explicit out-of-scope error;
184
- - credential values are redacted in snapshots, errors, and debug output.
185
- - redaction is tested in unit, component integration, and recorded API fixture sanitization.
186
-
187
- ## Phase 5: Anthropic Managed Agents adapter
188
-
189
- Build the provider adapter as the only MVP backend.
190
-
191
- Deliverables:
192
-
193
- - provider client wrapper;
194
- - create Environment per run;
195
- - upload local skills/resources;
196
- - create inline skills where provider supports them;
197
- - create Agent with model, system prompt, MCP servers, skills, tools, permission policies;
198
- - create per-run Vault and Credentials;
199
- - create Session;
200
- - send user messages;
201
- - stream/list events;
202
- - retrieve status/session metadata;
203
- - terminate session;
204
- - list/download session-scoped files;
205
- - retrieve usage/cost where provider exposes it;
206
- - cleanup created resources.
207
-
208
- Provider invariants:
209
-
210
- - MCP tools must not reach provider with an approval-required policy.
211
- - Enabled MCP tools must be explicitly allow/deny configured.
212
- - Credentials are submitted only to provider vault APIs, never persisted locally beyond active process memory.
213
- - Created provider IDs are captured for cleanup and debugging.
214
-
215
- Validation:
216
-
217
- - fake provider tests assert exact request order and payloads;
218
- - recorded API integration tests assert adapter parsing against sanitized real provider responses;
219
- - errors from each create step trigger correct cleanup state;
220
- - cleanup is idempotent where possible;
221
- - provider IDs are retained even after partial failures.
222
-
223
- ## Phase 6: Run state machine
224
-
225
- Implement deterministic orchestration around provider events.
226
-
227
- Deliverables:
228
-
229
- - `RunController`;
230
- - event stream consumer;
231
- - queued message scheduler;
232
- - timeout controller;
233
- - termination controller;
234
- - local status transitions;
235
- - final result builder.
236
-
237
- State-machine rules:
238
-
239
- - Send first message after Session creation.
240
- - On idle with queued messages remaining, send next message.
241
- - On idle with no queued messages remaining, succeed.
242
- - On any provider error, fail and do not send further messages.
243
- - On terminated, fail unless termination was user-requested.
244
- - On timeout, terminate and mark timed out.
245
-
246
- Validation:
247
-
248
- - table-driven tests for event sequences;
249
- - component integration tests cover event stream plus queued message scheduling;
250
- - duplicate/replayed events do not corrupt state;
251
- - queued messages are sent exactly once;
252
- - abort prevents later queued messages;
253
- - timeout cannot race into success.
254
-
255
- ## Phase 7: Run handle API
256
-
257
- Expose the MVP handle methods.
258
-
259
- Deliverables:
260
-
261
- - `status()`;
262
- - `streamEvents()`;
263
- - `wait()`;
264
- - `listFiles()`;
265
- - `downloadFile()`;
266
- - `downloadOutputs()`;
267
- - `cleanup()`;
268
- - `terminate()`;
269
- - `usage()`;
270
- - `result()`.
271
-
272
- Behavior:
273
-
274
- - `wait()` resolves once the run reaches terminal SDK state.
275
- - `streamEvents()` yields provider events plus SDK lifecycle events.
276
- - `downloadOutputs()` downloads all session-scoped files by default.
277
- - `cleanup()` is manual and can be called after success, failure, or termination.
278
- - `cleanup()` reports skipped/failed cleanup operations rather than hiding them.
279
-
280
- Validation:
281
-
282
- - methods have deterministic behavior before, during, and after terminal states;
283
- - cleanup can be retried safely;
284
- - file downloads preserve names and avoid unsafe path traversal.
285
- - live e2e test covers the happy path from Template to cleanup.
286
-
287
- ## Phase 8: Output download subsystem
288
-
289
- Implement local output handling without antpath storage.
290
-
291
- Deliverables:
292
-
293
- - session-scoped file listing;
294
- - safe local path mapping;
295
- - download all files by default;
296
- - optional filters/globs;
297
- - optional `/antpath/outputs` convention in prompts/config;
298
- - local output manifest;
299
- - checksum/size metadata when feasible.
300
-
301
- Validation:
302
-
303
- - path traversal filenames are sanitized;
304
- - duplicate filenames are handled deterministically;
305
- - partial download failures are reported clearly;
306
- - downloaded output manifest contains no secrets.
307
- - recorded API fixtures cover session-scoped file listing and download metadata.
308
-
309
- ## Phase 9: Skill packaging
310
-
311
- Support local uploads and inline skills.
312
-
313
- Deliverables:
314
-
315
- - local skill path validation;
316
- - package/zip local skill directory;
317
- - upload skill artifact/resource per run;
318
- - inline skill declaration support;
319
- - variable resolution in skill content/config;
320
- - skill resource cleanup tracking.
321
-
322
- Validation:
323
-
324
- - missing local paths fail before provider calls;
325
- - packaging is deterministic;
326
- - large/unsupported skill inputs fail with actionable errors;
327
- - inline and local skills can be combined.
328
-
329
- ## Phase 10: Cleanup and orphan handling
330
-
331
- Implement explicit cleanup as a first-class handle operation.
332
-
333
- Deliverables:
334
-
335
- - cleanup policy model;
336
- - manual default;
337
- - provider resource cleanup ordering;
338
- - vault/credential cleanup;
339
- - environment/agent/session/file cleanup where supported;
340
- - cleanup state reporting;
341
- - orphan recovery inputs.
342
-
343
- Cleanup order should prefer removing credentials first, then optional file/session resources, then Agent/Environment where safe.
344
-
345
- Validation:
346
-
347
- - cleanup works after success, failure, timeout, and partial provider creation failure;
348
- - cleanup failures are returned with provider IDs and redacted messages;
349
- - calling cleanup twice is safe;
350
- - orphan recovery can accept persisted provider IDs from a previous result.
351
- - live e2e cleanup verifies provider vault/session/resource cleanup behavior where provider APIs allow it.
352
-
353
- ## Phase 11: Observability and safe logging
354
-
355
- Add structured, redacted observability suitable for SDK users.
356
-
357
- Deliverables:
358
-
359
- - event hooks/callbacks;
360
- - debug logger interface;
361
- - redaction utility;
362
- - structured SDK lifecycle events;
363
- - provider request metadata without secrets;
364
- - error taxonomy.
365
-
366
- Validation:
367
-
368
- - tests prove secret values are never emitted through logs/events/errors;
369
- - debug logs include enough provider IDs and state to diagnose failures.
370
-
371
- ## Phase 12: Documentation and examples
372
-
373
- Create user-facing SDK docs and runnable examples.
374
-
375
- Deliverables:
376
-
377
- - quickstart;
378
- - Template guide;
379
- - credentials guide;
380
- - MCP guide;
381
- - skills guide;
382
- - cleanup/orphan guide;
383
- - output download guide;
384
- - examples for static bearer and OAuth access-token MCP credentials.
385
-
386
- Validation:
387
-
388
- - examples compile;
389
- - docs state MVP non-goals and accepted risks.
390
-
391
- ## Phase 13: Release readiness
392
-
393
- Prepare the SDK for first external use.
394
-
395
- Deliverables:
396
-
397
- - package metadata;
398
- - changelog;
399
- - versioning policy;
400
- - README;
401
- - API reference generation if practical;
402
- - CI workflow once repository hosting is established.
403
-
404
- Validation:
405
-
406
- - clean install works;
407
- - build/test pass from a fresh checkout;
408
- - package dry-run includes only intended files.
409
-
410
- ## Test strategy
411
-
412
- Use test-first development for every phase. Start each feature by adding or updating the narrowest failing test in the appropriate layer, then implement only enough code to pass it.
413
-
414
- Use a fake provider for most tests. Do not rely on live provider calls for the core state machine.
415
-
416
- ### Layer 1: Unit tests
417
-
418
- Purpose: verify pure logic and small state transitions with no provider, filesystem, or network dependency.
419
-
420
- Coverage:
421
-
422
- - Template compiler;
423
- - variable resolution and escaping;
424
- - credential parser;
425
- - redaction utilities;
426
- - state reducer/state-machine transitions;
427
- - safe local path mapping;
428
- - cleanup plan construction.
429
-
430
- Command:
431
-
432
- ```text
433
- npm run test:unit
434
- ```
435
-
436
- ### Layer 2: Component integration tests
437
-
438
- Purpose: verify interactions between SDK components using fake providers, fake clocks, and local fixtures.
439
-
440
- Coverage:
441
-
442
- - Client + Template compiler + credential validator;
443
- - RunController + fake provider event stream;
444
- - queued message scheduling;
445
- - output downloader + fake file service;
446
- - cleanup manager + fake provider resources;
447
- - logger/event hooks with redaction.
448
-
449
- Command:
450
-
451
- ```text
452
- npm run test:integration:components
453
- ```
454
-
455
- ### Layer 3: Recorded API integration tests
456
-
457
- Purpose: verify provider adapter behavior against sanitized responses captured from real Claude API calls, without requiring network access during normal CI.
458
-
459
- Requirements:
460
-
461
- - Real API responses are captured by explicit scripts.
462
- - Recordings are sanitized before being committed.
463
- - Secret values, request headers, API keys, bearer tokens, OAuth tokens, and raw sensitive prompts are never stored.
464
- - Fixtures are deterministic and versioned by provider API/beta header.
465
- - Tests fail if fixture sanitization leaves secret-shaped values.
466
-
467
- Commands:
468
-
469
- ```text
470
- npm run fixtures:record:anthropic
471
- npm run fixtures:sanitize
472
- npm run test:integration:api
473
- ```
474
-
475
- ### Layer 4: Live e2e tests
476
-
477
- Purpose: verify the full real Claude Managed Agents lifecycle using a local credential.
478
-
479
- Coverage:
480
-
481
- - create Environment;
482
- - create Agent;
483
- - create Vault/Credential when needed;
484
- - create Session;
485
- - send queued messages;
486
- - observe idle completion;
487
- - list/download session-scoped files;
488
- - retrieve usage/status where available;
489
- - terminate if needed;
490
- - cleanup if configured or explicitly requested.
491
-
492
- Rules:
493
-
494
- - Live tests are never part of default CI.
495
- - Live tests require `.env.local` with `ANTHROPIC_API_KEY`.
496
- - Live tests must use low-cost prompts, strict timeouts, and cleanup guards.
497
- - Live tests must not print the key or provider auth headers.
498
-
499
- Command:
500
-
501
- ```text
502
- npm run test:e2e:live
503
- ```
504
-
505
- Key invariants:
506
-
507
- - no provider call before Template and credentials are fully parsed;
508
- - no secret value appears in logs, errors, Template snapshots, result objects, or manifests;
509
- - no secret value appears in recorded API fixtures;
510
- - every created provider resource is tracked for cleanup;
511
- - message queue sends each message at most once;
512
- - cleanup is manual by default and retryable;
513
- - output download never writes outside the requested local directory.
514
-
515
- ## Backlog plan
516
-
517
- After MVP:
518
-
519
- 1. Add cloud metadata sync and dashboard.
520
- 2. Add encrypted run-scoped key support for guaranteed cleanup.
521
- 3. Add OpenAI adapter.
522
- 4. Add provider Environment caching by Template hash.
523
- 5. Add cost/token/iteration caps.
524
- 6. Add OAuth refresh credentials.
525
- 7. Add arbitrary MCP headers through an antpath MCP proxy.
526
- 8. Add Template registry and sharing.
527
- 9. Add artifact retention service.
1
+ ---
2
+ title: antpath implementation plan
3
+ status: accepted
4
+ scope: platform MVP
5
+ ---
6
+
7
+ # antpath implementation plan
8
+
9
+ ## Goal
10
+
11
+ Convert antpath from an SDK-only package into a pnpm TypeScript workspace containing:
12
+
13
+ - a platform SDK in `packages/sdk`;
14
+ - a dashboard app in `apps/dashboard`;
15
+ - a worker service in `apps/worker`;
16
+ - shared packages for types, schema, configuration, redaction, and database helpers as needed.
17
+
18
+ The platform must submit, dispatch, observe, store metadata for, capture outputs from, and clean up Claude Managed Agents runs while preserving tenant isolation and secret boundaries.
19
+
20
+ Implementation is test-driven. Each behavior starts with the narrowest failing test that proves the desired invariant.
21
+
22
+ ## Acceptance criteria
23
+
24
+ - The repository is a pnpm workspace and the current SDK is moved to `packages/sdk`.
25
+ - Dashboard and worker app folders exist with build/test integration.
26
+ - Auth.js authenticates dashboard users.
27
+ - SDK API tokens authenticate programmatic clients.
28
+ - Workspaces are the tenant boundary.
29
+ - BFF/server actions scope every dashboard/API operation by workspace membership or API-token scope.
30
+ - Supabase service-role credentials are never exposed to browser/client bundles.
31
+ - Supabase Postgres stores durable run, attempt, provider resource, event, output, output-capture-failure, cleanup, usage, workspace, and membership metadata. There is no persistent `provider_connections` table in MVP.
32
+ - Per-run secrets bundles (Anthropic key, optional MCP credentials, optional skill references) arrive inline on submission, are encrypted through Supabase Vault for the lifetime of that single run, and are deleted at terminal cleanup. The Vault entry is the only durable trace of the user's provider key.
33
+ - Run submission idempotency is enforced by workspace and request hash. The hash excludes the `secrets` block so re-submitting the same logical run with a new key still matches.
34
+ - Workers claim due runs with leases and `FOR UPDATE SKIP LOCKED`.
35
+ - Multiple workers can run concurrently without duplicate lifecycle ownership.
36
+ - Worker polling recovers from missed `NOTIFY`, worker restart, deploy, and expired leases.
37
+ - Worker creates per-run provider resources and journals them for cleanup.
38
+ - Worker polls provider status and events and stores only redacted metadata.
39
+ - Output capture is unconditional: every artifact Claude exposes on the run's session is written to private Supabase Storage, bounded only by the workspace storage cap. Files that would exceed the cap are persisted as `output_capture_failures` rows.
40
+ - BFF returns signed output links only after workspace authorization.
41
+ - Cleanup runs after terminal provider state and output capture, deletes the per-run Vault entry, and (by default) deletes Claude-side resources. `cleanup.claudeSession: "retain"` only affects Claude-side resources.
42
+ - Reconciliation can recover intended provider resources after partial worker crashes.
43
+ - User deletion is pending/soft until cleanup and storage deletion are complete.
44
+ - Exact tier/cap values are configurable through environment variables with conservative defaults and run-level snapshots.
45
+ - Security-sensitive BFF/API actions are audited and rate-limited.
46
+ - Workspace storage quota and workspace deletion are enforced before provider/storage side effects proceed.
47
+ - CI runs pnpm lint/test/build/package checks plus a local Supabase integration job.
48
+ - Tests follow the accepted taxonomy: deterministic unit tests (including fakes and sanitized recorded snapshots), live external-system integration tests with no skip flags, and top-to-bottom live e2e tests.
49
+
50
+ ## Phase 1: Workspace foundation
51
+
52
+ Create the workspace structure without changing runtime behavior.
53
+
54
+ Deliverables:
55
+
56
+ - Root pnpm workspace configuration.
57
+ - Current SDK moved to `packages/sdk`.
58
+ - `apps/dashboard` placeholder.
59
+ - `apps/worker` placeholder.
60
+ - Shared TypeScript config/build/test commands.
61
+ - Existing SDK exports preserved or intentionally migrated with compatibility notes.
62
+ - Root validation commands:
63
+ - `pnpm lint`
64
+ - `pnpm test`
65
+ - `pnpm build`
66
+
67
+ TDD gate:
68
+
69
+ - Add tests or CI command checks proving the moved SDK still builds/tests from the root workspace.
70
+
71
+ Validation:
72
+
73
+ - Clean install works from root.
74
+ - Existing SDK tests still pass from root and package scope.
75
+ - Build emits SDK declarations.
76
+
77
+ ## Phase 2: Shared contracts and configuration
78
+
79
+ Define shared platform contracts before implementing dashboard/worker behavior.
80
+
81
+ Deliverables:
82
+
83
+ - Shared run status and cleanup status types.
84
+ - Shared error taxonomy.
85
+ - Shared redaction helpers and secret wrapper.
86
+ - Shared Template/platform submission request schemas.
87
+ - Environment config parser with required-secret validation and conservative defaults.
88
+ - Plan/cap config defaults for:
89
+ - max run duration;
90
+ - workspace/user/token concurrency;
91
+ - polling intervals and jitter;
92
+ - provider token-bucket rates;
93
+ - retry backoffs;
94
+ - lease duration/renewal threshold;
95
+ - max attempts;
96
+ - cleanup retries;
97
+ - output caps;
98
+ - storage caps;
99
+ - signed-link TTL;
100
+ - free user allowance.
101
+
102
+ TDD gate:
103
+
104
+ - Unit tests for env parsing, missing required env failure, optional fallback defaults, cap snapshot values, redaction, and status transitions.
105
+
106
+ Validation:
107
+
108
+ - Missing required secrets/connectivity config fails service startup.
109
+ - Missing optional config falls back to conservative low limits.
110
+
111
+ ## Phase 3: Database foundation
112
+
113
+ Create the durable source of truth.
114
+
115
+ Deliverables:
116
+
117
+ - Migration framework.
118
+ - Tables:
119
+ - Auth.js adapter tables: `users`, `accounts`, `sessions`, `verification_token`;
120
+ - antpath `app_users`;
121
+ - `workspaces`;
122
+ - `workspace_memberships`;
123
+ - `api_tokens`;
124
+ - `runs` (includes `execution_secret_id` referencing the per-run Vault entry);
125
+ - `run_attempts`;
126
+ - `provider_resources`;
127
+ - `run_events`;
128
+ - `output_objects`;
129
+ - `output_capture_failures`;
130
+ - `cleanup_attempts`;
131
+ - `usage_ledger`.
132
+ - Constraints:
133
+ - unique `(workspace_id, idempotency_key)` on runs;
134
+ - request-hash conflict handling;
135
+ - unique `(run_attempt_id, provider_event_id)` on events;
136
+ - foreign keys for workspace and attributed `app_user_id` where practical;
137
+ - RLS enabled and `anon`/`authenticated` direct table access revoked for platform/Auth.js tables.
138
+ - DB query helpers for tenant-scoped access.
139
+
140
+ TDD gate:
141
+
142
+ - Add failing database/security integration tests for migrations, idempotency, lease claim behavior, event dedupe, usage ledger idempotency, and cross-workspace access denial.
143
+
144
+ Validation:
145
+
146
+ - Migrations apply from a clean database.
147
+ - Concurrent claims use `FOR UPDATE SKIP LOCKED`.
148
+ - Event and usage ledger writes are transactional.
149
+
150
+ ## Phase 4: Auth, BFF, and SDK API-token access
151
+
152
+ Establish the authorization boundary.
153
+
154
+ Deliverables:
155
+
156
+ - Auth.js dashboard authentication.
157
+ - User mirror or Auth.js adapter integration.
158
+ - Workspace membership resolution.
159
+ - Workspace switch/active workspace model.
160
+ - Hashed SDK API tokens with scopes, creator, revocation, and last-used tracking.
161
+ - Shared authorization helper for Auth.js sessions and API tokens.
162
+ - BFF/server-action routes for run submission/read/update operations.
163
+ - Browser/client bundle boundary preventing service-role import.
164
+
165
+ TDD gate:
166
+
167
+ - Add failing tests for membership-scoped queries, cross-workspace denial, API-token scope enforcement, revoked token rejection, attributed user freezing, and no browser service-role imports.
168
+
169
+ Validation:
170
+
171
+ - Dashboard reads and mutates only authorized workspace data.
172
+ - SDK token cannot access another workspace.
173
+ - Service-role credentials are server/worker only.
174
+
175
+ ## Phase 5: Platform SDK client
176
+
177
+ Turn the SDK into the programmatic client for the platform while preserving Template ergonomics where practical.
178
+
179
+ Deliverables:
180
+
181
+ - SDK client for platform API base URL and API token.
182
+ - `AntpathPlatformClient` constructor accepts a default `secrets` bundle (Anthropic key + optional MCP credentials + optional skill references); every method accepts a per-call override.
183
+ - Submit run API (carries inline `secrets`).
184
+ - Get run status/detail API.
185
+ - List metadata events API.
186
+ - List outputs API (returns successful captures and capture failures).
187
+ - Create signed output link API.
188
+ - Cancel run API.
189
+ - Delete run API.
190
+ - Typed errors for auth, quota, validation, conflict, not found, and provider/platform failures.
191
+ - Compatibility path from existing Template definitions to platform submission requests.
192
+
193
+ TDD gate:
194
+
195
+ - Type/contract tests for public SDK API and runtime tests with fake platform responses.
196
+ - Tests asserting `secrets` never appears in serialized request hashes, error metadata, or retry telemetry.
197
+
198
+ Validation:
199
+
200
+ - SDK does not persist provider keys: it forwards them to the platform per submission, where they are vaulted for the lifetime of one run and then deleted.
201
+ - SDK handles idempotency conflict, unauthorized, quota, and terminal states deterministically.
202
+
203
+ ## Phase 6: Worker claim loop and state machine
204
+
205
+ Implement durable lifecycle ownership independent of provider details.
206
+
207
+ Deliverables:
208
+
209
+ - Polling loop for due runs.
210
+ - Optional Postgres `NOTIFY` listener for fast wakeup.
211
+ - Lease claim/release helpers.
212
+ - Lease-guarded status update helper.
213
+ - Per-workspace rate limit hooks.
214
+ - Fair due-run ordering across workspaces.
215
+ - Cancellation/delete request checks before side effects.
216
+ - Timeout handling.
217
+ - Error classification.
218
+ - Retry/backoff scheduling through `next_check_at`.
219
+
220
+ TDD gate:
221
+
222
+ - Add failing component and database tests for concurrent fake workers, expired lease reclaim, lease-token update failures, cancellation races, timeout races, and polling fallback after missed `NOTIFY`.
223
+
224
+ Validation:
225
+
226
+ - Multiple workers do not process the same run step.
227
+ - Expired leases recover.
228
+ - Worker restart leaves no required in-memory state.
229
+
230
+ ## Phase 7: Fake provider lifecycle harness
231
+
232
+ Prove worker behavior with deterministic provider boundaries before live provider integration.
233
+
234
+ Deliverables:
235
+
236
+ - Fake provider implementing create resources, create session, send event, retrieve status, list events, list files, download file, and cleanup.
237
+ - Fake storage adapter.
238
+ - Fake Vault adapter.
239
+ - Table-driven lifecycle tests.
240
+
241
+ TDD gate:
242
+
243
+ - Add component tests for happy path, provider errors, terminal states, duplicate events, output capture, cleanup failures, and retryable failures.
244
+
245
+ Validation:
246
+
247
+ - Core run lifecycle is correct without network calls.
248
+ - Duplicate/replayed provider events do not double-count usage.
249
+
250
+ ## Phase 8: Claude provider adapter
251
+
252
+ Adapt existing Claude Managed Agents provider code for the platform worker.
253
+
254
+ Deliverables:
255
+
256
+ - Provider client wrapper for worker.
257
+ - Create Environment per run.
258
+ - Upload skills/resources as needed.
259
+ - Create Agent with model/system/MCP/skills/tool policy.
260
+ - Create provider Vault/Credentials for MCP credentials.
261
+ - Create Session.
262
+ - Send initial user event.
263
+ - Retrieve session status.
264
+ - List session events with cursor/filter where available.
265
+ - List/download session files.
266
+ - Cleanup/archive/delete resources.
267
+ - Provider metadata naming/tagging for reconciliation.
268
+ - Provider error classification.
269
+
270
+ TDD gate:
271
+
272
+ - Add sanitized recorded provider snapshot unit tests for parsing and cleanup behavior before live e2e.
273
+ - Verify exact Claude events pagination/filter semantics and document bounded fallback if needed.
274
+
275
+ Validation:
276
+
277
+ - No approval-required tool policy reaches the provider.
278
+ - Provider IDs are persisted for cleanup.
279
+ - Sanitized fixtures contain no secrets.
280
+
281
+ ## Phase 9: Provider resource journaling and reconciliation sweeper
282
+
283
+ Close resource leak windows.
284
+
285
+ Deliverables:
286
+
287
+ - Pre-insert intended `provider_resources` rows before provider side effects where possible.
288
+ - Deterministic provider names/metadata with antpath workspace/run/attempt identifiers.
289
+ - Sweeper for expired leases and unfinished intended resources.
290
+ - Orphan matching by provider list/search APIs where available.
291
+ - Reschedule recoverable runs.
292
+ - Cleanup orphaned resources.
293
+
294
+ TDD gate:
295
+
296
+ - Add tests that simulate worker crashes after provider create succeeds but before provider id persistence.
297
+
298
+ Validation:
299
+
300
+ - Sweeper can attach or cleanup recoverable resources.
301
+ - Cleanup remains idempotent after partial failures.
302
+
303
+ ## Phase 10: Output capture and Supabase Storage
304
+
305
+ Unconditionally capture every artifact the user's Claude session exposes into private Supabase Storage. The workspace storage cap is the only user-visible quota.
306
+
307
+ Deliverables:
308
+
309
+ - Worker terminal-state list of session-scoped files via the Claude Files API (`GET /v1/files?scope_id=<sessionId>`), returning `id`, `filename`, and `size_bytes`.
310
+ - Workspace storage usage accounting helper.
311
+ - Pre-download quota check: if `(workspace_used + size_bytes) > workspace cap`, write an `output_capture_failures` row with `reason = "workspace_quota_exceeded"` and continue without downloading.
312
+ - Streaming download via `GET /v1/files/{file_id}/content` with a worker-internal safety cap so malformed listing responses cannot OOM the worker; safety-cap aborts are recorded as `output_capture_failures` with `reason = "download_failed"`.
313
+ - Private Supabase Storage upload with workspace-scoped path policy.
314
+ - `output_objects` metadata insert for each successful capture.
315
+ - Per-user attribution frozen from the run row (audit/billing dimension only; not a quota).
316
+ - Signed-link BFF action/API for both `output_objects` rows; capture failures are read-only.
317
+
318
+ TDD gate:
319
+
320
+ - Add failing tests for: pre-download quota check choosing failure-row over download when over cap; safety-cap abort recording a failure row; happy path writing both an `output_objects` row and an audit entry; signed-link authorization; cross-workspace denial; listing returning zero files producing zero rows; listing returning N files with mixed sizes recording the expected mix of successes and failures.
321
+ - Tests must assert there is no user-facing knob for `capture: boolean`, `globs`, or per-file caps.
322
+
323
+ Validation:
324
+
325
+ - Output capture is unconditional and cannot be opted out by the submitter.
326
+ - Oversized output payloads do not OOM the worker.
327
+ - BFF only creates signed links for authorized workspace users/tokens.
328
+ - Dashboard renders successful captures and capture failures side by side.
329
+
330
+ ## Phase 11: Cleanup, deletion, and retention
331
+
332
+ Make cleanup and deletion first-class state machines. Cleanup must also destroy the per-run Vault secret.
333
+
334
+ Deliverables:
335
+
336
+ - Cleanup ordering (per the architecture decisions doc):
337
+ - capture session-scoped files (Phase 10);
338
+ - provider session files / session where supported;
339
+ - agent / archive;
340
+ - environment / archive or delete;
341
+ - skills uploaded for this session and other ephemeral provider resources;
342
+ - provider Vault/credentials created on Claude's side for MCP wiring;
343
+ - the antpath per-run Vault entry: `vault.deleteSecret(runs.execution_secret_id)` and clear the column;
344
+ - local Supabase Storage and metadata only on user deletion.
345
+ - The Claude-side cleanup steps are skipped when `cleanup.claudeSession === "retain"` (or the worker default is `retain`); the per-run Vault deletion still runs.
346
+ - Cleanup retry/backoff with idempotent calls.
347
+ - `cleanup_attempts` records.
348
+ - Cleanup state separate from user-facing run terminal state.
349
+ - User `pending_delete` flow.
350
+ - Workspace deletion flow.
351
+ - Audit logs and rate limits for delete/cancel/signed-link/API-token mutations.
352
+
353
+ TDD gate:
354
+
355
+ - Add tests for cleanup after success, failure, timeout, cancellation, partial provider creation, duplicate cleanup calls, and pending-delete races.
356
+ - Add tests that the per-run Vault entry is deleted exactly once and `runs.execution_secret_id` is cleared whether or not Claude-side resources are retained.
357
+ - Add tests that a key rotated on the provider side mid-run produces a `tenant_permanent` failure on the next provider call and that cleanup still runs (and the Vault entry is still deleted).
358
+
359
+ Validation:
360
+
361
+ - Cleanup failures surface actionable redacted errors.
362
+ - Hard deletion only happens after cleanup/storage deletion succeeds.
363
+ - The per-run Vault entry is always destroyed at terminal cleanup, regardless of retention knobs.
364
+
365
+ ## Phase 12: Minimal dashboard
366
+
367
+ Build the tenant-scoped monitoring surface.
368
+
369
+ Deliverables:
370
+
371
+ - Sign-in/out.
372
+ - Workspace switcher.
373
+ - Runs list.
374
+ - Run detail page.
375
+ - Status, timestamps, attributed user, template hash, provider IDs where safe, usage, cleanup state, and redacted metadata events.
376
+ - Output list and signed-link actions.
377
+ - Cancel/delete actions.
378
+ - Quota/cap warnings.
379
+
380
+ TDD gate:
381
+
382
+ - Add component or integration tests for tenant-scoped data loading through BFF only and role/scope behavior for actions.
383
+
384
+ Validation:
385
+
386
+ - Dashboard cannot read another workspace's runs or outputs.
387
+ - Dashboard displays cleanup retry/failure separately from run success/failure.
388
+
389
+ ## Phase 13: Observability and operations
390
+
391
+ Make the platform operable for future agents.
392
+
393
+ Deliverables:
394
+
395
+ - Structured worker logs.
396
+ - Redacted error reporting.
397
+ - Run lifecycle metrics.
398
+ - Worker health endpoint.
399
+ - Queue depth/due run metrics.
400
+ - Cleanup retry/dead-letter visibility.
401
+ - Reconciliation summary logs.
402
+ - Admin-only recovery tools if needed.
403
+
404
+ TDD gate:
405
+
406
+ - Add tests proving logs/events/errors cannot serialize secret wrappers and include enough non-secret identifiers for diagnosis.
407
+
408
+ Validation:
409
+
410
+ - Worker `/health` reports readiness.
411
+ - Operational traces include run id, workspace id, phase, attempt id, provider resource ids where safe, and cleanup status.
412
+
413
+ ## Phase 14: Live-gated e2e and release readiness
414
+
415
+ Verify the complete lifecycle only when credentials are intentionally present.
416
+
417
+ Deliverables:
418
+
419
+ - Live e2e command guarded by explicit env flag.
420
+ - Low-cost Claude Managed Agents fixture Template.
421
+ - Cleanup in `finally`.
422
+ - Release/readiness docs.
423
+ - Updated README and examples for platform SDK usage.
424
+ - GitHub Actions pnpm CI and local Supabase integration job.
425
+ - Vercel dashboard and Railway worker environment-variable contracts.
426
+ - `pnpm dev:stack` for local Supabase/dashboard/worker startup with fail-fast config checks.
427
+ - Deterministic MCP/skills request-wiring tests for SDK and worker providers.
428
+ - Separately gated public MCP plus Anthropic skill full-session e2e.
429
+
430
+ TDD gate:
431
+
432
+ - Live e2e is not a TDD driver for core logic, but must prove final integration before release.
433
+ - MCP/skills request shape, permission policy, and secret non-leakage must be covered by normal deterministic tests before relying on public live e2e.
434
+
435
+ Validation:
436
+
437
+ - Full submit -> provider session -> metadata poll -> output capture -> signed link -> cleanup works.
438
+ - Public MCP/skills e2e runs through the explicit `pnpm test:e2e:live` command with live credentials, reaches success, emits SDK/provider events, records safe provider IDs, exercises the default cleanup path, and exercises the `cleanup.claudeSession = "retain"` override path with a follow-up provider API check.
439
+ - Default `pnpm lint`, `pnpm test`, and `pnpm build` pass.
440
+
441
+ ## Backlog
442
+
443
+ - Persistent workspace-level provider connections (saved Anthropic keys with rotation/revocation/re-use). The MVP submission contract carries the key inline per run.
444
+ - Provider webhooks as wakeup/reconciliation accelerator.
445
+ - SSE live event stream for richer dashboard UI.
446
+ - Supabase Realtime with explicit Auth.js-to-Supabase authorization design.
447
+ - Agent/Environment caching by Template/config hash.
448
+ - Additional provider adapters.
449
+ - Runtime human approval flow if product scope changes.
450
+ - Advanced billing and plan management.
451
+ - Cloud Template registry.
452
+ - Curated MCP adapter catalog.