antpath 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,527 +1,430 @@
1
- ---
2
- title: antpath implementation plan
3
- status: proposed
4
- scope: sdk-only MVP
5
- ---
6
-
7
- # antpath implementation plan
8
-
9
- ## Goal
10
-
11
- Build the SDK-only MVP for antpath with test-first development: a TypeScript SDK that runs code-defined Templates on Claude Managed Agents, using caller-held provider credentials, typed credential inputs, provider-side vaults, queued user messages, event streaming, output downloads, and manual cleanup.
12
-
13
- ## Acceptance criteria
14
-
15
- - A user can define a secret-free Template in TypeScript.
16
- - A user can create a Client with their provider key.
17
- - A user can run a Template with typed variables and credentials.
18
- - The SDK creates all required provider resources per run.
19
- - The SDK sends queued user messages, one at a time, when the session is idle.
20
- - The SDK aborts on any provider error or terminated state.
21
- - The SDK resolves completion as the first idle state after all queued messages are sent.
22
- - The SDK can stream events, wait for completion, terminate, list files, download a file, download all outputs, read usage, return a final result, and cleanup.
23
- - The SDK never persists or logs provider keys or MCP credential values.
24
- - Unit tests cover Template parsing, variable resolution, credential validation, provider request mapping, run state transitions, output handling, and cleanup.
25
- - Component integration tests cover interactions between SDK modules using fake providers and local fixtures.
26
- - Recorded API integration tests replay sanitized responses captured from real Claude API calls and are reproducible with scripts.
27
- - Live e2e tests exercise the real Claude Managed Agents lifecycle and are gated behind local credentials.
28
-
29
- ## Phase 1: Repository and package foundation
30
-
31
- Create the TypeScript package foundation.
32
-
33
- Deliverables:
34
-
35
- - package manager configuration;
36
- - TypeScript configuration;
37
- - lint/format/test commands;
38
- - source/test directory layout;
39
- - public package entrypoint;
40
- - typed error base classes;
41
- - test fixtures and fake provider harness;
42
- - four test commands: unit, component integration, recorded API integration, and live e2e;
43
- - fixture recording/sanitization scripts for real API responses.
44
-
45
- Recommended structure:
46
-
47
- ```text
48
- src/
49
- index.ts
50
- client.ts
51
- template/
52
- credentials/
53
- providers/
54
- anthropic/
55
- run/
56
- files/
57
- skills/
58
- utils/
59
- test/
60
- unit/
61
- integration/
62
- components/
63
- api-recorded/
64
- live/
65
- fixtures/
66
- api-recordings/
67
- references/
68
- scripts/
69
- record-api-fixtures.ts
70
- sanitize-api-fixtures.ts
71
- ```
72
-
73
- Validation:
74
-
75
- - `npm test` or equivalent test command passes.
76
- - test commands exist for all four layers.
77
- - TypeScript build emits declarations.
78
-
79
- ## Phase 2: Public SDK types
80
-
81
- Define the public API before provider implementation.
82
-
83
- Deliverables:
84
-
85
- - `AntpathClient`;
86
- - `Template`;
87
- - `RunOptions`;
88
- - `RunHandle`;
89
- - `RunResult`;
90
- - `RunStatus`;
91
- - `RunEvent`;
92
- - `UsageSummary`;
93
- - `OutputManifest`;
94
- - `CleanupPolicy`;
95
- - typed credential unions.
96
-
97
- Initial API shape:
98
-
99
- ```ts
100
- const client = new AntpathClient({
101
- anthropicApiKey: process.env.ANTHROPIC_API_KEY
102
- });
103
-
104
- const template = defineTemplate({
105
- name: "example",
106
- model: "claude-sonnet-4-6",
107
- system: "You are a focused automation agent.",
108
- messages: ["Do the task: {{task}}"],
109
- variables: {
110
- task: string()
111
- },
112
- mcpServers: {
113
- linear: {
114
- url: "https://mcp.linear.app/mcp",
115
- auth: requiredStaticBearer(),
116
- tools: { allow: ["list_issues"] }
117
- }
118
- },
119
- outputs: {
120
- recommendedPath: "/antpath/outputs"
121
- }
122
- });
123
-
124
- const handle = await client.run(template, {
125
- variables: { task: "Summarize open issues" },
126
- credentials: {
127
- linear: { type: "static_bearer", token: process.env.LINEAR_API_KEY! }
128
- }
129
- });
130
-
131
- const result = await handle.wait();
132
- await handle.downloadOutputs("./outputs");
133
- await handle.cleanup();
134
- ```
135
-
136
- Validation:
137
-
138
- - Compile-time tests assert supported and unsupported credential shapes.
139
- - Runtime schema tests reject invalid Template and run inputs.
140
- - Public types are defined test-first with type tests before implementation.
141
-
142
- ## Phase 3: Template compiler
143
-
144
- Implement a strict compiler from user Template to resolved, provider-neutral internal configuration.
145
-
146
- Deliverables:
147
-
148
- - immutable Template snapshot/hash;
149
- - variable declaration and resolution;
150
- - escaping for literal placeholders;
151
- - strict unresolved variable failures;
152
- - secret boundary enforcement;
153
- - MCP declaration normalization;
154
- - tool allow/deny normalization;
155
- - environment package/setup normalization;
156
- - output configuration normalization.
157
-
158
- Validation:
159
-
160
- - unresolved variables fail before provider calls;
161
- - escaped placeholders remain literal;
162
- - secrets cannot be supplied through variables;
163
- - Template hash is stable for semantically identical inputs;
164
- - Template edits produce a different hash.
165
- - Recorded fixtures do not include resolved secret values.
166
-
167
- ## Phase 4: Credential validation
168
-
169
- Implement typed credential validation independent of provider calls.
170
-
171
- Deliverables:
172
-
173
- - `static_bearer` credential type;
174
- - `oauth_access_token` credential type;
175
- - Template credential requirements;
176
- - run-time credential matching by MCP server key;
177
- - redacted error messages;
178
- - no credential values in logs/errors/result objects.
179
-
180
- Validation:
181
-
182
- - missing required credentials fail before provider calls;
183
- - unsupported arbitrary headers fail with explicit out-of-scope error;
184
- - credential values are redacted in snapshots, errors, and debug output.
185
- - redaction is tested in unit, component integration, and recorded API fixture sanitization.
186
-
187
- ## Phase 5: Anthropic Managed Agents adapter
188
-
189
- Build the provider adapter as the only MVP backend.
190
-
191
- Deliverables:
192
-
193
- - provider client wrapper;
194
- - create Environment per run;
195
- - upload local skills/resources;
196
- - create inline skills where provider supports them;
197
- - create Agent with model, system prompt, MCP servers, skills, tools, permission policies;
198
- - create per-run Vault and Credentials;
199
- - create Session;
200
- - send user messages;
201
- - stream/list events;
202
- - retrieve status/session metadata;
203
- - terminate session;
204
- - list/download session-scoped files;
205
- - retrieve usage/cost where provider exposes it;
206
- - cleanup created resources.
207
-
208
- Provider invariants:
209
-
210
- - MCP tools must not reach provider with an approval-required policy.
211
- - Enabled MCP tools must be explicitly allow/deny configured.
212
- - Credentials are submitted only to provider vault APIs, never persisted locally beyond active process memory.
213
- - Created provider IDs are captured for cleanup and debugging.
214
-
215
- Validation:
216
-
217
- - fake provider tests assert exact request order and payloads;
218
- - recorded API integration tests assert adapter parsing against sanitized real provider responses;
219
- - errors from each create step trigger correct cleanup state;
220
- - cleanup is idempotent where possible;
221
- - provider IDs are retained even after partial failures.
222
-
223
- ## Phase 6: Run state machine
224
-
225
- Implement deterministic orchestration around provider events.
226
-
227
- Deliverables:
228
-
229
- - `RunController`;
230
- - event stream consumer;
231
- - queued message scheduler;
232
- - timeout controller;
233
- - termination controller;
234
- - local status transitions;
235
- - final result builder.
236
-
237
- State-machine rules:
238
-
239
- - Send first message after Session creation.
240
- - On idle with queued messages remaining, send next message.
241
- - On idle with no queued messages remaining, succeed.
242
- - On any provider error, fail and do not send further messages.
243
- - On terminated, fail unless termination was user-requested.
244
- - On timeout, terminate and mark timed out.
245
-
246
- Validation:
247
-
248
- - table-driven tests for event sequences;
249
- - component integration tests cover event stream plus queued message scheduling;
250
- - duplicate/replayed events do not corrupt state;
251
- - queued messages are sent exactly once;
252
- - abort prevents later queued messages;
253
- - timeout cannot race into success.
254
-
255
- ## Phase 7: Run handle API
256
-
257
- Expose the MVP handle methods.
258
-
259
- Deliverables:
260
-
261
- - `status()`;
262
- - `streamEvents()`;
263
- - `wait()`;
264
- - `listFiles()`;
265
- - `downloadFile()`;
266
- - `downloadOutputs()`;
267
- - `cleanup()`;
268
- - `terminate()`;
269
- - `usage()`;
270
- - `result()`.
271
-
272
- Behavior:
273
-
274
- - `wait()` resolves once the run reaches terminal SDK state.
275
- - `streamEvents()` yields provider events plus SDK lifecycle events.
276
- - `downloadOutputs()` downloads all session-scoped files by default.
277
- - `cleanup()` is manual and can be called after success, failure, or termination.
278
- - `cleanup()` reports skipped/failed cleanup operations rather than hiding them.
279
-
280
- Validation:
281
-
282
- - methods have deterministic behavior before, during, and after terminal states;
283
- - cleanup can be retried safely;
284
- - file downloads preserve names and avoid unsafe path traversal.
285
- - live e2e test covers the happy path from Template to cleanup.
286
-
287
- ## Phase 8: Output download subsystem
288
-
289
- Implement local output handling without antpath storage.
290
-
291
- Deliverables:
292
-
293
- - session-scoped file listing;
294
- - safe local path mapping;
295
- - download all files by default;
296
- - optional filters/globs;
297
- - optional `/antpath/outputs` convention in prompts/config;
298
- - local output manifest;
299
- - checksum/size metadata when feasible.
300
-
301
- Validation:
302
-
303
- - path traversal filenames are sanitized;
304
- - duplicate filenames are handled deterministically;
305
- - partial download failures are reported clearly;
306
- - downloaded output manifest contains no secrets.
307
- - recorded API fixtures cover session-scoped file listing and download metadata.
308
-
309
- ## Phase 9: Skill packaging
310
-
311
- Support local uploads and inline skills.
312
-
313
- Deliverables:
314
-
315
- - local skill path validation;
316
- - package/zip local skill directory;
317
- - upload skill artifact/resource per run;
318
- - inline skill declaration support;
319
- - variable resolution in skill content/config;
320
- - skill resource cleanup tracking.
321
-
322
- Validation:
323
-
324
- - missing local paths fail before provider calls;
325
- - packaging is deterministic;
326
- - large/unsupported skill inputs fail with actionable errors;
327
- - inline and local skills can be combined.
328
-
329
- ## Phase 10: Cleanup and orphan handling
330
-
331
- Implement explicit cleanup as a first-class handle operation.
332
-
333
- Deliverables:
334
-
335
- - cleanup policy model;
336
- - manual default;
337
- - provider resource cleanup ordering;
338
- - vault/credential cleanup;
339
- - environment/agent/session/file cleanup where supported;
340
- - cleanup state reporting;
341
- - orphan recovery inputs.
342
-
343
- Cleanup order should prefer removing credentials first, then optional file/session resources, then Agent/Environment where safe.
344
-
345
- Validation:
346
-
347
- - cleanup works after success, failure, timeout, and partial provider creation failure;
348
- - cleanup failures are returned with provider IDs and redacted messages;
349
- - calling cleanup twice is safe;
350
- - orphan recovery can accept persisted provider IDs from a previous result.
351
- - live e2e cleanup verifies provider vault/session/resource cleanup behavior where provider APIs allow it.
352
-
353
- ## Phase 11: Observability and safe logging
354
-
355
- Add structured, redacted observability suitable for SDK users.
356
-
357
- Deliverables:
358
-
359
- - event hooks/callbacks;
360
- - debug logger interface;
361
- - redaction utility;
362
- - structured SDK lifecycle events;
363
- - provider request metadata without secrets;
364
- - error taxonomy.
365
-
366
- Validation:
367
-
368
- - tests prove secret values are never emitted through logs/events/errors;
369
- - debug logs include enough provider IDs and state to diagnose failures.
370
-
371
- ## Phase 12: Documentation and examples
372
-
373
- Create user-facing SDK docs and runnable examples.
374
-
375
- Deliverables:
376
-
377
- - quickstart;
378
- - Template guide;
379
- - credentials guide;
380
- - MCP guide;
381
- - skills guide;
382
- - cleanup/orphan guide;
383
- - output download guide;
384
- - examples for static bearer and OAuth access-token MCP credentials.
385
-
386
- Validation:
387
-
388
- - examples compile;
389
- - docs state MVP non-goals and accepted risks.
390
-
391
- ## Phase 13: Release readiness
392
-
393
- Prepare the SDK for first external use.
394
-
395
- Deliverables:
396
-
397
- - package metadata;
398
- - changelog;
399
- - versioning policy;
400
- - README;
401
- - API reference generation if practical;
402
- - CI workflow once repository hosting is established.
403
-
404
- Validation:
405
-
406
- - clean install works;
407
- - build/test pass from a fresh checkout;
408
- - package dry-run includes only intended files.
409
-
410
- ## Test strategy
411
-
412
- Use test-first development for every phase. Start each feature by adding or updating the narrowest failing test in the appropriate layer, then implement only enough code to pass it.
413
-
414
- Use a fake provider for most tests. Do not rely on live provider calls for the core state machine.
415
-
416
- ### Layer 1: Unit tests
417
-
418
- Purpose: verify pure logic and small state transitions with no provider, filesystem, or network dependency.
419
-
420
- Coverage:
421
-
422
- - Template compiler;
423
- - variable resolution and escaping;
424
- - credential parser;
425
- - redaction utilities;
426
- - state reducer/state-machine transitions;
427
- - safe local path mapping;
428
- - cleanup plan construction.
429
-
430
- Command:
431
-
432
- ```text
433
- npm run test:unit
434
- ```
435
-
436
- ### Layer 2: Component integration tests
437
-
438
- Purpose: verify interactions between SDK components using fake providers, fake clocks, and local fixtures.
439
-
440
- Coverage:
441
-
442
- - Client + Template compiler + credential validator;
443
- - RunController + fake provider event stream;
444
- - queued message scheduling;
445
- - output downloader + fake file service;
446
- - cleanup manager + fake provider resources;
447
- - logger/event hooks with redaction.
448
-
449
- Command:
450
-
451
- ```text
452
- npm run test:integration:components
453
- ```
454
-
455
- ### Layer 3: Recorded API integration tests
456
-
457
- Purpose: verify provider adapter behavior against sanitized responses captured from real Claude API calls, without requiring network access during normal CI.
458
-
459
- Requirements:
460
-
461
- - Real API responses are captured by explicit scripts.
462
- - Recordings are sanitized before being committed.
463
- - Secret values, request headers, API keys, bearer tokens, OAuth tokens, and raw sensitive prompts are never stored.
464
- - Fixtures are deterministic and versioned by provider API/beta header.
465
- - Tests fail if fixture sanitization leaves secret-shaped values.
466
-
467
- Commands:
468
-
469
- ```text
470
- npm run fixtures:record:anthropic
471
- npm run fixtures:sanitize
472
- npm run test:integration:api
473
- ```
474
-
475
- ### Layer 4: Live e2e tests
476
-
477
- Purpose: verify the full real Claude Managed Agents lifecycle using a local credential.
478
-
479
- Coverage:
480
-
481
- - create Environment;
482
- - create Agent;
483
- - create Vault/Credential when needed;
484
- - create Session;
485
- - send queued messages;
486
- - observe idle completion;
487
- - list/download session-scoped files;
488
- - retrieve usage/status where available;
489
- - terminate if needed;
490
- - cleanup if configured or explicitly requested.
491
-
492
- Rules:
493
-
494
- - Live tests are never part of default CI.
495
- - Live tests require `.env.local` with `ANTHROPIC_API_KEY`.
496
- - Live tests must use low-cost prompts, strict timeouts, and cleanup guards.
497
- - Live tests must not print the key or provider auth headers.
498
-
499
- Command:
500
-
501
- ```text
502
- npm run test:e2e:live
503
- ```
504
-
505
- Key invariants:
506
-
507
- - no provider call before Template and credentials are fully parsed;
508
- - no secret value appears in logs, errors, Template snapshots, result objects, or manifests;
509
- - no secret value appears in recorded API fixtures;
510
- - every created provider resource is tracked for cleanup;
511
- - message queue sends each message at most once;
512
- - cleanup is manual by default and retryable;
513
- - output download never writes outside the requested local directory.
514
-
515
- ## Backlog plan
516
-
517
- After MVP:
518
-
519
- 1. Add cloud metadata sync and dashboard.
520
- 2. Add encrypted run-scoped key support for guaranteed cleanup.
521
- 3. Add OpenAI adapter.
522
- 4. Add provider Environment caching by Template hash.
523
- 5. Add cost/token/iteration caps.
524
- 6. Add OAuth refresh credentials.
525
- 7. Add arbitrary MCP headers through an antpath MCP proxy.
526
- 8. Add Template registry and sharing.
527
- 9. Add artifact retention service.
1
+ ---
2
+ title: antpath implementation plan
3
+ status: accepted
4
+ scope: platform MVP
5
+ ---
6
+
7
+ # antpath implementation plan
8
+
9
+ ## Goal
10
+
11
+ Convert antpath from an SDK-only package into a TypeScript workspace containing:
12
+
13
+ - a platform SDK in `packages/sdk`;
14
+ - a dashboard app in `apps/dashboard`;
15
+ - a worker service in `apps/worker`;
16
+ - shared packages for types, schema, configuration, redaction, and database helpers as needed.
17
+
18
+ The platform must submit, dispatch, observe, store metadata for, capture outputs from, and clean up Claude Managed Agents runs while preserving tenant isolation and secret boundaries.
19
+
20
+ Implementation is test-driven. Each behavior starts with the narrowest failing test that proves the desired invariant.
21
+
22
+ ## Acceptance criteria
23
+
24
+ - The repository is an npm workspace and the current SDK is moved to `packages/sdk`.
25
+ - Dashboard and worker app folders exist with build/test integration.
26
+ - Auth.js authenticates dashboard users.
27
+ - SDK API tokens authenticate programmatic clients.
28
+ - Workspaces are the tenant boundary.
29
+ - BFF/server actions scope every dashboard/API operation by workspace membership or API-token scope.
30
+ - Supabase service-role credentials are never exposed to browser/client bundles.
31
+ - Supabase Postgres stores durable run, attempt, provider resource, event, output, cleanup, usage, workspace, membership, and provider connection metadata.
32
+ - Provider keys are stored encrypted through Supabase Vault and resolved only by trusted server/worker code.
33
+ - Run submission idempotency is enforced by workspace and request hash.
34
+ - Workers claim due runs with leases and `FOR UPDATE SKIP LOCKED`.
35
+ - Multiple workers can run concurrently without duplicate lifecycle ownership.
36
+ - Worker polling recovers from missed `NOTIFY`, worker restart, deploy, and expired leases.
37
+ - Worker creates per-run provider resources and journals them for cleanup.
38
+ - Worker polls provider status and events and stores only redacted metadata.
39
+ - Output capture writes configured files to private Supabase Storage within per-file, per-run, and workspace quotas.
40
+ - BFF returns signed output links only after workspace authorization.
41
+ - Cleanup runs after terminal provider state and output capture.
42
+ - Reconciliation can recover intended provider resources after partial worker crashes.
43
+ - User deletion is pending/soft until cleanup and storage deletion are complete.
44
+ - Exact tier/cap values are configurable through environment variables with conservative defaults and run-level snapshots.
45
+ - Tests follow the accepted taxonomy: deterministic unit tests (including fakes and sanitized recorded snapshots), live external-system integration tests with no skip flags, and top-to-bottom live e2e tests.
46
+
47
+ ## Phase 1: Workspace foundation
48
+
49
+ Create the workspace structure without changing runtime behavior.
50
+
51
+ Deliverables:
52
+
53
+ - Root npm workspace configuration.
54
+ - Current SDK moved to `packages/sdk`.
55
+ - `apps/dashboard` placeholder.
56
+ - `apps/worker` placeholder.
57
+ - Shared TypeScript config/build/test commands.
58
+ - Existing SDK exports preserved or intentionally migrated with compatibility notes.
59
+ - Root validation commands:
60
+ - `npm run lint`
61
+ - `npm test`
62
+ - `npm run build`
63
+
64
+ TDD gate:
65
+
66
+ - Add tests or CI command checks proving the moved SDK still builds/tests from the root workspace.
67
+
68
+ Validation:
69
+
70
+ - Clean install works from root.
71
+ - Existing SDK tests still pass from root and package scope.
72
+ - Build emits SDK declarations.
73
+
74
+ ## Phase 2: Shared contracts and configuration
75
+
76
+ Define shared platform contracts before implementing dashboard/worker behavior.
77
+
78
+ Deliverables:
79
+
80
+ - Shared run status and cleanup status types.
81
+ - Shared error taxonomy.
82
+ - Shared redaction helpers and secret wrapper.
83
+ - Shared Template/platform submission request schemas.
84
+ - Environment config parser with required-secret validation and conservative defaults.
85
+ - Plan/cap config defaults for:
86
+ - max run duration;
87
+ - workspace/user/token concurrency;
88
+ - polling intervals and jitter;
89
+ - provider token-bucket rates;
90
+ - retry backoffs;
91
+ - lease duration/renewal threshold;
92
+ - max attempts;
93
+ - cleanup retries;
94
+ - output caps;
95
+ - storage caps;
96
+ - signed-link TTL;
97
+ - free user allowance.
98
+
99
+ TDD gate:
100
+
101
+ - Unit tests for env parsing, missing required env failure, optional fallback defaults, cap snapshot values, redaction, and status transitions.
102
+
103
+ Validation:
104
+
105
+ - Missing required secrets/connectivity config fails service startup.
106
+ - Missing optional config falls back to conservative low limits.
107
+
108
+ ## Phase 3: Database foundation
109
+
110
+ Create the durable source of truth.
111
+
112
+ Deliverables:
113
+
114
+ - Migration framework.
115
+ - Tables:
116
+ - `users`;
117
+ - `workspaces`;
118
+ - `workspace_memberships`;
119
+ - `api_tokens`;
120
+ - `provider_connections`;
121
+ - `runs`;
122
+ - `run_attempts`;
123
+ - `provider_resources`;
124
+ - `run_events`;
125
+ - `output_objects`;
126
+ - `cleanup_attempts`;
127
+ - `usage_ledger`.
128
+ - Constraints:
129
+ - unique `(workspace_id, idempotency_key)` on runs;
130
+ - request-hash conflict handling;
131
+ - unique `(run_attempt_id, provider_event_id)` on events;
132
+ - foreign keys for workspace and attributed user where practical.
133
+ - DB query helpers for tenant-scoped access.
134
+
135
+ TDD gate:
136
+
137
+ - Add failing database/security integration tests for migrations, idempotency, lease claim behavior, event dedupe, usage ledger idempotency, and cross-workspace access denial.
138
+
139
+ Validation:
140
+
141
+ - Migrations apply from a clean database.
142
+ - Concurrent claims use `FOR UPDATE SKIP LOCKED`.
143
+ - Event and usage ledger writes are transactional.
144
+
145
+ ## Phase 4: Auth, BFF, and SDK API-token access
146
+
147
+ Establish the authorization boundary.
148
+
149
+ Deliverables:
150
+
151
+ - Auth.js dashboard authentication.
152
+ - User mirror or Auth.js adapter integration.
153
+ - Workspace membership resolution.
154
+ - Workspace switch/active workspace model.
155
+ - Hashed SDK API tokens with scopes, creator, revocation, and last-used tracking.
156
+ - Shared authorization helper for Auth.js sessions and API tokens.
157
+ - BFF/server-action routes for run submission/read/update operations.
158
+ - Browser/client bundle boundary preventing service-role import.
159
+
160
+ TDD gate:
161
+
162
+ - Add failing tests for membership-scoped queries, cross-workspace denial, API-token scope enforcement, revoked token rejection, attributed user freezing, and no browser service-role imports.
163
+
164
+ Validation:
165
+
166
+ - Dashboard reads and mutates only authorized workspace data.
167
+ - SDK token cannot access another workspace.
168
+ - Service-role credentials are server/worker only.
169
+
170
+ ## Phase 5: Platform SDK client
171
+
172
+ Turn the SDK into the programmatic client for the platform while preserving Template ergonomics where practical.
173
+
174
+ Deliverables:
175
+
176
+ - SDK client for platform API base URL and API token.
177
+ - Submit run API.
178
+ - Get run status/detail API.
179
+ - List metadata events API.
180
+ - List outputs API.
181
+ - Create signed output link API.
182
+ - Cancel run API.
183
+ - Delete run API.
184
+ - Typed errors for auth, quota, validation, conflict, not found, and provider/platform failures.
185
+ - Compatibility path from existing Template definitions to platform submission requests.
186
+
187
+ TDD gate:
188
+
189
+ - Type/contract tests for public SDK API and runtime tests with fake platform responses.
190
+
191
+ Validation:
192
+
193
+ - SDK never accepts or stores provider keys for platform runs.
194
+ - SDK handles idempotency conflict, unauthorized, quota, and terminal states deterministically.
195
+
196
+ ## Phase 6: Worker claim loop and state machine
197
+
198
+ Implement durable lifecycle ownership independent of provider details.
199
+
200
+ Deliverables:
201
+
202
+ - Polling loop for due runs.
203
+ - Optional Postgres `NOTIFY` listener for fast wakeup.
204
+ - Lease claim/release helpers.
205
+ - Lease-guarded status update helper.
206
+ - Per-workspace/provider-key rate limit hooks.
207
+ - Fair due-run ordering across workspaces.
208
+ - Cancellation/delete request checks before side effects.
209
+ - Timeout handling.
210
+ - Error classification.
211
+ - Retry/backoff scheduling through `next_check_at`.
212
+
213
+ TDD gate:
214
+
215
+ - Add failing component and database tests for concurrent fake workers, expired lease reclaim, lease-token update failures, cancellation races, timeout races, and polling fallback after missed `NOTIFY`.
216
+
217
+ Validation:
218
+
219
+ - Multiple workers do not process the same run step.
220
+ - Expired leases recover.
221
+ - Worker restart leaves no required in-memory state.
222
+
223
+ ## Phase 7: Fake provider lifecycle harness
224
+
225
+ Prove worker behavior with deterministic provider boundaries before live provider integration.
226
+
227
+ Deliverables:
228
+
229
+ - Fake provider implementing create resources, create session, send event, retrieve status, list events, list files, download file, and cleanup.
230
+ - Fake storage adapter.
231
+ - Fake Vault adapter.
232
+ - Table-driven lifecycle tests.
233
+
234
+ TDD gate:
235
+
236
+ - Add component tests for happy path, provider errors, terminal states, duplicate events, output capture, cleanup failures, and retryable failures.
237
+
238
+ Validation:
239
+
240
+ - Core run lifecycle is correct without network calls.
241
+ - Duplicate/replayed provider events do not double-count usage.
242
+
243
+ ## Phase 8: Claude provider adapter
244
+
245
+ Adapt existing Claude Managed Agents provider code for the platform worker.
246
+
247
+ Deliverables:
248
+
249
+ - Provider client wrapper for worker.
250
+ - Create Environment per run.
251
+ - Upload skills/resources as needed.
252
+ - Create Agent with model/system/MCP/skills/tool policy.
253
+ - Create provider Vault/Credentials for MCP credentials.
254
+ - Create Session.
255
+ - Send initial user event.
256
+ - Retrieve session status.
257
+ - List session events with cursor/filter where available.
258
+ - List/download session files.
259
+ - Cleanup/archive/delete resources.
260
+ - Provider metadata naming/tagging for reconciliation.
261
+ - Provider error classification.
262
+
263
+ TDD gate:
264
+
265
+ - Add sanitized recorded provider snapshot unit tests for parsing and cleanup behavior before live e2e.
266
+ - Verify exact Claude events pagination/filter semantics and document bounded fallback if needed.
267
+
268
+ Validation:
269
+
270
+ - No approval-required tool policy reaches the provider.
271
+ - Provider IDs are persisted for cleanup.
272
+ - Sanitized fixtures contain no secrets.
273
+
274
+ ## Phase 9: Provider resource journaling and reconciliation sweeper
275
+
276
+ Close resource leak windows.
277
+
278
+ Deliverables:
279
+
280
+ - Pre-insert intended `provider_resources` rows before provider side effects where possible.
281
+ - Deterministic provider names/metadata with antpath workspace/run/attempt identifiers.
282
+ - Sweeper for expired leases and unfinished intended resources.
283
+ - Orphan matching by provider list/search APIs where available.
284
+ - Reschedule recoverable runs.
285
+ - Cleanup orphaned resources.
286
+
287
+ TDD gate:
288
+
289
+ - Add tests that simulate worker crashes after provider create succeeds but before provider id persistence.
290
+
291
+ Validation:
292
+
293
+ - Sweeper can attach or cleanup recoverable resources.
294
+ - Cleanup remains idempotent after partial failures.
295
+
296
+ ## Phase 10: Output capture and Supabase Storage
297
+
298
+ Capture configured outputs safely.
299
+
300
+ Deliverables:
301
+
302
+ - Output policy from Template/run request.
303
+ - Provider file metadata inspection.
304
+ - Per-file and per-run caps.
305
+ - Streaming download with hard byte cap and abort.
306
+ - Private Supabase Storage upload.
307
+ - Deterministic plus unguessable storage path policy if required.
308
+ - `output_objects` metadata.
309
+ - Workspace storage quota accounting.
310
+ - Per-user attribution from run row.
311
+ - Signed-link BFF action/API.
312
+
313
+ TDD gate:
314
+
315
+ - Add failing tests for quota checks before download, unknown-size streaming caps, storage metadata, signed-link authorization, and cross-workspace denial.
316
+
317
+ Validation:
318
+
319
+ - Oversized outputs do not OOM the worker.
320
+ - BFF only creates signed links for authorized workspace users/tokens.
321
+
322
+ ## Phase 11: Cleanup, deletion, and retention
323
+
324
+ Make cleanup and deletion first-class state machines.
325
+
326
+ Deliverables:
327
+
328
+ - Cleanup ordering:
329
+ - provider credentials/vaults;
330
+ - session files/session where supported;
331
+ - agent/archive;
332
+ - environment/archive/delete;
333
+ - uploaded provider file resources;
334
+ - local storage only on user deletion.
335
+ - Cleanup retry/backoff.
336
+ - `cleanup_attempts` records.
337
+ - Cleanup state separate from user-facing run terminal state.
338
+ - User `pending_delete` flow.
339
+ - Workspace deletion flow.
340
+ - Provider key revocation behavior.
341
+
342
+ TDD gate:
343
+
344
+ - Add tests for cleanup after success, failure, timeout, cancellation, partial provider creation, duplicate cleanup calls, key revocation, and pending-delete races.
345
+
346
+ Validation:
347
+
348
+ - Cleanup failures surface actionable redacted errors.
349
+ - Hard deletion only happens after cleanup/storage deletion succeeds.
350
+
351
+ ## Phase 12: Minimal dashboard
352
+
353
+ Build the tenant-scoped monitoring surface.
354
+
355
+ Deliverables:
356
+
357
+ - Sign-in/out.
358
+ - Workspace switcher.
359
+ - Runs list.
360
+ - Run detail page.
361
+ - Status, timestamps, attributed user, template hash, provider IDs where safe, usage, cleanup state, and redacted metadata events.
362
+ - Output list and signed-link actions.
363
+ - Cancel/delete actions.
364
+ - Quota/cap warnings.
365
+
366
+ TDD gate:
367
+
368
+ - Add component or integration tests for tenant-scoped data loading through BFF only and role/scope behavior for actions.
369
+
370
+ Validation:
371
+
372
+ - Dashboard cannot read another workspace's runs or outputs.
373
+ - Dashboard displays cleanup retry/failure separately from run success/failure.
374
+
375
+ ## Phase 13: Observability and operations
376
+
377
+ Make the platform operable for future agents.
378
+
379
+ Deliverables:
380
+
381
+ - Structured worker logs.
382
+ - Redacted error reporting.
383
+ - Run lifecycle metrics.
384
+ - Worker health endpoint.
385
+ - Queue depth/due run metrics.
386
+ - Cleanup retry/dead-letter visibility.
387
+ - Reconciliation summary logs.
388
+ - Admin-only recovery tools if needed.
389
+
390
+ TDD gate:
391
+
392
+ - Add tests proving logs/events/errors cannot serialize secret wrappers and include enough non-secret identifiers for diagnosis.
393
+
394
+ Validation:
395
+
396
+ - Worker `/health` reports readiness.
397
+ - Operational traces include run id, workspace id, phase, attempt id, provider resource ids where safe, and cleanup status.
398
+
399
+ ## Phase 14: Live-gated e2e and release readiness
400
+
401
+ Verify the complete lifecycle only when credentials are intentionally present.
402
+
403
+ Deliverables:
404
+
405
+ - Live e2e command guarded by explicit env flag.
406
+ - Low-cost Claude Managed Agents fixture Template.
407
+ - Cleanup in `finally`.
408
+ - Release/readiness docs.
409
+ - Updated README and examples for platform SDK usage.
410
+
411
+ TDD gate:
412
+
413
+ - Live e2e is not a TDD driver for core logic, but must prove final integration before release.
414
+
415
+ Validation:
416
+
417
+ - Full submit -> provider session -> metadata poll -> output capture -> signed link -> cleanup works.
418
+ - Default `npm run lint`, `npm test`, and `npm run build` pass.
419
+
420
+ ## Backlog
421
+
422
+ - Provider webhooks as wakeup/reconciliation accelerator.
423
+ - SSE live event stream for richer dashboard UI.
424
+ - Supabase Realtime with explicit Auth.js-to-Supabase authorization design.
425
+ - Agent/Environment caching by Template/config hash.
426
+ - Additional provider adapters.
427
+ - Runtime human approval flow if product scope changes.
428
+ - Advanced billing and plan management.
429
+ - Cloud Template registry.
430
+ - Curated MCP adapter catalog.