npm - antpath - Versions diffs - 0.1.0 → 0.1.1 - Mend

antpath 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

package/README.md +66 -67
package/dist/credentials.js +34 -5
package/dist/credentials.js.map +1 -1
package/dist/files/downloader.js +8 -0
package/dist/files/downloader.js.map +1 -1
package/dist/index.d.ts +5 -1
package/dist/index.js +2 -0
package/dist/index.js.map +1 -1
package/dist/platform/client.d.ts +73 -0
package/dist/platform/client.js +107 -0
package/dist/platform/client.js.map +1 -0
package/dist/platform/index.d.ts +1 -0
package/dist/platform/index.js +2 -0
package/dist/platform/index.js.map +1 -0
package/dist/providers/anthropic/provider.d.ts +6 -0
package/dist/providers/anthropic/provider.js +90 -12
package/dist/providers/anthropic/provider.js.map +1 -1
package/dist/utils/paths.js +9 -3
package/dist/utils/paths.js.map +1 -1
package/docs/cleanup.md +15 -15
package/docs/credentials.md +23 -23
package/docs/mcp.md +18 -18
package/docs/outputs.md +16 -16
package/docs/quickstart.md +13 -13
package/docs/release.md +22 -22
package/docs/skills.md +16 -16
package/docs/templates.md +24 -24
package/docs/testing.md +26 -27
package/examples/mcp-static-bearer.ts +30 -30
package/examples/quickstart.ts +23 -23
package/package.json +46 -51
package/references/architecture-decisions.md +427 -203
package/references/implementation-plan.md +430 -527
package/references/research-sources.md +41 -30
package/references/testing-strategy.md +29 -108

package/references/architecture-decisions.md CHANGED Viewed

@@ -1,203 +1,427 @@
----
-title: antpath architecture decisions
-status: accepted
-scope: sdk-only MVP
----
-# antpath architecture decisions
-## Product framing
-antpath starts as a **TypeScript-first SDK** for launching autonomous agent runs on provider-managed agent infrastructure. The MVP is intentionally not a dashboard, not a cloud control plane, not an agent-loop framework, and not a sandbox provider.
-The SDK gives users a typed way to define reusable, code-first Templates and run them with caller-held provider credentials. The SDK returns a handle for monitoring, waiting, streaming events, downloading files, terminating, and cleaning up provider resources.
-## Non-goals for MVP
-- No dashboard.
-- No antpath-hosted run orchestration service.
-- No stored provider API keys.
-- No stored MCP credentials.
-- No stored output file contents.
-- No local stdio MCP bridge.
-- No arbitrary MCP auth headers.
-- No OpenAI integration in MVP.
-- No runtime human approval flow.
-- No antpath-managed sandbox.
-## Core MVP decisions
-| Area | Decision |
-| --- | --- |
-| Product surface | SDK-only MVP. |
-| Primary language | TypeScript first. |
-| Provider | Claude Managed Agents only. |
-| Future providers | OpenAI is backlog; do not force a lowest-common-denominator abstraction into MVP. |
-| Template model | Code-only, immutable, secret-free Template snapshots. |
-| Template name | Use `Template`, not `Environment`, to avoid collision with provider environment objects. |
-| Provider key handling | User creates a SDK `Client` with the provider key; antpath does not store it. |
-| MCP credential handling | Credentials are supplied at run time in typed objects; antpath creates provider vault credentials per run where needed. |
-| MCP auth scope | Provider-native static bearer and OAuth access token only. |
-| Runtime approvals | No runtime approval/HITL. Tool allow/deny policy is configured before session start. |
-| MCP server scope | User-provided remote HTTPS MCP servers only. |
-| Provider objects | SDK creates Agent, Environment, Vault/Credentials, and Session per run. |
-| Cleanup default | Manual cleanup via `handle.cleanup()` by default. |
-| Orphan risk | Accepted for MVP if the SDK process exits before cleanup. |
-| Run input | One or more queued `user.message` events. |
-| Queue behavior | First message starts the session; subsequent messages are sent whenever the session becomes idle. |
-| Completion | First idle state after all queued messages have been sent. |
-| Error behavior | Abort on any provider session error or terminated state. |
-| Budget guardrails | Timeout and manual terminate only. |
-| Outputs | `downloadOutputs()` downloads all session-scoped files by default. `/antpath/outputs` is a recommended convention, not a provider default. |
-| Skills | Support local skill upload plus inline skill definitions. |
-| Variables | Strict Template variable resolution with escaping for literal placeholders. |
-| Secret boundary | Secrets are allowed only in typed credential inputs, never interpolated as Template variables. |
-| Handle API | `status`, `streamEvents`, `wait`, `listFiles`, `downloadFile`, `downloadOutputs`, `cleanup`, `terminate`, `usage`, `result`. |
-| Development workflow | Test-first development with unit, component integration, recorded API integration, and live e2e tests. |
-## Template contract
-Templates are code-defined, immutable configuration snapshots. A Template may include:
-- model;
-- system prompt;
-- one or more user messages;
-- inline skills;
-- local skill paths to upload;
-- MCP server declarations;
-- MCP tool allow/deny policy;
-- required credential declarations;
-- environment packages and setup commands;
-- network allowlist;
-- output path conventions or file filters;
-- non-secret metadata.
-Templates may use typed variables in:
-- system prompt;
-- user messages;
-- skill content/config;
-- MCP URLs;
-- MCP tool policy;
-- environment packages/setup;
-- network allowlist;
-- output paths;
-- metadata.
-All variables must resolve before provider object creation. Unresolved variables fail fast. Literal placeholder syntax must be escaped explicitly.
-## Credential contract
-Templates declare credential requirements but never contain credential values.
-MVP credential types:
-- `static_bearer`;
-- `oauth_access_token`.
-Out of scope for MVP:
-- arbitrary headers;
-- OAuth refresh handling;
-- persisted credential vault in antpath;
-- antpath-hosted MCP proxy.
-For Claude Managed Agents, the SDK creates a per-run provider vault and credentials, passes the vault ID to session creation, and deletes the provider vault during cleanup. Since the SDK caller owns the provider key, cleanup requires the SDK process or a later caller with the same key.
-## Run lifecycle
-1. Resolve Template variables and validate the resolved Template.
-2. Validate typed credentials against Template credential requirements.
-3. Create provider Environment.
-4. Upload local skill/resources and create inline skill resources as required.
-5. Create provider Agent with tools, skills, MCP servers, and permission policies.
-6. Create per-run provider Vault and Credentials when MCP credentials are supplied.
-7. Create provider Session with Agent, Environment, resources, and vault IDs.
-8. Send the first queued `user.message`.
-9. Stream provider events and update local run state.
-10. If the session becomes idle and queued messages remain, send the next message.
-11. If any provider error or terminated state occurs, abort and return a failed result.
-12. When the session becomes idle and no queued messages remain, mark the run succeeded.
-13. On `downloadOutputs()`, list and download session-scoped files.
-14. On `cleanup()`, delete/cleanup created provider resources according to policy.
-## State model
-MVP state lives in the SDK handle and returned result. There is no cloud persistence requirement.
-Run states:
-- `initializing`;
-- `creating_provider_resources`;
-- `running`;
-- `idle_waiting_to_send_next_message`;
-- `downloading_outputs`;
-- `succeeded`;
-- `failed`;
-- `terminating`;
-- `terminated`;
-- `cleanup_pending`;
-- `cleaned_up`;
-The handle should retain enough non-secret data to support local inspection:
-- provider object IDs;
-- Template version/hash;
-- status timestamps;
-- usage/cost summary when available;
-- error details;
-- cleanup state;
-- downloaded file manifest if files were downloaded locally.
-## Cleanup policy
-Manual cleanup is the default. This preserves post-run provider access and avoids surprising users by deleting files/events before they inspect them.
-The SDK may support configurable cleanup policies:
-- manual;
-- cleanup after successful output download;
-- cleanup on success;
-- cleanup always after terminal state;
-- delete only vault/credentials.
-If the process exits before cleanup, provider resources may remain in the user's provider workspace. The SDK should expose cleanup state and enough provider IDs to support later cleanup when the user recreates a Client with the same provider key.
-## Risks accepted for MVP
-| Risk | Status | Mitigation |
-| --- | --- | --- |
-| Provider objects can be orphaned if the SDK process dies before cleanup. | Accepted | Manual cleanup default; expose cleanup state and provider IDs. |
-| Per-run Environment creation may be slow. | Accepted | Keep design simple for MVP; later cache by Template hash if needed. |
-| Timeout-only budgeting does not cap token/tool spend. | Accepted | BYOK MVP; expose usage summary where provider supports it. |
-| Arbitrary remote MCP servers can exfiltrate context. | Accepted with guardrails | Require explicit MCP declarations and tool allow/deny config; no runtime approvals. |
-| No dashboard/cloud persistence. | Accepted | SDK-only MVP; dashboard is backlog. |
-| OpenAI MCP auth differs from Claude vaults. | Backlog design risk | Add provider-specific auth adapters later; avoid pretending uniform support exists now. |
-## Verification contract
-Every feature starts with the narrowest failing test that describes the intended behavior. Implementation follows only after the test shape is clear.
-Required test layers:
-1. Unit tests for pure functions and small state transitions.
-2. Component integration tests across SDK modules using fake providers and local fixtures.
-3. Recorded API integration tests using persisted, sanitized responses captured from real provider API calls and reproducible recording scripts.
-4. Live e2e tests gated behind local credentials in `.env.local`.
-Secrets must never appear in committed fixtures, logs, snapshots, manifests, errors, or reference docs.
-## Backlog
-- Dashboard and cloud metadata sync.
-- Cloud Template registry.
-- OpenAI Responses/Agents adapter.
-- Encrypted run-scoped keys for guaranteed cleanup.
-- Cost/token/iteration caps.
-- OAuth refresh credential support.
-- Arbitrary MCP headers through an antpath MCP proxy.
-- Curated MCP adapter catalog.
-- Provider Environment caching by Template hash.
-- Artifact retention service.
-- Webhook-based monitoring.
-- Team/workspace/multi-tenant permissions.
+---
+title: antpath architecture decisions
+status: accepted
+scope: platform MVP
+---
+# antpath architecture decisions
+## Product framing
+antpath is a TypeScript-first platform plus SDK for running autonomous sessions on provider-managed agent infrastructure. Claude Managed Agents remains the execution runtime. antpath provides the durable control plane around it: submit jobs, dispatch provider sessions, track tenant-scoped metadata, observe lifecycle state, capture configured outputs, enforce quotas, and retain or clean up provider resources according to run policy.
+The platform is intentionally not a custom agent loop or sandbox runtime. The worker dispatches, observes, stores metadata, captures outputs, and applies retention/cleanup policy. It does not execute tools, approve tool calls, or participate in provider-side reasoning.
+This supersedes the earlier SDK-only MVP boundary.
+## Goals
+- Stable start-to-finish lifecycle for provider-managed agent runs.
+- Minimal tenant-scoped dashboard metadata for monitoring run execution.
+- Durable worker recovery after restarts, deploys, crashes, or missed notifications.
+- Horizontal worker scaling without duplicate run ownership.
+- Provider cleanup by default after terminal states, with optional retention when configured per-request or per-deployment.
+- BYO provider key custody with encrypted storage.
+- Private output capture with quota enforcement.
+- Programmatic SDK access for submitting and observing platform runs.
+- Test-driven implementation across SDK, dashboard, worker, database, storage, and provider adapters.
+## Non-goals for platform MVP
+- No custom agent loop.
+- No antpath-managed sandbox.
+- No runtime human approval/tool approval flow.
+- No direct browser access to provider APIs.
+- No direct browser Supabase data access.
+- No raw prompt, raw model output, raw provider event payload, MCP credential, provider key, or output file content in normal application tables.
+- No provider Agent/Environment caching in MVP.
+- No provider webhooks in MVP.
+- No Supabase Realtime in MVP.
+- No OpenAI or other provider integration in MVP.
+## Core decisions
+| Area | Decision |
+| --- | --- |
+| Product surface | Platform plus SDK. |
+| Repository | Convert the repository to an npm TypeScript workspace. |
+| SDK location | Move the existing SDK package to `packages/sdk`. |
+| SDK role | The SDK submits runs and observes status, metadata, outputs, cancellation, and deletion through the platform API. |
+| Dashboard | Build a first-party minimal dashboard. |
+| Worker | Run a Railway persistent service, designed as ephemeral and horizontally scalable. |
+| Database | Use Supabase Postgres as the source of truth. |
+| Storage | Use private Supabase Storage buckets for captured outputs. |
+| Auth | Use Auth.js for user authentication, not Supabase Auth. |
+| Tenant model | Workspace is the tenant boundary. |
+| Membership | Users access workspaces through `workspace_memberships`. |
+| Authorization boundary | Vercel BFF/server actions validate Auth.js sessions or SDK API tokens and scope every DB operation by workspace membership/scope. |
+| Browser data access | Browser code talks to the BFF/server actions, not directly to Supabase data APIs. |
+| Service credentials | Supabase service-role credentials are server/worker only. |
+| SDK auth | SDK uses hashed, workspace-scoped API tokens. Dashboard uses Auth.js sessions. |
+| API token attribution | Token-authenticated runs are attributed to the token creator at submission time unless a future service-account model is introduced. |
+| Provider key custody | Workspace BYO Anthropic key is stored encrypted through Supabase Vault. |
+| Secret lifetime | Worker resolves provider keys per claimed lifecycle step, keeps them only in memory for that step, and drops them before lease release. |
+| Provider resources | Create provider Agent, Environment, Vault/Credential, Session, and file resources per run in MVP. |
+| Provider resource caching | Backlog; later cache Agent/Environment by Template/config hash. |
+| Worker wakeup | Poll due rows from the runs table. Postgres `NOTIFY` is a latency optimization only. |
+| Worker ownership | Use DB leases, `FOR UPDATE SKIP LOCKED`, lease tokens, and expiry for horizontally scaled workers. |
+| Provider observation | Poll provider session status and list provider events since the last cursor/timestamp where available. |
+| Event dedupe | Provider event id is the dedupe authority; cursor/timestamp is advisory. |
+| Webhooks | Backlog only; if added, use as wakeup/reconciliation signals, not sole source of truth. |
+| SSE | Not the primary monitoring mechanism in MVP. |
+| MCP approvals | Disallow approval-required tools. Worker must not approve tools or return custom tool results. |
+| Template boundary | Templates are code-first, secret-free snapshots with stable hashes. |
+| Idempotency | Run submission idempotency is scoped by `(workspace_id, idempotency_key)` and request hash. Same key and same hash returns the existing run; same key and different hash returns conflict. |
+| Quotas | Enforce workspace plan quotas with per-user attribution. |
+| Run caps | Plan-based caps cover duration, concurrency, storage, polling, retries, and output capture. |
+| Operational config | Exact cap/tier values are environment-configurable defaults with conservative fallbacks and run-level snapshots. |
+| Output capture | Enabled by default when configured by the run/template and within quota. |
+| Output access | BFF returns signed links only after workspace authorization. |
+| Cleanup | Clean up Claude provider resources by default after terminal state and output capture; explicit run policy or worker default can retain them for inspection. |
+| Delete semantics | User delete is soft/pending while execution, cleanup, or storage deletion is active. Hard purge happens only after cleanup/storage deletion succeeds. |
+| Retention | Run metadata and stored outputs remain until user deletion in MVP. |
+| Realtime | Defer from MVP; use BFF-mediated refresh/polling first. Revisit with custom Supabase JWT/RLS or a server-mediated realtime bridge. |
+| Development workflow | Use test-driven development: write or update the failing test at the narrowest useful layer before implementation. |
+## Workspace layout
+Target structure:
+```text
+apps/
+  dashboard/
+    app/
+    server/
+    components/
+    auth/
+    db/
+  worker/
+    src/
+      main.ts
+      polling/
+      providers/
+      lifecycle/
+      cleanup/
+      storage/
+      observability/
+packages/
+  sdk/
+    src/
+  shared/
+    src/
+      types/
+      status/
+      redaction/
+      templates/
+  db/
+    migrations/
+    schema/
+    queries/
+```
+`packages/shared` and `packages/db` may start small and grow only when invariants or schema/query helpers are shared by dashboard, worker, and SDK.
+## Core data model
+### Identity and tenancy
+- `users`
+  - Auth.js user identity mirror or Auth.js adapter user table reference.
+  - No provider secrets.
+- `workspaces`
+  - Tenant boundary.
+  - Plan, quota, status, and retention settings.
+- `workspace_memberships`
+  - Workspace, user, role, and membership status.
+  - Every dashboard/API operation must prove membership/scope.
+- `api_tokens`
+  - Workspace-scoped SDK credentials.
+  - Store hashed token material only, plus scopes, creator, last-used timestamp, and revoked timestamp.
+  - API-token submissions freeze the attributed user on the run row at creation time.
+### Provider connections
+- `provider_connections`
+  - Workspace-owned provider configuration.
+  - Provider type, display name, validation status, rotation/revocation status, and encrypted secret reference.
+  - Secret values live in Supabase Vault.
+  - Only trusted server/worker code can resolve decrypted provider keys.
+### Runs
+- `runs`
+  - User-visible logical run.
+  - Workspace, creator/attributed user, template snapshot/hash, status, lifecycle phase, plan caps, timestamps.
+  - Submission idempotency key and request hash.
+  - Unique constraint: `(workspace_id, idempotency_key)`.
+  - Lease fields: `lease_owner`, `lease_token`, `lease_expires_at`, `attempt_count`, `next_check_at`, `priority`.
+  - Cancellation/deletion fields: `cancel_requested_at`, `pending_delete_at`, `deleted_at`.
+  - Provider observation cursor/watermark where available.
+- `run_attempts`
+  - A logical run may have multiple provider attempts.
+  - Tracks provider session id, attempt state, start/end timestamps, and error classification.
+- `provider_resources`
+  - Journal of intended and created provider resources.
+  - Resource type, local idempotency key, deterministic provider name/metadata tags, provider id when known, cleanup status.
+  - Insert an intended row before provider side effects whenever possible.
+- `run_events`
+  - Metadata only.
+  - Provider event id/type, processed timestamp, redacted summary fields, usage metadata.
+  - Unique constraint on `(run_attempt_id, provider_event_id)`.
+  - No raw prompts, raw outputs, tool inputs/results, file contents, or credentials.
+### Outputs and cleanup
+- `output_objects`
+  - Workspace/run owner, storage bucket/path, size, checksum if available, content type, provider file id.
+  - Used for quota accounting and signed-link generation.
+- `cleanup_attempts`
+  - Per-resource cleanup action, status, retry count, redacted error code/message, timestamps.
+- `usage_ledger`
+  - Token/cost/storage attribution by workspace and attributed user.
+  - Written transactionally with source event/output rows to avoid drift.
+## Run lifecycle
+1. SDK/dashboard submits a run to the BFF.
+2. BFF validates Auth.js session or SDK API token.
+3. BFF resolves active workspace membership/scope.
+4. BFF validates Template/request shape, plan limits, and idempotency key/request hash.
+5. BFF writes a `runs` row with execution-affecting cap values snapshotted from plan/env defaults.
+6. BFF emits Postgres `NOTIFY` for fast wakeup.
+7. Worker claims due runs with row locking, `SKIP LOCKED`, lease token, and lease expiry.
+8. Worker resolves the provider key from Supabase Vault for the claimed step.
+9. Worker validates platform constraints: no approval-required tools, no custom antpath-executed tools, known output policy.
+10. Worker pre-journals intended provider resources.
+11. Worker creates provider resources and records provider IDs immediately after successful calls.
+12. Worker creates provider session and sends the initial user event.
+13. Worker schedules provider polling through `next_check_at`.
+14. Worker later reclaims the run and polls provider session status plus event list.
+15. Worker stores only metadata events and usage, deduped by provider event id.
+16. Before every side effect, worker checks lease token and cancellation/deletion requests.
+17. On terminal provider state or antpath timeout, worker captures configured outputs within per-file, per-run, and workspace quota caps.
+18. Worker cleans up provider resources by default, or records them as retained when the run/deployment policy asks to retain them.
+19. Worker marks the run terminal and releases the lease.
+20. Dashboard/SDK read tenant-scoped metadata and signed output links through BFF APIs.
+## Worker concurrency and recovery
+The database is the coordination primitive. Worker instances are stateless and can be added or removed without losing correctness.
+Claiming rules:
+- Query due rows ordered by priority, fairness across workspaces, and `next_check_at`.
+- Use `FOR UPDATE SKIP LOCKED`.
+- Set `lease_owner`, `lease_token`, `lease_expires_at`, and attempt counters.
+- Commit immediately.
+- Execute one bounded lifecycle step outside long transactions.
+- Persist result and either schedule next step or mark terminal.
+Safety rules:
+- Every status-mutating update includes `WHERE lease_token = $token AND lease_expires_at > now()` and verifies exactly one affected row.
+- Every side-effecting step checks `cancel_requested_at` and `pending_delete_at`.
+- Expired leases are reclaimable by any worker.
+- `next_check_at` includes jitter.
+- Claiming enforces per-workspace active-run caps and provider-key-scoped rate limits.
+- Workers handle SIGTERM by stopping new claims and relying on bounded steps, idempotency, leases, and reconciliation for in-flight work.
+- Polling is always enabled; `NOTIFY` only reduces latency.
+## Provider observation
+MVP source of truth:
+- Provider session retrieve for status.
+- Provider session events list for event metadata and usage.
+Rules:
+- Cursor/timestamp filters are used when available.
+- Provider event id is always used for dedupe.
+- Phase 5 must verify Claude Managed Agents event pagination/filter semantics.
+- If no stable cursor or since filter exists, use a bounded re-list strategy with event-id dedupe instead of unbounded full-history scans.
+Excluded from MVP:
+- SSE as primary monitoring.
+- Anthropic webhooks as primary monitoring.
+## Auth and dashboard security
+Auth.js handles interactive user sign-in. SDK API tokens handle programmatic access.
+BFF/server actions must:
+- validate Auth.js session or API token;
+- resolve user identity and active workspace;
+- check workspace membership or token scope;
+- scope every query/mutation by workspace id;
+- return only metadata allowed by the user's role/scope;
+- keep Supabase service-role credentials out of browser bundles.
+Supabase Realtime is backlog until antpath either mints short-lived Supabase-compatible JWTs with RLS policies or exposes a server-mediated realtime bridge.
+## Secret handling
+- Provider keys are workspace BYOK and stored via Supabase Vault.
+- MCP credentials follow the same no-log/no-table-secret rule.
+- Normal app tables store only secret references and validation/rotation status.
+- Worker resolves decrypted provider keys only for a claimed lifecycle step.
+- Secret values should use explicit redacted wrappers in code so they cannot serialize into logs, metrics, errors, events, or fixtures.
+- Secret redaction applies to logs, errors, run metadata, provider events, tests, fixtures, and docs.
+- If a provider key is revoked/rotated mid-run, active runs are failed or cancelled with a tenant-permanent error unless cleanup can still authenticate.
+- Revoked keys do not get a hidden grace cache in MVP. If cleanup cannot authenticate after revocation, the run/resource moves to `cleanup_failed` with an actionable tenant error.
+## Output storage and quotas
+Output flow:
+1. Worker lists provider session files.
+2. Worker checks provider file metadata, output policy, and remaining quota.
+3. Worker downloads selected files with streaming hard caps.
+4. If provider size is unknown or exceeds caps, worker aborts capture for that object and records a quota/cap warning.
+5. Worker uploads accepted files to private Supabase Storage.
+6. Worker records `output_objects` rows.
+7. Worker cleans up provider resources by default, or retains them when explicitly configured.
+8. Dashboard/SDK request signed links through the BFF.
+Quota rules:
+- Workspace plan quota is the hard enforcement boundary.
+- Per-file and per-run caps protect the worker before workspace quota is consumed.
+- Per-user attribution is frozen from the run row at submission time.
+- Free first X users is a plan/billing rule on top of workspace-level enforcement.
+## Cleanup and reconciliation
+Claude provider-resource cleanup is mandatory by default after terminal provider state and output capture so antpath does not leave behind provider state. Retention is opt-in and policy-driven through `cleanup.claudeSession = "retain"` or `ANTPATH_WORKER_CLAUDE_SESSION_CLEANUP_DEFAULT=retain`. Retained resources are recorded in `provider_resources.cleanup_status = retained` and remain reachable via the provider API.
+Explicit cleanup order:
+1. Provider credentials/vaults.
+2. Session files/session where supported.
+3. Agent/archive.
+4. Environment/archive or delete where allowed.
+5. Uploaded provider file resources.
+6. Local output metadata/storage only when user deletes a run.
+Cleanup properties:
+- Retried independently from run execution.
+- Idempotent where provider APIs allow.
+- Redacted errors recorded in `cleanup_attempts`.
+- Runs can be terminal while cleanup remains pending or retryable-failed.
+- User deletion sets `pending_delete_at` while cleanup/storage deletion is active.
+Resource leak recovery:
+- Every provider create starts from a local intended row with deterministic provider name/metadata where provider APIs allow.
+- A reconciliation sweeper reviews unfinished intended rows, expired leases, and provider-listable resources tagged with antpath metadata.
+- If provider create succeeded but provider id was not persisted before a crash, the sweeper attempts to match by deterministic name/metadata and attach the provider id for cleanup.
+Workspace/key deletion:
+- Workspace deletion blocks new runs, requests cancellation for active runs, drains cleanup, deletes stored outputs, then purges metadata.
+- Provider key revocation blocks new runs immediately and may force existing runs to fail/cancel if cleanup can no longer authenticate.
+## Run state model
+Suggested run statuses:
+- `queued`
+- `claiming`
+- `provisioning`
+- `session_created`
+- `dispatched`
+- `provider_running`
+- `provider_idle`
+- `provider_rescheduled`
+- `cancelling`
+- `capturing_outputs`
+- `cleaning_up`
+- `succeeded`
+- `failed`
+- `timed_out`
+- `cancelled`
+- `cleanup_failed`
+- `pending_delete`
+- `deleted`
+Terminal user-facing run status is separate from cleanup state so a successful provider run can still show cleanup retry warnings.
+Error classes:
+- `transient_provider`: retry with backoff and jitter.
+- `provider_permanent`: fail attempt/run and cleanup.
+- `tenant_permanent`: invalid key, quota exceeded, invalid Template, approval-required event observed.
+- `antpath_bug`: fail safely, alert, and preserve cleanup work.
+- `cancelled_by_user`: cleanup and mark cancelled.
+## Deployment baseline
+- Dashboard: Vercel.
+- Worker: Railway persistent service with configurable replicas.
+- Database: Supabase Postgres.
+- Storage: Supabase Storage private bucket.
+- Secrets: Vercel/Railway environment variables plus Supabase Vault for tenant provider keys.
+Required runtime config includes:
+- database URL;
+- Supabase service credentials;
+- Supabase Storage bucket;
+- Supabase Vault access function/schema;
+- Auth.js secret/providers;
+- SDK token hashing secret/pepper;
+- worker identity;
+- provider API base/version settings.
+Environment-configurable defaults include:
+- max run duration;
+- max active runs per workspace;
+- max active runs per user/token;
+- polling base interval;
+- polling max interval;
+- polling jitter;
+- provider create/delete/poll token-bucket limits;
+- provider retry backoff;
+- lease duration;
+- lease renewal threshold;
+- max provider attempts;
+- cleanup retry count;
+- cleanup retry backoff;
+- per-file output cap;
+- per-run output cap;
+- workspace storage cap;
+- signed URL TTL;
+- free user allowance;
+- metadata retention toggles.
+Missing optional env vars must fall back to conservative low limits. Missing required secret/connectivity env vars must fail service startup.
+## Open implementation details
+These are not architecture blockers, but must be pinned during implementation planning:
+- Exact plan tier values for duration, concurrency, storage, polling, and free user allowance.
+- Exact Auth.js providers.
+- Auth.js adapter/session mode and user mirror lifecycle.
+- Supabase Vault access method and SQL privilege wrapper shape.
+- Signed URL TTL and whether storage paths include random unguessable components in addition to workspace/run ids.
+- Deletion audit requirements.
+- Provider metadata naming convention.
+- Provider list/search capabilities and reconciliation coverage for each resource type.
+- Exact Claude Managed Agents event pagination/filter semantics and bounded polling fallback.
+## Backlog
+- Provider webhooks as wakeup/reconciliation accelerator.
+- SSE live event stream for richer UI.
+- Supabase Realtime with explicit Auth.js-to-Supabase authorization design.
+- Agent/Environment caching by Template/config hash.
+- More providers.
+- Runtime human approval flow if product scope changes.
+- Advanced billing and plan management.
+- Cloud Template registry.
+- Curated MCP adapter catalog.