npm - antpath - Versions diffs - 0.3.1 → 0.4.1 - Mend

antpath 0.3.1 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

package/README.md +13 -14
package/dist/_shared/blueprint.d.ts +263 -0
package/dist/_shared/blueprint.js +505 -0
package/dist/_shared/http.d.ts +6 -1
package/dist/_shared/http.js +10 -5
package/dist/_shared/index.d.ts +1 -0
package/dist/_shared/index.js +1 -0
package/dist/_shared/operations.d.ts +32 -9
package/dist/_shared/operations.js +73 -12
package/dist/_shared/runtime-types.d.ts +30 -0
package/dist/_shared/stable.d.ts +14 -0
package/dist/_shared/stable.js +14 -0
package/dist/_shared/submission.d.ts +55 -0
package/dist/_shared/submission.js +135 -1
package/dist/cli.mjs +114 -58
package/dist/cli.mjs.sha256 +1 -1
package/dist/client.d.ts +13 -6
package/dist/client.js +17 -16
package/dist/client.js.map +1 -1
package/docs/credentials.md +1 -3
package/docs/quickstart.md +4 -7
package/docs/release.md +57 -12
package/examples/mcp-static-bearer.ts +1 -3
package/examples/quickstart.ts +1 -3
package/package.json +2 -3
package/references/architecture-decisions.md +0 -473
package/references/implementation-plan.md +0 -452
package/references/research-sources.md +0 -41
package/references/testing-strategy.md +0 -29

package/references/implementation-plan.md DELETED Viewed

@@ -1,452 +0,0 @@
----
-title: antpath implementation plan
-status: accepted
-scope: platform MVP
----
-# antpath implementation plan
-## Goal
-Convert antpath from an SDK-only package into a pnpm TypeScript workspace containing:
-- a platform SDK in `packages/sdk`;
-- a dashboard app in `apps/dashboard`;
-- a worker service in `apps/worker`;
-- shared packages for types, schema, configuration, redaction, and database helpers as needed.
-The platform must submit, dispatch, observe, store metadata for, capture outputs from, and clean up Claude Managed Agents runs while preserving tenant isolation and secret boundaries.
-Implementation is test-driven. Each behavior starts with the narrowest failing test that proves the desired invariant.
-## Acceptance criteria
-- The repository is a pnpm workspace and the current SDK is moved to `packages/sdk`.
-- Dashboard and worker app folders exist with build/test integration.
-- Auth.js authenticates dashboard users.
-- SDK API tokens authenticate programmatic clients.
-- Workspaces are the tenant boundary.
-- BFF/server actions scope every dashboard/API operation by workspace membership or API-token scope.
-- Supabase service-role credentials are never exposed to browser/client bundles.
-- Supabase Postgres stores durable run, attempt, provider resource, event, output, output-capture-failure, cleanup, usage, workspace, and membership metadata. There is no persistent `provider_connections` table in MVP.
-- Per-run secrets bundles (Anthropic key, optional MCP credentials, optional skill references) arrive inline on submission, are encrypted through Supabase Vault for the lifetime of that single run, and are deleted at terminal cleanup. The Vault entry is the only durable trace of the user's provider key.
-- Run submission idempotency is enforced by workspace and request hash. The hash excludes the `secrets` block so re-submitting the same logical run with a new key still matches.
-- Workers claim due runs with leases and `FOR UPDATE SKIP LOCKED`.
-- Multiple workers can run concurrently without duplicate lifecycle ownership.
-- Worker polling recovers from missed `NOTIFY`, worker restart, deploy, and expired leases.
-- Worker creates per-run provider resources and journals them for cleanup.
-- Worker polls provider status and events and stores only redacted metadata.
-- Output capture is unconditional: every artifact Claude exposes on the run's session is written to private Supabase Storage, bounded only by the workspace storage cap. Files that would exceed the cap are persisted as `output_capture_failures` rows.
-- BFF returns signed output links only after workspace authorization.
-- Cleanup runs after terminal provider state and output capture, deletes the per-run Vault entry, and (by default) deletes Claude-side resources. `cleanup.claudeSession: "retain"` only affects Claude-side resources.
-- Reconciliation can recover intended provider resources after partial worker crashes.
-- User deletion is pending/soft until cleanup and storage deletion are complete.
-- Exact tier/cap values are configurable through environment variables with conservative defaults and run-level snapshots.
-- Security-sensitive BFF/API actions are audited and rate-limited.
-- Workspace storage quota and workspace deletion are enforced before provider/storage side effects proceed.
-- CI runs pnpm lint/test/build/package checks plus a local Supabase integration job.
-- Tests follow the accepted taxonomy: deterministic unit tests (including fakes and sanitized recorded snapshots), live external-system integration tests with no skip flags, and top-to-bottom live e2e tests.
-## Phase 1: Workspace foundation
-Create the workspace structure without changing runtime behavior.
-Deliverables:
-- Root pnpm workspace configuration.
-- Current SDK moved to `packages/sdk`.
-- `apps/dashboard` placeholder.
-- `apps/worker` placeholder.
-- Shared TypeScript config/build/test commands.
-- Existing SDK exports preserved or intentionally migrated with compatibility notes.
-- Root validation commands:
-  - `pnpm lint`
-  - `pnpm test`
-  - `pnpm build`
-TDD gate:
-- Add tests or CI command checks proving the moved SDK still builds/tests from the root workspace.
-Validation:
-- Clean install works from root.
-- Existing SDK tests still pass from root and package scope.
-- Build emits SDK declarations.
-## Phase 2: Shared contracts and configuration
-Define shared platform contracts before implementing dashboard/worker behavior.
-Deliverables:
-- Shared run status and cleanup status types.
-- Shared error taxonomy.
-- Shared redaction helpers and secret wrapper.
-- Shared Template/platform submission request schemas.
-- Environment config parser with required-secret validation and conservative defaults.
-- Plan/cap config defaults for:
-  - max run duration;
-  - workspace/user/token concurrency;
-  - polling intervals and jitter;
-  - provider token-bucket rates;
-  - retry backoffs;
-  - lease duration/renewal threshold;
-  - max attempts;
-  - cleanup retries;
-  - output caps;
-  - storage caps;
-  - signed-link TTL;
-  - free user allowance.
-TDD gate:
-- Unit tests for env parsing, missing required env failure, optional fallback defaults, cap snapshot values, redaction, and status transitions.
-Validation:
-- Missing required secrets/connectivity config fails service startup.
-- Missing optional config falls back to conservative low limits.
-## Phase 3: Database foundation
-Create the durable source of truth.
-Deliverables:
-- Migration framework.
-- Tables:
-  - Auth.js adapter tables: `users`, `accounts`, `sessions`, `verification_token`;
-  - antpath `app_users`;
-  - `workspaces`;
-  - `workspace_memberships`;
-  - `api_tokens`;
-  - `runs` (includes `execution_secret_id` referencing the per-run Vault entry);
-  - `run_attempts`;
-  - `provider_resources`;
-  - `run_events`;
-  - `output_objects`;
-  - `output_capture_failures`;
-  - `cleanup_attempts`;
-  - `usage_ledger`.
-- Constraints:
-  - unique `(workspace_id, idempotency_key)` on runs;
-  - request-hash conflict handling;
-  - unique `(run_attempt_id, provider_event_id)` on events;
-  - foreign keys for workspace and attributed `app_user_id` where practical;
-  - RLS enabled and `anon`/`authenticated` direct table access revoked for platform/Auth.js tables.
-- DB query helpers for tenant-scoped access.
-TDD gate:
-- Add failing database/security integration tests for migrations, idempotency, lease claim behavior, event dedupe, usage ledger idempotency, and cross-workspace access denial.
-Validation:
-- Migrations apply from a clean database.
-- Concurrent claims use `FOR UPDATE SKIP LOCKED`.
-- Event and usage ledger writes are transactional.
-## Phase 4: Auth, BFF, and SDK API-token access
-Establish the authorization boundary.
-Deliverables:
-- Auth.js dashboard authentication.
-- User mirror or Auth.js adapter integration.
-- Workspace membership resolution.
-- Workspace switch/active workspace model.
-- Hashed SDK API tokens with scopes, creator, revocation, and last-used tracking.
-- Shared authorization helper for Auth.js sessions and API tokens.
-- BFF/server-action routes for run submission/read/update operations.
-- Browser/client bundle boundary preventing service-role import.
-TDD gate:
-- Add failing tests for membership-scoped queries, cross-workspace denial, API-token scope enforcement, revoked token rejection, attributed user freezing, and no browser service-role imports.
-Validation:
-- Dashboard reads and mutates only authorized workspace data.
-- SDK token cannot access another workspace.
-- Service-role credentials are server/worker only.
-## Phase 5: Platform SDK client
-Turn the SDK into the programmatic client for the platform while preserving Template ergonomics where practical.
-Deliverables:
-- SDK client for platform API base URL and API token.
-- `AntpathClient` accepts `{ baseUrl, apiToken, workspaceId }` at construction; every submission carries its own inline `secrets` bundle (Anthropic key + optional MCP credentials + optional skill references). No client-held default bundle — secrets are explicit at every call site.
-- Submit run API (carries inline `secrets`).
-- Get run status/detail API.
-- List metadata events API.
-- List outputs API (returns successful captures and capture failures).
-- Create signed output link API.
-- Cancel run API.
-- Delete run API.
-- Typed errors for auth, quota, validation, conflict, not found, and provider/platform failures.
-- Compatibility path from existing Template definitions to platform submission requests.
-TDD gate:
-- Type/contract tests for public SDK API and runtime tests with fake platform responses.
-- Tests asserting `secrets` never appears in serialized request hashes, error metadata, or retry telemetry.
-Validation:
-- SDK does not persist provider keys: it forwards them to the platform per submission, where they are vaulted for the lifetime of one run and then deleted.
-- SDK handles idempotency conflict, unauthorized, quota, and terminal states deterministically.
-## Phase 6: Worker claim loop and state machine
-Implement durable lifecycle ownership independent of provider details.
-Deliverables:
-- Polling loop for due runs.
-- Optional Postgres `NOTIFY` listener for fast wakeup.
-- Lease claim/release helpers.
-- Lease-guarded status update helper.
-- Per-workspace rate limit hooks.
-- Fair due-run ordering across workspaces.
-- Cancellation/delete request checks before side effects.
-- Timeout handling.
-- Error classification.
-- Retry/backoff scheduling through `next_check_at`.
-TDD gate:
-- Add failing component and database tests for concurrent fake workers, expired lease reclaim, lease-token update failures, cancellation races, timeout races, and polling fallback after missed `NOTIFY`.
-Validation:
-- Multiple workers do not process the same run step.
-- Expired leases recover.
-- Worker restart leaves no required in-memory state.
-## Phase 7: Fake provider lifecycle harness
-Prove worker behavior with deterministic provider boundaries before live provider integration.
-Deliverables:
-- Fake provider implementing create resources, create session, send event, retrieve status, list events, list files, download file, and cleanup.
-- Fake storage adapter.
-- Fake Vault adapter.
-- Table-driven lifecycle tests.
-TDD gate:
-- Add component tests for happy path, provider errors, terminal states, duplicate events, output capture, cleanup failures, and retryable failures.
-Validation:
-- Core run lifecycle is correct without network calls.
-- Duplicate/replayed provider events do not double-count usage.
-## Phase 8: Claude provider adapter
-Adapt existing Claude Managed Agents provider code for the platform worker.
-Deliverables:
-- Provider client wrapper for worker.
-- Create Environment per run.
-- Upload skills/resources as needed.
-- Create Agent with model/system/MCP/skills/tool policy.
-- Create provider Vault/Credentials for MCP credentials.
-- Create Session.
-- Send initial user event.
-- Retrieve session status.
-- List session events with cursor/filter where available.
-- List/download session files.
-- Cleanup/archive/delete resources.
-- Provider metadata naming/tagging for reconciliation.
-- Provider error classification.
-TDD gate:
-- Add sanitized recorded provider snapshot unit tests for parsing and cleanup behavior before live e2e.
-- Verify exact Claude events pagination/filter semantics and document bounded fallback if needed.
-Validation:
-- No approval-required tool policy reaches the provider.
-- Provider IDs are persisted for cleanup.
-- Sanitized fixtures contain no secrets.
-## Phase 9: Provider resource journaling and reconciliation sweeper
-Close resource leak windows.
-Deliverables:
-- Pre-insert intended `provider_resources` rows before provider side effects where possible.
-- Deterministic provider names/metadata with antpath workspace/run/attempt identifiers.
-- Sweeper for expired leases and unfinished intended resources.
-- Orphan matching by provider list/search APIs where available.
-- Reschedule recoverable runs.
-- Cleanup orphaned resources.
-TDD gate:
-- Add tests that simulate worker crashes after provider create succeeds but before provider id persistence.
-Validation:
-- Sweeper can attach or cleanup recoverable resources.
-- Cleanup remains idempotent after partial failures.
-## Phase 10: Output capture and Supabase Storage
-Unconditionally capture every artifact the user's Claude session exposes into private Supabase Storage. The workspace storage cap is the only user-visible quota.
-Deliverables:
-- Worker terminal-state list of session-scoped files via the Claude Files API (`GET /v1/files?scope_id=<sessionId>`), returning `id`, `filename`, and `size_bytes`.
-- Workspace storage usage accounting helper.
-- Pre-download quota check: if `(workspace_used + size_bytes) > workspace cap`, write an `output_capture_failures` row with `reason = "workspace_quota_exceeded"` and continue without downloading.
-- Streaming download via `GET /v1/files/{file_id}/content` with a worker-internal safety cap so malformed listing responses cannot OOM the worker; safety-cap aborts are recorded as `output_capture_failures` with `reason = "download_failed"`.
-- Private Supabase Storage upload with workspace-scoped path policy.
-- `output_objects` metadata insert for each successful capture.
-- Per-user attribution frozen from the run row (audit/billing dimension only; not a quota).
-- Signed-link BFF action/API for both `output_objects` rows; capture failures are read-only.
-TDD gate:
-- Add failing tests for: pre-download quota check choosing failure-row over download when over cap; safety-cap abort recording a failure row; happy path writing both an `output_objects` row and an audit entry; signed-link authorization; cross-workspace denial; listing returning zero files producing zero rows; listing returning N files with mixed sizes recording the expected mix of successes and failures.
-- Tests must assert there is no user-facing knob for `capture: boolean`, `globs`, or per-file caps.
-Validation:
-- Output capture is unconditional and cannot be opted out by the submitter.
-- Oversized output payloads do not OOM the worker.
-- BFF only creates signed links for authorized workspace users/tokens.
-- Dashboard renders successful captures and capture failures side by side.
-## Phase 11: Cleanup, deletion, and retention
-Make cleanup and deletion first-class state machines. Cleanup must also destroy the per-run Vault secret.
-Deliverables:
-- Cleanup ordering (per the architecture decisions doc):
-  - capture session-scoped files (Phase 10);
-  - provider session files / session where supported;
-  - agent / archive;
-  - environment / archive or delete;
-  - skills uploaded for this session and other ephemeral provider resources;
-  - provider Vault/credentials created on Claude's side for MCP wiring;
-  - the antpath per-run Vault entry: `vault.deleteSecret(runs.execution_secret_id)` and clear the column;
-  - local Supabase Storage and metadata only on user deletion.
-- The Claude-side cleanup steps are skipped when `cleanup.claudeSession === "retain"` (or the worker default is `retain`); the per-run Vault deletion still runs.
-- Cleanup retry/backoff with idempotent calls.
-- `cleanup_attempts` records.
-- Cleanup state separate from user-facing run terminal state.
-- User `pending_delete` flow.
-- Workspace deletion flow.
-- Audit logs and rate limits for delete/cancel/signed-link/API-token mutations.
-TDD gate:
-- Add tests for cleanup after success, failure, timeout, cancellation, partial provider creation, duplicate cleanup calls, and pending-delete races.
-- Add tests that the per-run Vault entry is deleted exactly once and `runs.execution_secret_id` is cleared whether or not Claude-side resources are retained.
-- Add tests that a key rotated on the provider side mid-run produces a `tenant_permanent` failure on the next provider call and that cleanup still runs (and the Vault entry is still deleted).
-Validation:
-- Cleanup failures surface actionable redacted errors.
-- Hard deletion only happens after cleanup/storage deletion succeeds.
-- The per-run Vault entry is always destroyed at terminal cleanup, regardless of retention knobs.
-## Phase 12: Minimal dashboard
-Build the tenant-scoped monitoring surface.
-Deliverables:
-- Sign-in/out.
-- Workspace switcher.
-- Runs list.
-- Run detail page.
-- Status, timestamps, attributed user, template hash, provider IDs where safe, usage, cleanup state, and redacted metadata events.
-- Output list and signed-link actions.
-- Cancel/delete actions.
-- Quota/cap warnings.
-TDD gate:
-- Add component or integration tests for tenant-scoped data loading through BFF only and role/scope behavior for actions.
-Validation:
-- Dashboard cannot read another workspace's runs or outputs.
-- Dashboard displays cleanup retry/failure separately from run success/failure.
-## Phase 13: Observability and operations
-Make the platform operable for future agents.
-Deliverables:
-- Structured worker logs.
-- Redacted error reporting.
-- Run lifecycle metrics.
-- Worker health endpoint.
-- Queue depth/due run metrics.
-- Cleanup retry/dead-letter visibility.
-- Reconciliation summary logs.
-- Admin-only recovery tools if needed.
-TDD gate:
-- Add tests proving logs/events/errors cannot serialize secret wrappers and include enough non-secret identifiers for diagnosis.
-Validation:
-- Worker `/health` reports readiness.
-- Operational traces include run id, workspace id, phase, attempt id, provider resource ids where safe, and cleanup status.
-## Phase 14: Live-gated e2e and release readiness
-Verify the complete lifecycle only when credentials are intentionally present.
-Deliverables:
-- Live e2e command guarded by explicit env flag.
-- Low-cost Claude Managed Agents fixture Template.
-- Cleanup in `finally`.
-- Release/readiness docs.
-- Updated README and examples for platform SDK usage.
-- GitHub Actions pnpm CI and local Supabase integration job.
-- Vercel dashboard and Railway worker environment-variable contracts.
-- `pnpm dev:stack` for local Supabase/dashboard/worker startup with fail-fast config checks.
-- Deterministic MCP/skills request-wiring tests for SDK and worker providers.
-- Separately gated public MCP plus Anthropic skill full-session e2e.
-TDD gate:
-- Live e2e is not a TDD driver for core logic, but must prove final integration before release.
-- MCP/skills request shape, permission policy, and secret non-leakage must be covered by normal deterministic tests before relying on public live e2e.
-Validation:
-- Full submit -> provider session -> metadata poll -> output capture -> signed link -> cleanup works.
-- Public MCP/skills e2e runs through the explicit `pnpm test:e2e:live` command with live credentials, reaches success, emits SDK/provider events, records safe provider IDs, exercises the default cleanup path, and exercises the `cleanup.claudeSession = "retain"` override path with a follow-up provider API check.
-- Default `pnpm lint`, `pnpm test`, and `pnpm build` pass.
-## Backlog
-- Persistent workspace-level provider connections (saved Anthropic keys with rotation/revocation/re-use). The MVP submission contract carries the key inline per run.
-- Provider webhooks as wakeup/reconciliation accelerator.
-- SSE live event stream for richer dashboard UI.
-- Supabase Realtime with explicit Auth.js-to-Supabase authorization design.
-- Agent/Environment caching by Template/config hash.
-- Additional provider adapters.
-- Runtime human approval flow if product scope changes.
-- Advanced billing and plan management.
-- Cloud Template registry.
-- Curated MCP adapter catalog.

package/references/research-sources.md DELETED Viewed

@@ -1,41 +0,0 @@
----
-title: antpath research sources
-status: reference
-scope: architecture research
----
-# Research sources
-Primary sources used to frame the platform MVP architecture.
-| Topic | Source | Notes |
-| --- | --- | --- |
-| Claude Managed Agents overview | https://platform.claude.com/docs/en/managed-agents/overview.md | Agent, Environment, Session, Event primitives; provider-managed autonomous sessions. |
-| Claude Managed Agents environments | https://platform.claude.com/docs/en/managed-agents/environments.md | Environments define container/package/network configuration; sessions get isolated instances. |
-| Claude Managed Agents events | https://platform.claude.com/docs/en/managed-agents/events-and-streaming.md | Event stream, idle/running/terminated states, user messages, interruptions. |
-| Claude Managed Agents files | https://platform.claude.com/docs/en/managed-agents/files.md | Session-scoped files can be listed by `scope_id` and downloaded by file ID. |
-| Claude Managed Agents vaults | https://platform.claude.com/docs/en/managed-agents/vaults.md | Provider-side MCP credentials via vault IDs; static bearer and OAuth credentials. |
-| Claude Managed Agents MCP connector | https://platform.claude.com/docs/en/managed-agents/mcp-connector.md | Agent declares MCP URLs; session supplies vault IDs. |
-| Claude Managed Agents permission policies | https://platform.claude.com/docs/en/managed-agents/permission-policies.md | `always_allow` vs `always_ask`; MVP must avoid approval-required tools. |
-| Anthropic API and data retention | https://platform.claude.com/docs/en/manage-claude/api-and-data-retention.md | Managed Agents are stateful and not automatically deleted. |
-| Anthropic webhooks | https://docs.anthropic.com/en/api/webhooks | Webhooks are backlog; if used, treat as wakeup/reconciliation signals, not primary state. |
-| Supabase Postgres RLS | https://supabase.com/docs/guides/database/postgres/row-level-security | RLS is not the MVP browser boundary because Auth.js is primary auth and browser data access goes through BFF. |
-| Supabase Realtime | https://supabase.com/docs/guides/realtime | Realtime is backlog until Auth.js-to-Supabase authorization is explicitly solved. |
-| Supabase Storage access control | https://supabase.com/docs/guides/storage/security/access-control | Private buckets and signed/server-mediated access for captured outputs. |
-| Supabase Vault | https://supabase.com/docs/guides/database/vault | Encrypted storage for workspace BYO provider keys. |
-| Auth.js | https://authjs.dev/ | Dashboard user auth; BFF resolves identity and workspace membership. |
-| Vercel | https://vercel.com/docs | Dashboard deployment baseline. |
-| Railway | https://docs.railway.com/ | Persistent worker deployment baseline. |
-| PostgreSQL `SELECT` locking | https://www.postgresql.org/docs/current/sql-select.html | `FOR UPDATE SKIP LOCKED` for concurrent worker claims. |
-| PostgreSQL `LISTEN`/`NOTIFY` | https://www.postgresql.org/docs/current/sql-notify.html | `NOTIFY` is wakeup only; polling remains source of truth. |
-| npm workspaces | https://docs.npmjs.com/cli/using-npm/workspaces | Repository migration to a TypeScript workspace. |
-| OpenAI data controls | https://developers.openai.com/api/docs/guides/your-data | Responses retention, `store`, files, container lifecycle, ZDR behavior. |
-| OpenAI remote MCP tools | https://developers.openai.com/api/docs/guides/tools-remote-mcp | Remote MCP `authorization` is sent per request and not stored by OpenAI. |
-| MCP transports | https://modelcontextprotocol.io/specification/2025-11-25/basic/transports.md | Stdio and Streamable HTTP requirements, session IDs, Origin validation. |
-| MCP authorization | https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization.md | OAuth 2.1/OIDC discovery, protected resource metadata, scopes/resource binding. |
-| MCP tools | https://modelcontextprotocol.io/specification/2025-11-25/server/tools.md | Tool schemas, structured content, task support, HITL considerations. |
-| MCP security best practices | https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices.md | Confused deputy and token passthrough risks. |
-| MCP client best practices | https://modelcontextprotocol.io/docs/develop/clients/client-best-practices.md | Progressive discovery, dynamic server management, prompt-cache considerations. |
-| OpenAI Agents SDK JS | https://openai.github.io/openai-agents-js/ | Future comparison point for OpenAI support. |
-| Vercel AI SDK agents | https://ai-sdk.dev/docs/agents/building-agents | Future comparison point for TypeScript agent abstractions. |
-| Temporal TypeScript workflows | https://docs.temporal.io/develop/typescript/workflows/basics | Future reference if antpath adds durable cloud orchestration. |

package/references/testing-strategy.md DELETED Viewed

@@ -1,29 +0,0 @@
----
-title: antpath testing strategy
-status: accepted
-scope: SDK package
----
-# antpath SDK testing strategy
-The SDK follows the repository taxonomy: unit, integration, and e2e only.
-## Unit
-`pnpm test` and `pnpm test:unit` run deterministic unit tests. Unit tests may use fakes, generated data, and sanitized recorded snapshots, including expensive provider responses captured once and replayed later.
-```text
-pnpm test
-pnpm test:unit
-pnpm test:property
-pnpm test:unit:recorded
-pnpm test:load:replay
-```
-## Integration
-Integration tests run live external systems with no fakes and no skip flags. SDK package integration is currently covered through the workspace Supabase integration and worker provider integration commands.
-## E2E
-`pnpm test:e2e:live` runs the live top-to-bottom Claude Managed Agents flow. The command itself is the opt-in and requires `ANTHROPIC_API_KEY`; missing credentials fail the command rather than skipping.