antpath 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +67 -0
- package/dist/client.d.ts +14 -0
- package/dist/client.js +36 -0
- package/dist/client.js.map +1 -0
- package/dist/credentials.d.ts +3 -0
- package/dist/credentials.js +27 -0
- package/dist/credentials.js.map +1 -0
- package/dist/errors.d.ts +25 -0
- package/dist/errors.js +39 -0
- package/dist/errors.js.map +1 -0
- package/dist/files/downloader.d.ts +3 -0
- package/dist/files/downloader.js +35 -0
- package/dist/files/downloader.js.map +1 -0
- package/dist/index.d.ts +5 -0
- package/dist/index.js +5 -0
- package/dist/index.js.map +1 -0
- package/dist/providers/anthropic/provider.d.ts +30 -0
- package/dist/providers/anthropic/provider.js +302 -0
- package/dist/providers/anthropic/provider.js.map +1 -0
- package/dist/providers/types.d.ts +42 -0
- package/dist/providers/types.js +2 -0
- package/dist/providers/types.js.map +1 -0
- package/dist/run/controller.d.ts +27 -0
- package/dist/run/controller.js +224 -0
- package/dist/run/controller.js.map +1 -0
- package/dist/skills/packager.d.ts +11 -0
- package/dist/skills/packager.js +76 -0
- package/dist/skills/packager.js.map +1 -0
- package/dist/template/compiler.d.ts +28 -0
- package/dist/template/compiler.js +116 -0
- package/dist/template/compiler.js.map +1 -0
- package/dist/template/index.d.ts +10 -0
- package/dist/template/index.js +14 -0
- package/dist/template/index.js.map +1 -0
- package/dist/template/types.d.ts +67 -0
- package/dist/template/types.js +2 -0
- package/dist/template/types.js.map +1 -0
- package/dist/types.d.ts +129 -0
- package/dist/types.js +2 -0
- package/dist/types.js.map +1 -0
- package/dist/utils/events.d.ts +6 -0
- package/dist/utils/events.js +41 -0
- package/dist/utils/events.js.map +1 -0
- package/dist/utils/paths.d.ts +3 -0
- package/dist/utils/paths.js +21 -0
- package/dist/utils/paths.js.map +1 -0
- package/dist/utils/secrets.d.ts +10 -0
- package/dist/utils/secrets.js +59 -0
- package/dist/utils/secrets.js.map +1 -0
- package/dist/utils/stable.d.ts +2 -0
- package/dist/utils/stable.js +20 -0
- package/dist/utils/stable.js.map +1 -0
- package/docs/cleanup.md +15 -0
- package/docs/credentials.md +23 -0
- package/docs/mcp.md +18 -0
- package/docs/outputs.md +16 -0
- package/docs/quickstart.md +13 -0
- package/docs/release.md +22 -0
- package/docs/skills.md +16 -0
- package/docs/templates.md +24 -0
- package/docs/testing.md +27 -0
- package/examples/mcp-static-bearer.ts +30 -0
- package/examples/quickstart.ts +23 -0
- package/package.json +51 -0
- package/references/architecture-decisions.md +203 -0
- package/references/implementation-plan.md +527 -0
- package/references/research-sources.md +30 -0
- package/references/testing-strategy.md +108 -0
|
@@ -0,0 +1,527 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: antpath implementation plan
|
|
3
|
+
status: proposed
|
|
4
|
+
scope: sdk-only MVP
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# antpath implementation plan
|
|
8
|
+
|
|
9
|
+
## Goal
|
|
10
|
+
|
|
11
|
+
Build the SDK-only MVP for antpath with test-first development: a TypeScript SDK that runs code-defined Templates on Claude Managed Agents, using caller-held provider credentials, typed credential inputs, provider-side vaults, queued user messages, event streaming, output downloads, and manual cleanup.
|
|
12
|
+
|
|
13
|
+
## Acceptance criteria
|
|
14
|
+
|
|
15
|
+
- A user can define a secret-free Template in TypeScript.
|
|
16
|
+
- A user can create a Client with their provider key.
|
|
17
|
+
- A user can run a Template with typed variables and credentials.
|
|
18
|
+
- The SDK creates all required provider resources per run.
|
|
19
|
+
- The SDK sends queued user messages, one at a time, when the session is idle.
|
|
20
|
+
- The SDK aborts on any provider error or terminated state.
|
|
21
|
+
- The SDK resolves completion as the first idle state after all queued messages are sent.
|
|
22
|
+
- The SDK can stream events, wait for completion, terminate, list files, download a file, download all outputs, read usage, return a final result, and cleanup.
|
|
23
|
+
- The SDK never persists or logs provider keys or MCP credential values.
|
|
24
|
+
- Unit tests cover Template parsing, variable resolution, credential validation, provider request mapping, run state transitions, output handling, and cleanup.
|
|
25
|
+
- Component integration tests cover interactions between SDK modules using fake providers and local fixtures.
|
|
26
|
+
- Recorded API integration tests replay sanitized responses captured from real Claude API calls and are reproducible with scripts.
|
|
27
|
+
- Live e2e tests exercise the real Claude Managed Agents lifecycle and are gated behind local credentials.
|
|
28
|
+
|
|
29
|
+
## Phase 1: Repository and package foundation
|
|
30
|
+
|
|
31
|
+
Create the TypeScript package foundation.
|
|
32
|
+
|
|
33
|
+
Deliverables:
|
|
34
|
+
|
|
35
|
+
- package manager configuration;
|
|
36
|
+
- TypeScript configuration;
|
|
37
|
+
- lint/format/test commands;
|
|
38
|
+
- source/test directory layout;
|
|
39
|
+
- public package entrypoint;
|
|
40
|
+
- typed error base classes;
|
|
41
|
+
- test fixtures and fake provider harness;
|
|
42
|
+
- four test commands: unit, component integration, recorded API integration, and live e2e;
|
|
43
|
+
- fixture recording/sanitization scripts for real API responses.
|
|
44
|
+
|
|
45
|
+
Recommended structure:
|
|
46
|
+
|
|
47
|
+
```text
|
|
48
|
+
src/
|
|
49
|
+
index.ts
|
|
50
|
+
client.ts
|
|
51
|
+
template/
|
|
52
|
+
credentials/
|
|
53
|
+
providers/
|
|
54
|
+
anthropic/
|
|
55
|
+
run/
|
|
56
|
+
files/
|
|
57
|
+
skills/
|
|
58
|
+
utils/
|
|
59
|
+
test/
|
|
60
|
+
unit/
|
|
61
|
+
integration/
|
|
62
|
+
components/
|
|
63
|
+
api-recorded/
|
|
64
|
+
live/
|
|
65
|
+
fixtures/
|
|
66
|
+
api-recordings/
|
|
67
|
+
references/
|
|
68
|
+
scripts/
|
|
69
|
+
record-api-fixtures.ts
|
|
70
|
+
sanitize-api-fixtures.ts
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
Validation:
|
|
74
|
+
|
|
75
|
+
- `npm test` or equivalent test command passes.
|
|
76
|
+
- test commands exist for all four layers.
|
|
77
|
+
- TypeScript build emits declarations.
|
|
78
|
+
|
|
79
|
+
## Phase 2: Public SDK types
|
|
80
|
+
|
|
81
|
+
Define the public API before provider implementation.
|
|
82
|
+
|
|
83
|
+
Deliverables:
|
|
84
|
+
|
|
85
|
+
- `AntpathClient`;
|
|
86
|
+
- `Template`;
|
|
87
|
+
- `RunOptions`;
|
|
88
|
+
- `RunHandle`;
|
|
89
|
+
- `RunResult`;
|
|
90
|
+
- `RunStatus`;
|
|
91
|
+
- `RunEvent`;
|
|
92
|
+
- `UsageSummary`;
|
|
93
|
+
- `OutputManifest`;
|
|
94
|
+
- `CleanupPolicy`;
|
|
95
|
+
- typed credential unions.
|
|
96
|
+
|
|
97
|
+
Initial API shape:
|
|
98
|
+
|
|
99
|
+
```ts
|
|
100
|
+
const client = new AntpathClient({
|
|
101
|
+
anthropicApiKey: process.env.ANTHROPIC_API_KEY
|
|
102
|
+
});
|
|
103
|
+
|
|
104
|
+
const template = defineTemplate({
|
|
105
|
+
name: "example",
|
|
106
|
+
model: "claude-sonnet-4-6",
|
|
107
|
+
system: "You are a focused automation agent.",
|
|
108
|
+
messages: ["Do the task: {{task}}"],
|
|
109
|
+
variables: {
|
|
110
|
+
task: string()
|
|
111
|
+
},
|
|
112
|
+
mcpServers: {
|
|
113
|
+
linear: {
|
|
114
|
+
url: "https://mcp.linear.app/mcp",
|
|
115
|
+
auth: requiredStaticBearer(),
|
|
116
|
+
tools: { allow: ["list_issues"] }
|
|
117
|
+
}
|
|
118
|
+
},
|
|
119
|
+
outputs: {
|
|
120
|
+
recommendedPath: "/antpath/outputs"
|
|
121
|
+
}
|
|
122
|
+
});
|
|
123
|
+
|
|
124
|
+
const handle = await client.run(template, {
|
|
125
|
+
variables: { task: "Summarize open issues" },
|
|
126
|
+
credentials: {
|
|
127
|
+
linear: { type: "static_bearer", token: process.env.LINEAR_API_KEY! }
|
|
128
|
+
}
|
|
129
|
+
});
|
|
130
|
+
|
|
131
|
+
const result = await handle.wait();
|
|
132
|
+
await handle.downloadOutputs("./outputs");
|
|
133
|
+
await handle.cleanup();
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Validation:
|
|
137
|
+
|
|
138
|
+
- Compile-time tests assert supported and unsupported credential shapes.
|
|
139
|
+
- Runtime schema tests reject invalid Template and run inputs.
|
|
140
|
+
- Public types are defined test-first with type tests before implementation.
|
|
141
|
+
|
|
142
|
+
## Phase 3: Template compiler
|
|
143
|
+
|
|
144
|
+
Implement a strict compiler from user Template to resolved, provider-neutral internal configuration.
|
|
145
|
+
|
|
146
|
+
Deliverables:
|
|
147
|
+
|
|
148
|
+
- immutable Template snapshot/hash;
|
|
149
|
+
- variable declaration and resolution;
|
|
150
|
+
- escaping for literal placeholders;
|
|
151
|
+
- strict unresolved variable failures;
|
|
152
|
+
- secret boundary enforcement;
|
|
153
|
+
- MCP declaration normalization;
|
|
154
|
+
- tool allow/deny normalization;
|
|
155
|
+
- environment package/setup normalization;
|
|
156
|
+
- output configuration normalization.
|
|
157
|
+
|
|
158
|
+
Validation:
|
|
159
|
+
|
|
160
|
+
- unresolved variables fail before provider calls;
|
|
161
|
+
- escaped placeholders remain literal;
|
|
162
|
+
- secrets cannot be supplied through variables;
|
|
163
|
+
- Template hash is stable for semantically identical inputs;
|
|
164
|
+
- Template edits produce a different hash.
|
|
165
|
+
- Recorded fixtures do not include resolved secret values.
|
|
166
|
+
|
|
167
|
+
## Phase 4: Credential validation
|
|
168
|
+
|
|
169
|
+
Implement typed credential validation independent of provider calls.
|
|
170
|
+
|
|
171
|
+
Deliverables:
|
|
172
|
+
|
|
173
|
+
- `static_bearer` credential type;
|
|
174
|
+
- `oauth_access_token` credential type;
|
|
175
|
+
- Template credential requirements;
|
|
176
|
+
- run-time credential matching by MCP server key;
|
|
177
|
+
- redacted error messages;
|
|
178
|
+
- no credential values in logs/errors/result objects.
|
|
179
|
+
|
|
180
|
+
Validation:
|
|
181
|
+
|
|
182
|
+
- missing required credentials fail before provider calls;
|
|
183
|
+
- unsupported arbitrary headers fail with explicit out-of-scope error;
|
|
184
|
+
- credential values are redacted in snapshots, errors, and debug output.
|
|
185
|
+
- redaction is tested in unit, component integration, and recorded API fixture sanitization.
|
|
186
|
+
|
|
187
|
+
## Phase 5: Anthropic Managed Agents adapter
|
|
188
|
+
|
|
189
|
+
Build the provider adapter as the only MVP backend.
|
|
190
|
+
|
|
191
|
+
Deliverables:
|
|
192
|
+
|
|
193
|
+
- provider client wrapper;
|
|
194
|
+
- create Environment per run;
|
|
195
|
+
- upload local skills/resources;
|
|
196
|
+
- create inline skills where provider supports them;
|
|
197
|
+
- create Agent with model, system prompt, MCP servers, skills, tools, permission policies;
|
|
198
|
+
- create per-run Vault and Credentials;
|
|
199
|
+
- create Session;
|
|
200
|
+
- send user messages;
|
|
201
|
+
- stream/list events;
|
|
202
|
+
- retrieve status/session metadata;
|
|
203
|
+
- terminate session;
|
|
204
|
+
- list/download session-scoped files;
|
|
205
|
+
- retrieve usage/cost where provider exposes it;
|
|
206
|
+
- cleanup created resources.
|
|
207
|
+
|
|
208
|
+
Provider invariants:
|
|
209
|
+
|
|
210
|
+
- MCP tools must not reach provider with an approval-required policy.
|
|
211
|
+
- Enabled MCP tools must be explicitly allow/deny configured.
|
|
212
|
+
- Credentials are submitted only to provider vault APIs, never persisted locally beyond active process memory.
|
|
213
|
+
- Created provider IDs are captured for cleanup and debugging.
|
|
214
|
+
|
|
215
|
+
Validation:
|
|
216
|
+
|
|
217
|
+
- fake provider tests assert exact request order and payloads;
|
|
218
|
+
- recorded API integration tests assert adapter parsing against sanitized real provider responses;
|
|
219
|
+
- errors from each create step trigger correct cleanup state;
|
|
220
|
+
- cleanup is idempotent where possible;
|
|
221
|
+
- provider IDs are retained even after partial failures.
|
|
222
|
+
|
|
223
|
+
## Phase 6: Run state machine
|
|
224
|
+
|
|
225
|
+
Implement deterministic orchestration around provider events.
|
|
226
|
+
|
|
227
|
+
Deliverables:
|
|
228
|
+
|
|
229
|
+
- `RunController`;
|
|
230
|
+
- event stream consumer;
|
|
231
|
+
- queued message scheduler;
|
|
232
|
+
- timeout controller;
|
|
233
|
+
- termination controller;
|
|
234
|
+
- local status transitions;
|
|
235
|
+
- final result builder.
|
|
236
|
+
|
|
237
|
+
State-machine rules:
|
|
238
|
+
|
|
239
|
+
- Send first message after Session creation.
|
|
240
|
+
- On idle with queued messages remaining, send next message.
|
|
241
|
+
- On idle with no queued messages remaining, succeed.
|
|
242
|
+
- On any provider error, fail and do not send further messages.
|
|
243
|
+
- On terminated, fail unless termination was user-requested.
|
|
244
|
+
- On timeout, terminate and mark timed out.
|
|
245
|
+
|
|
246
|
+
Validation:
|
|
247
|
+
|
|
248
|
+
- table-driven tests for event sequences;
|
|
249
|
+
- component integration tests cover event stream plus queued message scheduling;
|
|
250
|
+
- duplicate/replayed events do not corrupt state;
|
|
251
|
+
- queued messages are sent exactly once;
|
|
252
|
+
- abort prevents later queued messages;
|
|
253
|
+
- timeout cannot race into success.
|
|
254
|
+
|
|
255
|
+
## Phase 7: Run handle API
|
|
256
|
+
|
|
257
|
+
Expose the MVP handle methods.
|
|
258
|
+
|
|
259
|
+
Deliverables:
|
|
260
|
+
|
|
261
|
+
- `status()`;
|
|
262
|
+
- `streamEvents()`;
|
|
263
|
+
- `wait()`;
|
|
264
|
+
- `listFiles()`;
|
|
265
|
+
- `downloadFile()`;
|
|
266
|
+
- `downloadOutputs()`;
|
|
267
|
+
- `cleanup()`;
|
|
268
|
+
- `terminate()`;
|
|
269
|
+
- `usage()`;
|
|
270
|
+
- `result()`.
|
|
271
|
+
|
|
272
|
+
Behavior:
|
|
273
|
+
|
|
274
|
+
- `wait()` resolves once the run reaches terminal SDK state.
|
|
275
|
+
- `streamEvents()` yields provider events plus SDK lifecycle events.
|
|
276
|
+
- `downloadOutputs()` downloads all session-scoped files by default.
|
|
277
|
+
- `cleanup()` is manual and can be called after success, failure, or termination.
|
|
278
|
+
- `cleanup()` reports skipped/failed cleanup operations rather than hiding them.
|
|
279
|
+
|
|
280
|
+
Validation:
|
|
281
|
+
|
|
282
|
+
- methods have deterministic behavior before, during, and after terminal states;
|
|
283
|
+
- cleanup can be retried safely;
|
|
284
|
+
- file downloads preserve names and avoid unsafe path traversal.
|
|
285
|
+
- live e2e test covers the happy path from Template to cleanup.
|
|
286
|
+
|
|
287
|
+
## Phase 8: Output download subsystem
|
|
288
|
+
|
|
289
|
+
Implement local output handling without antpath storage.
|
|
290
|
+
|
|
291
|
+
Deliverables:
|
|
292
|
+
|
|
293
|
+
- session-scoped file listing;
|
|
294
|
+
- safe local path mapping;
|
|
295
|
+
- download all files by default;
|
|
296
|
+
- optional filters/globs;
|
|
297
|
+
- optional `/antpath/outputs` convention in prompts/config;
|
|
298
|
+
- local output manifest;
|
|
299
|
+
- checksum/size metadata when feasible.
|
|
300
|
+
|
|
301
|
+
Validation:
|
|
302
|
+
|
|
303
|
+
- path traversal filenames are sanitized;
|
|
304
|
+
- duplicate filenames are handled deterministically;
|
|
305
|
+
- partial download failures are reported clearly;
|
|
306
|
+
- downloaded output manifest contains no secrets.
|
|
307
|
+
- recorded API fixtures cover session-scoped file listing and download metadata.
|
|
308
|
+
|
|
309
|
+
## Phase 9: Skill packaging
|
|
310
|
+
|
|
311
|
+
Support local uploads and inline skills.
|
|
312
|
+
|
|
313
|
+
Deliverables:
|
|
314
|
+
|
|
315
|
+
- local skill path validation;
|
|
316
|
+
- package/zip local skill directory;
|
|
317
|
+
- upload skill artifact/resource per run;
|
|
318
|
+
- inline skill declaration support;
|
|
319
|
+
- variable resolution in skill content/config;
|
|
320
|
+
- skill resource cleanup tracking.
|
|
321
|
+
|
|
322
|
+
Validation:
|
|
323
|
+
|
|
324
|
+
- missing local paths fail before provider calls;
|
|
325
|
+
- packaging is deterministic;
|
|
326
|
+
- large/unsupported skill inputs fail with actionable errors;
|
|
327
|
+
- inline and local skills can be combined.
|
|
328
|
+
|
|
329
|
+
## Phase 10: Cleanup and orphan handling
|
|
330
|
+
|
|
331
|
+
Implement explicit cleanup as a first-class handle operation.
|
|
332
|
+
|
|
333
|
+
Deliverables:
|
|
334
|
+
|
|
335
|
+
- cleanup policy model;
|
|
336
|
+
- manual default;
|
|
337
|
+
- provider resource cleanup ordering;
|
|
338
|
+
- vault/credential cleanup;
|
|
339
|
+
- environment/agent/session/file cleanup where supported;
|
|
340
|
+
- cleanup state reporting;
|
|
341
|
+
- orphan recovery inputs.
|
|
342
|
+
|
|
343
|
+
Cleanup order should prefer removing credentials first, then optional file/session resources, then Agent/Environment where safe.
|
|
344
|
+
|
|
345
|
+
Validation:
|
|
346
|
+
|
|
347
|
+
- cleanup works after success, failure, timeout, and partial provider creation failure;
|
|
348
|
+
- cleanup failures are returned with provider IDs and redacted messages;
|
|
349
|
+
- calling cleanup twice is safe;
|
|
350
|
+
- orphan recovery can accept persisted provider IDs from a previous result.
|
|
351
|
+
- live e2e cleanup verifies provider vault/session/resource cleanup behavior where provider APIs allow it.
|
|
352
|
+
|
|
353
|
+
## Phase 11: Observability and safe logging
|
|
354
|
+
|
|
355
|
+
Add structured, redacted observability suitable for SDK users.
|
|
356
|
+
|
|
357
|
+
Deliverables:
|
|
358
|
+
|
|
359
|
+
- event hooks/callbacks;
|
|
360
|
+
- debug logger interface;
|
|
361
|
+
- redaction utility;
|
|
362
|
+
- structured SDK lifecycle events;
|
|
363
|
+
- provider request metadata without secrets;
|
|
364
|
+
- error taxonomy.
|
|
365
|
+
|
|
366
|
+
Validation:
|
|
367
|
+
|
|
368
|
+
- tests prove secret values are never emitted through logs/events/errors;
|
|
369
|
+
- debug logs include enough provider IDs and state to diagnose failures.
|
|
370
|
+
|
|
371
|
+
## Phase 12: Documentation and examples
|
|
372
|
+
|
|
373
|
+
Create user-facing SDK docs and runnable examples.
|
|
374
|
+
|
|
375
|
+
Deliverables:
|
|
376
|
+
|
|
377
|
+
- quickstart;
|
|
378
|
+
- Template guide;
|
|
379
|
+
- credentials guide;
|
|
380
|
+
- MCP guide;
|
|
381
|
+
- skills guide;
|
|
382
|
+
- cleanup/orphan guide;
|
|
383
|
+
- output download guide;
|
|
384
|
+
- examples for static bearer and OAuth access-token MCP credentials.
|
|
385
|
+
|
|
386
|
+
Validation:
|
|
387
|
+
|
|
388
|
+
- examples compile;
|
|
389
|
+
- docs state MVP non-goals and accepted risks.
|
|
390
|
+
|
|
391
|
+
## Phase 13: Release readiness
|
|
392
|
+
|
|
393
|
+
Prepare the SDK for first external use.
|
|
394
|
+
|
|
395
|
+
Deliverables:
|
|
396
|
+
|
|
397
|
+
- package metadata;
|
|
398
|
+
- changelog;
|
|
399
|
+
- versioning policy;
|
|
400
|
+
- README;
|
|
401
|
+
- API reference generation if practical;
|
|
402
|
+
- CI workflow once repository hosting is established.
|
|
403
|
+
|
|
404
|
+
Validation:
|
|
405
|
+
|
|
406
|
+
- clean install works;
|
|
407
|
+
- build/test pass from a fresh checkout;
|
|
408
|
+
- package dry-run includes only intended files.
|
|
409
|
+
|
|
410
|
+
## Test strategy
|
|
411
|
+
|
|
412
|
+
Use test-first development for every phase. Start each feature by adding or updating the narrowest failing test in the appropriate layer, then implement only enough code to pass it.
|
|
413
|
+
|
|
414
|
+
Use a fake provider for most tests. Do not rely on live provider calls for the core state machine.
|
|
415
|
+
|
|
416
|
+
### Layer 1: Unit tests
|
|
417
|
+
|
|
418
|
+
Purpose: verify pure logic and small state transitions with no provider, filesystem, or network dependency.
|
|
419
|
+
|
|
420
|
+
Coverage:
|
|
421
|
+
|
|
422
|
+
- Template compiler;
|
|
423
|
+
- variable resolution and escaping;
|
|
424
|
+
- credential parser;
|
|
425
|
+
- redaction utilities;
|
|
426
|
+
- state reducer/state-machine transitions;
|
|
427
|
+
- safe local path mapping;
|
|
428
|
+
- cleanup plan construction.
|
|
429
|
+
|
|
430
|
+
Command:
|
|
431
|
+
|
|
432
|
+
```text
|
|
433
|
+
npm run test:unit
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
### Layer 2: Component integration tests
|
|
437
|
+
|
|
438
|
+
Purpose: verify interactions between SDK components using fake providers, fake clocks, and local fixtures.
|
|
439
|
+
|
|
440
|
+
Coverage:
|
|
441
|
+
|
|
442
|
+
- Client + Template compiler + credential validator;
|
|
443
|
+
- RunController + fake provider event stream;
|
|
444
|
+
- queued message scheduling;
|
|
445
|
+
- output downloader + fake file service;
|
|
446
|
+
- cleanup manager + fake provider resources;
|
|
447
|
+
- logger/event hooks with redaction.
|
|
448
|
+
|
|
449
|
+
Command:
|
|
450
|
+
|
|
451
|
+
```text
|
|
452
|
+
npm run test:integration:components
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
### Layer 3: Recorded API integration tests
|
|
456
|
+
|
|
457
|
+
Purpose: verify provider adapter behavior against sanitized responses captured from real Claude API calls, without requiring network access during normal CI.
|
|
458
|
+
|
|
459
|
+
Requirements:
|
|
460
|
+
|
|
461
|
+
- Real API responses are captured by explicit scripts.
|
|
462
|
+
- Recordings are sanitized before being committed.
|
|
463
|
+
- Secret values, request headers, API keys, bearer tokens, OAuth tokens, and raw sensitive prompts are never stored.
|
|
464
|
+
- Fixtures are deterministic and versioned by provider API/beta header.
|
|
465
|
+
- Tests fail if fixture sanitization leaves secret-shaped values.
|
|
466
|
+
|
|
467
|
+
Commands:
|
|
468
|
+
|
|
469
|
+
```text
|
|
470
|
+
npm run fixtures:record:anthropic
|
|
471
|
+
npm run fixtures:sanitize
|
|
472
|
+
npm run test:integration:api
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
### Layer 4: Live e2e tests
|
|
476
|
+
|
|
477
|
+
Purpose: verify the full real Claude Managed Agents lifecycle using a local credential.
|
|
478
|
+
|
|
479
|
+
Coverage:
|
|
480
|
+
|
|
481
|
+
- create Environment;
|
|
482
|
+
- create Agent;
|
|
483
|
+
- create Vault/Credential when needed;
|
|
484
|
+
- create Session;
|
|
485
|
+
- send queued messages;
|
|
486
|
+
- observe idle completion;
|
|
487
|
+
- list/download session-scoped files;
|
|
488
|
+
- retrieve usage/status where available;
|
|
489
|
+
- terminate if needed;
|
|
490
|
+
- cleanup if configured or explicitly requested.
|
|
491
|
+
|
|
492
|
+
Rules:
|
|
493
|
+
|
|
494
|
+
- Live tests are never part of default CI.
|
|
495
|
+
- Live tests require `.env.local` with `ANTHROPIC_API_KEY`.
|
|
496
|
+
- Live tests must use low-cost prompts, strict timeouts, and cleanup guards.
|
|
497
|
+
- Live tests must not print the key or provider auth headers.
|
|
498
|
+
|
|
499
|
+
Command:
|
|
500
|
+
|
|
501
|
+
```text
|
|
502
|
+
npm run test:e2e:live
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
Key invariants:
|
|
506
|
+
|
|
507
|
+
- no provider call before Template and credentials are fully parsed;
|
|
508
|
+
- no secret value appears in logs, errors, Template snapshots, result objects, or manifests;
|
|
509
|
+
- no secret value appears in recorded API fixtures;
|
|
510
|
+
- every created provider resource is tracked for cleanup;
|
|
511
|
+
- message queue sends each message at most once;
|
|
512
|
+
- cleanup is manual by default and retryable;
|
|
513
|
+
- output download never writes outside the requested local directory.
|
|
514
|
+
|
|
515
|
+
## Backlog plan
|
|
516
|
+
|
|
517
|
+
After MVP:
|
|
518
|
+
|
|
519
|
+
1. Add cloud metadata sync and dashboard.
|
|
520
|
+
2. Add encrypted run-scoped key support for guaranteed cleanup.
|
|
521
|
+
3. Add OpenAI adapter.
|
|
522
|
+
4. Add provider Environment caching by Template hash.
|
|
523
|
+
5. Add cost/token/iteration caps.
|
|
524
|
+
6. Add OAuth refresh credentials.
|
|
525
|
+
7. Add arbitrary MCP headers through an antpath MCP proxy.
|
|
526
|
+
8. Add Template registry and sharing.
|
|
527
|
+
9. Add artifact retention service.
|
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: antpath research sources
|
|
3
|
+
status: reference
|
|
4
|
+
scope: architecture research
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Research sources
|
|
8
|
+
|
|
9
|
+
Primary sources used to frame the MVP architecture.
|
|
10
|
+
|
|
11
|
+
| Topic | Source | Notes |
|
|
12
|
+
| --- | --- | --- |
|
|
13
|
+
| Claude Managed Agents overview | https://platform.claude.com/docs/en/managed-agents/overview.md | Agent, Environment, Session, Event primitives; provider-managed autonomous sessions. |
|
|
14
|
+
| Claude Managed Agents environments | https://platform.claude.com/docs/en/managed-agents/environments.md | Environments define container/package/network configuration; sessions get isolated instances. |
|
|
15
|
+
| Claude Managed Agents events | https://platform.claude.com/docs/en/managed-agents/events-and-streaming.md | Event stream, idle/running/terminated states, user messages, interruptions. |
|
|
16
|
+
| Claude Managed Agents files | https://platform.claude.com/docs/en/managed-agents/files.md | Session-scoped files can be listed by `scope_id` and downloaded by file ID. |
|
|
17
|
+
| Claude Managed Agents vaults | https://platform.claude.com/docs/en/managed-agents/vaults.md | Provider-side MCP credentials via vault IDs; static bearer and OAuth credentials. |
|
|
18
|
+
| Claude Managed Agents MCP connector | https://platform.claude.com/docs/en/managed-agents/mcp-connector.md | Agent declares MCP URLs; session supplies vault IDs. |
|
|
19
|
+
| Claude Managed Agents permission policies | https://platform.claude.com/docs/en/managed-agents/permission-policies.md | `always_allow` vs `always_ask`; MVP must avoid approval-required tools. |
|
|
20
|
+
| Anthropic API and data retention | https://platform.claude.com/docs/en/manage-claude/api-and-data-retention.md | Managed Agents are stateful and not automatically deleted. |
|
|
21
|
+
| OpenAI data controls | https://developers.openai.com/api/docs/guides/your-data | Responses retention, `store`, files, container lifecycle, ZDR behavior. |
|
|
22
|
+
| OpenAI remote MCP tools | https://developers.openai.com/api/docs/guides/tools-remote-mcp | Remote MCP `authorization` is sent per request and not stored by OpenAI. |
|
|
23
|
+
| MCP transports | https://modelcontextprotocol.io/specification/2025-11-25/basic/transports.md | Stdio and Streamable HTTP requirements, session IDs, Origin validation. |
|
|
24
|
+
| MCP authorization | https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization.md | OAuth 2.1/OIDC discovery, protected resource metadata, scopes/resource binding. |
|
|
25
|
+
| MCP tools | https://modelcontextprotocol.io/specification/2025-11-25/server/tools.md | Tool schemas, structured content, task support, HITL considerations. |
|
|
26
|
+
| MCP security best practices | https://modelcontextprotocol.io/docs/tutorials/security/security_best_practices.md | Confused deputy and token passthrough risks. |
|
|
27
|
+
| MCP client best practices | https://modelcontextprotocol.io/docs/develop/clients/client-best-practices.md | Progressive discovery, dynamic server management, prompt-cache considerations. |
|
|
28
|
+
| OpenAI Agents SDK JS | https://openai.github.io/openai-agents-js/ | Future comparison point for OpenAI support. |
|
|
29
|
+
| Vercel AI SDK agents | https://ai-sdk.dev/docs/agents/building-agents | Future comparison point for TypeScript agent abstractions. |
|
|
30
|
+
| Temporal TypeScript workflows | https://docs.temporal.io/develop/typescript/workflows/basics | Future reference if antpath adds durable cloud orchestration. |
|
|
@@ -0,0 +1,108 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: antpath testing strategy
|
|
3
|
+
status: accepted
|
|
4
|
+
scope: sdk-only MVP
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# antpath testing strategy
|
|
8
|
+
|
|
9
|
+
antpath uses **test-first development**. For every behavior change, write or update the relevant failing test first, then implement the smallest maintainable change that passes it.
|
|
10
|
+
|
|
11
|
+
## Test layers
|
|
12
|
+
|
|
13
|
+
| Layer | Name | Purpose | Default CI |
|
|
14
|
+
| --- | --- | --- | --- |
|
|
15
|
+
| 1 | Unit | Pure logic and small state transitions. | Yes |
|
|
16
|
+
| 2 | Component integration | Multiple SDK components with fake providers and local fixtures. | Yes |
|
|
17
|
+
| 3 | Recorded API integration | Provider adapter behavior using sanitized real API recordings. | Yes, if fixtures are present |
|
|
18
|
+
| 4 | Live e2e | Full lifecycle against real Claude Managed Agents. | No |
|
|
19
|
+
|
|
20
|
+
## Layer 1: unit tests
|
|
21
|
+
|
|
22
|
+
Unit tests must have no network, provider, or real filesystem dependency unless the unit being tested is explicitly a path utility.
|
|
23
|
+
|
|
24
|
+
Targets:
|
|
25
|
+
|
|
26
|
+
- Template compiler;
|
|
27
|
+
- variable resolution and escaping;
|
|
28
|
+
- credential parser;
|
|
29
|
+
- redaction utilities;
|
|
30
|
+
- state-machine reducer;
|
|
31
|
+
- cleanup plan builder;
|
|
32
|
+
- output path sanitizer.
|
|
33
|
+
|
|
34
|
+
## Layer 2: component integration tests
|
|
35
|
+
|
|
36
|
+
Component integration tests verify collaboration between modules with fake providers.
|
|
37
|
+
|
|
38
|
+
Targets:
|
|
39
|
+
|
|
40
|
+
- Client plus Template compiler plus credential validator;
|
|
41
|
+
- RunController plus fake provider stream;
|
|
42
|
+
- queued message scheduler;
|
|
43
|
+
- output downloader plus fake file service;
|
|
44
|
+
- cleanup manager plus fake provider resources;
|
|
45
|
+
- logger/event hooks with redaction.
|
|
46
|
+
|
|
47
|
+
## Layer 3: recorded API integration tests
|
|
48
|
+
|
|
49
|
+
Recorded API integration tests use persisted responses from real provider calls. These tests make provider adapter behavior reproducible without live network calls.
|
|
50
|
+
|
|
51
|
+
Rules:
|
|
52
|
+
|
|
53
|
+
- Recordings are created only by explicit scripts.
|
|
54
|
+
- Raw recordings are ignored and must not be committed.
|
|
55
|
+
- Sanitized recordings may be committed.
|
|
56
|
+
- Sanitization must remove request headers, API keys, bearer tokens, OAuth tokens, raw credentials, and any secret-shaped values.
|
|
57
|
+
- Tests must fail if a fixture contains known secret patterns.
|
|
58
|
+
- Fixtures should include provider API version and beta headers in non-secret metadata.
|
|
59
|
+
|
|
60
|
+
Expected scripts:
|
|
61
|
+
|
|
62
|
+
```text
|
|
63
|
+
npm run fixtures:record:anthropic
|
|
64
|
+
npm run fixtures:sanitize
|
|
65
|
+
npm run test:integration:api
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
## Layer 4: live e2e tests
|
|
69
|
+
|
|
70
|
+
Live e2e tests verify the complete lifecycle against Claude Managed Agents.
|
|
71
|
+
|
|
72
|
+
Rules:
|
|
73
|
+
|
|
74
|
+
- Never run by default.
|
|
75
|
+
- Require `.env.local` with `ANTHROPIC_API_KEY`.
|
|
76
|
+
- Use low-cost prompts and short timeouts.
|
|
77
|
+
- Always attempt cleanup in `finally`.
|
|
78
|
+
- Never print credentials or auth headers.
|
|
79
|
+
- Keep assertions focused on lifecycle invariants, not model prose.
|
|
80
|
+
|
|
81
|
+
Expected command:
|
|
82
|
+
|
|
83
|
+
```text
|
|
84
|
+
npm run test:e2e:live
|
|
85
|
+
```
|
|
86
|
+
|
|
87
|
+
## Secret handling
|
|
88
|
+
|
|
89
|
+
`.env.local` may be used for local live testing and must stay ignored. Do not inspect, print, snapshot, or commit it.
|
|
90
|
+
|
|
91
|
+
Secrets must never appear in:
|
|
92
|
+
|
|
93
|
+
- committed fixtures;
|
|
94
|
+
- logs;
|
|
95
|
+
- snapshots;
|
|
96
|
+
- error messages;
|
|
97
|
+
- Template snapshots;
|
|
98
|
+
- output manifests;
|
|
99
|
+
- reference documents.
|
|
100
|
+
|
|
101
|
+
## Test-first workflow
|
|
102
|
+
|
|
103
|
+
1. Choose the narrowest test layer that can prove the behavior.
|
|
104
|
+
2. Add a failing test.
|
|
105
|
+
3. Implement the behavior.
|
|
106
|
+
4. Run the target test command.
|
|
107
|
+
5. Run broader tests only when the behavior crosses module/provider boundaries.
|
|
108
|
+
6. Update recorded fixtures only through recording and sanitization scripts.
|