@pumped-fn/agent-sdk 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License Copyright (c) 2025 Duke
2
+
3
+ Permission is hereby granted, free of
4
+ charge, to any person obtaining a copy of this software and associated
5
+ documentation files (the "Software"), to deal in the Software without
6
+ restriction, including without limitation the rights to use, copy, modify, merge,
7
+ publish, distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to the
9
+ following conditions:
10
+
11
+ The above copyright notice and this permission notice
12
+ (including the next paragraph) shall be included in all copies or substantial
13
+ portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF
16
+ ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
17
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
18
+ EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
19
+ OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
20
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
package/PATTERNS.md ADDED
@@ -0,0 +1,458 @@
1
+ # Agent SDK Patterns
2
+
3
+ Use this package as a small convention layer over `@pumped-fn/lite`. If a use case can be expressed with `flow`, state/service, tags, and `ctx.exec`, do that before adding another primitive.
4
+
5
+ ## 0. Standalone Suspense
6
+
7
+ Use suspense when the system needs deterministic replay or external resolution, but not agents, workers, or remote routing.
8
+
9
+ ```ts
10
+ import { extension, runId, stepCounter, suspend, taskId } from "@pumped-fn/lite-extension-suspense"
11
+
12
+ const waitForCommit = flow({
13
+ name: "wait-for-commit",
14
+ parse: typed<{ revision: number }>(),
15
+ tags: [suspend(true)],
16
+ factory: () => {
17
+ throw new Error("resolved by sync service")
18
+ },
19
+ })
20
+
21
+ const scope = createScope({
22
+ extensions: [extension({ log })],
23
+ })
24
+
25
+ const ctx = scope.createContext({
26
+ tags: [
27
+ taskId("doc-123"),
28
+ runId("sync-42"),
29
+ stepCounter({ next: 0 }),
30
+ ],
31
+ })
32
+ ```
33
+
34
+ Suspense has no agent knowledge. It sees tagged `ctx.exec` calls, assigns `(taskId, runId, step)`, returns completed/resolved log entries, writes pending entries for suspended steps, and throws `SuspendSignal`.
35
+
36
+ ## 1. Workflow Flow
37
+
38
+ Use a workflow flow when code chooses order, branching, retries, and fan-out.
39
+
40
+ ```ts
41
+ import { createScope, flow, tags, typed } from "@pumped-fn/lite"
42
+ import {
43
+ runtime,
44
+ step,
45
+ workflow as workflowRuntime,
46
+ workflowRun,
47
+ workerRegistry,
48
+ workers,
49
+ } from "@pumped-fn/agent-sdk"
50
+ import { kit } from "@pumped-fn/agent-sdk-test"
51
+
52
+ export const processPr = flow({
53
+ name: "process_pr",
54
+ parse: typed<PrEvent>(),
55
+ tags: [
56
+ step({ workflow: true }),
57
+ workers(workerRegistry([lint, test, security])),
58
+ ],
59
+ deps: {
60
+ workflow: tags.required(workflowRuntime),
61
+ runtime: tags.required(runtime),
62
+ },
63
+ factory: async (ctx, { workflow, runtime }) => {
64
+ const lintResult = await runtime.delegate<{ failed: boolean }>("lint", { sha: ctx.input.sha })
65
+ if (lintResult.failed) return { taskId: workflow.taskId, status: "lint-failed" }
66
+
67
+ const [tests, security] = await Promise.all([
68
+ runtime.delegate("test", { sha: ctx.input.sha }),
69
+ runtime.delegate("security", { sha: ctx.input.sha }),
70
+ ])
71
+
72
+ return { taskId: workflow.taskId, status: "ok", tests, security }
73
+ },
74
+ })
75
+
76
+ export async function runProcessPr(input: PrEvent) {
77
+ const { extensions } = kit()
78
+ const scope = createScope({ extensions })
79
+ const ctx = scope.createContext({
80
+ tags: [workflowRun({ taskId: input.sha, runId: "run-1" })],
81
+ })
82
+
83
+ try {
84
+ return await ctx.exec({ flow: processPr, input })
85
+ } finally {
86
+ await ctx.close()
87
+ await scope.dispose()
88
+ }
89
+ }
90
+ ```
91
+
92
+ Why: normal TypeScript control flow stays visible. Replay still works because expensive work is behind `ctx.exec()` through `runtime.delegate()`.
93
+
94
+ `step({ workflow: true })` marks the flow as workflow policy surface. `workflowRun()` is a context tag for run metadata, passed through `createContext({ tags: [...] })`. `workflow` and `runtime` tags are required deps, so missing extensions fail before the factory runs. Event-log policy and remote routing stay normal extension composition.
95
+
96
+ ## 2. Worker Flow
97
+
98
+ Use a worker flow for one executable unit. `step()` says how it may run.
99
+
100
+ ```ts
101
+ export const lint = flow({
102
+ name: "lint",
103
+ parse: typed<{ sha: string }>(),
104
+ tags: [step({ remote: true, kind: "code", timeoutMs: 30_000 })],
105
+ factory: async (ctx) => runLinter(ctx.input.sha),
106
+ })
107
+ ```
108
+
109
+ `remote: true` means the extension may route it to a worker runner. Without a remote runner, the default test helper runs it locally through `next()`.
110
+
111
+ ## 3. LLM Provider
112
+
113
+ Prefer AI provider as a service. The flow owns prompt shape and output parsing.
114
+
115
+ ```ts
116
+ import { service, type Lite } from "@pumped-fn/lite"
117
+
118
+ interface Model {
119
+ complete(ctx: Lite.ExecutionContext, input: { system: string; prompt: string }): Promise<string>
120
+ }
121
+
122
+ export const model = service<Model>({
123
+ factory: () => {
124
+ const client = new ClaudeModel()
125
+ return {
126
+ complete: async (_ctx, input) => client.complete(input),
127
+ }
128
+ },
129
+ })
130
+
131
+ export const classify = flow({
132
+ name: "classify",
133
+ parse: typed<{ text: string }>(),
134
+ deps: { model },
135
+ tags: [step({ kind: "llm" })],
136
+ factory: async (ctx, { model }) => {
137
+ const raw = await model.complete(ctx, {
138
+ system: "Return JSON only.",
139
+ prompt: ctx.input.text,
140
+ })
141
+ return JSON.parse(raw) as { label: string }
142
+ },
143
+ })
144
+ ```
145
+
146
+ Test by preset, not by special agent hooks:
147
+
148
+ ```ts
149
+ const scope = createScope({
150
+ presets: [preset(model, { complete: async () => '{"label":"test"}' })],
151
+ })
152
+ ```
153
+
154
+ ## 4. Agent Application
155
+
156
+ Use `agent()` when the model should choose tools or delegate to another role. Keep the executable work as flows, and keep the model as a provider that can be tagged or faked.
157
+
158
+ ```ts
159
+ const loadTicket = tool({
160
+ description: "Load ticket details.",
161
+ flow: flow({
162
+ name: "load-ticket",
163
+ parse: typed<{ id: string }>(),
164
+ factory: (ctx) => ({ id: ctx.input.id, title: `ticket:${ctx.input.id}` }),
165
+ }),
166
+ })
167
+
168
+ const provider: Model = {
169
+ complete: (_ctx, request) => request.loadedSkills.length === 0
170
+ ? {
171
+ content: "need routing policy",
172
+ skillCalls: [{ name: "routing-policy" }],
173
+ }
174
+ : request.round === 1
175
+ ? {
176
+ content: "loading",
177
+ toolCalls: [{ name: "load-ticket", input: { id: "42" } }],
178
+ }
179
+ : {
180
+ content: "ready",
181
+ stop: true,
182
+ },
183
+ }
184
+
185
+ const triage = agent({
186
+ name: "triage",
187
+ tags: [model(provider)],
188
+ skills: [
189
+ skill({
190
+ name: "routing-policy",
191
+ description: "Support routing rules.",
192
+ content: "Route billing tickets to support.",
193
+ }),
194
+ ],
195
+ tools: [loadTicket],
196
+ })
197
+
198
+ const result = await ctx.exec({
199
+ flow: triage.turn,
200
+ input: { prompt: "triage ticket 42" },
201
+ })
202
+ ```
203
+
204
+ Why: tools and subagent turns still run through `ctx.exec()`, so the same workflow extension can replay, suspend, route, or time out the work. `events` is a boundary resource, so run inspection is testable without a global observer.
205
+
206
+ ## 5. Agent Evals
207
+
208
+ Use deterministic checks for exact requirements and judges for qualitative requirements. A subjective eval with exactly one judge is rejected.
209
+
210
+ ```ts
211
+ const accepts = judge({
212
+ name: "accepts",
213
+ evaluate: () => ({ name: "accepts", passed: true }),
214
+ })
215
+
216
+ const grounded = judge({
217
+ name: "grounded",
218
+ evaluate: () => ({ name: "grounded", passed: true }),
219
+ })
220
+
221
+ const evaluation = suite({
222
+ name: "triage-quality",
223
+ agent: triage,
224
+ cases: [
225
+ {
226
+ name: "uses the loader",
227
+ input: { prompt: "triage ticket 42" },
228
+ checks: [used("load-ticket"), includes("ready")],
229
+ },
230
+ ],
231
+ judges: [accepts, grounded],
232
+ })
233
+
234
+ const report = await runEval(ctx, evaluation)
235
+ const artifact = summary(report)
236
+ ```
237
+
238
+ ## 6. Run Inspection And HTTP
239
+
240
+ Use `inspect()` against a `RunLog` to read workflow steps by `(taskId, runId)`.
241
+
242
+ ```ts
243
+ const run = await inspect(log, { taskId: "triage-42", runId: "run-1" })
244
+ ```
245
+
246
+ Use `http()` to adapt a Fetch request to an agent turn without adding a server framework dependency.
247
+
248
+ ```ts
249
+ const handle = http({ agent: triage })
250
+ const response = await ctx.exec({
251
+ flow: handle,
252
+ input: new Request("https://agent.local/run", {
253
+ method: "POST",
254
+ body: JSON.stringify({ prompt: "triage ticket 42" }),
255
+ }),
256
+ })
257
+ ```
258
+
259
+ ## 7. Channels and Schedules
260
+
261
+ Use channel and schedule flows at the boundary. They should translate external shape into `TurnInput`, then let the agent turn own model/tool/subagent execution.
262
+
263
+ ```ts
264
+ const slack = channel({
265
+ name: "slack-message",
266
+ parse: typed<{ text: string }>(),
267
+ agent: triage,
268
+ input: (ctx) => ({ prompt: ctx.input.text }),
269
+ })
270
+
271
+ const daily = schedule({
272
+ name: "daily-digest",
273
+ agent: triage,
274
+ input: () => ({ prompt: "daily digest" }),
275
+ })
276
+ ```
277
+
278
+ Why: Slack, HTTP, cron, queues, and CLIs stay adapters. The agent runtime still sees a flow input and a scoped execution context.
279
+
280
+ ## 8. Sessions
281
+
282
+ Use `session()` for continuing message history. It is a material, so it uses the same patch and revision behavior as other task state.
283
+
284
+ ```ts
285
+ const thread = session("support-session")
286
+
287
+ await send(ctx, thread, triage, { prompt: "triage ticket 42" })
288
+ await send(ctx, thread, triage, { prompt: "summarize the route" })
289
+ ```
290
+
291
+ ## 9. Sandbox Capability
292
+
293
+ Use `sandbox` as an injected capability, not as a global file or process API.
294
+
295
+ ```ts
296
+ const readWorkspace = tool({
297
+ description: "Read a file from the workspace.",
298
+ flow: flow({
299
+ name: "read-workspace",
300
+ parse: typed<{ path: string }>(),
301
+ deps: { sandbox: tags.required(sandbox) },
302
+ factory: (ctx, deps) => deps.sandbox.readFile(ctx.input.path),
303
+ }),
304
+ })
305
+
306
+ const scope = createScope({
307
+ tags: [
308
+ sandbox({
309
+ readFile: (path) => `file:${path}`,
310
+ writeFile: () => undefined,
311
+ exec: (command, args = []) => ({
312
+ stdout: [command, ...args].join(" "),
313
+ stderr: "",
314
+ exitCode: 0,
315
+ }),
316
+ }),
317
+ ],
318
+ })
319
+ ```
320
+
321
+ ## 10. CLI Worker Adapter
322
+
323
+ Use provider packages when the runtime must call real local tools like Claude or Codex as the agent model provider. Keep the agent graph provider-free and choose the provider with scope or context tags.
324
+
325
+ ```ts
326
+ import { createScope } from "@pumped-fn/lite"
327
+ import { agent, guard, model } from "@pumped-fn/agent-sdk"
328
+ import { claude } from "@pumped-fn/agent-sdk-claude"
329
+ import { codex } from "@pumped-fn/agent-sdk-codex"
330
+
331
+ const shared = guard("review-guard")
332
+
333
+ const reviewer = agent({
334
+ name: "reviewer",
335
+ })
336
+
337
+ const codexScope = createScope({ tags: [codex({ sandbox: "read-only", guard: shared })] })
338
+ const claudeScope = createScope({ tags: [claude({ guard: shared })] })
339
+ ```
340
+
341
+ `codex()` and `claude()` are lazy `model` tags. Tagging a scope is configuration only; the CLI harness is built on first model use. Replace either provider with `model(fake)` at the same seam for tests.
342
+
343
+ `codexHarness()` runs `codex exec --ephemeral --ignore-user-config`. `claudeHarness()` runs `claude -p --no-session-persistence` and rejects `--bare`. Harness prompts request JSON with `content`, optional `guard`, and optional skill/tool/subagent calls. `guard` is the anti-goal; the first value collected from a run is kept in material state and injected into later prompts.
344
+
345
+ Harnesses default to bwrap isolation with network enabled. The default sandbox mounts only the workspace, temporary home, minimal runtime/cert/DNS paths, and explicit credential directories such as `codexHome`. Keep CLI workers at the edge. Stable domain tests should use provider state and presets.
346
+
347
+ ## 11. Durable Step
348
+
349
+ Use `step({ durable: true })` for a step that should suspend until another process resolves it.
350
+
351
+ ```ts
352
+ const approve = flow({
353
+ name: "approve",
354
+ parse: typed<{ title: string }>(),
355
+ tags: [step({ durable: true })],
356
+ factory: () => {
357
+ throw new Error("durable step should be resolved externally")
358
+ },
359
+ })
360
+ ```
361
+
362
+ First run writes a pending log entry and throws `SuspendSignal`. Replay returns the resolved value and continues.
363
+
364
+ ## 12. Remote Runner
365
+
366
+ Remote routing belongs in `RemoteRunner`, not inside workflow code.
367
+
368
+ ```ts
369
+ const scope = createScope({
370
+ extensions: [
371
+ workflowExtension({ log }),
372
+ extension({
373
+ remoteRunner: {
374
+ run: async (event, next) => {
375
+ if (canRoute(event.target)) return publishAndAwaitReply(event)
376
+ return next()
377
+ },
378
+ },
379
+ }),
380
+ ],
381
+ })
382
+ ```
383
+
384
+ The runner may short-circuit before worker dependencies resolve. If it calls `next()`, the worker runs locally.
385
+
386
+ ## 13. Materials
387
+
388
+ Use materials for task state the workflow or workers must patch.
389
+
390
+ ```ts
391
+ const inventory = material("inventory", {
392
+ kind: "json",
393
+ initialState: { items: [] as string[] },
394
+ })
395
+
396
+ await patchMaterial(ctx, inventory, [
397
+ { op: "add", path: "/items/-", value: "typescript" },
398
+ ])
399
+ ```
400
+
401
+ Use derived materials for pure projections:
402
+
403
+ ```ts
404
+ const count = derivedMaterial("inventory-count", inventory, (state) => state.items.length, {
405
+ kind: "json",
406
+ })
407
+ ```
408
+
409
+ ## 14. Event Log Boundary
410
+
411
+ The event log key is `(taskId, runId, step)`. The step increments in standalone suspense `wrapExec`; `workflowExtension()` composes that lower layer.
412
+
413
+ ```mermaid
414
+ sequenceDiagram
415
+ participant W as Workflow
416
+ participant E as Agent extension
417
+ participant L as Event log
418
+
419
+ W->>E: ctx.exec(flow)
420
+ E->>L: lookup task/run/step
421
+ alt completed
422
+ L-->>E: result
423
+ E-->>W: cached result
424
+ else absent local
425
+ E->>W: next()
426
+ W-->>E: result
427
+ E->>L: put completed
428
+ else durable
429
+ E->>L: put pending
430
+ E--xW: SuspendSignal
431
+ end
432
+ ```
433
+
434
+ Because lite wraps the full executable step, cached replay and remote routing skip both dependency resolution and factory execution.
435
+
436
+ ## 9. Failure Ownership
437
+
438
+ | Failure | Owner |
439
+ |---|---|
440
+ | Parse error | Flow boundary |
441
+ | Missing worker | `WorkerRegistry` / caller setup |
442
+ | CLI exit or timeout | `cliWorker()` |
443
+ | Material revision mismatch | Material writer |
444
+ | Pending durable step | Resolver / event log |
445
+ | Replay mismatch | Workflow determinism and event log |
446
+
447
+ Tests should prove the owning layer. Do not hide a missing dependency by adding a broad fake runner. Make the fake prove the exact behavior under test.
448
+
449
+ ## 10. Add No Primitive Unless Forced
450
+
451
+ Before adding an agent SDK primitive, ask:
452
+
453
+ 1. Can this be a tag on a `flow`?
454
+ 2. Can this be a state/service dependency?
455
+ 3. Can this be a `ctx.exec()` helper?
456
+ 4. Can this be an extension policy?
457
+
458
+ Only add a primitive when all four answers are no and the new concept has its own lifecycle or type boundary.