agentfootprint 2.3.0 → 2.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (61) hide show
  1. package/README.md +277 -249
  2. package/dist/core/Agent.js +16 -0
  3. package/dist/core/Agent.js.map +1 -1
  4. package/dist/esm/core/Agent.js +16 -0
  5. package/dist/esm/core/Agent.js.map +1 -1
  6. package/dist/esm/index.js +1 -1
  7. package/dist/esm/index.js.map +1 -1
  8. package/dist/esm/lib/injection-engine/SkillRegistry.js +83 -0
  9. package/dist/esm/lib/injection-engine/SkillRegistry.js.map +1 -0
  10. package/dist/esm/lib/injection-engine/factories/defineSkill.js +34 -0
  11. package/dist/esm/lib/injection-engine/factories/defineSkill.js.map +1 -1
  12. package/dist/esm/lib/injection-engine/index.js +2 -1
  13. package/dist/esm/lib/injection-engine/index.js.map +1 -1
  14. package/dist/esm/lib/injection-engine/types.js.map +1 -1
  15. package/dist/index.js +3 -1
  16. package/dist/index.js.map +1 -1
  17. package/dist/lib/injection-engine/SkillRegistry.js +87 -0
  18. package/dist/lib/injection-engine/SkillRegistry.js.map +1 -0
  19. package/dist/lib/injection-engine/factories/defineSkill.js +36 -1
  20. package/dist/lib/injection-engine/factories/defineSkill.js.map +1 -1
  21. package/dist/lib/injection-engine/index.js +4 -1
  22. package/dist/lib/injection-engine/index.js.map +1 -1
  23. package/dist/lib/injection-engine/types.js.map +1 -1
  24. package/dist/types/core/Agent.d.ts +14 -0
  25. package/dist/types/core/Agent.d.ts.map +1 -1
  26. package/dist/types/index.d.ts +1 -1
  27. package/dist/types/index.d.ts.map +1 -1
  28. package/dist/types/lib/injection-engine/SkillRegistry.d.ts +50 -0
  29. package/dist/types/lib/injection-engine/SkillRegistry.d.ts.map +1 -0
  30. package/dist/types/lib/injection-engine/factories/defineSkill.d.ts +72 -0
  31. package/dist/types/lib/injection-engine/factories/defineSkill.d.ts.map +1 -1
  32. package/dist/types/lib/injection-engine/index.d.ts +2 -1
  33. package/dist/types/lib/injection-engine/index.d.ts.map +1 -1
  34. package/dist/types/lib/injection-engine/types.d.ts +10 -0
  35. package/dist/types/lib/injection-engine/types.d.ts.map +1 -1
  36. package/package.json +27 -8
  37. package/README.proposed.md +0 -258
  38. package/dist/instructions.js +0 -21
  39. package/dist/instructions.js.map +0 -1
  40. package/dist/lib/instructions/defineInstruction.js +0 -35
  41. package/dist/lib/instructions/defineInstruction.js.map +0 -1
  42. package/dist/lib/instructions/evaluator.js +0 -38
  43. package/dist/lib/instructions/evaluator.js.map +0 -1
  44. package/dist/lib/instructions/index.js +0 -48
  45. package/dist/lib/instructions/index.js.map +0 -1
  46. package/dist/lib/instructions/types.js +0 -22
  47. package/dist/lib/instructions/types.js.map +0 -1
  48. package/dist/memory/conversationHelpers.js +0 -39
  49. package/dist/memory/conversationHelpers.js.map +0 -1
  50. package/dist/types/instructions.d.ts +0 -5
  51. package/dist/types/instructions.d.ts.map +0 -1
  52. package/dist/types/lib/instructions/defineInstruction.d.ts +0 -22
  53. package/dist/types/lib/instructions/defineInstruction.d.ts.map +0 -1
  54. package/dist/types/lib/instructions/evaluator.d.ts +0 -11
  55. package/dist/types/lib/instructions/evaluator.d.ts.map +0 -1
  56. package/dist/types/lib/instructions/index.d.ts +0 -44
  57. package/dist/types/lib/instructions/index.d.ts.map +0 -1
  58. package/dist/types/lib/instructions/types.d.ts +0 -100
  59. package/dist/types/lib/instructions/types.d.ts.map +0 -1
  60. package/dist/types/memory/conversationHelpers.d.ts +0 -19
  61. package/dist/types/memory/conversationHelpers.d.ts.map +0 -1
package/README.md CHANGED
@@ -1,351 +1,379 @@
1
1
  <p align="center">
2
- <h1 align="center">AgentFootPrint</h1>
2
+ <h1 align="center">agentfootprint</h1>
3
3
  <p align="center">
4
- <strong>Context engineering, made buildable.</strong>
4
+ <strong>Context engineering, abstracted.</strong>
5
5
  </p>
6
6
  </p>
7
7
 
8
8
  <p align="center">
9
9
  <a href="https://github.com/footprintjs/agentfootprint/actions"><img src="https://github.com/footprintjs/agentfootprint/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
10
+ <a href="https://codecov.io/gh/footprintjs/agentfootprint"><img src="https://codecov.io/gh/footprintjs/agentfootprint/branch/main/graph/badge.svg" alt="Coverage"></a>
10
11
  <a href="https://www.npmjs.com/package/agentfootprint"><img src="https://img.shields.io/npm/v/agentfootprint.svg?style=flat" alt="npm version"></a>
11
- <a href="https://github.com/footprintjs/agentfootprint/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="MIT License"></a>
12
12
  <a href="https://www.npmjs.com/package/agentfootprint"><img src="https://img.shields.io/npm/dm/agentfootprint.svg" alt="Downloads"></a>
13
- <a href="https://footprintjs.github.io/agentfootprint/"><img src="https://img.shields.io/badge/Docs-agentfootprint-facc15?style=flat&logo=typescript&logoColor=white" alt="Docs"></a>
14
- <a href="https://github.com/footprintjs/footPrint"><img src="https://img.shields.io/badge/Built_on-footprintjs-ca8a04?style=flat" alt="Built on footprintjs"></a>
13
+ <a href="https://github.com/footprintjs/agentfootprint/blob/main/LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="MIT"></a>
15
14
  </p>
16
15
 
17
16
  <br>
18
17
 
19
- **Building Generative AI applications is mostly *context engineering*** &mdash; deciding *what content lands in which slot of the LLM call, when, and why*. agentfootprint gives you a framework to build generative AI apps — single LLM calls, agents, multi-agent systems — where this discipline is **buildable at the control-flow level** (Sequence, Parallel, Conditional, Loop), not hidden inside new classes per paper.
18
+ > **PyTorch's autograd abstracted gradient computation. Express abstracted the HTTP request loop. Prisma abstracted SQL CRUD. Kubernetes abstracted reconciliation. React abstracted the DOM.**
19
+ >
20
+ > Every load-bearing dev tool of the last decade is *the same kind of move* — abstract one specific kind of bookkeeping that practitioners were doing by hand, so they can spend their attention on intent instead of plumbing.
21
+ >
22
+ > **agentfootprint is that move applied to context engineering** — the discipline of deciding what content lands in which slot of an LLM call, when, and why. You describe injections declaratively. The framework evaluates every trigger every iteration, composes the `system` / `messages` / `tools` slots, observes every decision as a typed event, and persists checkpoints you can replay six months later. So you write the **intent**, not 200 lines of slot-management bookkeeping per agent.
20
23
 
21
- ```bash
22
- npm install agentfootprint footprintjs
23
- ```
24
-
25
- ```typescript
26
- const agent = Agent.create({ provider: anthropic(...) })
27
- .steering(tone) // always-on persona
28
- .instruction(urgentRule) // rule-gated, fires when matched
29
- .skill(billingSkill) // LLM-activated body + tools
30
- .memory(causalMemory) // cross-run "why" replay
31
- .build();
32
- ```
24
+ | Framework | You write declaratively | The framework abstracts |
25
+ |---|---|---|
26
+ | **PyTorch (autograd)** | Forward computation graph | Gradient computation, backward pass, parameter bookkeeping |
27
+ | **Express / Fastify** | Routes + handlers | HTTP request loop, middleware chain, response serialization |
28
+ | **Prisma / SQLAlchemy** | Schema + query intent | SQL generation, connection pooling, migrations |
29
+ | **Kubernetes** | Desired state (manifests) | Scheduling, health checks, reconciliation loop |
30
+ | **React** | Components + state | DOM diffing, render path, event delegation |
31
+ | **agentfootprint** | Injections (slot × trigger) | Slot composition, iteration loop, observation, replay |
32
+
33
+ The closest structural parallel is **autograd**: you describe the graph, the framework traverses it, and *because the framework owns the traversal it can record everything that happens for free*. Same idea here — you describe Injections, agentfootprint runs the iteration loop, and the typed-event stream + replayable checkpoints are a consequence, not an extra feature.
34
+
35
+ <!-- ┌────────────────────────────────────────────────────────────────┐
36
+ │ 📹 30-second demo video here. │
37
+ │ Embed: paste-trace → drag time-travel slider → │
38
+ │ every slot, every decision, every tool call visible. │
39
+ │ Frame this as "agent DevTools" — the React DevTools moment.│
40
+ └────────────────────────────────────────────────────────────────┘ -->
33
41
 
34
42
  ---
35
43
 
36
- ## What is context engineering?
44
+ ## The abstraction, concretely
37
45
 
38
- Every LLM call has **three slots**:
46
+ ### Without agentfootprint context engineering by hand
39
47
 
40
- ```
41
- ┌──────────────────────────────────────────────────────────────┐
42
- ┌─────────────────────┐ ┌──────────────┐ ┌────────────┐ │
43
- │ system-prompt slot │ │ messages slot│ │ tools slot │ │
44
- │ (instructions, │ │ (history, │ │ (functions │ │
45
- │ │ persona, rules) │ │ user input, │ │ the LLM │ │
46
- │ │ │ tool results)│ │ may call) │ │
47
- └─────────────────────┘ └──────────────┘ └────────────┘ │
48
- └──────────────────────────────────────────────────────────────┘
48
+ ```typescript
49
+ async function runAgentTurn(userMsg, state) {
50
+ let systemPrompt = baseSystem;
51
+ const messages = [...state.history, { role: 'user', content: userMsg }];
52
+ let activeTools = [...baseTools];
53
+
54
+ // 1. Apply always-on steering rules
55
+ for (const rule of steeringRules) systemPrompt += '\n' + rule.text;
56
+
57
+ // 2. Evaluate conditional instructions
58
+ for (const inst of instructions) {
59
+ if (inst.activeWhen(state)) systemPrompt += '\n' + inst.prompt;
60
+ }
61
+
62
+ // 3. Check on-tool-return triggers
63
+ if (state.lastToolResult?.toolName === 'redact_pii') {
64
+ messages.push({ role: 'system', content: 'Use redacted text only.' });
65
+ }
66
+
67
+ // 4. Resolve LLM-activated skills
68
+ for (const id of state.activatedSkills) {
69
+ systemPrompt += '\n' + skills[id].body;
70
+ activeTools.push(...skills[id].tools);
71
+ }
72
+
73
+ // 5. Load + format memory for this tenant
74
+ const memEntries = await store.list({ tenant, conversationId });
75
+ messages.unshift({ role: 'system', content: formatMemory(memEntries.slice(-10)) });
76
+
77
+ // 6. Call LLM, route tool calls, loop, capture state for resume...
78
+ // 7. Persist new turn back to memory tagged with identity...
79
+ // 8. Wire SSE for streaming, attach observability hooks...
80
+
81
+ // No replay. No audit trail. Per agent, hundreds of lines.
82
+ // Every refactor risks a slot-ordering bug nobody catches until prod.
83
+ }
49
84
  ```
50
85
 
51
- Context engineering is **deciding what flows into each slot, when**. The same content can be a *Skill* (LLM activates it), a *Steering* doc (always-on), an *Instruction* (rule-gated), a *Fact* (developer-supplied data), or a *Memory* (learned across runs). They're flavors of the same idea: **injection into a slot at the right moment**.
86
+ ### With agentfootprint declarative
52
87
 
53
- | Flavor | Slot | When it injects |
54
- |---|---|---|
55
- | **Skill** | system-prompt + tools | When the LLM calls `read_skill('billing')` |
56
- | **Steering** | system-prompt | Always-on (persona, output format, safety) |
57
- | **Instruction** | system-prompt or messages | Rule-gated (predicate matches the turn / a tool just returned) |
58
- | **Fact** | system-prompt or messages | Always-on, but data &mdash; user profile, env, current time |
59
- | **Memory** | messages | Learned across runs &mdash; window / facts / narrative / **causal snapshots** |
60
- | **RAG** | messages | Retrieved chunks (rule + score threshold) |
88
+ ```typescript
89
+ const agent = Agent.create({ provider, model: 'claude-sonnet-4-5-20250929' })
90
+ .system('You are a support assistant.')
91
+ .steering(toneRule) // always-on
92
+ .instruction(urgentRule) // rule-gated
93
+ .skill(billingSkill) // LLM-activated
94
+ .memory(conversationMemory) // cross-run, multi-tenant
95
+ .tool(weather)
96
+ .build();
61
97
 
62
- You're not learning N new framework classes &mdash; you're learning **one model**: slot &times; flavor &times; timing.
98
+ await agent.run({ message: userInput, identity: { conversationId } });
63
99
 
64
- ---
100
+ // Every iteration is a replayable typed event stream — for free.
101
+ agent.on('agentfootprint.context.injected', (e) =>
102
+ console.log(`[${e.payload.source}] landed in ${e.payload.slot}`));
103
+ ```
65
104
 
66
- ## Where do you inject?
105
+ Same agent. The hand-rolled version is ~80 lines and growing; the declarative version is ~8 and stable. **The framework owns the wiring** — which is exactly why it can observe, replay, and audit it for you.
67
106
 
68
- Into the three slots of the LLM API call:
107
+ ---
69
108
 
70
- | LLM API field | Holds | Examples of what you'd inject here |
71
- |---|---|---|
72
- | **`system` prompt** | Persona, rules, available capabilities | Steering doc · Instruction text · Skill body · Fact data · formatted memory |
73
- | **`messages` array** | Conversation turns + tool results | Memory replay · RAG chunks · synthetic tool results · injected instructions on a recent turn |
74
- | **`tools` array** | Function schemas the LLM may call | Skill-attached tools · permission-gated subset · per-iteration dynamic registry |
109
+ ## In 30 seconds runs offline, no API key
110
+
111
+ ```bash
112
+ npm install agentfootprint footprintjs
113
+ ```
75
114
 
76
- Every injection — Skill, Steering, Instruction, Fact, Memory, RAG, Guardrail — lands in **one of these three places**. There is no fourth slot.
115
+ ```typescript
116
+ import { Agent, defineTool, mock } from 'agentfootprint';
77
117
 
78
- ## When (and how) do we inject?
118
+ const weather = defineTool({
119
+ name: 'weather',
120
+ description: 'Get current weather for a city.',
121
+ inputSchema: {
122
+ type: 'object',
123
+ properties: { city: { type: 'string' } },
124
+ required: ['city'],
125
+ },
126
+ execute: async ({ city }: { city: string }) => `${city}: 72°F, sunny`,
127
+ });
79
128
 
80
- Context engineering happens at runtime. Three timing levels of expressiveness:
129
+ const agent = Agent.create({
130
+ provider: mock({ reply: 'I checked: it is 72°F and sunny.' }), // ← deterministic, no API key
131
+ model: 'mock',
132
+ })
133
+ .system('You answer weather questions using the weather tool.')
134
+ .tool(weather)
135
+ .build();
81
136
 
82
- ```
83
- 1. LLMCall ← one-shot. Inject once before the call.
84
-
85
- 2. Agent (ReAct loop) ← inject before EVERY iteration.
86
-
87
- 3. Dynamic Agent ← inject DIFFERENTLY per iteration based on
88
- tool results, reasoning state, or user input.
137
+ const result = await agent.run({ message: 'Weather in Paris?' });
138
+ console.log(result); // "I checked: it is 72°F and sunny."
89
139
  ```
90
140
 
91
- The third level is where context engineering pays off. The agent calls a `redact_pii` tool → on the *next* iteration, an Instruction with `trigger: 'on-tool-return'` fires that says *"use the redacted text only, don't paraphrase the original"*. That kind of just-in-time injection is what separates an LLM that follows the rules from one that drifts.
92
-
93
- agentfootprint handles all three timing levels through the **same** primitive (`Injection`), evaluated by the **same** engine, observed by the **same** event (`agentfootprint.context.injected`).
141
+ Swap `mock(...)` for `anthropic(...)` / `openai(...)` / `bedrock(...)` / `ollama(...)` for production. Nothing else changes.
94
142
 
95
143
  ---
96
144
 
97
- ## Building a generative app = deciding when/how to inject
98
-
99
- That's the discipline. agentfootprint abstracts it for you in two layers, both built on the [footprintjs](https://github.com/footprintjs/footPrint) flowchart substrate:
100
-
101
- ### Layer 1 — Single agent: one `Injection` primitive
145
+ ## The mental model three slots, four triggers, one Injection
102
146
 
103
- For ONE agent, every Skill / Steering / Instruction / Fact / Memory / RAG is the same `Injection` primitive: *"this content lands in this slot when this trigger matches."* You define them; the engine evaluates them per iteration; observability flows through one event.
147
+ Every LLM call has three slots. **Every "agent feature" Skill, Steering doc, Instruction, Fact, Memory replay, RAG chunk is content flowing into one of them, under one of four triggers.** That's the entire abstraction.
104
148
 
105
149
  ```
106
- Agent ─► InjectionEngine ─► ┌─ system-prompt slot ─► CallLLM ─► Tools ─► loop
107
- ├─ messages slot
108
- └─ tools slot
150
+ ┌─────────────────────────────────────┐
151
+ │ │
152
+ │ Your LLM call has 3 slots: │
153
+ │ │
154
+ │ system messages tools │
155
+ │ ▲ ▲ ▲ │
156
+ └───────┼──────────┼──────────┼───────┘
157
+ │ │ │
158
+ │ one │ one │
159
+ │ Injection│ Injection│
160
+ │ fires… │ fires… │
161
+ │ │ │
162
+ ┌──────────────┴────┐ ┌──┴───┐ ┌──┴────┐
163
+ │ defineSteering │ │ ... │ │ ... │
164
+ │ defineInstruction │ │ │ │ │
165
+ │ defineSkill │ │ │ │ │
166
+ │ defineFact │ │ │ │ │
167
+ │ defineMemory(read) │ │ │ │ │
168
+ │ defineRAG │ │ │ │ │
169
+ │ …your next idea │ │ │ │ │
170
+ └────────────────────┘ └──────┘ └───────┘
171
+
172
+ …under one of:
173
+ always · rule · on-tool-return · llm-activated
109
174
  ```
110
175
 
111
- ### Layer 2Multi-agent: connect agents through control flow
176
+ There's no fourth slot. There won't be. Every named pattern in the agent literature Reflexion, Tree-of-Thoughts, Skills, RAG, Constitutional AI — reduces to *which slot* + *which trigger*. **You learn one model; the field's growth lands as new factories on the same primitive.**
112
177
 
113
- For MULTIPLE agents, you don't need a new primitive. Connect them with the same control-flow building blocks that connect any flowchart stages:
178
+ ### The four triggers *who decides* this injection is needed right now?
114
179
 
115
- | Composition | What it does | Multi-agent example |
116
- |---|---|---|
117
- | **Sequence** | A B C | `Sequence(Researcher, Writer, Editor)`output flows downstream |
118
- | **Parallel** | fan-out, merge | `Parallel(Critic1, Critic2, Critic3) + merge`multi-perspective review |
119
- | **Conditional** | predicate-based routing | route to specialist Agent based on intent classification |
120
- | **Loop** | repeat with budget | `Loop(Agent + judge)` iterate until quality bar hit |
180
+ | Trigger | Who decides | Fires when | Real-world example |
181
+ |---|---|---|---|
182
+ | `always` | nobody (always on) | Every iteration, every turn | *"Be friendly and concise."* `defineSteering` |
183
+ | `rule` | **you**, via predicate | A `(ctx) => boolean` you wrote returns true | *"If user wrote 'urgent', prioritize fastest path."*`defineInstruction({ activeWhen })` |
184
+ | `on-tool-return` | **the system** | A specific tool just returned (recency-first injection on the next iteration) | *"After `redact_pii` ran, use redacted text only."* Dynamic ReAct |
185
+ | `llm-activated` | **the LLM** | The LLM called your activation tool (e.g. `read_skill('billing')`) | Skill body + unlocked tools land next iteration `defineSkill` |
121
186
 
122
- That's it. **No `MultiAgentSystem` class. No `Orchestrator` class. No new vocabulary.** Multi-agent is just compositions of single Agents through control flow.
187
+ Why exactly four? Because *who decides activation* is a closed axis: nobody / the developer / the system / the LLM. Together those four exhaust the meaningful "when does this content matter?" cases. A fifth would require introducing a new agent of decision and there isn't one. That's why the primitive surface stays this small even as named patterns proliferate above it.
123
188
 
124
- ### Same abstraction → native patterns
189
+ ---
125
190
 
126
- Because the substrate is so small (Agent + Sequence/Parallel/Conditional/Loop), every named multi-agent pattern is just a recipe — and we ship runnable examples for the canonical ones:
191
+ ## Why this isn't just an ergonomics win
127
192
 
128
- | Pattern | Recipe | Source |
129
- |---|---|---|
130
- | **ReAct** | `Agent` with the default loop | Yao 2022 |
131
- | **Reflexion** | `Sequence(Agent, critique-LLM, Agent)` | Shinn 2023 |
132
- | **Tree-of-Thoughts** | `Parallel(Agent × N) + rank` | Yao 2023 |
133
- | **Self-Consistency** | `Parallel(Agent × N) + majority-vote` | Wang 2022 |
134
- | **Debate** | `Loop(Agent × 2 + judge)` | Du 2023 |
135
- | **Map-Reduce** | `Parallel(Agent × N) + merge` | Dean 2004 |
136
- | **Swarm** | `Agent` whose tools are other `Agent`s | OpenAI 2024 |
193
+ The React parallel goes one layer deeper than "less code." Because the framework owns the wiring, the framework can do things you couldn't do by hand:
137
194
 
138
- Browse them in [`examples/patterns/`](examples/patterns/). Every file runs end-to-end with `npm run example examples/patterns/<file>.ts`. **You compose. We don't ship a `ReflexionAgent` class.**
195
+ | You write declaratively | The framework does for you |
196
+ |---|---|
197
+ | `.steering(rule)` | Evaluates every iteration, composes into `system` slot |
198
+ | `.instruction(activeWhen, prompt)` | Re-evaluates predicate per iteration; routes to `system` or `messages` for attention positioning |
199
+ | `.skill(billing)` | Auto-attaches `read_skill` tool; LLM activates by id; body + unlocked tools land in next iteration |
200
+ | `.memory(causal)` | Persists footprintjs decision-evidence snapshots; embeds queries; cosine-matches on follow-up runs |
201
+ | `.tool(weather)` | Schemas to LLM, dispatches calls, captures args/results, gates by permission policy |
202
+ | `.attach(recorder)` | Subscribes to 47 typed events across 13 domains as the chart traverses |
203
+ | `agent.run({...})` | Captures every decision, every commit, every tool call as a JSON checkpoint that's replayable cross-server |
139
204
 
140
- > **Show me the smallest one** [`examples/patterns/02-reflection.ts`](examples/patterns/02-reflection.ts) implements Reflexion in ~30 lines. Run it: `npm run example examples/patterns/02-reflection.ts`.
205
+ **The flowchart-pattern substrate** ([footprintjs](https://github.com/footprintjs/footPrint)) is what makes the observation automatic. Every stage execution is a typed event during one DFS traversal — no instrumentation, no post-processing. Same way React DevTools shows you the component tree because React owns the render path, agentfootprint shows you the slot composition because agentfootprint owns the prompt path.
141
206
 
142
207
  ---
143
208
 
144
- ## Why a context-engineering framework
209
+ ## What you can build
145
210
 
146
- If you're going to build generative AI apps on a framework, pick the one whose **core stays small as the field grows**. agentfootprint's core has *one* Injection primitive. Every current flavor reduces to it &mdash; and so will every flavor that hasn't been invented yet.
211
+ Three example shapes, all runnable end-to-end with `npm run example examples/<file>.ts`.
147
212
 
148
- ```
149
- Skill = Injection { trigger: 'llm-activated', slots: [system-prompt, tools] }
150
- Steering = Injection { trigger: 'always-on', slots: [system-prompt] }
151
- Instruction = Injection { trigger: 'rule', slots: [system-prompt | messages] }
152
- Fact = Injection { trigger: 'always-on', slots: [system-prompt | messages] }
153
- RAG = Injection { trigger: 'rule + score', slots: [messages] } (v2.1)
154
- Guardrail = Injection { trigger: 'on-tool-return',slots: [system-prompt] } (v2.x)
155
- ??? = Injection { trigger: ?, slots: ? } (your idea)
213
+ ### Customer support agent (skills + memory + audit trail)
214
+
215
+ ```typescript
216
+ const agent = Agent.create({ provider, model: 'claude-sonnet-4-5-20250929' })
217
+ .system('You are a friendly support assistant.')
218
+ .skill(billingSkill) // LLM activates with read_skill('billing')
219
+ .steering(toneGuidelines) // always-on
220
+ .memory(conversationMemory) // remembers across .run() calls, per-tenant
221
+ .build();
156
222
  ```
157
223
 
158
- Adding the next flavor is **one new factory file** &mdash; no engine change, no slot subflow change, no consumer-API change. Lens chips, observability events, audit trails all flow through the same plumbing.
224
+ [`examples/context-engineering/06-mixed-flavors.ts`](examples/context-engineering/06-mixed-flavors.ts)
159
225
 
160
- | | Frameworks growing class-per-paper | agentfootprint |
161
- |---|---|---|
162
- | Adding a new flavor (e.g. *guardrail*) | New `GuardrailAgent` class, new event type, new UI surface | One factory file, same `Injection` shape, same `context.injected` event |
163
- | Cross-run "why was X rejected?" | LLM reconstructs from messages | Replay EXACT past decisions from causal snapshots |
164
- | Training-data export | Manual, lossy, optional | Same snapshot shape → SFT / DPO / process-RL ready (v2.1+) |
165
- | Decision evidence | Lost &mdash; only the final answer survives | First-class events from `decide()` / `select()` captured during traversal |
226
+ ### Research pipeline (multi-agent fan-out + merge)
166
227
 
167
- ---
228
+ ```typescript
229
+ const research = Parallel.create()
230
+ .branch(optimist).branch(skeptic).branch(historian)
231
+ .merge(synthesizer)
232
+ .build();
168
233
 
169
- ## Quick Start
234
+ await research.run({ message: 'Should we adopt microservices?' });
235
+ ```
236
+
237
+ → [`examples/patterns/05-tot.ts`](examples/patterns/05-tot.ts) (Tree-of-Thoughts) · [`examples/patterns/01-self-consistency.ts`](examples/patterns/01-self-consistency.ts)
238
+
239
+ ### Streaming chat agent (token-by-token to a browser)
240
+
241
+ <!-- ┌────────────────────────────────────────────────────────────────┐
242
+ │ 📹 Streaming demo clip here. │
243
+ │ Short loop: user types → tokens stream → tool call │
244
+ │ surfaces mid-stream → final answer. │
245
+ └────────────────────────────────────────────────────────────────┘ -->
170
246
 
171
247
  ```typescript
172
- import {
173
- Agent, defineTool, defineSteering, defineInstruction,
174
- defineMemory, MEMORY_TYPES, MEMORY_STRATEGIES,
175
- InMemoryStore, anthropic,
176
- } from 'agentfootprint';
248
+ agent.on('agentfootprint.stream.token', (e) => res.write(e.payload.content));
249
+ agent.on('agentfootprint.stream.tool_start', (e) => res.write(`\n→ ${e.payload.toolName}...\n`));
250
+ await agent.run({ message: userInput });
251
+ ```
177
252
 
178
- // Want $0 testing? Swap `anthropic({...})` for `mock({ reply: '...' })`
179
- // — same agent, same flowchart, no API key needed.
253
+ [`docs-site/guides/streaming/`](docs-site/src/content/docs/guides/streaming.mdx)
180
254
 
181
- // 1. A tool the agent can call
182
- const weather = defineTool({
183
- schema: {
184
- name: 'weather',
185
- description: 'Current weather for a city.',
186
- inputSchema: {
187
- type: 'object',
188
- properties: { city: { type: 'string' } },
189
- required: ['city'],
190
- },
191
- },
192
- execute: async (args) => `${(args as { city: string }).city}: 72°F, sunny`,
193
- });
255
+ ---
194
256
 
195
- // 2. Context engineering: one steering doc + one rule-gated instruction
196
- const tone = defineSteering({
197
- id: 'tone',
198
- prompt: 'Be friendly and concise. Acknowledge feelings before facts.',
199
- });
257
+ ## The differentiator: the trace is a cache of the agent's thinking
200
258
 
201
- const urgent = defineInstruction({
202
- id: 'urgent',
203
- activeWhen: (ctx) => /urgent|asap|emergency/i.test(ctx.userMessage),
204
- prompt: 'The user marked this urgent. Prioritize the fastest resolution.',
205
- });
259
+ Other agent frameworks' memory remembers *what was said*. agentfootprint's `defineMemory({ type: CAUSAL })` records the **decision evidence** — every value the agent's flowchart captured during the run, persisted as a JSON-portable snapshot.
206
260
 
207
- // 3. Memory across runs
208
- const memory = defineMemory({
209
- id: 'short-term',
210
- type: MEMORY_TYPES.EPISODIC,
211
- strategy: { kind: MEMORY_STRATEGIES.WINDOW, size: 10 },
212
- store: new InMemoryStore(),
213
- });
261
+ That changes the cost structure of *everything that happens after the agent runs.* The expensive thinking happened once; the recorded trace makes consuming that thinking cheap, three different ways:
214
262
 
215
- // 4. Buildevery layer composes
216
- const agent = Agent.create({
217
- provider: anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! }),
218
- model: 'claude-sonnet-4-5-20250929',
219
- })
220
- .system('You are a helpful weather assistant.')
221
- .tool(weather)
222
- .steering(tone)
223
- .instruction(urgent)
224
- .memory(memory)
225
- .build();
263
+ ### 1. Audit / explain cross-run, six months later, exact past facts
226
264
 
227
- // 5. Run with multi-tenant identity
228
- const id = { conversationId: 'alice-session' };
229
- await agent.run({ message: 'Weather in SF urgently?', identity: id });
265
+ ```typescript
266
+ const causal = defineMemory({
267
+ id: 'causal',
268
+ type: MEMORY_TYPES.CAUSAL,
269
+ strategy: { kind: MEMORY_STRATEGIES.TOP_K, topK: 1, threshold: 0.7, embedder },
270
+ store,
271
+ projection: SNAPSHOT_PROJECTIONS.DECISIONS, // inject "why" only, not "what"
272
+ });
273
+
274
+ // Monday: agent decides loan #42 should be rejected (creditScore=580, threshold=600).
275
+ // Friday: user asks "Why was my application rejected?"
276
+ // → Causal memory loads the exact decision evidence from Monday.
277
+ // → LLM answers from the SOURCE, not reconstruction.
230
278
  ```
231
279
 
232
- Every `.steering` / `.instruction` / `.memory` / `.tool` call adds an injection or a binding. The Agent's flowchart shows them as visible subflows in the narrative &mdash; you can read exactly *what landed in which slot, when, and why* for any run.
280
+ [`examples/memory/06-causal-snapshot.ts`](examples/memory/06-causal-snapshot.ts) runs end-to-end with mock embedder, ~50 lines.
233
281
 
234
- ---
282
+ ### 2. Cheap-model triage — the trace *is* the reasoning
235
283
 
236
- ## Build with mocks first &mdash; swap real infra later
284
+ A trace recorded from your expensive production model (Sonnet-4, GPT-4) is a perfectly good *input* for a small, fast, cheap model (Haiku, GPT-4o-mini) answering follow-up questions about that run. The expensive model already did the work; the cheap model just **reads what's in the trace**.
237
285
 
238
- Generative AI app development is expensive when every iteration hits a paid API. agentfootprint is designed so you can **build the entire app &mdash; agent, context engineering, tool chains, memory, RAG &mdash; against in-memory mocks**, prove the logic and patterns end-to-end with zero API cost, then swap real infrastructure in piece by piece.
286
+ Reading recorded decision evidence is structurally simpler than re-deriving the answer from first principles so a smaller model is enough. You can compose the routing yourself: when causal memory injected a snapshot on the next turn, send that turn to a cheaper provider.
239
287
 
240
288
  ```typescript
241
- import {
242
- Agent, defineTool, defineSteering, defineMemory,
243
- MEMORY_TYPES, MEMORY_STRATEGIES,
244
- mock, InMemoryStore, // ← the mock surfaces
245
- } from 'agentfootprint';
289
+ const heavy = anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
290
+ const cheap = anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
246
291
 
247
- const agent = Agent.create({
248
- provider: mock({ reply: 'Refunds take 3 business days.' }), // ← no API key
249
- model: 'mock',
250
- })
251
- .steering(defineSteering({ id: 'tone', prompt: 'Be friendly.' }))
252
- .tool(defineTool({
253
- schema: { name: 'lookup', description: '...', inputSchema: {} },
254
- execute: async () => 'mock data', // ← inline mock
255
- }))
256
- .memory(defineMemory({
257
- id: 'short-term',
258
- type: MEMORY_TYPES.EPISODIC,
259
- strategy: { kind: MEMORY_STRATEGIES.WINDOW, size: 10 },
260
- store: new InMemoryStore(), // ← ephemeral
261
- }))
292
+ // Production turn — heavy model, full reasoning, snapshot persisted.
293
+ const productionAgent = Agent.create({ provider: heavy, model: 'claude-sonnet-4-5-20250929' })
294
+ .memory(causal)
262
295
  .build();
296
+ await productionAgent.run({ message: 'Should we approve loan #42?', identity });
263
297
 
264
- await agent.run({ message: 'How long does a refund take?' });
298
+ // Follow-up turn cheaper model reads the snapshot, lower cost per turn.
299
+ const followUpAgent = Agent.create({ provider: cheap, model: 'claude-haiku-4-5-20251001' })
300
+ .memory(causal)
301
+ .build();
302
+ await followUpAgent.run({ message: 'Why was loan #42 rejected?', identity });
265
303
  ```
266
304
 
267
- The whole flow runs offline. Iterate on context engineering, narrative, control-flow patterns, error handling, multi-agent compositions &mdash; **without** spending a cent.
305
+ This is memoization for agent reasoning do the expensive work once, serve many queries from the cached result. Across a production system that handles audit / explain / "why did the agent do X?" traffic, this is real money.
268
306
 
269
- When the logic is right, swap one boundary at a time:
307
+ ### 3. Training data every successful run becomes a labeled trajectory
270
308
 
271
- | Boundary | Mock for development | Production swap |
272
- |---|---|---|
273
- | **LLM provider** | `mock({ reply })` &middot; `mock({ replies })` for scripted ReAct | `anthropic()` &middot; `openai()` &middot; `bedrock()` &middot; `ollama()` |
274
- | **Embedder** | `mockEmbedder()` | OpenAI / Cohere / Bedrock embedder (factories on roadmap) |
275
- | **Memory store** | `InMemoryStore` | `RedisStore` (`agentfootprint/memory-redis`) &middot; `AgentCoreStore` (`agentfootprint/memory-agentcore`) &middot; DynamoDB / Postgres / Pinecone (planned) |
276
- | **MCP server** | `mockMcpClient({ tools })` &mdash; in-memory, no SDK | `mcpClient({ transport })` to a real server |
277
- | **Tool execution** | `defineTool({ execute: async () => '...' })` | Same `defineTool`, real implementation |
309
+ The same snapshot data shape is the input to SFT / DPO / process-RL training pipelines (`causalMemory.exportForTraining({ format: 'sft' | 'dpo' | 'process' })` is on the roadmap — see below). You don't run a separate data-collection phase — **your production traffic IS your training set.** Every successful customer interaction is a positive trajectory; every escalation or override is a counter-example.
278
310
 
279
- Each swap is one line. The flowchart, narrative, recorders, and tests don't change. Ship the patterns first; pay for tokens last.
280
-
281
- > Why this matters: it's the difference between *learning context engineering by trying things* and *learning by burning your API budget*. The library treats $0 development as a first-class workflow, not an afterthought.
311
+ The same JSON shape that powered the audit trail and the cheap-model follow-up is the training payload. One recording, three downstream consumers, no extra instrumentation.
282
312
 
283
313
  ---
284
314
 
285
- ## Memory &mdash; one factory, four types, seven strategies
315
+ ## Mocks first, prod second
286
316
 
287
- `defineMemory({ type, strategy, store })` &mdash; one factory dispatches `type &times; strategy.kind` onto the right pipeline.
317
+ Generative AI development is expensive when every iteration hits a paid API. agentfootprint is designed so you build the entire app — agent, context engineering, memory, RAG — against in-memory mocks, prove the logic end-to-end with **zero API cost**, then swap real infrastructure in one boundary at a time.
288
318
 
289
- | Type | What's stored |
290
- |---|---|
291
- | `EPISODIC` | Raw conversation messages |
292
- | `SEMANTIC` | Extracted structured facts |
293
- | `NARRATIVE` | Beats / summaries of prior runs |
294
- | **`CAUSAL`** | **footprintjs decision-evidence snapshots** &mdash; replay "why" cross-run, zero hallucination |
295
-
296
- | Strategy | How content is selected |
297
- |---|---|
298
- | `WINDOW` | Last N entries (rule, no LLM, no embeddings) |
299
- | `BUDGET` | Fit-to-tokens (decider) |
300
- | `SUMMARIZE` | LLM compresses older turns |
301
- | `TOP_K` | Score-threshold semantic retrieval |
302
- | `EXTRACT` | LLM distills facts/beats on write |
303
- | `DECAY` | Recency-weighted (planned) |
304
- | `HYBRID` | Compose multiple |
305
-
306
- **Causal memory** is the differentiator: footprintjs's `decide()` and `select()` capture decision evidence as first-class events. Causal memory persists those snapshots tagged with the user's original query. New questions cosine-match past queries → inject the prior decision evidence → the LLM answers from EXACT past facts. Cross-run "why was X rejected last week?" follow-ups answer correctly without reconstruction.
319
+ | Boundary | Dev (mock) | Prod (swap one line) |
320
+ |---|---|---|
321
+ | LLM provider | `mock({ reply })` · `mock({ replies })` for scripted multi-turn | `anthropic()` · `openai()` · `bedrock()` · `ollama()` |
322
+ | Embedder | `mockEmbedder()` | OpenAI / Cohere / Bedrock embedder (factories on roadmap) |
323
+ | Memory store | `InMemoryStore` | `RedisStore` (`agentfootprint/memory-redis`) · `AgentCoreStore` (`agentfootprint/memory-agentcore`) · DynamoDB / Postgres / Pinecone (planned) |
324
+ | MCP server | `mockMcpClient({ tools })` — in-memory, no SDK | `mcpClient({ transport })` to a real server |
325
+ | Tool execution | inline closure | real implementation |
307
326
 
308
- The same snapshot data shape becomes RL/SFT/DPO training data in v2.1+.
327
+ The flowchart, recorders, narrative, and tests don't change between dev and prod. **Ship the patterns first; pay for tokens last.**
309
328
 
310
329
  ---
311
330
 
312
- ## Documentation
331
+ ## Pick your starting door
313
332
 
314
- | Resource | Link |
333
+ | If you are... | Start here |
315
334
  |---|---|
316
- | **Getting Started** | [Quick Start](https://footprintjs.github.io/agentfootprint/getting-started/quick-start/) &middot; [Key Concepts](https://footprintjs.github.io/agentfootprint/getting-started/key-concepts/) &middot; [Why agentfootprint?](https://footprintjs.github.io/agentfootprint/getting-started/why/) |
317
- | **Guides** | [Agent](https://footprintjs.github.io/agentfootprint/guides/agent/) &middot; [Memory](https://footprintjs.github.io/agentfootprint/guides/memory/) &middot; [Skills](https://footprintjs.github.io/agentfootprint/guides/skills-explained/) &middot; [Instructions](https://footprintjs.github.io/agentfootprint/guides/instructions/) &middot; [Streaming](https://footprintjs.github.io/agentfootprint/guides/streaming/) |
318
- | **Examples** | [33 runnable examples](https://github.com/footprintjs/agentfootprint/tree/main/examples) &mdash; primitives, compositions, patterns, context engineering, memory, runtime features |
319
- | **Integrations** | [Anthropic](https://footprintjs.github.io/agentfootprint/integrations/anthropic/) &middot; [OpenAI](https://footprintjs.github.io/agentfootprint/integrations/openai/) &middot; [AWS Bedrock](https://footprintjs.github.io/agentfootprint/integrations/aws-bedrock/) &middot; [Ollama](https://footprintjs.github.io/agentfootprint/integrations/ollama/) |
320
- | **Built on** | [footprintjs](https://github.com/footprintjs/footPrint) &mdash; the flowchart pattern for backend code |
335
+ | 🎓 **New to agents** | [5-minute Quick Start](https://footprintjs.github.io/agentfootprint/getting-started/quick-start/) first agent runs offline |
336
+ | 🛠️ **A LangChain / CrewAI / LangGraph user** | [Migration sketch](https://footprintjs.github.io/agentfootprint/getting-started/vs/) same patterns, fewer classes |
337
+ | 🏗️ **Architecting an enterprise rollout** | [Production guide](https://footprintjs.github.io/agentfootprint/guides/deployment/) multi-tenant identity, audit trails, redaction, OTel |
338
+ | 🔬 **Researcher / extending the framework** | [Extension guide](https://footprintjs.github.io/agentfootprint/contributing/extension-guide/) add a new flavor in 50 lines |
339
+
340
+ Every code snippet on the docs site is imported from a real, runnable file in [`examples/`](examples/) — every example is also an end-to-end test in CI. There is no docs-only code in this repo.
321
341
 
322
342
  ---
323
343
 
324
- ## What v2.0 ships (today)
344
+ ## What ships today
345
+
346
+ - **2 primitives** — `LLMCall`, `Agent` (the ReAct loop)
347
+ - **4 compositions** — `Sequence`, `Parallel`, `Conditional`, `Loop`
348
+ - **6 LLM providers** — Anthropic · OpenAI · Bedrock · Ollama · Browser-Anthropic · Browser-OpenAI · Mock (with `mock({ replies })` for scripted multi-turn)
349
+ - **One Injection primitive** — `defineSkill` / `defineSteering` / `defineInstruction` / `defineFact` (one engine, four typed factories, all reduce to `{ trigger, slot }`)
350
+ - **One Memory factory** — `defineMemory({ type, strategy, store })` — 4 types × 7 strategies including **Causal**
351
+ - **RAG** — `defineRAG()` + `indexDocuments()` (sugar over Semantic + TopK)
352
+ - **MCP** — `mcpClient({ transport })` for real servers · `mockMcpClient({ tools })` for in-memory development
353
+ - **Memory store adapters** — `InMemoryStore` · `RedisStore` (subpath `agentfootprint/memory-redis`) · `AgentCoreStore` (subpath `agentfootprint/memory-agentcore`)
354
+ - **47 typed observability events** across 13 domains — context · stream · agent · cost · skill · permission · eval · memory · …
355
+ - **Pause / resume** — JSON-serializable checkpoints; pause via `askHuman` / `pauseHere`, resume hours later on a different server
356
+ - **Resilience** — `withRetry`, `withFallback`, `resilientProvider`
357
+ - **AI-coding-tool support** — bundled instructions for Claude Code · Cursor · Windsurf · Cline · Kiro · Copilot
358
+ - **Runnable examples** organized by DNA layer (core · core-flow · patterns · context-engineering · memory · features) — every example is also an end-to-end CI test
359
+
360
+ ## What's next (clearly marked roadmap)
361
+
362
+ | Theme | Focus |
363
+ |---|---|
364
+ | **Reliability subsystem** | `CircuitBreaker` · 3-tier output fallback · auto-resume-on-error · Skills upgrades (`surfaceMode`, `refreshPolicy`) · `MockEnvironment` composer |
365
+ | **Causal training-data exports** | `causalMemory.exportForTraining({ format: 'sft' \| 'dpo' \| 'process' })` — production traffic becomes labeled SFT / DPO / process-RL trajectories |
366
+ | **Governance** | `Policy` · `BudgetTracker` · DynamoDB / Postgres / Pinecone memory adapters · production embedder factories |
367
+ | **Deep Agents · A2A protocol** | Planning-before-execution · agent-to-agent protocol · Lens UI deep-link |
325
368
 
326
- - **2 primitives** — `LLMCall`, `Agent` (ReAct loop)
327
- - **3 compositions + Loop** — Sequence · Parallel · Conditional · Loop
328
- - **6 LLM providers** — Anthropic · OpenAI · Bedrock · Ollama · Browser-Anthropic · Browser-OpenAI · Mock (for $0 testing)
329
- - **InjectionEngine** — one `Injection` primitive + 4 typed factories (`defineSkill` / `defineSteering` / `defineInstruction` / `defineFact`); covers Dynamic ReAct via `on-tool-return` triggers
330
- - **Memory subsystem** — `defineMemory` factory, 4 types (Episodic / Semantic / Narrative / **Causal** ⭐) × 7 strategies (Window / Budget / Summarize / TopK / Extract / Decay / Hybrid)
331
- - **Multi-agent through control flow** — no separate `MultiAgentSystem` class; agents compose via Sequence / Parallel / Conditional / Loop
332
- - **6 canonical patterns** runnable as examples — ReAct · Reflexion · ToT · Self-Consistency · Debate · Map-Reduce · Swarm
333
- - **Observability** — 47 typed events × 13 domains; recorders for context · stream · agent · cost · skill · permission · eval · memory
334
- - **Resilience helpers** — `withRetry`, `withFallback`, `resilientProvider`
335
- - **Pause / resume** — JSON-serializable checkpoints; agent can pause via `askHuman`/`pauseHere` and resume hours later on a different server
336
- - **AI-coding-tool support** — bundled instructions for Claude Code / Cursor / Windsurf / Cline / Kiro / Copilot
337
- - **33 runnable end-to-end examples** — every example is a real test exercising the documented surface
369
+ For shipped features per release see [CHANGELOG.md](./CHANGELOG.md). Roadmap items are *not* claims about the current API if a feature isn't in `npm install agentfootprint` today, it's listed here, not in the documentation.
338
370
 
339
- ## What's next
371
+ ---
340
372
 
341
- | Release | Focus |
342
- |---|---|
343
- | ~~v2.1~~ ✓ | RAG flavor (`defineRAG`) — shipped in 2.1.0 |
344
- | v2.2 | MCP integration (`mcpClient`) ✓ · Redis memory store adapter · CircuitBreaker primitive · 3-tier structured-output fallback |
345
- | v2.2 | Governance subsystem (`Policy`, `BudgetTracker`, role-based access) · DynamoDB / Postgres / Pinecone store adapters |
346
- | v2.3 | Causal training-data exports — `causalMemory.exportForTraining({ format: 'sft' \| 'dpo' \| 'process' })` for HuggingFace / OpenAI / Anthropic batch fine-tune |
347
- | v2.4+ | Deep Agents (planning-before-execution) · A2A protocol · Lens UI deep-link |
373
+ ## Built on
348
374
 
349
- ---
375
+ [footprintjs](https://github.com/footprintjs/footPrint) — the flowchart pattern for backend code. The decision-evidence capture, narrative recording, and time-travel checkpointing this library uses are footprintjs primitives. The same way autograd's forward-pass traversal is what makes gradient inspection automatic, footprintjs's flowchart traversal is what makes agentfootprint's typed-event stream and replayable traces automatic. You don't need to learn footprintjs to use agentfootprint — but if you want to build your own primitives at this depth, [start there](https://footprintjs.github.io/footPrint/).
376
+
377
+ ## License
350
378
 
351
- [MIT](./LICENSE) &copy; [Sanjay Krishna Anbalagan](https://github.com/sanjay1909)
379
+ [MIT](./LICENSE) © [Sanjay Krishna Anbalagan](https://github.com/sanjay1909)